c
Hi Sometimes we see som very long grpc.time_ms. In this case 16 minutes? {"level":"info","traceID":"93d6aac56306fc4858e366e4a7bac301","protocol":"grpc","grpc.component":"server","grpc.service":"authzed.api.v1.PermissionsService","grpc.method":"WriteRelationships","grpc.method_type":"unary","requestID":"cudn2b75pq3c73fk49sg","peer.address":"10.42.5.93:53736","grpc.start_time":"2025-01-30T12:31:40Z","grpc.code":"OK","grpc.time_ms":950686,"time":"2025-01-30T12:47:30Z","message":"finished call"} We are running v1.31.0, in Kubernetes, and are using Postgresql. We have OTL enabled, so we can see the trace. Efter 16 minutes the transaction continues as normal? https://cdn.discordapp.com/attachments/844600078948630559/1334511826762989628/8PKAx5Cf7C0YAAAAASUVORK5CYII.png?ex=679ccc7d&is=679b7afd&hm=aef5c8475f9c47ebde27585fa49c13b6ec93d598d72ec969aef1e4a7665a9394&
v
I wouldn't think that's possible because there are server side deadlines. So it feels like it's blocked somewhere else
is this a gRPC call or an HTTP API call?
c
This is a gRPC call
v
did a database restart happen during that timeframe?
c
No.
j
Is the postgres saturated on the number of open connections? It could've been stuck waiting for a connection based on those events
c
We cannot see any problems with the connections. But another observation is, it always happens with CheckPermission or Dispatch?
We can see from the TCP dump, the connection is established correctly. So would could be happening the next 16 minutes, until SpiceDB start sending the BEGIN to Postgresql?
v
The trace above seems to suggest it also happens with
WriteRelationships
. We know there are some issues with cancelation of SQL connections, so I wonder if your connection pool is starved for 16 minutes, which seems excessive but I can't think of the service being blocked anywhere else. Can you check the events on the top most
v1.PermissionsService/WriteRelationships
span? You should see something like shown in the snapshot. It should tell you where exactly it is waiting. If it's after
read write transaction
and before
preconditions
, then we know it's likely related to connection pools. I suggest you look into the connection pool metrics, should give you a lot of insight. https://cdn.discordapp.com/attachments/1334511826989350923/1335917941748666378/image.png?ex=67a1ea09&is=67a09889&hm=36976fde8305d40fd877d640fe463e2410a430e913f5aa82886cfc579998d05a& https://cdn.discordapp.com/attachments/1334511826989350923/1335917942008844369/image.png?ex=67a1ea09&is=67a09889&hm=bc968a72918b1de462430159755766b72ba973d8ee8c8e04af571e96775d6839&
We solved this problem. It was the firewall dropping connections.
v
oh heh, good to hear!