Hi SpiceDB #spicedb

CasperT

01/30/2025, 1:13 PM

Hi Sometimes we see som very long grpc.time_ms. In this case 16 minutes? {"level":"info","traceID":"93d6aac56306fc4858e366e4a7bac301","protocol":"grpc","grpc.component":"server","grpc.service":"authzed.api.v1.PermissionsService","grpc.method":"WriteRelationships","grpc.method_type":"unary","requestID":"cudn2b75pq3c73fk49sg","peer.address":"10.42.5.93:53736","grpc.start_time":"2025-01-30T12:31:40Z","grpc.code":"OK","grpc.time_ms":950686,"time":"2025-01-30T12:47:30Z","message":"finished call"} We are running v1.31.0, in Kubernetes, and are using Postgresql. We have OTL enabled, so we can see the trace. Efter 16 minutes the transaction continues as normal? https://cdn.discordapp.com/attachments/844600078948630559/1334511826762989628/8PKAx5Cf7C0YAAAAASUVORK5CYII.png?ex=679ccc7d&is=679b7afd&hm=aef5c8475f9c47ebde27585fa49c13b6ec93d598d72ec969aef1e4a7665a9394&

vroldanbet

01/30/2025, 1:42 PM

I wouldn't think that's possible because there are server side deadlines. So it feels like it's blocked somewhere else

vroldanbet

01/30/2025, 1:42 PM

is this a gRPC call or an HTTP API call?

CasperT

01/30/2025, 1:43 PM

This is a gRPC call

vroldanbet

01/30/2025, 2:36 PM

did a database restart happen during that timeframe?

CasperT

01/30/2025, 2:42 PM

No.

jzelinskie

01/30/2025, 4:15 PM

Is the postgres saturated on the number of open connections? It could've been stuck waiting for a connection based on those events

CasperT

01/31/2025, 9:04 AM

We cannot see any problems with the connections. But another observation is, it always happens with CheckPermission or Dispatch?

CasperT

02/03/2025, 9:48 AM

We can see from the TCP dump, the connection is established correctly. So would could be happening the next 16 minutes, until SpiceDB start sending the BEGIN to Postgresql?

vroldanbet

02/03/2025, 10:21 AM

The trace above seems to suggest it also happens with

WriteRelationships

. We know there are some issues with cancelation of SQL connections, so I wonder if your connection pool is starved for 16 minutes, which seems excessive but I can't think of the service being blocked anywhere else. Can you check the events on the top most

v1.PermissionsService/WriteRelationships

span? You should see something like shown in the snapshot. It should tell you where exactly it is waiting. If it's after

read write transaction

and before

preconditions

, then we know it's likely related to connection pools. I suggest you look into the connection pool metrics, should give you a lot of insight. https://cdn.discordapp.com/attachments/1334511826989350923/1335917941748666378/image.png?ex=67a1ea09&is=67a09889&hm=36976fde8305d40fd877d640fe463e2410a430e913f5aa82886cfc579998d05a& https://cdn.discordapp.com/attachments/1334511826989350923/1335917942008844369/image.png?ex=67a1ea09&is=67a09889&hm=bc968a72918b1de462430159755766b72ba973d8ee8c8e04af571e96775d6839&

CasperT

02/04/2025, 11:44 AM

The connection pools looks ok. https://cdn.discordapp.com/attachments/1334511826989350923/1336301382176735272/TqE6gGWqr4wAAAABJRU5ErkJggg.png?ex=67a34f24&is=67a1fda4&hm=e6c6f795d3f31c9fbc327b1dd0c1bf26a7a923d20ed704f5de2aa1488efe8a0c&

CasperT

02/27/2025, 9:56 AM

We solved this problem. It was the firewall dropping connections.

vroldanbet

02/27/2025, 10:20 AM

oh heh, good to hear!

Previous Next