Hey we're looking into reducing latency SpiceDB #spicedb

Hey we're looking into reducing latency

tokongs

02/18/2025, 7:40 AM

Hey we're looking into reducing latency in our SpiceDB cluster. Looking at this trace what is it reasonable for the

DispatchCheck

to "hang" like it does? As far as I understand that's a gRPC call back to the cluster itself. Does it point to some networking issue? Or is it doing more than one might suspect from the span name? https://cdn.discordapp.com/attachments/844600078948630559/1341313337715200030/image.png?ex=67b58ae4&is=67b43964&hm=90f020200fa08b7a6f4b155059fcf8c501a4975d1df00616a3041b0afcc992d7&

vroldanbet

02/18/2025, 10:35 AM

That seems network related, the DispatchCheck gRPC call to another pod is taking very long. Are you using sidecars?

tokongs

02/18/2025, 11:07 AM

Yep. We're using LinkerD

vroldanbet

02/18/2025, 11:29 AM

Then that's almost certainly it. We've had other folks report similar issues when using a service mesh, and the issues dissapear when it's removed.

vroldanbet

02/18/2025, 11:30 AM

SpiceDB does not need a service mesh - it knows how to discover its peers and talk to them directly. Adding a service mesh is unnecessary here, unless you have very specific requirements over transport and cluster topology.

tokongs

02/18/2025, 11:32 AM

Okay, thanks! We'll try to disable linkerd and see what happens!

vroldanbet

02/24/2025, 10:19 AM

did removing linkerd from the equation help?

tokongs

02/26/2025, 8:12 AM

I at least haven't observed spans like the above since removing it. Thanks for checking in! Still struggling with high(>250ms) request times in LookupResource, but I assume that's to be expected if you have more than a handful of relations/resources?

vroldanbet

02/26/2025, 10:15 AM

yeah LR can be slow, depending on your schema and data shape. One knob you can play with is

--dispatch-chunk-size

. If you have wide relations (relations with many relationships) it helps accelerate the dispatching by shoving more resource IDs into a single dispatch. You should also definitely look into potential signs of contention in the connection pool as LR calls cause fan-out in the number of queries, and tunning your configured staleness to increase the odds of a cache hit

9 Views

Previous Next