Hey we're looking into reducing latency
# spicedb
t
Hey we're looking into reducing latency in our SpiceDB cluster. Looking at this trace what is it reasonable for the
DispatchCheck
to "hang" like it does? As far as I understand that's a gRPC call back to the cluster itself. Does it point to some networking issue? Or is it doing more than one might suspect from the span name? https://cdn.discordapp.com/attachments/844600078948630559/1341313337715200030/image.png?ex=67b58ae4&is=67b43964&hm=90f020200fa08b7a6f4b155059fcf8c501a4975d1df00616a3041b0afcc992d7&
v
That seems network related, the DispatchCheck gRPC call to another pod is taking very long. Are you using sidecars?
t
Yep. We're using LinkerD
v
Then that's almost certainly it. We've had other folks report similar issues when using a service mesh, and the issues dissapear when it's removed.
SpiceDB does not need a service mesh - it knows how to discover its peers and talk to them directly. Adding a service mesh is unnecessary here, unless you have very specific requirements over transport and cluster topology.
t
Okay, thanks! We'll try to disable linkerd and see what happens!
v
did removing linkerd from the equation help?
t
I at least haven't observed spans like the above since removing it. Thanks for checking in! Still struggling with high(>250ms) request times in LookupResource, but I assume that's to be expected if you have more than a handful of relations/resources?
v
yeah LR can be slow, depending on your schema and data shape. One knob you can play with is
--dispatch-chunk-size
. If you have wide relations (relations with many relationships) it helps accelerate the dispatching by shoving more resource IDs into a single dispatch. You should also definitely look into potential signs of contention in the connection pool as LR calls cause fan-out in the number of queries, and tunning your configured staleness to increase the odds of a cache hit
3 Views