๐Ÿ‘‹ Hello yall, how're things going?
# spicedb
p
๐Ÿ‘‹ Hello yall, how're things going? I'm seeing some strange behaviour in our SpiceDB that I wouldn't know how to debug. On sending a request like:
Copy code
zed permission lookup-resources document can_comment user:$uuid --page-limit 100
(can comment is a permission backed by a relation that allows for user:*, just in case it makes any difference) We get a flamegraph that looks like this (see attached). The strange thing is that the sql queries occur during the 1st ~10ms of the request, but then the request hangs for 30s, until it times out on server side. Do you know what can be causing this? should we just discourage the use of lookup resources for wildcard-backed relations? https://cdn.discordapp.com/attachments/844600078948630559/1376849986460979313/CleanShot_2025-05-27_at_10.08.06.png?ex=6836d2f9&is=68358179&hm=d7e08c9480a952fb7138c84293967c4f0e5026cece0c7090bdc934ba636fa971&
the only error we see from SpiceDB's side is a
context canceled
, btw
y
that's unexpected. is it a particularly deep/wide schema? are you using gRPC?
context canceled
isn't necessarily a problem - the dispatch service will show that in normal operation
j
@pepegar are you using some form of networking layer like Istio between your SpiceDB pods?
g
Btw, we're also seeing this behavior on our side (was coming to the discord to ask a similar question). It doesn't reach 30secs in our case but still a lot of discrepancy between sql queries span duration and dispatch service. We are using grpc and I don't believe we are using anything like Istio. We have ous spicedb pods running on EKS but we're not using the SpiceDB Operator, in case that makes a difference
y
what are you using for ingress?
g
Kong on EKS
Actually, that's not accurate. Our services point to the kubernetes service directly. So no ingresses apply here? We do have coredns on the cluster
y
the operator would mean that spicedb nodes in the cluster are able to dispatch to each other
but i'm not sure how it would manifest as the behavior you're seeing ๐Ÿค”
g
What would you guys consider a "deep/wide schema"? We are in a stage of migrating from JWT RBAC to Authzed where we opted to list all scopes (roughly all of our rest API resources) as permissions on a single definition that contains all user relationships to a single "root" resource, where each relationship corresponds to a role. This does lead to a pretty lengthy definition (100 relations and ~1000 permissions). Could this be impacting authzed's ability to walk the graph efficiently?
Also, @pepegar sorry for hijacking the thread. Do let me know if you'd prefer I move this to another one!
y
"deep" would mean lots of edges that have to be walked in order to answer a permission computation
i'd say 10 layers would be "deep"
"wide" would be a single subject that has lots of relations going away from it or a single object that has lots of relations going to it
and i'd say 1-10k relations is what i'd call "wide"
g
Got it. That's not our case then
We're still on v1.28.0, I can try upgrading to latest and see if there's any impact. Would appreciate any other possible paths forward.
j
yeah... you should see some major impacts on performance
especially if you're on CRDB or Postgres (for Postgres, enable the hint extension to get the full benefit)
g
Great! Upgrading as we speak. Will follow up
p
Hey folks, sorry Iยดve been AFK for some time ๐Ÿ™‚ - @yetitwo It's not really a deep schema,
permission can_comment = owner + delete + editor + comment + parent->can_comment + can_participate
(we don't have any
parent
data in the DB, that's a feature that hasn't landed yet, but the schema is prepared). By the measures mentioned by you before (10 levels deep/10k out/in-coming conns) the schema is neither deep or wide. - @yetitwo yes, we're using GRPC to talk with SpiceDB, configuring the client as follows:
Copy code
kotlin
  PermissionsServiceGrpcKt.PermissionsServiceCoroutineStub(
    ManagedChannelBuilder
        .forTarget(headlessServiceUrl)
        .usePlaintext()
        .defaultLoadBalancingPolicy("round_robin")
        .build()
  )
And the server with:
Copy code
typescript
{
  name: "SPICEDB_DATASTORE_CONNECTION_BALANCING",
  value: "true",
},
{
  name: "SPICEDB_ENABLE_EXPERIMENTAL_WATCHABLE_SCHEMA_CACHE",
  value: "true",
},
{
  name: "SPICEDB_DISPATCH_CLUSTER_MAX_CONN_AGE",
  value: "15s",
},
- @Joey We use internal k8s networking, and we've configured the k8s service as headless to do load balancing on client side.
No problem ๐Ÿ™‚
4 Views