👋 Hello yall, how're things going? SpiceDB #spicedb

👋 Hello yall, how're things going?

pepegar

05/27/2025, 9:10 AM

👋 Hello yall, how're things going? I'm seeing some strange behaviour in our SpiceDB that I wouldn't know how to debug. On sending a request like:

Copy code

zed permission lookup-resources document can_comment user:$uuid --page-limit 100

(can comment is a permission backed by a relation that allows for user:*, just in case it makes any difference) We get a flamegraph that looks like this (see attached). The strange thing is that the sql queries occur during the 1st ~10ms of the request, but then the request hangs for 30s, until it times out on server side. Do you know what can be causing this? should we just discourage the use of lookup resources for wildcard-backed relations? https://cdn.discordapp.com/attachments/844600078948630559/1376849986460979313/CleanShot_2025-05-27_at_10.08.06.png?ex=6836d2f9&is=68358179&hm=d7e08c9480a952fb7138c84293967c4f0e5026cece0c7090bdc934ba636fa971&

pepegar

05/27/2025, 9:11 AM

the only error we see from SpiceDB's side is a

context canceled

, btw

yetitwo

05/27/2025, 2:32 PM

that's unexpected. is it a particularly deep/wide schema? are you using gRPC?

yetitwo

05/27/2025, 2:33 PM

context canceled

isn't necessarily a problem - the dispatch service will show that in normal operation

Joey

05/27/2025, 2:41 PM

@pepegar are you using some form of networking layer like Istio between your SpiceDB pods?

gsimas

05/28/2025, 10:15 PM

Btw, we're also seeing this behavior on our side (was coming to the discord to ask a similar question). It doesn't reach 30secs in our case but still a lot of discrepancy between sql queries span duration and dispatch service. We are using grpc and I don't believe we are using anything like Istio. We have ous spicedb pods running on EKS but we're not using the SpiceDB Operator, in case that makes a difference

yetitwo

05/28/2025, 10:29 PM

what are you using for ingress?

gsimas

05/28/2025, 11:02 PM

Kong on EKS

gsimas

05/28/2025, 11:11 PM

Actually, that's not accurate. Our services point to the kubernetes service directly. So no ingresses apply here? We do have coredns on the cluster

yetitwo

05/29/2025, 12:55 AM

the operator would mean that spicedb nodes in the cluster are able to dispatch to each other

yetitwo

05/29/2025, 12:55 AM

but i'm not sure how it would manifest as the behavior you're seeing 🤔

gsimas

05/29/2025, 3:09 PM

What would you guys consider a "deep/wide schema"? We are in a stage of migrating from JWT RBAC to Authzed where we opted to list all scopes (roughly all of our rest API resources) as permissions on a single definition that contains all user relationships to a single "root" resource, where each relationship corresponds to a role. This does lead to a pretty lengthy definition (100 relations and ~1000 permissions). Could this be impacting authzed's ability to walk the graph efficiently?

gsimas

05/29/2025, 3:09 PM

Also, @pepegar sorry for hijacking the thread. Do let me know if you'd prefer I move this to another one!

yetitwo

05/29/2025, 5:38 PM

"deep" would mean lots of edges that have to be walked in order to answer a permission computation

yetitwo

05/29/2025, 5:38 PM

i'd say 10 layers would be "deep"

yetitwo

05/29/2025, 5:39 PM

"wide" would be a single subject that has lots of relations going away from it or a single object that has lots of relations going to it

yetitwo

05/29/2025, 5:39 PM

and i'd say 1-10k relations is what i'd call "wide"

gsimas

05/29/2025, 5:56 PM

Got it. That's not our case then

gsimas

05/29/2025, 5:57 PM

We're still on v1.28.0, I can try upgrading to latest and see if there's any impact. Would appreciate any other possible paths forward.

Joey

05/29/2025, 6:47 PM

yeah... you should see some major impacts on performance

Joey

05/29/2025, 6:47 PM

especially if you're on CRDB or Postgres (for Postgres, enable the hint extension to get the full benefit)

gsimas

05/29/2025, 7:56 PM

Great! Upgrading as we speak. Will follow up

pepegar

05/30/2025, 9:09 AM

Hey folks, sorry I´ve been AFK for some time 🙂 - @yetitwo It's not really a deep schema,

permission can_comment = owner + delete + editor + comment + parent->can_comment + can_participate

(we don't have any

parent

data in the DB, that's a feature that hasn't landed yet, but the schema is prepared). By the measures mentioned by you before (10 levels deep/10k out/in-coming conns) the schema is neither deep or wide. - @yetitwo yes, we're using GRPC to talk with SpiceDB, configuring the client as follows:

Copy code

kotlin
  PermissionsServiceGrpcKt.PermissionsServiceCoroutineStub(
    ManagedChannelBuilder
        .forTarget(headlessServiceUrl)
        .usePlaintext()
        .defaultLoadBalancingPolicy("round_robin")
        .build()
  )

And the server with:

Copy code

typescript
{
  name: "SPICEDB_DATASTORE_CONNECTION_BALANCING",
  value: "true",
},
{
  name: "SPICEDB_ENABLE_EXPERIMENTAL_WATCHABLE_SCHEMA_CACHE",
  value: "true",
},
{
  name: "SPICEDB_DISPATCH_CLUSTER_MAX_CONN_AGE",
  value: "15s",
},

- @Joey We use internal k8s networking, and we've configured the k8s service as headless to do load balancing on client side.

pepegar

05/30/2025, 9:10 AM

No problem 🙂

9 Views

Previous Next