LookupResources performances SpiceDB #spicedb

LookupResources performances

williamdclt

05/15/2024, 8:54 AM

👋 we've been digging into "how to compute all permissions for a user". We've tried 2 approaches: - Do a

lookupResources

for every permission (~50 permissions) - Do a

bulkCheck

for every permission and every resource (~50 permissions, ~1000 resources, ~50k checks total) I've been surprised to find that: -

lookupResources

completely overwhelms the database, the pods CPU and the pods memory even without any parallelism (1 lookup at a time) -

bulkCheck

actually performs somewhat decently (it's too slow for our use-case, but that's probably because having 50k permissions is a fundamentally flawed idea) What I don't understand is: how can bulk-checking every single permission on every single resource possibly be more efficient than a

lookupResource

? This massive bulk-check seems like the naive approach to implement

lookupResources

, so lookup should work at least as well as bulk-checking (probably much better) I'm happy to provide more hard data, but does the assumption sound correct or am I missing something?

williamdclt

05/15/2024, 10:23 AM

We are on 1.30.1 btw

vroldanbet

05/15/2024, 10:37 AM

bulk check is pretty good at checking the same permission over multiple resources, because it batches it. So a bulk request of

resource:1...1000 view user:1

leads to batching SQL queries that need to check resources, so in the best case it's pretty efficient.

LookupResources

is a different monster. It needs to traverse the reachability graph for the user and permission, and depending on the existence of intersections and exclusions, it may have to issue check permissions. To support cursors, there is a bunch of metadata that needs to be tracked by the spicedb process and dispatched back and forth. I'm sure there are opportunities to optimize the later. I'd suggest looking into the RDS analytics to see what are the queries taking place, to rule out any missing indexes at least.

williamdclt

05/15/2024, 11:34 AM

I'm probably being naive here, but why is traversing the reachability graph such a problem? If I take the following schema:

Copy code

definition Document {
  relation org: Organisation
  relation viewers: User

  permission view = org->user & viewers
}

definition User {}

definition Organisation {
  user: User
}

Then I'd expect

lookup document view user:1

to do something like: - SELECT the orgs that user:1 is a member of - SELECT the documents that are viewable by those orgs - SELECT which documents

user:1

is a

viewer

of There's an intersection in the permission, but it doesn't seem like dispatching `check`s is needed? Again I'm sure I'm being naive. Possibly I'm using knowledge of the schema which is difficult for SpiceDB to get (eg low cardinality of Organisation, high cardinality of Document, or the fact that Organisation is alike to a tenancy boundary), but I'd be keen to understand that better: maybe I can redesign my schema to make these lookups efficient?

williamdclt

05/15/2024, 11:35 AM

> I'd suggest looking into the RDS analytics to see what are the queries taking place, to rule out any missing indexes at least. I'll have a look at that. But the high CPU and RAM usage of the pods would still be a problem

Joey

05/15/2024, 3:33 PM

(will respond in a bit to explain)

Joey

05/15/2024, 3:51 PM

LR, as it stands, will do the following: - select the organizations the user is a member of - then walk to Document - see the intersection, mark the walk as needing a check - keep walking until the permission requested is found - since a check was marked as necessary, issue a check for that permission and the user

Joey

05/15/2024, 3:51 PM

in the above case, because we don't yet have stats, we pick the far left-most branch of the intersection for the walk

williamdclt

05/15/2024, 4:20 PM

I can't shake the idea that the very concept of needing a check is weird: LR is fundamentally a graph resolution problem, something that graph-focused technologies like Neo4J, RDF triplestores and datalog thingies can handle fine without (AFAIK) an extra check-like step

Joey

05/15/2024, 4:21 PM

the problem is how to "wait" for the other branch

Joey

05/15/2024, 4:21 PM

one branch could have 3 steps, the other 30

Joey

05/15/2024, 4:21 PM

since we start walking from the subject->permission

Joey

05/15/2024, 4:21 PM

we can't "wait" for the other branches to come in

Joey

05/15/2024, 4:22 PM

at least not without a very complicated dispatching mechanism

Joey

05/15/2024, 4:22 PM

checking at the root is not optimal and we know that

Joey

05/15/2024, 4:22 PM

for one, it redoes the walk we just did

williamdclt

05/15/2024, 5:25 PM

I'm struggling to wrap my head around this "waiting" problem, without the mental model of how things are implemented. But again, regardless of SpiceDB's current implementation, isn't this a fundamentally solved problem? Neo4j does it, something like (not a Cypher expert):

Copy code

cypher
MATCH (u:User WHERE id:1) -[:viewer]-> (d:Document),
      (d) -[:org]-> (o:Organisation) -[:user]-> (u)
RETURN d.id

RDF triplestores can do it efficiently too, and there's cypher-to-sql engines as well as SPARQL-to-SQL

williamdclt

05/15/2024, 5:26 PM

I suppose it's a problem you solved differently with Materialized? But that's not something available in SpiceDB sadly (any plan? 👀 )

Joey

05/15/2024, 5:30 PM

Materialize actively updates its store; its not running queries

Joey

05/15/2024, 5:30 PM

and it will eventually be available on prem

williamdclt

05/15/2024, 5:31 PM

> and it will eventually be available on prem that's exciting!

674 Views

Previous Next