LookupResources performances
# spicedb
w
đź‘‹ we've been digging into "how to compute all permissions for a user". We've tried 2 approaches: - Do a
lookupResources
for every permission (~50 permissions) - Do a
bulkCheck
for every permission and every resource (~50 permissions, ~1000 resources, ~50k checks total) I've been surprised to find that: -
lookupResources
completely overwhelms the database, the pods CPU and the pods memory even without any parallelism (1 lookup at a time) -
bulkCheck
actually performs somewhat decently (it's too slow for our use-case, but that's probably because having 50k permissions is a fundamentally flawed idea) What I don't understand is: how can bulk-checking every single permission on every single resource possibly be more efficient than a
lookupResource
? This massive bulk-check seems like the naive approach to implement
lookupResources
, so lookup should work at least as well as bulk-checking (probably much better) I'm happy to provide more hard data, but does the assumption sound correct or am I missing something?
We are on 1.30.1 btw
v
bulk check is pretty good at checking the same permission over multiple resources, because it batches it. So a bulk request of
resource:1...1000 view user:1
leads to batching SQL queries that need to check resources, so in the best case it's pretty efficient.
LookupResources
is a different monster. It needs to traverse the reachability graph for the user and permission, and depending on the existence of intersections and exclusions, it may have to issue check permissions. To support cursors, there is a bunch of metadata that needs to be tracked by the spicedb process and dispatched back and forth. I'm sure there are opportunities to optimize the later. I'd suggest looking into the RDS analytics to see what are the queries taking place, to rule out any missing indexes at least.
w
I'm probably being naive here, but why is traversing the reachability graph such a problem? If I take the following schema:
Copy code
definition Document {
  relation org: Organisation
  relation viewers: User

  permission view = org->user & viewers
}

definition User {}

definition Organisation {
  user: User
}
Then I'd expect
lookup document view user:1
to do something like: - SELECT the orgs that user:1 is a member of - SELECT the documents that are viewable by those orgs - SELECT which documents
user:1
is a
viewer
of There's an intersection in the permission, but it doesn't seem like dispatching `check`s is needed? Again I'm sure I'm being naive. Possibly I'm using knowledge of the schema which is difficult for SpiceDB to get (eg low cardinality of Organisation, high cardinality of Document, or the fact that Organisation is alike to a tenancy boundary), but I'd be keen to understand that better: maybe I can redesign my schema to make these lookups efficient?
> I'd suggest looking into the RDS analytics to see what are the queries taking place, to rule out any missing indexes at least. I'll have a look at that. But the high CPU and RAM usage of the pods would still be a problem
j
(will respond in a bit to explain)
LR, as it stands, will do the following: - select the organizations the user is a member of - then walk to Document - see the intersection, mark the walk as needing a check - keep walking until the permission requested is found - since a check was marked as necessary, issue a check for that permission and the user
in the above case, because we don't yet have stats, we pick the far left-most branch of the intersection for the walk
w
I can't shake the idea that the very concept of needing a check is weird: LR is fundamentally a graph resolution problem, something that graph-focused technologies like Neo4J, RDF triplestores and datalog thingies can handle fine without (AFAIK) an extra check-like step
j
the problem is how to "wait" for the other branch
one branch could have 3 steps, the other 30
since we start walking from the subject->permission
we can't "wait" for the other branches to come in
at least not without a very complicated dispatching mechanism
checking at the root is not optimal and we know that
for one, it redoes the walk we just did
w
I'm struggling to wrap my head around this "waiting" problem, without the mental model of how things are implemented. But again, regardless of SpiceDB's current implementation, isn't this a fundamentally solved problem? Neo4j does it, something like (not a Cypher expert):
Copy code
cypher
MATCH (u:User WHERE id:1) -[:viewer]-> (d:Document),
      (d) -[:org]-> (o:Organisation) -[:user]-> (u)
RETURN d.id
RDF triplestores can do it efficiently too, and there's cypher-to-sql engines as well as SPARQL-to-SQL
I suppose it's a problem you solved differently with Materialized? But that's not something available in SpiceDB sadly (any plan? 👀 )
j
Materialize actively updates its store; its not running queries
and it will eventually be available on prem
w
> and it will eventually be available on prem that's exciting!