acl filtering
# spicedb
t
Hey everyone, Does anyone have an example of integrating SpiceDB with an application database so queries only load the authorized data? Working with single objects (read and write) seems relatively straight forward as you validate if the idenity is allowed to interact with the specific object. But when you start looking at lists of objects it gets hard I feel like I’ve missed something in my thinking on how to integrate this, so far I have this flow imagine a recipe database, to list all recipes an idenity has read access to we do: 1. Request list of all recipes (id) the identity has access to from SpiceDB 2. query the database with
SELECT * FROM recipes WHERE id IN (<list of ids from SpiceDB>)
3. Do stuff with data This seems like an incredibly slow solution for larger more complicated queries, and also really awkward to maintain in source code, so I feel like I’m missing something fundamental here, anyone have a better approach?
e
There are a few ways to do this, the one you've described we usually call "pre-filtering" because you're getting the authorized list up front and passing it to your database. This can be a good choice when the absolute number of authorized objects is fairly small - if it gets too large, it will start taking longer and longer to get the full list of authorized ids up front. You can also post-filter: query for the recipes you're looking for and then bulk-check the resulting ids in spicedb. This can work well as long as there's a decent distribution of authorized results in the full set - i.e. you don't want to have to page through 1million recipes just to get to the 1million+1th that is actually authorized. One of these two can work in most common cases because usually you can have some other limiting factor to reduce the set you're working with (i.e. the number of items you can reasonable display on a screen, or you can filter the set down ahead of time by authorizing some other object like an
org
). The times when they don't work as well is when you need complex searching and filtering over large sets. We built Materialize for that: https://authzed.com/blog/materialize-early-access - Materialize has an API that sends lists of authorized objects that you can ingest back into your own datastore to join against and filter by.
t
Thanks for that @ecordell , that's really helpful The example is a bit contrived, but what I'm trying to do is find a performant way to get a list of all the entities of type X that identity Y has access to. There is a level or granularity to consider here, you could e.g. share a recipe with a different identity, but not an ingredient, as that seems silly I was mostly worried about db performance, but it's quite a complicated query on the spicedb side as well. I hope SpiceDB can handle it hehe I think pre-filter is probably the way forward, I don't think loading millions of recipes into the app to filter it down is the right approach, the db is designed with this work in mind. Materialize looks interesting though. Do you have any more details on how that works? I saw it integrates with PostgreSQL, does it generate PostgreSQL Row Level Security policies? That would be really interesting, and could potentially open the door for using electic-SQL and local first
m
FWIW, at my org we found performance of pre-filtering to be untenable (given the size of the set of resources a user can have access to is quite large and the schema has a few intersections) but post-filtering has been very performant given a strategy of running our data query to select only ids, pass to SpiceDB bulk check permissions, then plug back into the same original query to select all fields needed to serialize the objects in the app. I mostly mention that in case you were picturing materializing objects in your app and then filtering down instead of post-filtering still potentially being prior to materialization of your app’s in-memory resources.
t
That's really interesting, and incredibly counter intuitive to me What scale are you working with? millions of objects? It can't be free loading the data and transforming it to the bulk format expected by SpiceDB What's the process you go through? 1.
select id from recipes
2. transform data to SpiceDB format (in app) 3. Bulk check against SpiceDB (in app) 4. filter out unauthorized objects (in app) 5.
select * from recipes where id IN (<object ids>)
6. do stuff with data
e
Can you explain what's counter-intuitive about it? The process you wrote out sounds right to me, though you probably want to filter as much as you can in (1) with e.g.
select id from recipes where description like 'potato'
, etc. And it does have the problem that if authorized resources are sparse compared to all resources, it may take time to page through all of those requests.
> Materialize looks interesting though. Do you have any more details on how that works? I think we're planning to get the docs up on the doc site soon™️ - an example would make it clear I think, but basically Materialize watches for changes to relationships in SpiceDB and proactively computes the permissions defined by the schema. There's two ways it can be consumed: either through accelerated queries (i.e. Checks and Lookups do no computation, they just read out pre-computed data, so they're very very fast), or through a WatchPermissionSets API. The WatchPermissionSets API serves pairs of data like
(user:evan, 123789), (user:tom, 456123)
and
(123789, recipe:mushroom_soup#view), (456123, recipe:salad#view)
- the ids are internal materialize ids. You subscribe to specific permissions that you want to filter by and store those pairs in your database, and you join those two sets wherever the internal id is the same to find out what permissions you have - i.e. in this case,
user:evan
can view
recipe:mushroom_soup
and
user:tom
can view
recipe:salad
. You can join against those tables in your own queries to pre-filter without hitting SpiceDB at all. There will be some lag between when writes happen in SpiceDB and when the results pop up in Materialize (typically a second or two, but it depends on the schema and type of change). If your application requires it (i.e. very sensitive to new-enemies, probably not a recipe application), you can be extra safe by also post-filtering via the BulkCheck apis, which unlike before won't have the risk of being sparse.
so it would be like
select * from recipe join materialize where materialize.user = evan and materlialize.permission = view and materialize.recipe_id = recipe.id
to get the filtered list
SpiceDB is still the source of truth so you get the benefits of separating your permissions but still the performance of local joins, and if you have microservices they can just subscribe to the permissions they care about. Materialize does the work of unwinding deep and nested queries so you're not doing a complicated recursive sql query to authorize (i.e. like you might end up with if you were implementing your own auth in SQL), just a simple join.
>I saw it integrates with PostgreSQL, does it generate PostgreSQL Row Level Security policies? That would be really interesting, and could potentially open the door for using electic-SQL and local first There's no integration for RLS right now, but that's a super interesting idea. It would require your users to map to pg users, but you could imagine the above example but instead of joining, you just change the connected user.
t
The performance characteristics was the counter-intuitive part. I'm surprised to see that loading all the IDs from the DB into the app and then shipping it across to spiceDB to process is more performant than using the lookup resources API. If you're using UUIDs, for a million rows in sql that's 36MB of data for just the IDs that you're shipping over the wire. Twice if you count from DB to app, and from app to SpiceDB. And that doesn't account for any of the wrapping json
Looking forward to the docs updating and getting to play with materialize
e
The difference is that it just happens once, and then only the changes are synced down. You're not pulling 1m rows any time you need to make a query; materialize keeps the dataset up to date for you
t
oh, I think I missed some semantics here. I thought Matt was speaking to using post filtering without Materialize, which sounded counter intuitive to me The process described using Materialize makes a lot of sense to me and I'm looking forward to seeing how that works in detail
m
I was indeed referring to post filtering without materialize. We are operating on the scale of a few million records in at least one case; I think it intuitively breaks down like this: If you have a large number of records and many of them are going to get excluded by SpiceDB, then a lookup call must check every record for inclusion only to end up choosing a few of them. If the sql you run will result in a substantial number fewer records (fewer than the total number of records SpiceDB knows about total across all users) then a bulk check on only that reduced set results in many fewer checks within SpiceDB. Either way caching of subproblems comes into play, but we have intersections in our schema that also make caching a bit less helpful for lookup requests so there’s just that much less work being done by SpiceDB (and less data being sent across the wire too). If instead you actually do expect the user to have access to a large portion of all records or you expect your API to return millions of records instead of that just being the number you are starting with before filtering then perf might be quite different for your use case than for ours.
154 Views