Hi team, I'm trying to understand how to
# spicedb
r
Hi team, I'm trying to understand how to achieve a good balance between performance and maintainability/flexibility for my use case. My app is essentially a data app -- at its core, it's essentially, conceptually, a huge table -- and our customers want to be able to query that data. My customers want to be able to grant/revoke permissions on a per-user basis for each data point in two ways: - by the type of row (example: UserA is allowed to see rows with type
Amazon orders
but UserB is not) - by the field (example: UserA is allowed to see data belonging to a field called
Profit
but UserB is not) Row types are practically unlimited and set by the users, but fields are limited and set by the system. Row types and fields are essentially orthogonal, so any row type can have data in any field. So let's say I model this with something like:
Copy code
definition row_type {
    relation viewer: user
}

definition field {
    relation viewer: user
}

definition data_point {
    relation type: row_type
    relation field: field

    permission view = type->viewer & field->viewer
}
This is highly simplified, but conceptually it maps onto what I'm doing. It's also very easy to understand, even when the added complexity is introduced (fields and row_types are recursive, users can be in groups, etc. etc.), which is lovely. But if this is my schema, I'm going to have to register a relationship to both a row_type and a field for every single data point that I pull into my data lake, which is like in the 10e7-8 per day, let's say. The compute involved is of course one thing, but also my information about auth relationships is going to be at least as big as my data lake itself, right?. This isn't ideal, right?
So now if I think of adding a layer of abstraction like
view
, I get something like this:
Copy code
definition view {
    relation row_type: row_type
    relation field: field
 
    permission view = row_type->viewer & field->viewer
}
but of course the problem there is that I need to have permissions to view all the row types and all the fields that are connected to that view, not just one. Which I could implement at the application layer, but then why am I using SpiceDB? There's got to be a way... who wants to help me brainstorm?
y
i'd be curious to hear what the devs say, but my assumption is that your first inclination is probably a good place to start
spicedb is designed to deal with massive throughput and datasets
> my information about auth relationships is going to be at least as big as my data lake itself, right do you mean just in the sense of row cardinality, or do you mean in terms of storage space?
r
I guess both, although I'm less concerned about the actual space it takes and more about the fact that it's just a massive set of rows
y
yeah... your schema is also very simple in that case, which i think would make for pretty quick lookups even with a large dataset though take this with a grain of salt - i wouldn't call myself an expert in performance for either postgres or spicedb
also it sounds like your app is very tabular, which would mean lots of bulk/lookup operations, and those could get heavy 🤔
r
It's also possibly a problem for us that we're using BigQuery, so we don't actually own the individual data points at the level of application logic... when data workflows output data in our tables, we can't make a call to SpiceDB for each cell. But there must be some way to do it with a hook... We're also exploring using BQ's native authorization APIs, but it's not as granular as we'd like it to be