Caveats or Resources, how to decide?
# spicedb
v
I'm trying to represent users, groups, and resources over - DIDs (users) - AtURI (content-addressable resource ids:
<did>/<space>/[nsid]/<rkey>
) [full schema](https://github.com/blebbit/atproto/blob/main/packages/pds/src/authz/spicedb/schema/atproto.zed) I have a resource type for each segment of the segments, with a parenting setup to support nesting/hierarchy, and each AtUri has an associate data record tied to it, except for
nsid
. They are only for structuring records and granting permissions, in OAuth scopes today, and content/methods tbd, that's what I'm working on. NSID are defined by any DID, it's a very dynamic list. There are
nsid
for queries and procedures that will never have records, but we still want to put permissions over them. Certain
nsid
are expected to have high numbers of records and storing the parenting relationship in spice, which seems inefficient? Are caveats something that can help me here? What if those caveats are large sets? The new way to specify OAuth scopes over the dynamic NSID is as a permission set https://github.com/bluesky-social/proposals/blob/main/0011-auth-scopes/README.md#permission-sets I image we need something similar for the custom roles that we want in the content permission system, and then have to reflect those within spice by making a number of calls, or a bulk input? (maybe one day even replace the OAuth permission setup and unify the two... #futurology)
y
my general advice is that if something can be represented and stored in relationships, it should. relations are going to cache better and (generally) evaluate faster than equivalent logic expressed in caveats. what's the concern with inefficiency here?
v
Here's where I got to, decided to try caveats, I like them https://bsky.app/profile/verdverm.com/post/3lycq2yxhzs26
y
right on
v
Is a schema that is 500+ lines pretty typical for real / large systems?
y
yeah, that's been our experience. we've had to bump the max allowed size of a schema in validation a couple of times to reflect this: https://github.com/authzed/api/pull/77
v
Another question 1. I have a token (subject) with read/write permissions generally 2. I want to limit the permission graph based on a context, read-only is a simple example, one could imagine it could get more complex, effectively another custom role 3. What if I don't know the permission-subgraph ahead of time? (i.e. it's driven by external data / context, user defined custom roles) 4. Part of this is coming from Capability Based Authorization, and delegation of capabilities How should I model this? If I simplify to custom roles (less-dynamic), maybe I can model context limitations and delegation with a caveat on the roles access? I suppose with two inputs (user says def use this context, env is already restricted, we take the middle of the venn)... perhaps this is where more of the algebra comes in to spicedb?
I still have to wrap my head around banned users and other identities, how to bring these two together. Is there a way to be more succinct with the crud relations, and then again with the negations they need?
Copy code
definition superuser {} // PDS admin / moderation
definition anon {}
definition acct {}
definition oauth {}
definition apikey {}
definition svcacct {}
definition service {}  // appview, labeler, feedgen, ... needs a DID

partial negative {
  // various negating relations (should / do we need all of these here)
  relation blocked:   acct | service
  relation muted:     acct | service
  relation banned:    acct | service
  relation takendown: acct | service

  // is there a meta-permission here like
  permission negated = blocked | muted | banned | takendown
}

definition record {
  // space containment / nesting
  relation parent: space

  ...owned
  ...negative
  ...record_crud
  ...record_iam
}

partial record_crud {
  // Role CRUD relations
  relation record_deleter: superuser |
    acct    | acct    with nsid_allowed |
    oauth   | oauth   with nsid_allowed |
    apikey  | apikey  with nsid_allowed |
    svcacct | svcacct with nsid_allowed |
    service | service with nsid_allowed |
    space#member | space#member with nsid_allowed |
    group#member | group#member with nsid_allowed |
    role#member  | role#member  with nsid_allowed

    ...

  // Role CRUD permissions
  permission record_delete = owner         + record_deleter + parent->record_delete
  permission record_update = record_delete + record_updater + parent->record_update
  permission record_create = record_update + record_creator + parent->record_create
  permission record_list =   record_create + record_lister  + parent->record_list
  permission record_read =   record_list   + record_reader  + parent->record_read
Or is this part of why schemas get so long, and also is the computation really that different, or does the underlying algebra & graph have good algos so it matters less? It's hard to know if the schema I write is "good" both in terms of correctness and performance
y
i would start with correctness, and then there are some heuristics that you can use to improve performance
ime the simplest schema you can write that expresses the logic you want is usually pretty close to the most performant
there are a couple of guidelines to follow and a couple of non-obvious constructions that can help
one is that negation is expensive, because you need to fully materialize the set on both sides of a negation to determine whether the resulting set is non-empty
whereas unions and intersections can both short-circuit
another is that intersection is more expensive than a union or an arrow (generally), which means that phrasing boolean logic in terms of self-relations can be beneficial:
Copy code
definition resource {
  relation user: user
  relation active: user:*
  permission view = user & active
}
// becomes
definition resource {
  relation user: user
  relation active: resource
  permission view = active->user
}
and you'd write a relation from a resource to itself to make it active.
we also wrote up some best practices a little while ago: https://authzed.com/docs/best-practices
i'd be curious whether this doc is helpful for you or not
i'm also not sure i entirely understand your use case - where would a SpiceDB instance be running? what data would it hold?
v
The use-case is private data in ATProtocol, where every user gets their own database, so it seems they should each get their own permission system (so they can migrate both together. The user's database (repo) is managed by a PDS, which can have a single user all the way to 500k (though could get to nMillion). So SpiceDB would be running with the PDS for most cases. For a large outfit like Bluesky, they may run SpicedDB for the 30M+ accounts they manage the PDS for (they run a gateway for oauth already, which is run by the PDS is self-hosting as well). This is a really clean explanation of the distribute architecture of ATProtocol: https://atproto.com/articles/atproto-for-distsys-engineers
More generally, we need to map the permission system onto the existing resources in the protocol, and enable ATProto apps to leverage this permission system instead of having to write their own. So I'm not writing a permission system for one application, but for a network of applications and users. Not sure how much this will help without context or my words, but I have this slide deck I'm working on: https://docs.google.com/presentation/d/1504zw9wtNuG4FvyZSTAsfMbwPXrFuvorfRWOS1mWN44/edit?usp=sharing
tl;dr, it is very close to Google Docs, each account gets a root space and IAM therein. Organizations will create an account and then assign users within the org's atproto account / spaces
Just started skimming and it looks like it's going to be really helpful, thanks for the link!
I was thinking about the negation stuff I have, and your comment that it is far more complex. I believe we can handle this outside the permission system, before we even ask any questions. We are already checking these things in other places anyway, and they are broader strokes in terms of access. Then I see it as one of the first / top recommendations!
Now I see "Prefer Relationships to Caveats" and I will have to think about what I've done. Fortunately I have setup a testbed for our schema and should be able to try out both methods. Making the NSIDs a type and relation comes with more complexity on our end (there is not actual data or record), they are a scoping/authority mechanism in a content addressable system. Maybe they can be a pseudo-resource like pseudo-relations?
y
ah yeah, if there's no data or record that sounds more like an attribute-based system which is what caveats are intended to help implement
generating ad-hoc relations sounds painful
v
it's more that we have this app defined NSID (reverse domain namespace id) that sits in the middle of the content addressing, than we do ad-hof relations. The permissions over them should be relatively simple / limited (at least at the protocol level) CRUD + IAM for content, custom functions over them live in the apps instead of the PDS, but we can still put permissions to invoke them (really just need one permission available)
The caveats align well with the oauth scopes permission sets (which are collections of these NSID
Copy code
//
// Caveats
//

// NSID scoping
caveat nsid_allowed(nsid string, allowed_nsid list<string>) {
  nsid in allowed_nsid
}

// Custom scoping for apps
caveat context_allowed(context string, allowed_contenxt list<string>) {
  context in allowed_context
}

// for special use-cases, not expected to be generally used
caveat time_frame(beg string, end string, mode string) {
  // tbd...
}
If a caveat exists, but is rarely used, then performance experienced should be near-equivalent as if it wasn't in the schema?
y
yeah, if an evaluation path doesn't include a caveat it shouldn't be affected much (if at all) by the existence of caveats on other paths
v
what if it's set in the schema above, but never used in the relations or permission checks? I would imagine the query engine can regonize this and avoid calculations?
y
i'm not sure i understand why it would be in the schema if it's not used in checks 🤔
v
I mean >99% of the checks, imagine a caveat that is rarely used, those 99 would not be impacted by the existence of the caveat?
y
correct, yeah
up to slightly more stuff in the database and a slightly larger schema to be held in memory and interpreted, both of which i would expect to be negligible
v
I'm inevitably creating a short-list of links from spicedb for people in atproto, will share with y'all when they are in a good place
I might have to go back to the resource version instead of caveats, I need to nest them (or more so records under them, like a thread in a channel) and that makes the dedicated resource with relations seem more appropriate in my mind
lol, I think I'm back to caveats... having a hard time modeling the nsids as a resource, probably because we don't know what nsid until request time...
y
yeah that requirement would definitely push me towards caveats
this also sounds like it might be a decent use case for contextual tuples: https://github.com/authzed/spicedb/issues/1398 it's something that we've gone back and forth on actually implementing because it seems difficult to do in a sane and safe way, but for a relation that's not actually known until request time it sounds about right
v
Interesting, I'll look into adding the atproto use case there for more context, or maybe I'll just start a new discussion
I'm not sure we don't know before the request, maybe it's more that we have a large unknown list of NSIDs that are out of our control, but we do imagine users limiting to a sublist (still large, 100+), which can then have several more orders of magnitude of objects on the other side.
Copy code
<space>
  - 100s <user> in 1+ <group>
  - 100s <nsid>
     - 1000s and beyond <records>
i.e. there is significant fan-out in the content tree, and also the records can refer to each other, which is probably a good relation to capture in Spice. Caveats seem like they will work for this, but I wonder about performance...
Copy code
caveat nsids(allowed list<string>, nsid string) {
      nsid in allowed  <- this appears to be O(n) instead of O(n_logn)?
    }
Though if it as a map... it could be O(1) and support both positive and negative associations with an NSID...? It just seems like with NSIDs in the schema as a resource, to give access, we need to write 100s of relations to the "virtual" NSID (they are not stored, they are part of content addressing), versuss one with the caveats on it
The latest NSID caveat I'm working with, is the big-o statement accruate?
Copy code
// NSID filtering (think oauth permission sets and check)
// map  vs list: while specifying is more cumbersome,
// O(1) vs O(N) runtime performance is compelling
caveat nsids(allowed map<bool>, nsid string) {
  allowed[nsid] || false
}
ugh... why is existence in a map linear while lookup is constant > has(e.f): Space is constant. > If e is a map, time is linear in size of e.
> In the boolean operators
&&
and
||
if any of their operands uniquely determines the result (false for
&&
and true for
||
) the other operand may or may not be evaluated, and if that evaluation produces a runtime error, it will be ignored. This is not what I'm seeing, is this a bug?
Copy code
11:59AM ERR terminated with errors error="rpc error: code = InvalidArgument desc = evaluation error for caveat nsids: no such key: bsky_blob"
ERROR: got  expected false
from: https://github.com/google/cel-spec/blob/master/doc/langdef.md#logical-operators
new latest version
Copy code
caveat nsids(allowed map<bool>, default bool, nsid string) {
    (nsid in allowed) ? allowed[nsid] : (default || false)
  }
y
that's bizarre
it might be? that's definitely surprising to me. what was the expression and what was the context provided?
v
They would be very simple, like
Copy code
["  caveat bryan", [_space, "record_viewer", _bryan,  #"nsids:{"default": false, "allowed": {"bsky_post":true,"bsky_like":true}}"#]],
      ["  caveat devin", [_space, "record_viewer", _devin,  #"nsids:{"allowed": {"bsky_post":true}}"#]],
It's unclear if the docs are inaccurate, Gemini DR was insistent that it is O(1) because it uses that underlying Go map, which seems to be correct if the right kind of map is created (?) https://github.com/google/cel-go/blob/master/common/types/map.go#L587
fwiw, this is what Gemini said about the non-short-circuit to falsy for errors in CEL ... also that doing so could lead to unintended access in a permissions system, which I do think is a valid design point the CEL authors probably had in mind (?) https://cdn.discordapp.com/attachments/1414026804867371079/1422240490215964783/Screenshot_2025-09-29_at_8.13.39_AM.png?ex=68dbf42c&is=68daa2ac&hm=054bcf4ea086229b8b2f9ca013afc029cb0d9334a89357811f500d5b03820bcb&
y
ah hm. that makes some sense, though i have thoughts about map access returning errors.
"doing so" meaning short-circuiting?
v
yea, if you short circuit to false on error, i.e. if this was a block list rather than an allow list
y
hmmm yeah
v
I'm actually happier with the final caveat, it supports more use case without being overly complex (defaulting to allow or block depending on the app or user)
y
🙌
j
@verdverm.com I explicitly had errors treated as such to prevent "hidden" issues
5 Views