the general recommendation is not to
# spicedb
j
the general recommendation is not to have a single point of failure for any part of the SpiceDB stack: SpiceDB itself, of course can (and should) be run multinode and you can have two clusters if you want complete and total isolation between them (although this is unlikely to be needed). The datastore itself is the most contentious possible SPoF; for that we recommend using a true HA multi-master datastore like CockroachDB or Spanner
w
Multinode SpiceDB + HA multimaster DB does mitigate some scenarios, but it only covers failures of a given node (both in the case of spicedb and CockroachDB). Scenarios I'm looking at are things like: - Overloading (of either spicedb or CockroachDB). Usually, if one node gets overloaded it's likely that the others are too. Usually caused by things like a client going wild, or a data volume threshold - Bad operations, like an upgrade going wrong. If a spicedb upgrade or a CockroachDB upgrade causes issues (eg regression), it'll likely impact the whole cluster
j
how do you mitigate those issues with your standard database today?
w
We don't 😄 but our databases don't tend to be SPOFs for the entire system: if they go down they only take a specific service with them, the rest of our product keeps running. Authorization is a central domain by nature, so I'm getting challenged on it being a SPOF for the whole product. If it goes down, everything goes down. It's not about "spicedb versus in-house authorization service", it's about "centralised authz versus decentralised authz"
y
i'd probably talk about the complexity of decentralized authorization
specifically in needing to share state and logic between services in order for them to make their own authorization decisions
j
so in your case @williamdclt since you do have the read replicas isolated
you basically are running two distinct clusters except for the upgrade path
and for that, I'd recommend testing on a different stage before pushing to prod
longer term, we have plans for a second layer of caching that could answer queries if SpiceDB is down and the cache is backfilled
but it wouldn't be able to answer everything
w
> i'd probably talk about the complexity of decentralized authorization > specifically in needing to share state and logic between services in order for them to make their own authorization decisions Yeah, that's the discussion I'm having ATM 😄 I'm having a hard time convincing the principal engs, so I'm also exploring how to improve the SPOF situation > you basically are running two distinct clusters except for the upgrade path True, and I suppose we could make it "one cluster per product domain" if we really wanted to. The database is still a SPOF and as well as upgrades (at least the ones containing a DB migration, otherwise we could run different spicedb versions across clusters), but maybe that's mitigation enough
> longer term, we have plans for a second layer of caching that could answer queries if SpiceDB is down and the cache is backfilled > but it wouldn't be able to answer everything Ha, we do that actually! We store the spicedb responses in redis and replay them if spicedb is unavailable
j
> Ha, we do that actually! We store the spicedb responses in redis and replay them if spicedb is unavailable yeah, it might be something like this, but with intelligence to keep the responses long term until they've changed
w
> with intelligence to keep the responses long term until they've changed Ohhh we actually implemented exactly that in our (now archived) home-grown zanzibar-based service. There's interesting papers on this from the RDF people, the core idea was to store the tuples that were used to check the permission (called "BGP" - basic graph patterns) with the cached response for invalidation when something changes. If you're interested I can ask if that's something we'd be willing to open-source
j
we actually have a working prototype that does exactly that
although it uses some specialized data structures to ensure faster checks and less mem
and everything is revisioned
if you're interested in contributing, we'd love your insights into it 🙂
w
as usual, interested but probably too time-constrained 😄 If you have something ready to be read I'd be interested but certainly don't go to much effort for me!
j
k 🙂