I was talking to some internal teams
# spicedb
p
I was talking to some internal teams about using SpiceDB instead of a homegrown authorization system in our applications running in Kubernetes, but got some immediate pushback about that this would introduce network latency to all checks, and that all our api endpoints that do permission checks would go from ns to ms, and potentially even seconds compared to the current authz (the current authz system queries a database, but also caches results, meaning it is in-memory for some/most checks and hard to beat). What are the best practices for having as low latency as possible, and what kind of performance do you think we can expect? I assume deploying the SpiceDB cluster distributed with node affinity to get "local" network access for example? Are there other ways of increasing performance and ways to help me convince (besides running an actual benchmark with/without SpiceDB, which I hope we will do eventually but wanted to get them convinced before that)
v
It depends on many things, but even though the setup has impact on latency, the most important bit is your schema and data shape and size. Folks that say "sub millisecond authorization" forget that data needed to compute authorization decisions need to be loaded somewhere. Policy engines claim they are fast - and they are - but only because the responsibility of loading the data is pushed somewhere else or the dataset fits in memory. There are scenarios where folks rely completely on state carried in things like JWT tokens, so the data load is amortized because the data was loaded offband the request. That all is fine but eventually requirements change and you face the hard reality of having to evolve a system that wasn't designed to accomodate those requirements. Those systems also ignore the challenges of globally distributed systems, strong consistency and the new enemy problem. Now back to your question, under normal circumstances you are not going to get submillisecond latency unless your clients are hitting SpiceDB at a high rate with a dataset that essentially is kept always in memory. You can run SpiceDB as a library or sidecar if your dataset fits in memory, using the in-memory datastore. A more realistic scenario that we've seen is something like 5ms p95 for certain workloads with good cache hit ratios. SpiceDB, inspired in Zanzibar, is meant to be scalable, secure, predictable, and most importantly easy to evolve your authorization logic, but not to have submillisecond response times.
So I think the discussion has to be more than just latency if you want to compare to entirely different authz services
I don't think I addressed "home grown" solutions properly - that's precisely where the pain emerges as business needs evolve. The system cannot adjust to the new demands and overall value delivery to customers is affected. If you are caching data, how stale are you accepting your data to be?
p
Thanks for the thorough response! And I totally agree that the discussion should not just be about latency, as there are so many benefits of moving to a system like SpiceDB for the many reasons you bring up, and in my mind the potential latency added would most likely be acceptable given all the benefits. Running as a sidecar or even as a library (unfortunately in Java, so no library?) in-process should to some degree give them some feeling of managing the latency to some degree. I haven't digged into how much a "local" network call adds (that has to some degree travel through the network stack) compared doing something in-process. And yes, managing the evolving business needs is one of the reasons we are looking at a system like SpiceDB instead of the home grown, so we can deliver functionality quicker and also even have more complex authz functionality that we anticipate (with less code, and an easier to understand authz model!). As for caching, since the authz is (currently) part of a monolitic application that has intricate knowledge of how things are tied together and has control over all data changes, it knows when to invalidate the cache (but this is of course complicated to maintain).
v
Right, at the end there is a critical point in time where the problems of mantaining a homegrown solution offset the fast response times.
I think that at the very least you want to make use of
at_least_as_fresh
semantics to make sure cached subproblems are leveraged for as long as possible. SpiceDB will also invalidate caches if state changes. We always recommend doing a proof of concept, model your schema with SpiceDB, load a dataset that is representative of your production dataset, and do a load test to understand the behaviour. From there you can start tweaking things down to trim those milliseconds down.
s
> SpiceDB will also invalidate caches if state changes. sorry to interrupt your thread, but what do you mean here? I haven't seen anything like that in the source code yet, can you point me there please?
v
it's not actively invalidated. What it means is that the datastore revision will change, and any subsequent requests that select that revision will see no entries in the cache at that revision
so it's not removed from the cache, if that's what you mean, but it effectively dissapears from the point of view of the request
actual cache eviction happens on a TTL
s
ah, okay. that I observed in the code 🙂 thought for a moment I missed somtheing and decided to double check. again, sorry to interrupt and thnx ))
v
no worries!
was trying to simplify the description not to go into detail around multi-version concurrency control
4 Views