Q: Im thinking of centralising how we
# spicedb
b
Q: Im thinking of centralising how we make our spiceDB queries across our services into shared package, as well as doing that I am conserding abstracting the zedToken entirely. Currently each microservice stores the zedToken for the records within it's domain. The problem is if another service needs to do a permissions check but doesn't have access to that zedToken they will either need to use fullConsistent reads or run the risk of a race condition, which for certain flows is pretty much 100% of the time. I was thinking that the shared lib would automatically store the token per user namespace/object_id combination in redis for the duration of the quantization window. Am I being too naive here with this implementation? On the other hand if this does sound sensible, is there a reason spiceDB doesn't do this out of the box?
v
I understand the zedtoken is not being stored along-side the resource, and that's the reason other services may not have access to it? You seem to store it in redis, so I think providing utils to abstract this for the client service makes sense to me. I wouldn't store it just for the duration of the quantization window. If the revision is evicted from redis, you'll basically fall back to full consistency, which is expensive. However, even if it's past the quantization window, it's still useful (and desirable) for SpiceDB to be provided with the zedtoken, as it allows them determining how stale are you willing the response to be. In practice what it means is that past the quantization_window, you are effectively getting
minimize_latency
semantics, so long the revision you provided is older than the new "optimized revision". You could also do the same SpiceDB does here and fallback to minimize_latency when no zedtoken is found, with the assumption that a key should always exist for a resource that changed. w.r.t to having this done by SpiceDB, we've actually discussed something like this internally but haven't opened a proposal as we are still not 100% convinced this is the way to go. It would be akin to "named zedtokens", and would mean SpiceDB would store a name->zedtoken list so you can reference it in your requests.
cc @Joey for thoughts here
b
> I understand the zedtoken is not being stored along-side the resource, and that's the reason other services may not have access to it? You seem to store it in redis Not quite, it's the other way around. We are currently storing in the respective microservice's DB, but I am proposing moving all zedTokens to be store into redis, and the shared library would abstract this as an implementation detail meaning they would never need to know or care about zedTokens or consistency. The plan was to fall back to minimise latency with the assumption the new cache would already have the new tuple and all reads would be correct. With that do you think it is still useful to pass the zedToken even if it's 'stale'?
^ w.r.t this > However, even if it's past the quantization window, it's still useful (and desirable) for SpiceDB to be provided with the zedtoken, as it allows them determining how stale are you willing the response to be.
v
>Not quite, it's the other way around. We are currently storing in the respective microservice's DB, but I am proposing moving all zedTokens to be store into redis, and the shared library would abstract this as an implementation detail meaning they would never need to know or care about zedTokens or consistency. So what I'm hearing is that the "microservice that does not have the zedtoken" is because the owning microservice is not exposing the zedtoken? This approach is fine, some folks do it, it just sucks to add another dependency in the critical path and couple the failure domain of SpiceDB to Redis. So long you can keep them isolated enough (i.e. the system knows how to handle the situation when Redis is down). You could even hedge a
minimize_latency
call in parallel while waiting for Redis response, and only issue the request if the token exists / redis is up healthy. >The plan was to fall back to minimise latency with the assumption the new cache would already have the new tuple and all reads would be correct. With that do you think it is still useful to pass the zedToken even if it's 'stale'? I think so, but you'd have to add some padding to make sure the quantization window has really elapsed. SpiceDB has 3 parameters to compute the window: - the quantization window itself - the crossfade factor (by default 20% if I'm not mistaken) - follower-replica lag (only CRDB and Spanner) The TTL in the cache is actually stored to 2 times of the total above. @Joey can you confirm that, if storing the zedtoken for at least 1x the quantization window of the total above, clients are safe to fall back to
minimize_latency
?
b
Yea if it's not ok to use the exact quantization window value, e.g 5sec... would there be a recommended factor to apply to is. e.g. window x2 10sec. Thanks for your help btw 🙂
v
depends on your configuration, note the parameters above, they are all configuration parameters, unless you've touched them, they should be set to the default, and you just need to 2x that
w
FWIW, we do exactly that: we built an SDK on top of
authzed-node
to provide additional functionalities and QOL: things like client-side OTel, caching, and zedtoken handling. For zedtoken handling we do exactly what you describe: on write we store the zedtoken in Redis (key is the user ID), on read we get the zedtoken in Redis and fallback to
minimize_latency
. There are downsides: - It doesn't guarantee read-your-own-writes (can't have strong consistency between 2 datastores), but it is likely enough for our practical use-case (and fallback on eventual consistency) - There's a risk of race conditions, concurrent writes could leave you with an outdated zedtoken in Redis - We accepted that, we already accepted that we'd have eventual consistency in edge cases - Makes me think that this could be avoided if zedtokens were lexicographically sortable 🤔 then we could make sure to only update Redis if the new key is bigger (would need some Lua scripting) - It adds overhead. Redis is fast but so is SpiceDB: Redis is a significant portion of the total, easily double-digit ms. At the P99 it's actually very significant, multiple times SpiceDB's response time. - The hedging described by @vroldanbet would help, but it's potentially doubling the load on SpiceDB and it's additional complexity
e
@williamdclt just FYI there is an active proposal for comparing zedtokens: https://github.com/authzed/spicedb/issues/1162 feel free to weigh in on that if you have an opinion
v
Those are great points William, indeed there is no such thing as strong-consistency when you something involves a dual-write (writing to two or more transactional boundaries). The main problem with naive dual-writes is that you can't guarantee the system eventually gets in to the right state, as processes, hardware or networks may fail mid TX. Strategies like CDC or durable workflows that can help make sure distributed transactions are atomic but isolation can't be guaranteed. We are planning building opinionated and ergonomic SDKs (versus the current dumb proto SDKs) so we definitely are interested in hearing what'd you like to see of such an SDK. I don't think hedging strictly doubles the load, at least not in the DB side, which is the critical part. SpiceDB (like Zanzibar) has a bunch of request deduplication mechanism is place to make sure the same subproblem is not requested twice in the same quantization window - this is really what the cache is for. So: - in the scenario you are in the same quantization window (read your own writes), data that needs to be loaded will only be loaded once, and there is no way around not loading that data. Even if you end up with 2 requests in flight at the same time, caches and datastore deduplication will do its work. - in the scenario your zedtoken is behind the quantization window, you will be doing effectively a _minimize_latency_ anyway, so the caches will be populated, and that request will incur no DB access. - where I think this would really be doubling, and there is a slimmer chance, is when the 2 requests are landing in 2 different quantization windows because they land in the cluster at 2 differet times right when it crosses the border. I agree it's more complexity, and I definitely recommend using the zedtoken always, but as you noted, adding a Redis basically makes the perceived performance and availability of SpiceDB be bound to that of Redis
do you have numbers on how often do folks end up with a request that 403's because the zedtoken used for read-your-writes was written and either partially failed or raced? I guess it's difficult to determine, unless you have obvious UI workflows like create-then-redirect
b
@williamdclt thanks for your input 🙇‍♂️ Is this SDK open source, if so can you link me? I've not yet decieded if the redis key should be the user or the object_id. Pros of using the user is that only they are affected by any additional latency from the use of
at_least_as_fresh
however any other requests made by the user to check permissions against any other object would also incur the same latency hit. Interesting to hear that redis can become the bottleneck when it comes to latency, so running both in parallel (redis key check and spicedb call with minimize latency) if what @vroldanbet is saying is true has no real cost we can certainly do since it's going to be the hot path i.e. only in certain circumstances will the zedtoken be in redis.
v
not saying it does not have cost - it does - it's just not 2X
imho a worthy tradeoff
w
It isn't open-source sadly! I probably won't be able to open it up, as it contains a fair amount of internal stuff like our spicedb schema. Not that it should be a security problem, but we don't have a "develop in the open" culture
> do you have numbers on how often do folks end up with a request that 403's because the zedtoken used for read-your-writes was written and either partially failed or raced? I guess it's difficult to determine, unless you have obvious UI workflows like create-then-redirect I might be able to pull some numbers, I'll have a look later
> Strategies like CDC or durable workflows that can help make sure distributed transactions are atomic but isolation can't be guaranteed. Yep, what we do is CDC for eventual consistency and in addition dual-write within the request lifetime (after commit) as best-effort-read-your-own-writes. We need idempotency for CDC anyway so that works fine
b
@williamdclt how long do you keep the zedToken in redis, and how does that relate to your configured quantization window?
v
>Yep, what we do is CDC for eventual consistency and in addition dual-write within the request lifetime (after commit) as best-effort-read-your-own-writes. We need idempotency for CDC anyway so that works fine Was this because it wasn't trivial to propagate the zedtoken from the CDC pipeline back into redis?
w
> Was this because it wasn't trivial to propagate the zedtoken from the CDC pipeline back into redis? No, it is trivial (and we do), but CDC is async so it wouldn't give us read-your-own-write consistency
v
you could have the UI synchronize on the events to be published
99 Views