c
Hi We are experiencing some strange things, regarding check permission and zedTokens. Our case is the following. We simultaneously create relations for 6 documents (ids doc1 - doc6) with the same owner relation (owner1). During the creation of ex doc3, the relation for doc1 gets deleted. When we then want to check permission isOwner after creation of doc3 with the zedToken we got from the write, we get permission denied. If we then try to loop with the same check with 500ms delay, then after 3 - 5 secs, we get permission allowed. Or if we check right after with fully_consistent we also get permission allowed. Are there something we don't understand regarding the zedToken? 🙂
y
> We simultaneously create meaning in the same
WriteRelationships
call? my best guess is that the revision in the token you're using isn't pointing at the revision you think it's pointing at, given the behavior you're describing
it might help to show the code in question
c
No, not in the same write, but in multiple threads. Each write gets its own zedTokan which we store together with the document.
j
zedtokens are not necessarily casually ordered
if you're getting different ones, they will only be guaranteed to reflect the changes from the specific write from which they were created
c
Why do we then get permission denied on the check permission for the specific write with its zedToken, and why does the response change if we keep trying for 5 secs?
j
you shouldn't so long as the write reflects all the relationships for that check
if it reflects a portion of them, its possible to get a different answer
and after 5s, the cache has moved on
y
specifically, if the zedtoken you get for the document writes is functionally earlier than the zedtoken you get for the owner write, and you're checking using the zedtoken for the document write, the request may not reflect the change from the owner write
as for why it changes, it's because at that point the quantization window has moved to a revision that includes the user write
so you're seeing the cache roll over, more or less
j
what datastore are you using?
y
this blog post goes into depth on how quantization windows work if you're curious: https://authzed.com/blog/hotspot-caching-in-google-zanzibar-and-spicedb
c
We are using postgresql
j
which has non-timestamp based revisions
so no casual ordering
also, if you have two writes that change the same relationship and they are started at the same time, the order at which they apply can be either order
so if you have one that adds a relationship on doc1 and another that deletes it, its possible the second write finishes first and the relationship now exists
depends on which hits the datastore first
c
Does that mean if we make more writes to the owner relationship, for the same owner at the same time, but manage to delete one of the first relationships, before the later is writen to the database, the token for later ones can be unusable?
Is there at way to see this in the database?
j
it means if you're making changes to the same relationships in parallel
and those changes conflict
which order they apply (and therefore, the end state) is more or less random
if I have a TOUCH document:1234#owner#user:tom and a DELETE document:1234#owner#user:tom, in two parallel writes
the outcome depends on which finishes first and which finishes second
you can avoid that using preconditions
for example, you could set a precondition on the DELETE that says "only delete this IF the relationship already exists"
c
We will not have a delete for a speciific doc ex doc1 before it is written. But we can have a delete for doc1 while ex doc3 is written.
j
what is "ex" doc vs normal doc?
c
Sorry while doc3 is writen.
j
if you write a relationship for doc3 and then check for doc3 (assuming it doesn't rely on doc1 at all) with at_least_as_fresh with the zedtoken from the write, it should reflect that write
if doc3 relies on the relationships for doc1 though, all bets are off IF you're also changing doc1 at the same time
c
We will never delete a relation without insure we get a zedtoken for that relation write.
j
right, but you need to remember the zedtokens may not overlap
c
This is where we experience the faulty check permission.
j
it shouldn't do so, so you might need to put together a repro case
are you sure you're using the correct zedtoken?
c
And if we retry for 5 seconds, it works
j
because at that point the window has moved on
c
And to make this even more strange, it only happens sometimes.
j
well yeah
it would only happen if someone did a check on the same document, for the same subject
and got the negative result right before the write occurred
otherwise it won't be in the cache
but I would triple check you're sending the correct zedtoken
and the correct consistency level: if you aren't correct on it, it will default to minimize latency
c
But could it be because of cache window move this happens?
j
no
if you're using at_least_as_fresh with the correct zedtoken, it should avoid the cache for any data computed before the write committed
c
We use at_least_as_fresh for checks
j
I know - but its easy to accidentally send the wrong zedtoken
so I recommend triple checking
if you verify its the correct zedtoken and the consistency is specified correctly (which can be done incorrectly easily in TypeScript or Python)
then I'd suggest putting together a minimal repro
what client are you using?
c
We experience this even if we do a check in the code line just after the write, with the zedtoken from the write.
We are using Java
j
> We experience this even if we do a check in the code line just after the write, with the zedtoken from the write. then please put together a repro and we'll see if I can identify the cause
if I recall, the Java client had something odd about specifying consistently levels
c
Is it okay to store each write zedtoken together with the doc, and use it for checks for that document, or should we store the last zedtoken and use that for checks?
j
yes, that's what's intended
store it alongside the doc
c
One last thing, if we syncronize the writes, so only one write can happen at the time, everything works fine.
j
sounds like there might be some overlap
anyway, see if you can create a repro
c
I will. Thanks 🙂
I have just seen this is set in our docker compose test setup. Could this be the reason? SPICEDB_DISPATCH_CACHE_ENABLED=false
j
no
disabling a cache would not cause a problem
it just makes things slower
7 Views