Hi SpiceDB #spicedb

CasperT

03/19/2025, 6:32 PM

Hi We are experiencing some strange things, regarding check permission and zedTokens. Our case is the following. We simultaneously create relations for 6 documents (ids doc1 - doc6) with the same owner relation (owner1). During the creation of ex doc3, the relation for doc1 gets deleted. When we then want to check permission isOwner after creation of doc3 with the zedToken we got from the write, we get permission denied. If we then try to loop with the same check with 500ms delay, then after 3 - 5 secs, we get permission allowed. Or if we check right after with fully_consistent we also get permission allowed. Are there something we don't understand regarding the zedToken? 🙂

yetitwo

03/19/2025, 6:39 PM

> We simultaneously create meaning in the same

WriteRelationships

call? my best guess is that the revision in the token you're using isn't pointing at the revision you think it's pointing at, given the behavior you're describing

yetitwo

03/19/2025, 6:39 PM

it might help to show the code in question

CasperT

03/19/2025, 7:03 PM

No, not in the same write, but in multiple threads. Each write gets its own zedTokan which we store together with the document.

Joey

03/19/2025, 7:35 PM

zedtokens are not necessarily casually ordered

Joey

03/19/2025, 7:35 PM

if you're getting different ones, they will only be guaranteed to reflect the changes from the specific write from which they were created

CasperT

03/19/2025, 7:53 PM

Why do we then get permission denied on the check permission for the specific write with its zedToken, and why does the response change if we keep trying for 5 secs?

Joey

03/19/2025, 7:55 PM

you shouldn't so long as the write reflects all the relationships for that check

Joey

03/19/2025, 7:55 PM

if it reflects a portion of them, its possible to get a different answer

Joey

03/19/2025, 7:55 PM

and after 5s, the cache has moved on

yetitwo

03/19/2025, 7:58 PM

specifically, if the zedtoken you get for the document writes is functionally earlier than the zedtoken you get for the owner write, and you're checking using the zedtoken for the document write, the request may not reflect the change from the owner write

yetitwo

03/19/2025, 7:59 PM

as for why it changes, it's because at that point the quantization window has moved to a revision that includes the user write

yetitwo

03/19/2025, 7:59 PM

so you're seeing the cache roll over, more or less

Joey

03/19/2025, 8:00 PM

what datastore are you using?

yetitwo

03/19/2025, 8:01 PM

this blog post goes into depth on how quantization windows work if you're curious: https://authzed.com/blog/hotspot-caching-in-google-zanzibar-and-spicedb

CasperT

03/19/2025, 8:19 PM

We are using postgresql

Joey

03/19/2025, 8:19 PM

which has non-timestamp based revisions

Joey

03/19/2025, 8:19 PM

so no casual ordering

Joey

03/19/2025, 8:20 PM

also, if you have two writes that change the same relationship and they are started at the same time, the order at which they apply can be either order

Joey

03/19/2025, 8:21 PM

so if you have one that adds a relationship on doc1 and another that deletes it, its possible the second write finishes first and the relationship now exists

Joey

03/19/2025, 8:21 PM

depends on which hits the datastore first

CasperT

03/19/2025, 8:24 PM

Does that mean if we make more writes to the owner relationship, for the same owner at the same time, but manage to delete one of the first relationships, before the later is writen to the database, the token for later ones can be unusable?

CasperT

03/19/2025, 8:24 PM

Is there at way to see this in the database?

Joey

03/19/2025, 8:24 PM

it means if you're making changes to the same relationships in parallel

Joey

03/19/2025, 8:24 PM

and those changes conflict

Joey

03/19/2025, 8:25 PM

which order they apply (and therefore, the end state) is more or less random

Joey

03/19/2025, 8:25 PM

if I have a TOUCH document:1234#owner#user:tom and a DELETE document:1234#owner#user:tom, in two parallel writes

Joey

03/19/2025, 8:25 PM

the outcome depends on which finishes first and which finishes second

Joey

03/19/2025, 8:25 PM

you can avoid that using preconditions

Joey

03/19/2025, 8:26 PM

for example, you could set a precondition on the DELETE that says "only delete this IF the relationship already exists"

CasperT

03/19/2025, 8:26 PM

We will not have a delete for a speciific doc ex doc1 before it is written. But we can have a delete for doc1 while ex doc3 is written.

Joey

03/19/2025, 8:27 PM

what is "ex" doc vs normal doc?

CasperT

03/19/2025, 8:27 PM

Sorry while doc3 is writen.

Joey

03/19/2025, 8:28 PM

if you write a relationship for doc3 and then check for doc3 (assuming it doesn't rely on doc1 at all) with at_least_as_fresh with the zedtoken from the write, it should reflect that write

Joey

03/19/2025, 8:28 PM

if doc3 relies on the relationships for doc1 though, all bets are off IF you're also changing doc1 at the same time

CasperT

03/19/2025, 8:29 PM

We will never delete a relation without insure we get a zedtoken for that relation write.

Joey

03/19/2025, 8:30 PM

right, but you need to remember the zedtokens may not overlap

CasperT

03/19/2025, 8:30 PM

This is where we experience the faulty check permission.

Joey

03/19/2025, 8:30 PM

it shouldn't do so, so you might need to put together a repro case

Joey

03/19/2025, 8:30 PM

are you sure you're using the correct zedtoken?

CasperT

03/19/2025, 8:30 PM

And if we retry for 5 seconds, it works

Joey

03/19/2025, 8:31 PM

because at that point the window has moved on

CasperT

03/19/2025, 8:31 PM

And to make this even more strange, it only happens sometimes.

Joey

03/19/2025, 8:31 PM

well yeah

Joey

03/19/2025, 8:31 PM

it would only happen if someone did a check on the same document, for the same subject

Joey

03/19/2025, 8:31 PM

and got the negative result right before the write occurred

Joey

03/19/2025, 8:31 PM

otherwise it won't be in the cache

Joey

03/19/2025, 8:32 PM

but I would triple check you're sending the correct zedtoken

Joey

03/19/2025, 8:32 PM

and the correct consistency level: if you aren't correct on it, it will default to minimize latency

CasperT

03/19/2025, 8:32 PM

But could it be because of cache window move this happens?

Joey

03/19/2025, 8:33 PM

Joey

03/19/2025, 8:33 PM

if you're using at_least_as_fresh with the correct zedtoken, it should avoid the cache for any data computed before the write committed

CasperT

03/19/2025, 8:33 PM

We use at_least_as_fresh for checks

Joey

03/19/2025, 8:33 PM

I know - but its easy to accidentally send the wrong zedtoken

Joey

03/19/2025, 8:33 PM

so I recommend triple checking

Joey

03/19/2025, 8:34 PM

if you verify its the correct zedtoken and the consistency is specified correctly (which can be done incorrectly easily in TypeScript or Python)

Joey

03/19/2025, 8:34 PM

then I'd suggest putting together a minimal repro

Joey

03/19/2025, 8:34 PM

what client are you using?

CasperT

03/19/2025, 8:34 PM

We experience this even if we do a check in the code line just after the write, with the zedtoken from the write.

CasperT

03/19/2025, 8:35 PM

We are using Java

Joey

03/19/2025, 8:35 PM

> We experience this even if we do a check in the code line just after the write, with the zedtoken from the write. then please put together a repro and we'll see if I can identify the cause

Joey

03/19/2025, 8:36 PM

if I recall, the Java client had something odd about specifying consistently levels

CasperT

03/19/2025, 8:39 PM

Is it okay to store each write zedtoken together with the doc, and use it for checks for that document, or should we store the last zedtoken and use that for checks?

Joey

03/19/2025, 8:55 PM

yes, that's what's intended

Joey

03/19/2025, 8:56 PM

store it alongside the doc

CasperT

03/19/2025, 9:04 PM

One last thing, if we syncronize the writes, so only one write can happen at the time, everything works fine.

Joey

03/19/2025, 9:13 PM

sounds like there might be some overlap

Joey

03/19/2025, 9:13 PM

anyway, see if you can create a repro

CasperT

03/19/2025, 9:18 PM

I will. Thanks 🙂

CasperT

03/20/2025, 1:09 PM

I have just seen this is set in our docker compose test setup. Could this be the reason? SPICEDB_DISPATCH_CACHE_ENABLED=false

Joey

03/20/2025, 1:46 PM

Joey

03/20/2025, 1:46 PM

disabling a cache would not cause a problem

Joey

03/20/2025, 1:46 PM

it just makes things slower

7 Views

Previous Next