Hi all, at my company we've run into an SpiceDB #spicedb

Hi all, at my company we've run into an

diveangle

08/08/2025, 8:13 AM

Hi all, at my company we've run into an issue. We use SpiceDB with a postgres datastore, with our app DB as the source of truth for all the relationships in SpiceDB. We have a process running that is `Touch`ing all relationships that should exist in SpiceDB based on the app DB's state, and also extending expiration, since we use expiration on all relationships as suggested to me here as well maybe a few months ago. It runs once every 23 hours. we have about 1.3M relationships in our datastore based on

relation_tuple

, not counting any deleted ones, but at the moment only 680k of those that were written in the last day (so others will expire eventually), so maybe something in that ballpark would be how many relationships this process has to write once a day, and it takes around 20 minutes generally. Of course we also usually have concurrent usage of SpiceDB by our application for mostly

CheckBulkPermissions

requests whenever these syncs happen Now with that context, we are seeing two types of errors in the spicedb pod logs, that I can confirm appear during these syncs, and I can reliably reproduce them by re-running the sync process. What is interesting is, the

WriteRelationships

calls on the sync process side seem to all succeed, which is why we didn't notice this error for some time (we didn't have alerts for the SpiceDB pod's errors so we didn't notice them). From our side it looked like the sync was succeeding, but we were not seeing some relationships that we thought we should be seeing (we were relying on the sync to create them in that case). Will split the rest of the message with the actual errors into a second message.

diveangle

08/08/2025, 8:15 AM

Here are the errors first (schema specific parts of the log removed)

Copy code

{"level":"error","err":"extended protocol limited to 65535 parameters","pid":773,"sql":"INSERT INTO relation_tuple (namespace,object_id,relation,userset_namespace,userset_object_id,userset_relation,caveat_name,caveat_context,expiration) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9),($10,$11,$12,$13,$14,$15,$16,$17,$18),($19,$20,$21,$22,$23,$24,$25,$26,$27),($28,$29,$30,$31,$32,$33,$34,$35,$36),($37,$38,$39,$40,$41,$42,$43,$44,$45),($46,$47,$48,...","time":15.889214},"time":"2025-08-08T06:19:09Z","message":"Query"}

second

Copy code

{
  "level": "warn",
  "lock_id": 1,
  "message": "held lock not released; this likely indicates a bug"
}

Generally the first error occurs more often than the second, and sometimes I don't see the second, but usually we also get it. We have SpiceDB configured to allow 10000 relationships as the batch size for

WriteRelationships

calls. Reducing the batch size to 5000 from the sync process's side caused the errors to go away completely. Could there be anything we're doing wrong or something we can do to solve this, so that we can put the batch size back to 10000 and there not be any issues with not seeing relationships that should have been inserted? Is it intended for an error like this to be possible when the

WriteRelationships

returned a success response? Thanks in advance!

Joey

08/08/2025, 8:24 AM

no, that error should return a write error

Joey

08/08/2025, 8:24 AM

if it isn't, that is a bug

Joey

08/08/2025, 8:24 AM

as for the limit, it is a Postgres limit

Joey

08/08/2025, 8:27 AM

as for the warning, no idea where that is coming from

diveangle

08/08/2025, 8:40 AM

Hmm alright. I will try to see if maybe we're somehow ignoring the error. I even ran the process locally and it appeared to be that the call was returning a success response with it's zed token, so I'm not sure. It really appeared to be that the request was succeeding. And since the number of parameters used per relationship isn't related to the schema, would this mean in practice it's not possible to actually insert a batch of 10000 relationships? I'm not sure how long this error was coming up and we just didn't notice it (since we usually insert the relationships in the actual application and only rely on the sync process for backup in case there are errors), but we've been using 10000 as the limit for a while and at least didn't notice any issue until now. By the way we are on 1.42.1 now. We just upgraded, it was also happening on 1.40.1 before the upgrade

Joey

08/08/2025, 8:41 AM

if there is any form of error, it should return that error

Joey

08/08/2025, 8:41 AM

I'm adding a test now for it

Joey

08/08/2025, 8:41 AM

as for hitting the limit, if you enabled expiration, then you added a parameter per relationship

Joey

08/08/2025, 8:41 AM

that's 10,000 additional parameters

Joey

08/08/2025, 8:43 AM

looking at the column list:

(namespace,object_id,relation,userset_namespace,userset_object_id,userset_relation,caveat_name,caveat_context,expiration)

diveangle

08/08/2025, 8:43 AM

Ah ok, that makes sense

Joey

08/08/2025, 8:43 AM

9 cols vs 8 before

Joey

08/08/2025, 8:44 AM

I would have suspected it to fail before, given the 65K limit

Joey

08/08/2025, 8:44 AM

but its possible you had less than 10K rels sometimes

diveangle

08/08/2025, 8:46 AM

Yeah, it's definitely possible the error was appearing before. The actual number of 10k batches is quite few, most of the time they are 1k-2k, but occasionally there will be a 10k batch. We never looked at the spicedb logs much before

Joey

08/08/2025, 8:46 AM

yeah

Joey

08/08/2025, 8:46 AM

that's likely the reason it only appears occasionally

Joey

08/08/2025, 8:46 AM

I'm trying to reproduce why the error isn't being returned now

Joey

08/08/2025, 8:48 AM

what version of PG are you using?

diveangle

08/08/2025, 8:49 AM

16.8

Joey

08/08/2025, 8:57 AM

well, I've reproduced the insert failing and not returning an error

Joey

08/08/2025, 8:57 AM

but I don't know why

Joey

08/08/2025, 8:58 AM

I'll continue investigating

diveangle

08/08/2025, 9:02 AM

Ah interesting. Thank you

Joey

08/08/2025, 9:02 AM

do you have the read replica support enabled?

diveangle

08/08/2025, 9:02 AM

Joey

08/08/2025, 9:10 AM

okay, I see the root cause

Joey

08/08/2025, 9:11 AM

missing error check on one particular branch

Joey

08/08/2025, 12:31 PM

https://github.com/authzed/spicedb/pull/2526

Previous Next