Hi all, at my company we've run into an
# spicedb
d
Hi all, at my company we've run into an issue. We use SpiceDB with a postgres datastore, with our app DB as the source of truth for all the relationships in SpiceDB. We have a process running that is `Touch`ing all relationships that should exist in SpiceDB based on the app DB's state, and also extending expiration, since we use expiration on all relationships as suggested to me here as well maybe a few months ago. It runs once every 23 hours. we have about 1.3M relationships in our datastore based on
relation_tuple
, not counting any deleted ones, but at the moment only 680k of those that were written in the last day (so others will expire eventually), so maybe something in that ballpark would be how many relationships this process has to write once a day, and it takes around 20 minutes generally. Of course we also usually have concurrent usage of SpiceDB by our application for mostly
CheckBulkPermissions
requests whenever these syncs happen Now with that context, we are seeing two types of errors in the spicedb pod logs, that I can confirm appear during these syncs, and I can reliably reproduce them by re-running the sync process. What is interesting is, the
WriteRelationships
calls on the sync process side seem to all succeed, which is why we didn't notice this error for some time (we didn't have alerts for the SpiceDB pod's errors so we didn't notice them). From our side it looked like the sync was succeeding, but we were not seeing some relationships that we thought we should be seeing (we were relying on the sync to create them in that case). Will split the rest of the message with the actual errors into a second message.
Here are the errors first (schema specific parts of the log removed)
Copy code
{"level":"error","err":"extended protocol limited to 65535 parameters","pid":773,"sql":"INSERT INTO relation_tuple (namespace,object_id,relation,userset_namespace,userset_object_id,userset_relation,caveat_name,caveat_context,expiration) VALUES ($1,$2,$3,$4,$5,$6,$7,$8,$9),($10,$11,$12,$13,$14,$15,$16,$17,$18),($19,$20,$21,$22,$23,$24,$25,$26,$27),($28,$29,$30,$31,$32,$33,$34,$35,$36),($37,$38,$39,$40,$41,$42,$43,$44,$45),($46,$47,$48,...","time":15.889214},"time":"2025-08-08T06:19:09Z","message":"Query"}
second
Copy code
{
  "level": "warn",
  "lock_id": 1,
  "message": "held lock not released; this likely indicates a bug"
}
Generally the first error occurs more often than the second, and sometimes I don't see the second, but usually we also get it. We have SpiceDB configured to allow 10000 relationships as the batch size for
WriteRelationships
calls. Reducing the batch size to 5000 from the sync process's side caused the errors to go away completely. Could there be anything we're doing wrong or something we can do to solve this, so that we can put the batch size back to 10000 and there not be any issues with not seeing relationships that should have been inserted? Is it intended for an error like this to be possible when the
WriteRelationships
returned a success response? Thanks in advance!
j
no, that error should return a write error
if it isn't, that is a bug
as for the limit, it is a Postgres limit
as for the warning, no idea where that is coming from
d
Hmm alright. I will try to see if maybe we're somehow ignoring the error. I even ran the process locally and it appeared to be that the call was returning a success response with it's zed token, so I'm not sure. It really appeared to be that the request was succeeding. And since the number of parameters used per relationship isn't related to the schema, would this mean in practice it's not possible to actually insert a batch of 10000 relationships? I'm not sure how long this error was coming up and we just didn't notice it (since we usually insert the relationships in the actual application and only rely on the sync process for backup in case there are errors), but we've been using 10000 as the limit for a while and at least didn't notice any issue until now. By the way we are on 1.42.1 now. We just upgraded, it was also happening on 1.40.1 before the upgrade
j
if there is any form of error, it should return that error
I'm adding a test now for it
as for hitting the limit, if you enabled expiration, then you added a parameter per relationship
that's 10,000 additional parameters
looking at the column list:
(namespace,object_id,relation,userset_namespace,userset_object_id,userset_relation,caveat_name,caveat_context,expiration)
d
Ah ok, that makes sense
j
9 cols vs 8 before
I would have suspected it to fail before, given the 65K limit
but its possible you had less than 10K rels sometimes
d
Yeah, it's definitely possible the error was appearing before. The actual number of 10k batches is quite few, most of the time they are 1k-2k, but occasionally there will be a 10k batch. We never looked at the spicedb logs much before
j
yeah
that's likely the reason it only appears occasionally
I'm trying to reproduce why the error isn't being returned now
what version of PG are you using?
d
16.8
j
well, I've reproduced the insert failing and not returning an error
but I don't know why
I'll continue investigating
d
Ah interesting. Thank you
j
do you have the read replica support enabled?
d
No
j
okay, I see the root cause
missing error check on one particular branch