Duncan
09/19/2023, 10:26 AMpanic: runtime error: makeslice: cap out of range
from github.com/authzed/spicedb/internal/datastore/postgres.parseRevisionDecimal({0xc000bce280?, 0x1?})
(larger trace in thread) .
We suspect this is because we have some consistency tokens stored from when we were on serverless that don't match our self-hosted db, but wanted to:
1. check that this is a likely cause of the error we're seeing
2. report that this takes our our entire SpiceDB cluster when it happens. We haven't isolated yet if this is because we retry the permission check on a service failure, so we quickly cycle through the available spicedb nodes until they're all dead, or if the group failure is the result of internal cluster communication.Duncan
09/19/2023, 10:30 AM2023-09-18 16:15:10
/home/runner/work/spicedb/spicedb/internal/datastore/postgres/revisions.go:126 +0x25
2023-09-18 16:15:10
github.com/authzed/spicedb/internal/datastore/postgres.(*pgDatastore).RevisionFromString(0xc0011dc6e8?, {0xc000bce280?, 0xc000a8a8c0?})
2023-09-18 16:15:10
/home/runner/work/spicedb/spicedb/internal/datastore/postgres/revisions.go:132 +0x4f
2023-09-18 16:15:10
github.com/authzed/spicedb/internal/datastore/postgres.parseRevision({0xc000bce280, 0x1e})
2023-09-18 16:15:10
/home/runner/work/spicedb/spicedb/internal/datastore/postgres/revisions.go:206 +0x1ba
2023-09-18 16:15:10
github.com/authzed/spicedb/internal/datastore/postgres.parseRevisionDecimal({0xc000bce280?, 0x1?})
2023-09-18 16:15:10
goroutine 811 [running]:
2023-09-18 16:15:10
2023-09-18 16:15:10
panic: runtime error: makeslice: cap out of range
Likely cause:
https://github.com/authzed/spicedb/blob/main/internal/datastore/postgres/revisions.go#L206
where xmax-xmin
is producing an invalid cap, similar to https://github.com/golang/go/issues/52783Joey
09/19/2023, 2:56 PMJoey
09/19/2023, 2:56 PMJoey
09/19/2023, 2:56 PMDuncan
09/19/2023, 3:44 PMJoey
09/19/2023, 3:46 PMJoey
09/19/2023, 3:46 PMDuncan
09/19/2023, 6:11 PMGhUKEzE2OTM1NDA5NDQ5NTk3MjA1OTI=
. These are handled fine, and just cause a “revision was invalid” error, or a fallback to another consistency mode.
Some zedtokens are of the format GiAKHjE2OTM1NDA5NDAzNzMwNDU3MjcuMDAwMDAwMDAwMQ==
. That one nukes an instance.
zed --endpoint=spicedb.us.com:443 --token=<auth-token> --permissions-system perms/ relationship read perms/application:1 --consistency-at-least GiAKHjE2OTM1NDA5NDAzNzMwNDU3MjcuMDAwMDAwMDAwMQ==
Error: rpc error: code = Unavailable desc = unexpected HTTP status code received from server: 502 (Bad Gateway); transport: received unexpected content-type "text/html"
same effect with --consistency-at-exactly
. both take out the spicedb pod that received the request, with the same error shown above.
The request only takes out the one pod though. The issue we saw with all pods in the SpiceDbCluster failing must have been the result of us retrying, or a burst of similar requests coming in.Duncan
09/19/2023, 6:20 PMmax>min
on L205Duncan
09/19/2023, 6:23 PMJoey
09/19/2023, 6:32 PMJoey
09/19/2023, 6:32 PMJoey
09/19/2023, 6:32 PMJoey
09/19/2023, 6:32 PMDuncan
09/19/2023, 6:33 PMJoey
09/19/2023, 6:33 PMDuncan
09/19/2023, 6:34 PMJoey
09/19/2023, 6:35 PMJoey
09/19/2023, 6:35 PMJoey
09/19/2023, 6:35 PMJoey
09/19/2023, 6:58 PMJoey
09/19/2023, 6:58 PMJoey
09/19/2023, 6:58 PMJoey
09/19/2023, 7:15 PMDuncan
09/19/2023, 7:47 PMJoey
09/19/2023, 7:59 PMDuncan
09/20/2023, 9:34 AMJoey
09/20/2023, 2:57 PM