Ben Simpson
03/13/2025, 3:13 AM"err": "ERROR: transaction ID 808116016 is in the future (SQLSTATE 22023)",
Datastore is AWS Aurora Postgres, we're running SpiceDB 1.41.0 and have set SPICEDB_DATASTORE_FOLLOWER_READ_DELAY_DURATION=500ms
Our maximum replication lag is around 20ms so 500ms should be more than enough. Using minimize_latency
for permission checksvroldanbet
03/13/2025, 9:47 AMminimize_latency
will basically look at the last 6s
of quantization window (5s seconds quantization window + 10% extra accepted staleness + 500ms delay).
In order to reduce the amount of replica misses, you gotta ingrease the follower read delay.Joey
03/13/2025, 11:42 AMJoey
03/13/2025, 11:42 AMJoey
03/13/2025, 11:43 AMBen Simpson
03/13/2025, 9:08 PMWriter
and Reader
endpoint for the cluster. The Writer
endpoint always connects to the single Writer
instance, while the Reader
endpoint always connects to one of the read instances. Based on the documentation, I believe that the only way (barring a failover, which we haven't had) for these endpoints to connect to the same instance would be to only run a single instance with no replicas.Ben Simpson
03/13/2025, 9:09 PMBen Simpson
03/13/2025, 9:20 PMJoey
03/13/2025, 9:28 PMBen Simpson
03/13/2025, 9:47 PMJoey
03/13/2025, 9:48 PMJoey
03/13/2025, 9:48 PMBen Simpson
03/13/2025, 10:04 PMBen Simpson
03/13/2025, 10:06 PMquery_total
vs fallback_query_total
it's working ok and we just need to tune the delay?Ben Simpson
03/13/2025, 10:18 PMBen Simpson
03/13/2025, 10:48 PMBen Simpson
03/13/2025, 11:14 PMBen Simpson
03/13/2025, 11:25 PMreplica = total - fallback
https://cdn.discordapp.com/attachments/1349581093334155325/1349886207416602664/image.png?ex=67d4baff&is=67d3697f&hm=3e5100d4cf2a1ec7065309f818a8bdb81d12008768a519aedb502f3b31c9ae34&Ben Simpson
03/14/2025, 1:52 AMecordell
03/14/2025, 3:16 AMBen Simpson
03/14/2025, 3:22 AMBen Simpson
03/14/2025, 3:26 AMBen Simpson
03/14/2025, 3:33 AMvroldanbet
03/14/2025, 12:00 PMvroldanbet
03/14/2025, 12:01 PMvroldanbet
03/14/2025, 12:02 PMvroldanbet
03/14/2025, 12:14 PMpg_current_snapshot()
, which is going to give you a revision that does not even exist at that point.
So I think that 10% fallback seems reasonable based on how the system is implemented, and I'd argue it's fantastic.
I think this scenario could be more likely if your write throuhgput is low and writes are sparse, and can get exacerbated if you have other workloads running in the same database or any other processes that cause out-of-band snapshot changes to Aurora. What's the write throughput in your system?
for reference, this is the [optimized revision query](https://github.com/authzed/spicedb/blob/5b3d384cbbe9bcc5d488bb1360f565110cecb184/internal/datastore/postgres/revisions.go#L41-L61)vroldanbet
03/14/2025, 12:15 PMvroldanbet
03/14/2025, 12:16 PMoptimized_revision
behave like full_consistency
vroldanbet
03/14/2025, 12:19 PMBen Simpson
03/16/2025, 6:54 PMminimize_latency
but could be something I missed. I'll keep the delay low seeing as it seems to have little impact on the overall %age of fallbacks. I agree that 10% isn't unreasonable, just with large numbers of requests it generates a lot of error logs. I've added a filter to the fluent-bit config to filter this particular error out seeing as we have visibility of this via the metrics.
Thanks for all the help, as usual!Joey
03/16/2025, 6:55 PMJoey
03/16/2025, 6:55 PMBen Simpson
03/16/2025, 10:33 PMERROR: transaction ID <id> is in the future (SQLSTATE 22023)
ERROR: replica missing revision (SQLSTATE P0004)
vroldanbet
03/17/2025, 12:09 PMJoey
03/17/2025, 12:09 PMJoey
03/17/2025, 12:09 PMJoey
03/17/2025, 12:09 PMvroldanbet
03/17/2025, 12:11 PMvroldanbet
03/17/2025, 12:11 PMvroldanbet
03/17/2025, 12:11 PMJoey
03/17/2025, 12:12 PMJoey
03/17/2025, 12:32 PMBen Simpson
03/17/2025, 8:37 PMjson
{
"time": "2025-03-14T00:57:27Z",
"level": "error",
"source": "stderr",
"message": "Query",
"pgx": {
"time": 1.26576,
"pid": 1266,
"sql": "\n\t\tSELECT * FROM (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND namespace = $5 AND relation = $6 AND object_id IN ($7) AND ((userset_namespace = $8 A...",
"args": [
"REDACTED"
],
"err": "ERROR: transaction ID 811038035 is in the future (SQLSTATE 22023)"
}
}
json
{
"source": "stderr",
"pgx": {
"err": "ERROR: replica missing revision (SQLSTATE P0004)",
"pid": 8503,
"sql": "\n\t\tSELECT * FROM (SELECT serialized_config, created_xid FROM namespace_config WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND (namespace = $5 OR namespace = $6)) AS results WHERE pg_xact_status(816057260::text::xid8) != 'in progress';\n\t\tDO $$\n\t\tBEGIN\n\t\t\tASSERT (select pg_xact_status(816057260:...",
"args": [
"REDACTED"
],
"time": 1.933301
},
"time": "2025-03-17T00:06:23Z",
"level": "error",
"message": "Query"
}
Joey
03/17/2025, 8:47 PMJoey
03/17/2025, 8:47 PMBen Simpson
03/17/2025, 9:00 PM