I'm seeing a **lot** of the following in
# spicedb
b
I'm seeing a lot of the following in the logs after enabling read replica:
"err": "ERROR: transaction ID 808116016 is in the future (SQLSTATE 22023)",
Datastore is AWS Aurora Postgres, we're running SpiceDB 1.41.0 and have set
SPICEDB_DATASTORE_FOLLOWER_READ_DELAY_DURATION=500ms
Our maximum replication lag is around 20ms so 500ms should be more than enough. Using
minimize_latency
for permission checks
v
That error is definitely an unreplicated revision. Is it bubbling up in the API call, or is it just being logged because it falls to the primary? If the former, that's unexpected, if the latter, it's expected. There are metrics that show how many times the primary is hit vs the replica It's entirely possible that, even if delay is set, replica misses are hit. The optimized revision selection algorithm (what's used when you do
minimize_latency
will basically look at the last
6s
of quantization window (5s seconds quantization window + 10% extra accepted staleness + 500ms delay). In order to reduce the amount of replica misses, you gotta ingrease the follower read delay.
j
@Ben Simpson how are you configuring the read replica endpoint?
make absolutely certain the primary endpoint is not capable of reaching a replica and vice versa
if there is a shared DNS endpoint, it won't work
b
Aurora provides a
Writer
and
Reader
endpoint for the cluster. The
Writer
endpoint always connects to the single
Writer
instance, while the
Reader
endpoint always connects to one of the read instances. Based on the documentation, I believe that the only way (barring a failover, which we haven't had) for these endpoints to connect to the same instance would be to only run a single instance with no replicas.
I'll have a look at the metrics
I'm 99% certain it's not bubbling up to the API, the problem is the sheer amount of error logs resulted in a noticeable increase in cloudwatch usage
j
oh; yeah, that's not unexpected
b
Is there a way to suppress specific errors?
j
no; you'll need to filter the logs on your side
we can look into filtering it on our end, but it won't be simple
But looking at
query_total
vs
fallback_query_total
it's working ok and we just need to tune the delay?
I'm just trying to understand why exactly this is missing so frequently with the follower delay set to 500ms because according to a comment from @ecordell about this it should be set to somewhere around the replication lag, which in our case is ~20ms because it's only an AZ within the same region and not global: > If you are going to run with Global DB with cross-region replicas, I would recommend running at least version 1.41 so that you can use the --datastore-follower-read-delay flag with postgres and set it to the replica lag. This will ensure SpiceDB picks revisions for minimize_latency requests that are very likely to be replicated in the local region.
I'll adjust it upwards and see if it makes any difference.
Yeah increasing it to 2s did basically nothing.
And lowering the value to 200ms appeared to do nothing either. I checked the maximum replication lag and it's <30ms
e
The metrics look like itโ€™s showing like 80-90% of queries are landing on the replica? Or am I misreading? Can you tell us more about your setup?
b
It's just an Aurora Postgres cluster with 2 instances i.e. single writer, single replica in the same region in different AZs with a maximum replication lag of ~30ms. I would have expected to see closer to 100% hits on the replica with a follower delay of >30ms, or am I misunderstanding something here?
And in terms of spicedb, multiple nodes with dispatch enabled
The point of the follower delay is to prevent spicedb from trying to fetch revisions from the replica that can't possibly be in the replica, right? If my replication lag is 30ms and I configure a 2s follower delay, how on earth are 10% of queries to the replica missing?
v
have you checked if 100% of the traffic is minimize_latency?
there are metrics for that
any chance you can check any sample queries that are hitting the primary?
I'm thinking the following. The quantization algorithm slices time in windows of 6s (assuming the configuration above). Let's say you hit the next quantization window, and the revision that is selected was just written (i.e. it falls in the tail end of the 6s window, so imagine, at the 5.99s), and no other revision exists in that window. Now you use an optimized revision that is very fresh, and unlikely to be replicated. Also, if there is no transaction written in that window, you are 100% going to get an unreplicated snapshot because we use
pg_current_snapshot()
, which is going to give you a revision that does not even exist at that point. So I think that 10% fallback seems reasonable based on how the system is implemented, and I'd argue it's fantastic. I think this scenario could be more likely if your write throuhgput is low and writes are sparse, and can get exacerbated if you have other workloads running in the same database or any other processes that cause out-of-band snapshot changes to Aurora. What's the write throughput in your system? for reference, this is the [optimized revision query](https://github.com/authzed/spicedb/blob/5b3d384cbbe9bcc5d488bb1360f565110cecb184/internal/datastore/postgres/revisions.go#L41-L61)
As you can see in the comment in the query: > If there are no transactions newer than the quantization period, it just picks the latest transaction.
I happen to have changed all of this for SpiceDB 1.41, and can tell you that it's much better than before. The quantization system for postgres was broken for databases with sparse writes or out-of-band writes, which basically made
optimized_revision
behave like
full_consistency
I debated with @Jake a lot about this new implementation, he convinced me that we don't want to give perfect caching, because it can become a double-edged sword.
b
I'm like 99% certain everything is
minimize_latency
but could be something I missed. I'll keep the delay low seeing as it seems to have little impact on the overall %age of fallbacks. I agree that 10% isn't unreasonable, just with large numbers of requests it generates a lot of error logs. I've added a filter to the fluent-bit config to filter this particular error out seeing as we have visibility of this via the metrics. Thanks for all the help, as usual!
j
Yeah, we explicitly did not filter out those errors
so long as the clients are not seeing them, its Working As Intended
b
For anyone else following along who wants to exclude these errors, turns out there are 2 different errors logged:
ERROR: transaction ID <id> is in the future (SQLSTATE 22023)
ERROR: replica missing revision (SQLSTATE P0004)
v
@Joey perhaps we could just turn those 2 specific errors into a label like "errorCode" on the metric that captures the fallback, and then just completely elide the error
j
we could, but it can't be done at the proxy level
so it means putting it into the datastore code itself
its doable, just something that will "bleed" into it
v
I see, yeah, it leaks out
perhaps just moving it to debug level?
since it's an expected error condition users can't really do much about it
j
@Ben Simpson can you paste the full log please?
b
Copy code
json
{
    "time": "2025-03-14T00:57:27Z",
    "level": "error",
    "source": "stderr",
    "message": "Query",
    "pgx": {
        "time": 1.26576,
        "pid": 1266,
        "sql": "\n\t\tSELECT * FROM (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND namespace = $5 AND relation = $6 AND object_id IN ($7) AND ((userset_namespace = $8 A...",
        "args": [
           "REDACTED"
        ],
        "err": "ERROR: transaction ID 811038035 is in the future (SQLSTATE 22023)"
    }
}
Copy code
json
{
    "source": "stderr",
    "pgx": {
        "err": "ERROR: replica missing revision (SQLSTATE P0004)",
        "pid": 8503,
        "sql": "\n\t\tSELECT * FROM (SELECT serialized_config, created_xid FROM namespace_config WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND (namespace = $5 OR namespace = $6)) AS results WHERE pg_xact_status(816057260::text::xid8) != 'in progress';\n\t\tDO $$\n\t\tBEGIN\n\t\t\tASSERT (select pg_xact_status(816057260:...",
        "args": [
            "REDACTED"
        ],
        "time": 1.933301
    },
    "time": "2025-03-17T00:06:23Z",
    "level": "error",

    "message": "Query"
}
j
fix already merged
will go out in the next release
b
๐Ÿ™Œ
10 Views