I'm seeing a **lot** of the following in SpiceDB #spicedb

I'm seeing a **lot** of the following in

Ben Simpson

03/13/2025, 3:13 AM

I'm seeing a lot of the following in the logs after enabling read replica:

"err": "ERROR: transaction ID 808116016 is in the future (SQLSTATE 22023)",

Datastore is AWS Aurora Postgres, we're running SpiceDB 1.41.0 and have set

SPICEDB_DATASTORE_FOLLOWER_READ_DELAY_DURATION=500ms

Our maximum replication lag is around 20ms so 500ms should be more than enough. Using

minimize_latency

for permission checks

vroldanbet

03/13/2025, 9:47 AM

That error is definitely an unreplicated revision. Is it bubbling up in the API call, or is it just being logged because it falls to the primary? If the former, that's unexpected, if the latter, it's expected. There are metrics that show how many times the primary is hit vs the replica It's entirely possible that, even if delay is set, replica misses are hit. The optimized revision selection algorithm (what's used when you do

minimize_latency

will basically look at the last

6s

of quantization window (5s seconds quantization window + 10% extra accepted staleness + 500ms delay). In order to reduce the amount of replica misses, you gotta ingrease the follower read delay.

Joey

03/13/2025, 11:42 AM

@Ben Simpson how are you configuring the read replica endpoint?

Joey

03/13/2025, 11:42 AM

make absolutely certain the primary endpoint is not capable of reaching a replica and vice versa

Joey

03/13/2025, 11:43 AM

if there is a shared DNS endpoint, it won't work

Ben Simpson

03/13/2025, 9:08 PM

Aurora provides a

Writer

and

Reader

endpoint for the cluster. The

Writer

endpoint always connects to the single

Writer

instance, while the

Reader

endpoint always connects to one of the read instances. Based on the documentation, I believe that the only way (barring a failover, which we haven't had) for these endpoints to connect to the same instance would be to only run a single instance with no replicas.

Ben Simpson

03/13/2025, 9:09 PM

I'll have a look at the metrics

Ben Simpson

03/13/2025, 9:20 PM

I'm 99% certain it's not bubbling up to the API, the problem is the sheer amount of error logs resulted in a noticeable increase in cloudwatch usage

Joey

03/13/2025, 9:28 PM

oh; yeah, that's not unexpected

Ben Simpson

03/13/2025, 9:47 PM

Is there a way to suppress specific errors?

Joey

03/13/2025, 9:48 PM

no; you'll need to filter the logs on your side

Joey

03/13/2025, 9:48 PM

we can look into filtering it on our end, but it won't be simple

Ben Simpson

03/13/2025, 10:04 PM

I'm not sure what to make of the fact that 2 of those metrics are 0? https://cdn.discordapp.com/attachments/1349581093334155325/1349865838093533305/image.png?ex=67d4a807&is=67d35687&hm=4a99796ba9c46d414a27b84af33f3568a4e94b0672827ca56cc8af872449b220&

Ben Simpson

03/13/2025, 10:06 PM

But looking at

query_total

fallback_query_total

it's working ok and we just need to tune the delay?

Ben Simpson

03/13/2025, 10:18 PM

I'm just trying to understand why exactly this is missing so frequently with the follower delay set to 500ms because according to a comment from @ecordell about this it should be set to somewhere around the replication lag, which in our case is ~20ms because it's only an AZ within the same region and not global: > If you are going to run with Global DB with cross-region replicas, I would recommend running at least version 1.41 so that you can use the --datastore-follower-read-delay flag with postgres and set it to the replica lag. This will ensure SpiceDB picks revisions for minimize_latency requests that are very likely to be replicated in the local region.

Ben Simpson

03/13/2025, 10:48 PM

I'll adjust it upwards and see if it makes any difference.

Ben Simpson

03/13/2025, 11:14 PM

Yeah increasing it to 2s did basically nothing.

Ben Simpson

03/13/2025, 11:25 PM

New instances with follower delay of 2s were rolled out around 12:03,

replica = total - fallback

https://cdn.discordapp.com/attachments/1349581093334155325/1349886207416602664/image.png?ex=67d4baff&is=67d3697f&hm=3e5100d4cf2a1ec7065309f818a8bdb81d12008768a519aedb502f3b31c9ae34&

Ben Simpson

03/14/2025, 1:52 AM

And lowering the value to 200ms appeared to do nothing either. I checked the maximum replication lag and it's <30ms

ecordell

03/14/2025, 3:16 AM

The metrics look like it’s showing like 80-90% of queries are landing on the replica? Or am I misreading? Can you tell us more about your setup?

Ben Simpson

03/14/2025, 3:22 AM

It's just an Aurora Postgres cluster with 2 instances i.e. single writer, single replica in the same region in different AZs with a maximum replication lag of ~30ms. I would have expected to see closer to 100% hits on the replica with a follower delay of >30ms, or am I misunderstanding something here?

Ben Simpson

03/14/2025, 3:26 AM

And in terms of spicedb, multiple nodes with dispatch enabled

Ben Simpson

03/14/2025, 3:33 AM

The point of the follower delay is to prevent spicedb from trying to fetch revisions from the replica that can't possibly be in the replica, right? If my replication lag is 30ms and I configure a 2s follower delay, how on earth are 10% of queries to the replica missing?

vroldanbet

03/14/2025, 12:00 PM

have you checked if 100% of the traffic is minimize_latency?

vroldanbet

03/14/2025, 12:01 PM

there are metrics for that

vroldanbet

03/14/2025, 12:02 PM

any chance you can check any sample queries that are hitting the primary?

vroldanbet

03/14/2025, 12:14 PM

I'm thinking the following. The quantization algorithm slices time in windows of 6s (assuming the configuration above). Let's say you hit the next quantization window, and the revision that is selected was just written (i.e. it falls in the tail end of the 6s window, so imagine, at the 5.99s), and no other revision exists in that window. Now you use an optimized revision that is very fresh, and unlikely to be replicated. Also, if there is no transaction written in that window, you are 100% going to get an unreplicated snapshot because we use

pg_current_snapshot()

, which is going to give you a revision that does not even exist at that point. So I think that 10% fallback seems reasonable based on how the system is implemented, and I'd argue it's fantastic. I think this scenario could be more likely if your write throuhgput is low and writes are sparse, and can get exacerbated if you have other workloads running in the same database or any other processes that cause out-of-band snapshot changes to Aurora. What's the write throughput in your system? for reference, this is the [optimized revision query](https://github.com/authzed/spicedb/blob/5b3d384cbbe9bcc5d488bb1360f565110cecb184/internal/datastore/postgres/revisions.go#L41-L61)

vroldanbet

03/14/2025, 12:15 PM

As you can see in the comment in the query: > If there are no transactions newer than the quantization period, it just picks the latest transaction.

vroldanbet

03/14/2025, 12:16 PM

I happen to have changed all of this for SpiceDB 1.41, and can tell you that it's much better than before. The quantization system for postgres was broken for databases with sparse writes or out-of-band writes, which basically made

optimized_revision

behave like

full_consistency

vroldanbet

03/14/2025, 12:19 PM

I debated with @Jake a lot about this new implementation, he convinced me that we don't want to give perfect caching, because it can become a double-edged sword.

Ben Simpson

03/16/2025, 6:54 PM

I'm like 99% certain everything is

minimize_latency

but could be something I missed. I'll keep the delay low seeing as it seems to have little impact on the overall %age of fallbacks. I agree that 10% isn't unreasonable, just with large numbers of requests it generates a lot of error logs. I've added a filter to the fluent-bit config to filter this particular error out seeing as we have visibility of this via the metrics. Thanks for all the help, as usual!

Joey

03/16/2025, 6:55 PM

Yeah, we explicitly did not filter out those errors

Joey

03/16/2025, 6:55 PM

so long as the clients are not seeing them, its Working As Intended

Ben Simpson

03/16/2025, 10:33 PM

For anyone else following along who wants to exclude these errors, turns out there are 2 different errors logged:

ERROR: transaction ID <id> is in the future (SQLSTATE 22023)

ERROR: replica missing revision (SQLSTATE P0004)

vroldanbet

03/17/2025, 12:09 PM

@Joey perhaps we could just turn those 2 specific errors into a label like "errorCode" on the metric that captures the fallback, and then just completely elide the error

Joey

03/17/2025, 12:09 PM

we could, but it can't be done at the proxy level

Joey

03/17/2025, 12:09 PM

so it means putting it into the datastore code itself

Joey

03/17/2025, 12:09 PM

its doable, just something that will "bleed" into it

vroldanbet

03/17/2025, 12:11 PM

I see, yeah, it leaks out

vroldanbet

03/17/2025, 12:11 PM

perhaps just moving it to debug level?

vroldanbet

03/17/2025, 12:11 PM

since it's an expected error condition users can't really do much about it

Joey

03/17/2025, 12:12 PM

@Ben Simpson can you paste the full log please?

Joey

03/17/2025, 12:32 PM

https://github.com/authzed/spicedb/pull/2282

Ben Simpson

03/17/2025, 8:37 PM

Copy code

json
{
    "time": "2025-03-14T00:57:27Z",
    "level": "error",
    "source": "stderr",
    "message": "Query",
    "pgx": {
        "time": 1.26576,
        "pid": 1266,
        "sql": "\n\t\tSELECT * FROM (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND namespace = $5 AND relation = $6 AND object_id IN ($7) AND ((userset_namespace = $8 A...",
        "args": [
           "REDACTED"
        ],
        "err": "ERROR: transaction ID 811038035 is in the future (SQLSTATE 22023)"
    }
}

Copy code

json
{
    "source": "stderr",
    "pgx": {
        "err": "ERROR: replica missing revision (SQLSTATE P0004)",
        "pid": 8503,
        "sql": "\n\t\tSELECT * FROM (SELECT serialized_config, created_xid FROM namespace_config WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND (namespace = $5 OR namespace = $6)) AS results WHERE pg_xact_status(816057260::text::xid8) != 'in progress';\n\t\tDO $$\n\t\tBEGIN\n\t\t\tASSERT (select pg_xact_status(816057260:...",
        "args": [
            "REDACTED"
        ],
        "time": 1.933301
    },
    "time": "2025-03-17T00:06:23Z",
    "level": "error",

    "message": "Query"
}

Joey

03/17/2025, 8:47 PM

fix already merged

Joey

03/17/2025, 8:47 PM

will go out in the next release

Ben Simpson

03/17/2025, 9:00 PM

🙌

10 Views

Previous Next