Hello guys.
# spicedb
r
Hello guys. I got back to this problem https://discord.com/channels/844600078504951838/844600078948630559/1291681060605661194 and I am still struggling to setup read replicas. These are my container ENV variables: - name: "SPICEDB_DATASTORE_CONN_URI" valueFrom: secretKeyRef: name: {{ .Values.spicedb.secrets.database }} key: url - name: "SPICEDB_DATASTORE_READ_REPLICA_CONN_URI" valueFrom: secretKeyRef: name: {{ .Values.spicedb.secrets.database }} key: read_url Like this, - checkPermissions returns: FAILED_PRECONDITION: object definition
something
not found - write works If I swap read_url with url (to validate the conenction stings), then checkPermission works and writes gie an Error 500 as probably it cant wrinte on the replica. (So it is expected, and the connection string is fine) Both BDs have data on
relation_tuple_transaction
/
namespace_config
so the replica is working. Could you give me any pointer on how to debug this?
I am using bitnami postgres charts to setup my postgres. i can post my local values if needed
y
did you run
spicedb datastore repair
on the replica? if you're using logical replication, the transaction IDs will be out of sync
r
how can I check the transaction Ids?
I did not ran
spicedb datastore repair
on the replica. Did I miss a guide? I can try to run it. I just let the bitnami postgres chart to handle the replication, so I expect to be a physical replication not a logical one
as it set up de replicas and replication arch for me
y
if it's physical replication I wouldn't expect to run into this problem
but the symptom you're describing sounds a lot like what happens when TXIDs don't line up
one way to check is to see whether the data appear to be the same in the replica
if they do, it's a txid issue
and all you have to do is run
spicedb datastore repair
with configuration that points at the replica
it looks like i need to add some documentation here: https://authzed.com/docs/spicedb/concepts/datastores#read-replicas i'll make a note to do that
r
Hello there. I will be picking this up in a while, but just to give some news, I pulled the TXID string and I got to:
ERROR:  cannot execute pg_current_xact_id() during recovery
(executed manually) I will give more news later
I ran the
spicedb datastore repair
Copy code
{"level":"info","time":"2024-10-14T10:36:12Z","message":"using postgres datastore engine"}
{"level":"debug","pgx":{"args":[],"commandTag":"SHOW","pid":11410,"sql":"SHOW track_commit_timestamp;","time":0.538},"time":"2024-10-14T10:36:12Z","message":"Query"}
{"level":"warn","time":"2024-10-14T10:36:12Z","message":"datastore background garbage collection disabled"}
{"level":"debug","pgx":{"args":[],"commandTag":"SHOW","pid":10873,"sql":"SHOW track_commit_timestamp;","time":1.342708},"time":"2024-10-14T10:36:12Z","message":"Query"}
{"level":"debug","replica-count":1,"time":"2024-10-14T10:36:12Z","message":"Using replicas for reads"}
{"level":"error","error":"datastore of type *proxy.strictReplicatedDatastore does not support the repair operation","time":"2024-10-14T10:36:12Z","message":"terminated with errors"}
TL;DR;:
proxy.strictReplicatedDatastore does not support the repair operation
Notes: I ran with the some ENV properties the I am running with the
serve
command. (it works on writes, but fails on reads)
SELECT max(xid::text::integer) FROM relation_tuple_transaction
is aligned between primary and replica And looking at spiceDB code, i tried to run the pg_current_xact_id onte replica and got
cannot execute pg_current_xact_id() during recovery
y
so the replica is in a recovery mode of some sort?
r
yes. But I believe it is normal for postgres. The primary writes to a WAL log while the read replicas are in perpetual recovery mode reading from it. If the pg_current_xact_id() needs to run at a read replica, then it might be the issue. (tomorrow I will keep the investigation)
y
> bitnami postgres charts so this is within kubernetes? do you have a link to those charts?
i'll also say that we have tested logical replication and there are users who are running it in production
i'm not sure we've done the same for physical replication
i'm asking around though
we haven't tested physical replication ourselves and don't have much experience with it. one thing that may help is to disable replication, run
spicedb datastore repair
on the replica, and reenable replication
r
Hello, Yes I can help / give you my scenario. We are using AWS aurora, so it is physical replication out of the box
(at least my cloud team set it up everything this way)
y
gotcha
r
I will try to setup a repo and share with you later this week
as I think it would be a shame if the physical replication is not suported 😢
(we already have instance_count > 1 as a menas of hot swap withe the primary during maintnance)
y
yeah, agreed. i'm also not sure what would go into it if the transaction counters are desynced and we don't have a means of correcting that.
39 Views