Hello guys. SpiceDB #spicedb

Hello guys.

Rodolfo

10/11/2024, 3:15 PM

Hello guys. I got back to this problem https://discord.com/channels/844600078504951838/844600078948630559/1291681060605661194 and I am still struggling to setup read replicas. These are my container ENV variables: - name: "SPICEDB_DATASTORE_CONN_URI" valueFrom: secretKeyRef: name: {{ .Values.spicedb.secrets.database }} key: url - name: "SPICEDB_DATASTORE_READ_REPLICA_CONN_URI" valueFrom: secretKeyRef: name: {{ .Values.spicedb.secrets.database }} key: read_url Like this, - checkPermissions returns: FAILED_PRECONDITION: object definition

something

not found - write works If I swap read_url with url (to validate the conenction stings), then checkPermission works and writes gie an Error 500 as probably it cant wrinte on the replica. (So it is expected, and the connection string is fine) Both BDs have data on

relation_tuple_transaction

namespace_config

so the replica is working. Could you give me any pointer on how to debug this?

Rodolfo

10/11/2024, 3:23 PM

I am using bitnami postgres charts to setup my postgres. i can post my local values if needed

yetitwo

10/12/2024, 4:13 AM

did you run

spicedb datastore repair

on the replica? if you're using logical replication, the transaction IDs will be out of sync

Rodolfo

10/12/2024, 4:08 PM

how can I check the transaction Ids?

Rodolfo

10/12/2024, 4:12 PM

I did not ran

spicedb datastore repair

on the replica. Did I miss a guide? I can try to run it. I just let the bitnami postgres chart to handle the replication, so I expect to be a physical replication not a logical one

Rodolfo

10/12/2024, 4:12 PM

as it set up de replicas and replication arch for me

yetitwo

10/13/2024, 10:59 PM

if it's physical replication I wouldn't expect to run into this problem

yetitwo

10/13/2024, 10:59 PM

but the symptom you're describing sounds a lot like what happens when TXIDs don't line up

yetitwo

10/13/2024, 10:59 PM

one way to check is to see whether the data appear to be the same in the replica

yetitwo

10/13/2024, 10:59 PM

if they do, it's a txid issue

yetitwo

10/13/2024, 10:59 PM

and all you have to do is run

spicedb datastore repair

with configuration that points at the replica

yetitwo

10/13/2024, 11:00 PM

it looks like i need to add some documentation here: https://authzed.com/docs/spicedb/concepts/datastores#read-replicas i'll make a note to do that

Rodolfo

10/14/2024, 10:00 AM

Hello there. I will be picking this up in a while, but just to give some news, I pulled the TXID string and I got to:

ERROR:  cannot execute pg_current_xact_id() during recovery

(executed manually) I will give more news later

Rodolfo

10/14/2024, 10:40 AM

I ran the

spicedb datastore repair

Copy code

{"level":"info","time":"2024-10-14T10:36:12Z","message":"using postgres datastore engine"}
{"level":"debug","pgx":{"args":[],"commandTag":"SHOW","pid":11410,"sql":"SHOW track_commit_timestamp;","time":0.538},"time":"2024-10-14T10:36:12Z","message":"Query"}
{"level":"warn","time":"2024-10-14T10:36:12Z","message":"datastore background garbage collection disabled"}
{"level":"debug","pgx":{"args":[],"commandTag":"SHOW","pid":10873,"sql":"SHOW track_commit_timestamp;","time":1.342708},"time":"2024-10-14T10:36:12Z","message":"Query"}
{"level":"debug","replica-count":1,"time":"2024-10-14T10:36:12Z","message":"Using replicas for reads"}
{"level":"error","error":"datastore of type *proxy.strictReplicatedDatastore does not support the repair operation","time":"2024-10-14T10:36:12Z","message":"terminated with errors"}

TL;DR;:

proxy.strictReplicatedDatastore does not support the repair operation

Notes: I ran with the some ENV properties the I am running with the

serve

command. (it works on writes, but fails on reads)

SELECT max(xid::text::integer) FROM relation_tuple_transaction

is aligned between primary and replica And looking at spiceDB code, i tried to run the pg_current_xact_id onte replica and got

cannot execute pg_current_xact_id() during recovery

yetitwo

10/14/2024, 2:37 PM

so the replica is in a recovery mode of some sort?

Rodolfo

10/14/2024, 8:45 PM

yes. But I believe it is normal for postgres. The primary writes to a WAL log while the read replicas are in perpetual recovery mode reading from it. If the pg_current_xact_id() needs to run at a read replica, then it might be the issue. (tomorrow I will keep the investigation)

yetitwo

10/15/2024, 5:45 PM

> bitnami postgres charts so this is within kubernetes? do you have a link to those charts?

yetitwo

10/15/2024, 5:54 PM

i'll also say that we have tested logical replication and there are users who are running it in production

yetitwo

10/15/2024, 5:54 PM

i'm not sure we've done the same for physical replication

yetitwo

10/15/2024, 5:54 PM

i'm asking around though

yetitwo

10/15/2024, 6:48 PM

we haven't tested physical replication ourselves and don't have much experience with it. one thing that may help is to disable replication, run

spicedb datastore repair

on the replica, and reenable replication

Rodolfo

10/16/2024, 8:38 AM

Hello, Yes I can help / give you my scenario. We are using AWS aurora, so it is physical replication out of the box

Rodolfo

10/16/2024, 8:39 AM

(at least my cloud team set it up everything this way)

yetitwo

10/16/2024, 2:33 PM

gotcha

Rodolfo

10/16/2024, 2:38 PM

I will try to setup a repo and share with you later this week

Rodolfo

10/16/2024, 2:39 PM

as I think it would be a shame if the physical replication is not suported 😢

Rodolfo

10/16/2024, 2:39 PM

(we already have instance_count > 1 as a menas of hot swap withe the primary during maintnance)

yetitwo

10/16/2024, 2:57 PM

yeah, agreed. i'm also not sure what would go into it if the transaction counters are desynced and we don't have a means of correcting that.

39 Views

Previous Next