Hi guys!
# spicedb
u
Hi guys! we're migrating our SpiceDB datastore from PSQL server to Aurora. (using AWS DMS for data migration) now we have to run
spicedb datastore repair
! it seems that it needs quite a time to run! (if possible to share more details on what it does) Also curious if we have to run it again if we have to second migrate the new added data after migration? I think yes, then will it need the same amount of repair again? Or if Any proposals on how to approach this while keeping realtime sync with old datastore (used in prod)! Thx 🙏 a thought: 1. first migrating all existing data, then transaction id repair it. //this should cover most data, we will need make sure that new DB works fine and transactionID is fixed. 2. then new added data after first migration (shouldnt have many of them), will be handled via some scripting ~ using Spicedb bulkimport (so we dont need to repair again, if a second repair will still require much time ~ not sure if it is a locking process)
v
SpiceDB does not officially support Amazon Aurora despite its compatibility with the Postgres wire protocol. We cannot ensure that SpiceDB's consistency guarantees will be maintained when using Aurora, although it is likely to be the case if Aurora supports a
SERIALIZABLE
consistency level. Could you please specify which version of Aurora you are using? SpiceDB leverages Postgres's snapshotting primitives to manage its internal revisions for ZedTokens. Consequently, when migrating from one database to another, the internal snapshot high-watermark in Postgres will differ from the value stored in SpiceDB's transactions table. The
repair
command creates artificial transactions to increase the high watermark. So the time it takes to repair depends on the lifespan of your previous Postgres cluster and how long it takes to run a transaction in Aurora; roughly as many transactions were run on it. I think we could improve
repair
easily with some progress reporting since we know the current transaction ID and the target transaction. The XID may never converge to that of your original Postgres: we have not tested this scenario with Aurora.
> Also curious if we have to run it again if we have to second migrate the new added data after migration? I think yes, then will it need the same amount of repair again? Any time you move to a new database, you have to run repair > Or if Any proposals on how to approach this while keeping realtime sync with old datastore (used in prod)! Yes you can use: 1. Bulk export to get a snapshot of your SpiceDB 2. Use Bulk import into the new SpiceDB with the new Database 3. Use Watch API in source SpiceDB at the revision of the snapshot, and write every event emitted to the new SpiceDB cluster 4. Stop writes to old cluster, move them to new cluster This may get more complicated if you have stored zedtokens in your database, in that case you'd have to drop all zed tokens
AWS DMS is more complicated because you'd need to identiy what is the revision at which the database was migrated. Otherwise if you can manually reconcile any missing writes manually, you can do that too before the cut over.
The bulk export stuff works as well, yeah, so long you know what you are doing at the application level
u
thx for swift reply and details! > Could you please specify which version of Aurora you are using new Aurora v16.6 (psql engine) current psql DB v16.3 over 20M relationships are defined in
relation_tuple
repaid command progress shows: 17240000/1072123333 > Any time you move to a new database, you have to run repair not new database, but for the delta data that is added after repair! (still not synced when we finished first migration phase is done) > Bulk export to get a snapshot of your SpiceDB I dont have too much info on how it works! (is seems stream baased?) do you think Bulk export will works fine with 20M+ relations? if it is killed in between, should have a way to resume? > This may get more complicated if you have stored zedtokens in your database currently we don't store ZedTokens on DB, only cached on Redis with 10s TTL. so shouldnt be an issue
v
>17240000/1072123333 how long has it been running?
> not new database, but for the delta data that is added after repair! (still not synced when we finished first migration phase is done) no, that does not need repair. This is only needed once per backup restored into a new database
> I dont have too much info on how it works! (is seems stream baased?) do you think Bulk export will works fine with 20M+ relations? > if it is killed in between, should have a way to resume? Yes, it should be able to handle that quickly, and it can resume, the API exposes a cursor with each reponse
> currently we don't store ZedTokens on DB, only cached on Redis with 10s TTL. so shouldnt be an issue cool, then everything else should be relatively easy
u
> how long has it been running? 3h just for 2% not sure why it needs such time relation_tuple_transaction has only ~500 entry > no, that does not need repair. This is only needed once per backup restored into a new database Oh! I thought the new data added to old Database withe different snapshots will still be be based on old db transaction counter 🤔 when we second migrate it manually, we thought that it will need to be fixed again 🤔 > Yes, it should be able to handle that quickly, and it can resume, the API exposes a cursor with each reponse Good! nice to know
v
> 3h just for 2% > not sure why it needs such time > relation_tuple_transaction has only ~500 entry It's not related to the number of transactions in your table, but to highest transaction ID (
xid
). SpiceDB will GC old transactions after 24h by default, so 500 transactions is not representative of the number of transactions the cluster observed. It needs that amount of time because: - your old cluster observed many transactions - aurora transactions seem to be slow enough it makes little progress > Oh! I thought the new data added to old Database withe different snapshots will still be be based on old db transaction counter 🤔 when we second migrate it manually, we thought that it will need to be fixed again 🤔 That's irrelevant to the new DB. What we are trying to solve with
repair
is that you load a specific state into a database, but the highest transaction ID
xid
differs from what is stored in the
relation_tuple_transaction
table. Once the internal postgres transaction ID moves past the highest one in that table, any further changes you apply are going to continue moving that number up. That's all. All we need is that internal PG transaction ID is larger than the biggest transaction ID in the
relation_tuple_transaction
table
51 Views