Hi guys! SpiceDB #spicedb

Hi guys!

0xb4lamx [EU]

12/13/2024, 2:41 PM

Hi guys! we're migrating our SpiceDB datastore from PSQL server to Aurora. (using AWS DMS for data migration) now we have to run

spicedb datastore repair

! it seems that it needs quite a time to run! (if possible to share more details on what it does) Also curious if we have to run it again if we have to second migrate the new added data after migration? I think yes, then will it need the same amount of repair again? Or if Any proposals on how to approach this while keeping realtime sync with old datastore (used in prod)! Thx 🙏 a thought: 1. first migrating all existing data, then transaction id repair it. //this should cover most data, we will need make sure that new DB works fine and transactionID is fixed. 2. then new added data after first migration (shouldnt have many of them), will be handled via some scripting ~ using Spicedb bulkimport (so we dont need to repair again, if a second repair will still require much time ~ not sure if it is a locking process)

vroldanbet

12/13/2024, 3:21 PM

SpiceDB does not officially support Amazon Aurora despite its compatibility with the Postgres wire protocol. We cannot ensure that SpiceDB's consistency guarantees will be maintained when using Aurora, although it is likely to be the case if Aurora supports a

SERIALIZABLE

consistency level. Could you please specify which version of Aurora you are using? SpiceDB leverages Postgres's snapshotting primitives to manage its internal revisions for ZedTokens. Consequently, when migrating from one database to another, the internal snapshot high-watermark in Postgres will differ from the value stored in SpiceDB's transactions table. The

repair

command creates artificial transactions to increase the high watermark. So the time it takes to repair depends on the lifespan of your previous Postgres cluster and how long it takes to run a transaction in Aurora; roughly as many transactions were run on it. I think we could improve

repair

easily with some progress reporting since we know the current transaction ID and the target transaction. The XID may never converge to that of your original Postgres: we have not tested this scenario with Aurora.

vroldanbet

12/13/2024, 3:25 PM

> Also curious if we have to run it again if we have to second migrate the new added data after migration? I think yes, then will it need the same amount of repair again? Any time you move to a new database, you have to run repair > Or if Any proposals on how to approach this while keeping realtime sync with old datastore (used in prod)! Yes you can use: 1. Bulk export to get a snapshot of your SpiceDB 2. Use Bulk import into the new SpiceDB with the new Database 3. Use Watch API in source SpiceDB at the revision of the snapshot, and write every event emitted to the new SpiceDB cluster 4. Stop writes to old cluster, move them to new cluster This may get more complicated if you have stored zedtokens in your database, in that case you'd have to drop all zed tokens

vroldanbet

12/13/2024, 3:27 PM

AWS DMS is more complicated because you'd need to identiy what is the revision at which the database was migrated. Otherwise if you can manually reconcile any missing writes manually, you can do that too before the cut over.

vroldanbet

12/13/2024, 3:27 PM

The bulk export stuff works as well, yeah, so long you know what you are doing at the application level

0xb4lamx [EU]

12/13/2024, 3:58 PM

thx for swift reply and details! > Could you please specify which version of Aurora you are using new Aurora v16.6 (psql engine) current psql DB v16.3 over 20M relationships are defined in

relation_tuple

repaid command progress shows: 17240000/1072123333 > Any time you move to a new database, you have to run repair not new database, but for the delta data that is added after repair! (still not synced when we finished first migration phase is done) > Bulk export to get a snapshot of your SpiceDB I dont have too much info on how it works! (is seems stream baased?) do you think Bulk export will works fine with 20M+ relations? if it is killed in between, should have a way to resume? > This may get more complicated if you have stored zedtokens in your database currently we don't store ZedTokens on DB, only cached on Redis with 10s TTL. so shouldnt be an issue

vroldanbet

12/13/2024, 4:16 PM

>17240000/1072123333 how long has it been running?

vroldanbet

12/13/2024, 4:17 PM

> not new database, but for the delta data that is added after repair! (still not synced when we finished first migration phase is done) no, that does not need repair. This is only needed once per backup restored into a new database

vroldanbet

12/13/2024, 4:18 PM

> I dont have too much info on how it works! (is seems stream baased?) do you think Bulk export will works fine with 20M+ relations? > if it is killed in between, should have a way to resume? Yes, it should be able to handle that quickly, and it can resume, the API exposes a cursor with each reponse

vroldanbet

12/13/2024, 4:18 PM

> currently we don't store ZedTokens on DB, only cached on Redis with 10s TTL. so shouldnt be an issue cool, then everything else should be relatively easy

0xb4lamx [EU]

12/13/2024, 4:54 PM

> how long has it been running? 3h just for 2% not sure why it needs such time relation_tuple_transaction has only ~500 entry > no, that does not need repair. This is only needed once per backup restored into a new database Oh! I thought the new data added to old Database withe different snapshots will still be be based on old db transaction counter 🤔 when we second migrate it manually, we thought that it will need to be fixed again 🤔 > Yes, it should be able to handle that quickly, and it can resume, the API exposes a cursor with each reponse Good! nice to know

vroldanbet

12/16/2024, 8:29 AM

> 3h just for 2% > not sure why it needs such time > relation_tuple_transaction has only ~500 entry It's not related to the number of transactions in your table, but to highest transaction ID (

xid

). SpiceDB will GC old transactions after 24h by default, so 500 transactions is not representative of the number of transactions the cluster observed. It needs that amount of time because: - your old cluster observed many transactions - aurora transactions seem to be slow enough it makes little progress > Oh! I thought the new data added to old Database withe different snapshots will still be be based on old db transaction counter 🤔 when we second migrate it manually, we thought that it will need to be fixed again 🤔 That's irrelevant to the new DB. What we are trying to solve with

repair

is that you load a specific state into a database, but the highest transaction ID

xid

differs from what is stored in the

relation_tuple_transaction

table. Once the internal postgres transaction ID moves past the highest one in that table, any further changes you apply are going to continue moving that number up. That's all. All we need is that internal PG transaction ID is larger than the biggest transaction ID in the

relation_tuple_transaction

table

51 Views

Previous Next