I'm backfilling a lot of historical data into spic...
# spicedb
w
I'm backfilling a lot of historical data into spicedb (v16.1, Postgres datastore). I'm seeing a lot of
ERROR: could not serialize access due to read/write dependencies among transactions (SQLSTATE 40001)
. I'd like to understand it a bit more: it sounds that it's because I'm parallelising the ingestion of data (multiple writers) and SpiceDB uses a
serializable
isolation level? Is there a retry on SpiceDB side? Why does SpiceDB require
serializable
?
v
SpiceDB uses serializable isolation as it's the level of guarantee that Spanner offers to support Zanzibar, and what gives us the guarantees it solves the new enemy problem. I know that we have handling in place for retrying CockroachDB transactions that were overlapping, but I'm not sure about the PosgreSQL implementation. Presumably yes, since the retry logic is like a middleware, but perhaps it's not catching the right errors to retry on. We merged some optimizations in 1.17 that should make transactions faster and thus reduce the likelyhood of them conflicting, but I can't guarantee that would solve the problem. In general each datastore working at serializable level has a limit of how much data it can ingest.
@Jake may be able to add more, since he has good understanding of the PostgreSQL datastore
w
> We merged some optimizations in 1.17 that should make transactions faster Do you have an idea of "how much faster"? 😄
j
@williamdclt also try using CREATE instead of TOUCH if you know the data is supposed to be new
It’s faster
Longer term for backfilling we have plans to provide bulk importing capabilities that will “bypass” (not really, but somewhat circumvent) the transactional concurrency system for faster import at the cost of a bit of staleness during the import
v
>Do you have an idea of "how much faster"? 😄 depends on your
WriteRelationships
requests. It fixed an N+1 query. If your writes contain one single target relation, then it's the same performance. If your writes contain more than 1 target, it eliminates one query and rountrip. If your writes have N target relation, it goes from N queries, to 1