Re: migrating to 1.14 with Postgres I'm finding th...
# spicedb
w
Re: migrating to 1.14 with Postgres I'm finding that this query from the migration is pretty slow (~500ms):
Copy code
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
LIMIT 1000 FOR UPDATE;
Query plan (EXPLAIN ANALYZE):
Copy code
sql
Limit  (cost=0.00..68.34 rows=1000 width=150) (actual time=468.112..472.134 rows=1000 loops=1)
  ->  LockRows  (cost=0.00..479050.71 rows=7009496 width=150) (actual time=468.111..472.023 rows=1000 loops=1)
        ->  Seq Scan on relation_tuple  (cost=0.00..408955.75 rows=7009496 width=150) (actual time=468.097..468.493 rows=1000 loops=1)
              Filter: (created_xid IS NULL)
              Rows Removed by Filter: 2947393
Planning Time: 0.105 ms
Execution Time: 472.248 ms
The index on
created_xid IS NULL
isn't used 🤔 Even after an
ANALYZE
. No idea why.
Tried
CREATE INDEX CONCURRENTLY ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (created_xid) WHERE created_xid IS NULL;
To no avail
I have ~11M relationship, ~7M of which still have
created_xid IS NULL
FWIW removing the
FOR UPDATE
doesn't change anything
j
i'm not sure why the index isn't getting used
we literally had to add it for a large beta tester of this migration, so I know it can get used
what version of postgres are you using?
if we fix this issue, 1hr should be plenty to migrate 11M
w
v13
j
my guess is that the query optimizer is looking at the cardinality of that index and saying: "i'm still left with 7m rows after using the index, since it's more than half of the total rows, why bother?"
w
from testing around, it is getting used if I only
SELECT created_xid
but not if selecting anything else
That would make sense if we didn't have a LIMIT :/
j
oh interesting, when you select created_xid it can serve directly from the index
w
yeah it does an INDEX ONLY
j
if you're willing to just try things, try adding an index with all of the columns that are being selected
w
I think the reason is: - that PG expects that the rows matching
created_xid IS NULL
are more-or-less random in a sequential walk, so it's indeed theoretically cheaper to go for a seq walk (see SO) - everytime we do this SELECT, PG walks through the rows in the same order - because we set the selected rows to NOT NULL, it takes PG longer and longer to find IS NULL rows in its sequential walk
> if you're willing to just try things, try adding an index with all of the columns that are being selected I did, no improvement
j
maybe you can trick it into using the index with like an
order by rand()
or something
w
I tried somethign like that, but it random-sorts the entire table before selecting 1000 rows so it's crazy slow
j
your hypothesis sounds correct to me
> There are three kinds of lies: lies, damned lies, and statistics.
w
Alright, I got something
j
what about ordering by created_xid DESC, that should ask it to put the nulls first?
w
This query:
Copy code
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
ORDER BY created_xid 
LIMIT 1000;
With this index:
Copy code
sql
CREATE INDEX ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (created_xid) WHERE (created_xid IS NULL);
great minds 😄
It feels stupid to order given we filter by IS NULL, but 🤷‍♂️
j
can you try the
desc
thing without your new index?
w
Copy code
sql
Limit  (cost=504061.89..504178.57 rows=1000 width=153) (actual time=1998.702..2003.173 rows=1000 loops=1)
  ->  Gather Merge  (cost=504061.89..1184130.67 rows=5828754 width=153) (actual time=1998.700..2003.088 rows=1000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=503061.87..510347.81 rows=2914377 width=153) (actual time=1989.904..1989.942 rows=568 loops=3)
              Sort Key: created_xid DESC
              Sort Method: top-N heapsort  Memory: 314kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 296kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 310kB
              ->  Parallel Seq Scan on relation_tuple  (cost=0.00..343269.72 rows=2914377 width=153) (actual time=268.602..1185.586 rows=2331861 loops=3)
                    Filter: (created_xid IS NULL)
                    Rows Removed by Filter: 1411000
Planning Time: 0.806 ms
Execution Time: 2003.370 ms
j
oof
w
for this query
Copy code
sql
EXPLAIN ANALYZE
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
ORDER BY created_xid DESC
LIMIT 1000;
I'm not too happy with the solution I found given it's in spicedb-land
j
yeah, but we could cut a v1.14.1 if we had to
w
.2
actually but yea
Just means it'll take a few more days for me to upgrade, I hoped to have it done today but doesn't seem realistic!
not the end of the world
j
i mean, you could also just create a binary that does what you want to get your migration done
after that you can forget about it
w
as in, a custom spicedb build? yeah, would rather avoid that though
Ha, there's another way
set enable_seqscan = off
😄
j
we should probably just do that in our datastore tests
w
Do what sorry?
j
set enable_seqscan = off
it's basically a non-starter for this architecture at scale
i really like the way that google app engine's datastore used to work, you would write your code, and after your tests ran it would just tell you what indices you needed
w
Users will have it to
on
in real life though
setting it to
off
in tests won't be representative
it's not something that will make your test fail in case of a seq scan, it's a config of the query planner
j
if I understand it correctly it will protect us from writing queries that literally can't be completed without a seqscan though
ah
w
Sadly not, it just forces the use of indexes
I don't know what happens if there's no index to use
auto-explain (https://www.postgresql.org/docs/current/auto-explain.html) and
SET log_statement TO 'all';
and grep through logs for a seq scan?
j
so are you good to get through the migration just by setting that variable? or should we try to prioritize a fix for you
w
Although it sped up this SELECT, the whole UPDATE is still pretty slow actually
That's with an additional index:
Copy code
sql
CREATE INDEX CONCURRENTLY ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction);
it's not necessarily a blocker, but it means that it'll take a while to do the migration
(I'm working on a copy of the actual prod database ATM)
j
Copy code
Seq Scan on relation_tuple relation_tuple_1  (cost=0.00..416979.27 rows=5246140 width=151) (actual time=732.178..733.167 rows=2000 loops=1)
what's this?
w
Ah, that's because the seqscan is only disabled for the session, must have ran the EXPLAIN ANALYZE in a different session
j
1ms per row isn't that bad I guess
ooh, i have an idea
try to order by
created_xid, created_id
you might get a better hit rate for the order the rows are physically stored
i.e. fewer overall page reads and writes
w
created_id
? Is that
created_transaction
?
j
ah yup
sorry, working purely from memory
w
Making it much worse 😅
TBH I think that's acceptable, I'll just wait a few hours
it'll be faster than figuring out a patch and having it available!
m
Hello all, I am experiencing similar issues on this step. I found significant improvements by changing the subquery to use IDs directly instead
Copy code
UPDATE relation_tuple
SET deleted_xid = deleted_transaction::text::xid8,
    created_xid = created_transaction::text::xid8
WHERE (id) IN (SELECT id
               FROM relation_tuple
               WHERE created_xid IS NULL
               LIMIT 10000 FOR UPDATE);
any reason why the select is performed on the individual columns? From what I read in the Postgres docs (https://www.postgresql.org/docs/14/sql-select.html#SQL-FOR-UPDATE-SHARE), the lock is on the whole row (see also https://www.postgresql.org/docs/14/explicit-locking.html#LOCKING-ROWS), regardless of the selected fields Am I missing something along the way? 🙂 Thank you very much
j
we're dropping the IDs altogether because their generation was limiting write throughput
in our other backends we can't have a globally unique like that at all and hope to scale performance
now you could arguably use them for this migration, but if there's another way to fix it I think we should pursue it
m
alright, thank you very much for your reply 🙂
2 Views