Title
w

williamdclt

11/21/2022, 4:17 PM
Re: migrating to 1.14 with Postgres I'm finding that this query from the migration is pretty slow (~500ms):
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
LIMIT 1000 FOR UPDATE;
Query plan (EXPLAIN ANALYZE):
sql
Limit  (cost=0.00..68.34 rows=1000 width=150) (actual time=468.112..472.134 rows=1000 loops=1)
  ->  LockRows  (cost=0.00..479050.71 rows=7009496 width=150) (actual time=468.111..472.023 rows=1000 loops=1)
        ->  Seq Scan on relation_tuple  (cost=0.00..408955.75 rows=7009496 width=150) (actual time=468.097..468.493 rows=1000 loops=1)
              Filter: (created_xid IS NULL)
              Rows Removed by Filter: 2947393
Planning Time: 0.105 ms
Execution Time: 472.248 ms
The index on
created_xid IS NULL
isn't used 🤔 Even after an
ANALYZE
. No idea why.
Tried
CREATE INDEX CONCURRENTLY ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (created_xid) WHERE created_xid IS NOT NULL;
To no avail
I have ~11M relationship, ~7M of which still have
created_xid IS NULL
FWIW removing the
FOR UPDATE
doesn't change anything
j

Jake

11/21/2022, 5:01 PM
i'm not sure why the index isn't getting used
we literally had to add it for a large beta tester of this migration, so I know it can get used
what version of postgres are you using?
if we fix this issue, 1hr should be plenty to migrate 11M
w

williamdclt

11/21/2022, 5:06 PM
v13
j

Jake

11/21/2022, 5:07 PM
my guess is that the query optimizer is looking at the cardinality of that index and saying: "i'm still left with 7m rows after using the index, since it's more than half of the total rows, why bother?"
w

williamdclt

11/21/2022, 5:07 PM
from testing around, it is getting used if I only
SELECT created_xid
but not if selecting anything else
That would make sense if we didn't have a LIMIT :/
j

Jake

11/21/2022, 5:08 PM
oh interesting, when you select created_xid it can serve directly from the index
w

williamdclt

11/21/2022, 5:09 PM
yeah it does an INDEX ONLY
j

Jake

11/21/2022, 5:10 PM
if you're willing to just try things, try adding an index with all of the columns that are being selected
w

williamdclt

11/21/2022, 5:13 PM
I think the reason is: - that PG expects that the rows matching
created_xid IS NULL
are more-or-less random in a sequential walk, so it's indeed theoretically cheaper to go for a seq walk (see SO) - everytime we do this SELECT, PG walks through the rows in the same order - because we set the selected rows to NOT NULL, it takes PG longer and longer to find IS NULL rows in its sequential walk
> if you're willing to just try things, try adding an index with all of the columns that are being selected I did, no improvement
j

Jake

11/21/2022, 5:14 PM
maybe you can trick it into using the index with like an
order by rand()
or something
w

williamdclt

11/21/2022, 5:14 PM
I tried somethign like that, but it random-sorts the entire table before selecting 1000 rows so it's crazy slow
j

Jake

11/21/2022, 5:16 PM
your hypothesis sounds correct to me
> There are three kinds of lies: lies, damned lies, and statistics.
w

williamdclt

11/21/2022, 5:18 PM
Alright, I got something
j

Jake

11/21/2022, 5:18 PM
what about ordering by created_xid DESC, that should ask it to put the nulls first?
w

williamdclt

11/21/2022, 5:18 PM
This query:
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
ORDER BY created_xid 
LIMIT 1000;
With this index:
sql
CREATE INDEX ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (created_xid) WHERE (created_xid IS NULL);
great minds 😄
It feels stupid to order given we filter by IS NULL, but 🤷‍♂️
j

Jake

11/21/2022, 5:20 PM
can you try the
desc
thing without your new index?
w

williamdclt

11/21/2022, 5:21 PM
sql
Limit  (cost=504061.89..504178.57 rows=1000 width=153) (actual time=1998.702..2003.173 rows=1000 loops=1)
  ->  Gather Merge  (cost=504061.89..1184130.67 rows=5828754 width=153) (actual time=1998.700..2003.088 rows=1000 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=503061.87..510347.81 rows=2914377 width=153) (actual time=1989.904..1989.942 rows=568 loops=3)
              Sort Key: created_xid DESC
              Sort Method: top-N heapsort  Memory: 314kB
              Worker 0:  Sort Method: top-N heapsort  Memory: 296kB
              Worker 1:  Sort Method: top-N heapsort  Memory: 310kB
              ->  Parallel Seq Scan on relation_tuple  (cost=0.00..343269.72 rows=2914377 width=153) (actual time=268.602..1185.586 rows=2331861 loops=3)
                    Filter: (created_xid IS NULL)
                    Rows Removed by Filter: 1411000
Planning Time: 0.806 ms
Execution Time: 2003.370 ms
j

Jake

11/21/2022, 5:21 PM
oof
w

williamdclt

11/21/2022, 5:21 PM
for this query
sql
EXPLAIN ANALYZE
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction
FROM relation_tuple
WHERE created_xid IS NULL
ORDER BY created_xid DESC
LIMIT 1000;
I'm not too happy with the solution I found given it's in spicedb-land
j

Jake

11/21/2022, 5:23 PM
yeah, but we could cut a v1.14.1 if we had to
w

williamdclt

11/21/2022, 5:24 PM
.2
actually but yea
Just means it'll take a few more days for me to upgrade, I hoped to have it done today but doesn't seem realistic!
not the end of the world
j

Jake

11/21/2022, 5:24 PM
i mean, you could also just create a binary that does what you want to get your migration done
after that you can forget about it
w

williamdclt

11/21/2022, 5:25 PM
as in, a custom spicedb build? yeah, would rather avoid that though
Ha, there's another way
set enable_seqscan = off
😄
j

Jake

11/21/2022, 5:27 PM
we should probably just do that in our datastore tests
w

williamdclt

11/21/2022, 5:27 PM
Do what sorry?
j

Jake

11/21/2022, 5:28 PM
set enable_seqscan = off
it's basically a non-starter for this architecture at scale
i really like the way that google app engine's datastore used to work, you would write your code, and after your tests ran it would just tell you what indices you needed
w

williamdclt

11/21/2022, 5:29 PM
Users will have it to
on
in real life though
setting it to
off
in tests won't be representative
it's not something that will make your test fail in case of a seq scan, it's a config of the query planner
j

Jake

11/21/2022, 5:29 PM
if I understand it correctly it will protect us from writing queries that literally can't be completed without a seqscan though
ah
w

williamdclt

11/21/2022, 5:30 PM
Sadly not, it just forces the use of indexes
I don't know what happens if there's no index to use
auto-explain (https://www.postgresql.org/docs/current/auto-explain.html) and
SET log_statement TO 'all';
and grep through logs for a seq scan?
j

Jake

11/21/2022, 6:16 PM
so are you good to get through the migration just by setting that variable? or should we try to prioritize a fix for you
w

williamdclt

11/21/2022, 6:27 PM
Although it sped up this SELECT, the whole UPDATE is still pretty slow actually
message has been deleted
That's with an additional index:
sql
CREATE INDEX CONCURRENTLY ix_backfill_tuple_temp_2 ON public.relation_tuple USING btree (namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction);
it's not necessarily a blocker, but it means that it'll take a while to do the migration
(I'm working on a copy of the actual prod database ATM)
j

Jake

11/21/2022, 6:30 PM
Seq Scan on relation_tuple relation_tuple_1  (cost=0.00..416979.27 rows=5246140 width=151) (actual time=732.178..733.167 rows=2000 loops=1)
what's this?
w

williamdclt

11/21/2022, 6:31 PM
Ah, that's because the seqscan is only disabled for the session, must have ran the EXPLAIN ANALYZE in a different session
message has been deleted
j

Jake

11/21/2022, 6:32 PM
1ms per row isn't that bad I guess
ooh, i have an idea
try to order by
created_xid, created_id
you might get a better hit rate for the order the rows are physically stored
i.e. fewer overall page reads and writes
w

williamdclt

11/21/2022, 6:35 PM
created_id
? Is that
created_transaction
?
j

Jake

11/21/2022, 6:35 PM
ah yup
sorry, working purely from memory
w

williamdclt

11/21/2022, 6:36 PM
Making it much worse 😅
TBH I think that's acceptable, I'll just wait a few hours
it'll be faster than figuring out a patch and having it available!
m

michael93

12/07/2022, 4:42 PM
Hello all, I am experiencing similar issues on this step. I found significant improvements by changing the subquery to use IDs directly instead
UPDATE relation_tuple
SET deleted_xid = deleted_transaction::text::xid8,
    created_xid = created_transaction::text::xid8
WHERE (id) IN (SELECT id
               FROM relation_tuple
               WHERE created_xid IS NULL
               LIMIT 10000 FOR UPDATE);
any reason why the select is performed on the individual columns? From what I read in the Postgres docs (https://www.postgresql.org/docs/14/sql-select.html#SQL-FOR-UPDATE-SHARE), the lock is on the whole row (see also https://www.postgresql.org/docs/14/explicit-locking.html#LOCKING-ROWS), regardless of the selected fields Am I missing something along the way? 🙂 Thank you very much
j

Jake

12/07/2022, 7:00 PM
we're dropping the IDs altogether because their generation was limiting write throughput
in our other backends we can't have a globally unique like that at all and hope to scale performance
now you could arguably use them for this migration, but if there's another way to fix it I think we should pursue it
m

michael93

12/08/2022, 8:10 AM
alright, thank you very much for your reply 🙂