Joey
04/05/2024, 2:27 AMJoey
04/05/2024, 2:29 AMJoey
04/05/2024, 2:29 AMJoey
04/05/2024, 2:29 AMJoey
04/05/2024, 2:34 AMJoey
04/05/2024, 2:34 AMJoey
04/05/2024, 2:34 AMJoey
04/05/2024, 2:34 AMBen Simpson
04/05/2024, 3:35 AMxmin
from a transaction ~1hr old and there are ~4.2 million tuples returned by
select count(*) from spicedb.public.relation_tuple where deleted_xid < '185328836';
Which doesn't seem excessive for the size of the server but maybe would take >1minBen Simpson
04/05/2024, 3:36 AMJoey
04/05/2024, 3:38 AMJoey
04/05/2024, 3:41 AMBen Simpson
04/05/2024, 3:44 AMBen Simpson
04/05/2024, 3:45 AMJoey
04/05/2024, 3:46 AMJoey
04/05/2024, 3:46 AMJoey
04/05/2024, 3:46 AMJoey
04/05/2024, 3:46 AMBen Simpson
04/05/2024, 5:04 AMKent Chenery
04/05/2024, 5:07 AMdb.r6g.xlarge
nodes in the cluster. Certainly its not a large type though I'd expect this to cope.
Even if we were to increase the size I dont think we'd see an improvement as its single CPU bound for the query. A larger size would just give us more idle CPUs. Although we would also get more RAM.
As Ben has just said each batch is taking a very long time. The index seems to be used for the query to find 1000 records but I'm not sure its being used for the actual DELETE
. I suspect that its scanning the table (which is currently ~20GB)
Are we able to change the batch size to help confirm or disprove this hypothesis?Kent Chenery
04/05/2024, 5:08 AMDELETE
statement manually to test/confirm this?Kent Chenery
04/05/2024, 5:09 AMKent Chenery
04/05/2024, 5:30 AMvroldanbet
04/05/2024, 12:28 PMJoey
04/05/2024, 2:15 PMJoey
04/05/2024, 2:15 PMKent Chenery
04/05/2024, 10:45 PMvroldanbet
04/06/2024, 12:24 AMKent Chenery
04/06/2024, 1:15 AMWITH rows AS (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM relation_tuple WHERE deleted_xid < $1 LIMIT ?)
DELETE FROM relation_tuple
WHERE (namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid) IN (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM rows);
Its our biggest load on our cluster. This cluster has a cople of databases - one for SpiceDB and one for FusionAuth. We've basically separated our "auth" onto its own DB cluster. Pretty much everything else is barely noticeable on the load (see the Delete Query Load.png
image)
We first looked at the "rows" select to make sure that is returning quickly. As you'll see Performance Insights gives us a $x
placeholder for variables. So we worked out what we think would be going into that value. As its an xid8 value it was a bit of a struggle to get it parse but found if we used a string it worked. So for this example:
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM relation_tuple WHERE deleted_xid < '999999999' LIMIT 1000
The query will run and return the rows in milliseconds. Perfectly performant. The plan is:
Limit (cost=0.43..110.31 rows=1000 width=122)
-> Index Scan using ix_gc_index on relation_tuple (cost=0.43..466125.91 rows=4242314 width=122)
Index Cond: (deleted_xid < '999999999'::xid8)
Clearly using the index as you'd expect.
https://cdn.discordapp.com/attachments/1225632871214219294/1225977179830222950/Delete_Query_Load.png?ex=66231739&is=6610a239&hm=c636db80454a1d0e39f75f4cb9de2f770da9aa0f26a8fd20123c5ccb38f421c6&Kent Chenery
04/06/2024, 1:18 AMDELETE
statement the plan is:
Delete on relation_tuple (cost=140.87..2956.71 rows=0 width=0)
-> Nested Loop (cost=140.87..2956.71 rows=227 width=152)
-> HashAggregate (cost=140.31..150.31 rows=1000 width=268)
Group Key: (rows.namespace)::text, (rows.object_id)::text, (rows.relation)::text, (rows.userset_namespace)::text, (rows.userset_object_id)::text, (rows.userset_relation)::text, rows.created_xid, rows.deleted_xid
-> Subquery Scan on rows (cost=0.43..120.31 rows=1000 width=268)
-> Limit (cost=0.43..110.31 rows=1000 width=122)
-> Index Scan using ix_gc_index on relation_tuple relation_tuple_1 (cost=0.43..466125.91 rows=4242314 width=122)
Index Cond: (deleted_xid < '999999999'::xid8)
-> Index Scan using ix_relation_tuple_by_subject on relation_tuple (cost=0.56..2.80 rows=1 width=128)
Index Cond: (((userset_object_id)::text = (rows.userset_object_id)::text) AND ((userset_namespace)::text = (rows.userset_namespace)::text) AND ((userset_relation)::text = (rows.userset_relation)::text) AND ((namespace)::text = (rows.namespace)::text) AND ((relation)::text = (rows.relation)::text))
Filter: (((rows.object_id)::text = (object_id)::text) AND (rows.created_xid = created_xid) AND (rows.deleted_xid = deleted_xid))
There's a large cost for a HashAggregate
. Which I believe is the issue.Kent Chenery
04/06/2024, 1:18 AMDISTINCT
to the initial rows
query. That appears to improve the plan quite a bit:
Delete on relation_tuple (cost=1.25..3324.01 rows=0 width=0)
-> Nested Loop (cost=1.25..3324.01 rows=1 width=152)
-> Subquery Scan on rows (cost=0.69..517.61 rows=1000 width=268)
-> Limit (cost=0.69..507.61 rows=1000 width=122)
-> Unique (cost=0.69..1066720.82 rows=2104290 width=122)
-> Index Only Scan using pk_relation_tuple on relation_tuple relation_tuple_1 (cost=0.69..981874.54 rows=4242314 width=122)
Index Cond: (deleted_xid < '999999999'::xid8)
-> Index Scan using ix_relation_tuple_by_subject on relation_tuple (cost=0.56..2.80 rows=1 width=128)
Index Cond: (((userset_object_id)::text = (rows.userset_object_id)::text) AND ((userset_namespace)::text = (rows.userset_namespace)::text) AND ((userset_relation)::text = (rows.userset_relation)::text) AND ((namespace)::text = (rows.namespace)::text) AND ((relation)::text = (rows.relation)::text))
Filter: (((rows.object_id)::text = (object_id)::text) AND (rows.created_xid = created_xid) AND (rows.deleted_xid = deleted_xid))
Kent Chenery
04/06/2024, 1:21 AMdeleted_xid
. Not sure it matters but it might be tidier to remove rows in the order they are deleted.vroldanbet
04/08/2024, 8:17 AMxid8
doing as you see above: '999999999'::xid8
. This is important because we identified a while back that the query planner won't select xid8 based indexes in versions < PG15, hence the question and the recommendation to run SpiceDB with SpiceDB >= PG15.vroldanbet
04/08/2024, 8:33 AMvroldanbet
04/08/2024, 8:54 AMvroldanbet
04/08/2024, 9:06 AMLimit (cost=0.43..110.31 rows=1000 width=122)
-> Index Scan using ix_gc_index on relation_tuple (cost=0.43..466125.91 rows=4242314 width=122)
Index Cond: (deleted_xid < '999999999'::xid8)
there are 4242314
rows scanned. That seems odd. This is what my local 15.6 PG shows, with 5M deleted rows. Also it's not selecting the GC index, interestingly
| --------------------------------------------------------------------------------------------------------------------------- |
| Limit (cost=0.00..26.79 rows=1000 width=80) (actual time=0.006..0.198 rows=1000 loops=1) |
| -> Seq Scan on relation_tuple (cost=0.00..137596.88 rows=5136950 width=80) (actual time=0.006..0.157 rows=1000 loops=1) |
| Filter: (deleted_xid < '999999999'::xid8) |
| Planning Time: 0.058 ms |
| Execution Time: 0.227 ms |
vroldanbet
04/08/2024, 9:14 AM| c0 |
| ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Limit (cost=0.43..35.38 rows=1000 width=80) (actual time=1.183..1.740 rows=1000 loops=1) |
| -> Index Scan using ix_gc_index on relation_tuple (cost=0.43..178606.61 rows=5110010 width=80) (actual time=1.182..1.705 rows=1000 loops=1) |
| Index Cond: (deleted_xid < '901'::xid8) |
| Planning Time: 0.359 ms |
| Execution Time: 1.780 ms |
vroldanbet
04/08/2024, 9:16 AMEXPLAIN
but not EXPLAIN ANALYZE
, so you get the actual number of rows scanned. Makes sense.Ben Simpson
04/08/2024, 9:21 AMvroldanbet
04/08/2024, 9:22 AMEXPLAIN DELETE FROM relation_tuple
WHERE ctid in (SELECT ctid
FROM relation_tuple
WHERE deleted_xid < 'the_xid8_to_GC_from'
LIMIT 1000);
vroldanbet
04/08/2024, 9:23 AM| c0 |
| ------------------------------------------------------------------------------------------------------------------------------------------ |
| Delete on relation_tuple (cost=47.89..854.38 rows=0 width=0) |
| -> Nested Loop (cost=47.89..854.38 rows=1000 width=36) |
| -> HashAggregate (cost=47.88..49.88 rows=200 width=36) |
| Group Key: "ANY_subquery".ctid |
| -> Subquery Scan on "ANY_subquery" (cost=0.43..45.38 rows=1000 width=36) |
| -> Limit (cost=0.43..35.38 rows=1000 width=6) |
| -> Index Scan using ix_gc_index on relation_tuple relation_tuple_1 (cost=0.43..178606.61 rows=5110010 width=6) |
| Index Cond: (deleted_xid < '901'::xid8) |
| -> Tid Scan on relation_tuple (cost=0.00..4.01 rows=1 width=6) |
| TID Cond: (ctid = "ANY_subquery".ctid) |
`
Ben Simpson
04/08/2024, 9:23 AM185328836
it's the one I grabbed on Fridayvroldanbet
04/08/2024, 9:24 AMvroldanbet
04/08/2024, 9:25 AMvroldanbet
04/08/2024, 9:38 AMWITH rows AS (SELECT ctid FROM relation_tuple WHERE deleted_xid < 'your_XID8' LIMIT 1000)
DELETE FROM relation_tuple
WHERE (ctid) IN (SELECT ctid FROM rows);
vroldanbet
04/08/2024, 9:52 AMKent Chenery
04/08/2024, 7:05 PMWITH rows...
one) on our db (with a rollback) and it executed in 135ms.Kent Chenery
04/08/2024, 7:08 PMvroldanbet
04/09/2024, 8:15 AMvroldanbet
04/09/2024, 8:15 AMKent Chenery
04/09/2024, 9:08 PMBen Simpson
04/09/2024, 9:12 PMvroldanbet
04/10/2024, 9:51 AMspicedb datastore gc
command to run these on-demand. But because this is an issue with the queries SpiceDB issue when doing GC, and because this branch hasn't landed yet in main, we recommended running it yourself directly.
Once the PR lands, you should be able to grab the nightly build. Unfortunately we have a slow week because of several folks on PTO.Kent Chenery
04/23/2024, 12:58 AM1.31.0
yesterday and the issue is all solved now. Thanks!vroldanbet
04/23/2024, 7:02 AM