make sure you're on version 15 at least SpiceDB #spicedb

Join Discord

make sure you're on version 15 at least

# spicedb

Joey

04/05/2024, 2:27 AM

make sure you're on version 15 at least of PG

Joey

04/05/2024, 2:29 AM

Joey

04/05/2024, 2:29 AM

well, that means the GC process is timing out

Joey

04/05/2024, 2:29 AM

it should be fixed on PG 15

Joey

04/05/2024, 2:34 AM

the GC timeout can be increased, but it is taking over a minute (which is the default)

Joey

04/05/2024, 2:34 AM

then you're either generating a VERY large amount of garbage

Joey

04/05/2024, 2:34 AM

or its not using the correct indexes

Joey

04/05/2024, 2:34 AM

which, with Aurora, who knows

Ben Simpson

04/05/2024, 3:35 AM

I grabbed the snapshot

xmin

from a transaction ~1hr old and there are ~4.2 million tuples returned by

select count(*) from spicedb.public.relation_tuple where deleted_xid < '185328836';

Which doesn't seem excessive for the size of the server but maybe would take >1min

Ben Simpson

04/05/2024, 3:36 AM

I'll try get an EXPLAIN for the GC delete query

Joey

04/05/2024, 3:38 AM

you're generating 4.2M updates within an hour?

Joey

04/05/2024, 3:41 AM

if it is timing out, you may need to scale up your Aurora

Ben Simpson

04/05/2024, 3:44 AM

No, I don't think the GC has succeeded in a long time

Ben Simpson

04/05/2024, 3:45 AM

Like since we migrated all our users into SpiceDB probably

Joey

04/05/2024, 3:46 AM

there is a prometheus metric we recommend monitoring for just that

Joey

04/05/2024, 3:46 AM

but yeah, I'd scale up and see why its failing

Joey

04/05/2024, 3:46 AM

in PG < 15, it wasn't using the correct index

Joey

04/05/2024, 3:46 AM

but it should be doing so with PG 15+

Ben Simpson

04/05/2024, 5:04 AM

It's taking ~5min to delete 1 batch of 1000 rows which definitely ain't right

Kent Chenery

04/05/2024, 5:07 AM

We're currently on 15.4. We are using Aurora PG and have a pair of

db.r6g.xlarge

nodes in the cluster. Certainly its not a large type though I'd expect this to cope. Even if we were to increase the size I dont think we'd see an improvement as its single CPU bound for the query. A larger size would just give us more idle CPUs. Although we would also get more RAM. As Ben has just said each batch is taking a very long time. The index seems to be used for the query to find 1000 records but I'm not sure its being used for the actual

DELETE

. I suspect that its scanning the table (which is currently ~20GB) Are we able to change the batch size to help confirm or disprove this hypothesis?

Kent Chenery

04/05/2024, 5:08 AM

Or... can we run the

DELETE

statement manually to test/confirm this?

Kent Chenery

04/05/2024, 5:09 AM

Here's why I think it might be scanning the table: https://cdn.discordapp.com/attachments/1225632871214219294/1225673647361101894/image.png?ex=6621fc89&is=660f8789&hm=37970136cc9dc063ce4ff348c8112f0e7292c9cc536176d8da7d0e6f6f73c1ac&

Kent Chenery

04/05/2024, 5:30 AM

We've run a VACUUM and an ANALYZE to make sure its not related to that. A query to do the SELECT for the 1000 rows is "immediate" (94ms in my case). Which leads me to think either I'm getting a different plan there or it is scanning the table on the DELETE section.

vroldanbet

04/05/2024, 12:28 PM

something you can do to get this unstuck, which is not ideal but will likely work, is to increase the timeout of the GC via the corresponding ENV variable. Once it does complete, you should be able to go back to hte original value. I did recently merge a metric improvement that should help show the number of rows deleted even when the operation timed out, which previously was not being reported to via the prometheus endpoint

Joey

04/05/2024, 2:15 PM

yeah, I'd see if you can get an EXPLAIN on the DELETE

Joey

04/05/2024, 2:15 PM

the delete is being done via PK; it shouldn't take 5 minutes

Kent Chenery

04/05/2024, 10:45 PM

We did this (threw it way high to a 10minute timeout). We get a couple of batches done in that time but that's all. It'll take very long time to catch up and we still receive timeout errors at which point the process stops and we have to wait for its next cycle.

vroldanbet

04/06/2024, 12:24 AM

Then change the cycle to be lower, eventually it will catch up. Just saying as a stop gap. Right now it's an exercise of understanding what's exactly wrong with the query and if it's missing another index. I did some work a while back, I added an index to help here but it seems that it hasn't helped as I expected

Kent Chenery

04/06/2024, 1:15 AM

Here's a bit of an analysis dump (across a few messages to satisfy max message length)... Since we use RDS Aurora, we have their Performance Insights data. From that we get the query:

Copy code

WITH rows AS (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM relation_tuple WHERE deleted_xid < $1 LIMIT ?)
          DELETE FROM relation_tuple
          WHERE (namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid) IN (SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM rows);

Its our biggest load on our cluster. This cluster has a cople of databases - one for SpiceDB and one for FusionAuth. We've basically separated our "auth" onto its own DB cluster. Pretty much everything else is barely noticeable on the load (see the

Delete Query Load.png

image) We first looked at the "rows" select to make sure that is returning quickly. As you'll see Performance Insights gives us a

$x

placeholder for variables. So we worked out what we think would be going into that value. As its an xid8 value it was a bit of a struggle to get it parse but found if we used a string it worked. So for this example:

Copy code

SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, created_xid, deleted_xid FROM relation_tuple WHERE deleted_xid < '999999999' LIMIT 1000

The query will run and return the rows in milliseconds. Perfectly performant. The plan is:

Copy code

Limit  (cost=0.43..110.31 rows=1000 width=122)
  ->  Index Scan using ix_gc_index on relation_tuple  (cost=0.43..466125.91 rows=4242314 width=122)
        Index Cond: (deleted_xid < '999999999'::xid8)

Clearly using the index as you'd expect. https://cdn.discordapp.com/attachments/1225632871214219294/1225977179830222950/Delete_Query_Load.png?ex=66231739&is=6610a239&hm=c636db80454a1d0e39f75f4cb9de2f770da9aa0f26a8fd20123c5ccb38f421c6&

Kent Chenery

04/06/2024, 1:18 AM

Substituting this into the

DELETE

statement the plan is:

Copy code

Delete on relation_tuple  (cost=140.87..2956.71 rows=0 width=0)
  ->  Nested Loop  (cost=140.87..2956.71 rows=227 width=152)
        ->  HashAggregate  (cost=140.31..150.31 rows=1000 width=268)
              Group Key: (rows.namespace)::text, (rows.object_id)::text, (rows.relation)::text, (rows.userset_namespace)::text, (rows.userset_object_id)::text, (rows.userset_relation)::text, rows.created_xid, rows.deleted_xid
              ->  Subquery Scan on rows  (cost=0.43..120.31 rows=1000 width=268)
                    ->  Limit  (cost=0.43..110.31 rows=1000 width=122)
                          ->  Index Scan using ix_gc_index on relation_tuple relation_tuple_1  (cost=0.43..466125.91 rows=4242314 width=122)
                                Index Cond: (deleted_xid < '999999999'::xid8)
        ->  Index Scan using ix_relation_tuple_by_subject on relation_tuple  (cost=0.56..2.80 rows=1 width=128)
              Index Cond: (((userset_object_id)::text = (rows.userset_object_id)::text) AND ((userset_namespace)::text = (rows.userset_namespace)::text) AND ((userset_relation)::text = (rows.userset_relation)::text) AND ((namespace)::text = (rows.namespace)::text) AND ((relation)::text = (rows.relation)::text))
              Filter: (((rows.object_id)::text = (object_id)::text) AND (rows.created_xid = created_xid) AND (rows.deleted_xid = deleted_xid))

There's a large cost for a

HashAggregate

. Which I believe is the issue.

Kent Chenery

04/06/2024, 1:18 AM

Thinking about that I just tried adding in a

DISTINCT

to the initial

rows

query. That appears to improve the plan quite a bit:

Copy code

Delete on relation_tuple  (cost=1.25..3324.01 rows=0 width=0)
  ->  Nested Loop  (cost=1.25..3324.01 rows=1 width=152)
        ->  Subquery Scan on rows  (cost=0.69..517.61 rows=1000 width=268)
              ->  Limit  (cost=0.69..507.61 rows=1000 width=122)
                    ->  Unique  (cost=0.69..1066720.82 rows=2104290 width=122)
                          ->  Index Only Scan using pk_relation_tuple on relation_tuple relation_tuple_1  (cost=0.69..981874.54 rows=4242314 width=122)
                                Index Cond: (deleted_xid < '999999999'::xid8)
        ->  Index Scan using ix_relation_tuple_by_subject on relation_tuple  (cost=0.56..2.80 rows=1 width=128)
              Index Cond: (((userset_object_id)::text = (rows.userset_object_id)::text) AND ((userset_namespace)::text = (rows.userset_namespace)::text) AND ((userset_relation)::text = (rows.userset_relation)::text) AND ((namespace)::text = (rows.namespace)::text) AND ((relation)::text = (rows.relation)::text))
              Filter: (((rows.object_id)::text = (object_id)::text) AND (rows.created_xid = created_xid) AND (rows.deleted_xid = deleted_xid))

Kent Chenery

04/06/2024, 1:21 AM

Another observation I had was the delete doesnt order the inital select by

deleted_xid

. Not sure it matters but it might be tidier to remove rows in the order they are deleted.

vroldanbet

04/08/2024, 8:17 AM

for additional context, you can cast to

xid8

doing as you see above:

'999999999'::xid8

. This is important because we identified a while back that the query planner won't select xid8 based indexes in versions < PG15, hence the question and the recommendation to run SpiceDB with SpiceDB >= PG15.

vroldanbet

04/08/2024, 8:33 AM

also note the index already has an order by clause, so pressumably those are already being deleted in order. See https://github.com/authzed/spicedb/blob/1e81e6848a555fec2647923416aa0913ccd22fb1/internal/datastore/postgres/migrations/zz_migration.0016_add_tuned_gc_index.go#L12

vroldanbet

04/08/2024, 8:54 AM

may I ask how many relationships are in your database? I'm not able to reproduce locally the same issue with 5M deleted tuples

vroldanbet

04/08/2024, 9:06 AM

Notice your explain for the select query:

Copy code

Limit  (cost=0.43..110.31 rows=1000 width=122)
  ->  Index Scan using ix_gc_index on relation_tuple  (cost=0.43..466125.91 rows=4242314 width=122)
        Index Cond: (deleted_xid < '999999999'::xid8)

there are

rows scanned. That seems odd. This is what my local 15.6 PG shows, with 5M deleted rows. Also it's not selecting the GC index, interestingly

Copy code

| --------------------------------------------------------------------------------------------------------------------------- |
| Limit  (cost=0.00..26.79 rows=1000 width=80) (actual time=0.006..0.198 rows=1000 loops=1)                                   |
|   ->  Seq Scan on relation_tuple  (cost=0.00..137596.88 rows=5136950 width=80) (actual time=0.006..0.157 rows=1000 loops=1) |
|         Filter: (deleted_xid < '999999999'::xid8)                                                                           |
| Planning Time: 0.058 ms                                                                                                     |
| Execution Time: 0.227 ms                                                                                                    |

vroldanbet

04/08/2024, 9:14 AM

oh I see, was able to reproduce that phenomenon when I added 5M live tuples:

Copy code

| c0                                                                                                                                              |
| ----------------------------------------------------------------------------------------------------------------------------------------------- |
| Limit  (cost=0.43..35.38 rows=1000 width=80) (actual time=1.183..1.740 rows=1000 loops=1)                                                       |
|   ->  Index Scan using ix_gc_index on relation_tuple  (cost=0.43..178606.61 rows=5110010 width=80) (actual time=1.182..1.705 rows=1000 loops=1) |
|         Index Cond: (deleted_xid < '901'::xid8)                                                                                                 |
| Planning Time: 0.359 ms                                                                                                                         |
| Execution Time: 1.780 ms                                                                                                                        |

vroldanbet

04/08/2024, 9:16 AM

ah this could be because Performance Insights is doing

EXPLAIN

but not

EXPLAIN ANALYZE

, so you get the actual number of rows scanned. Makes sense.

Ben Simpson

04/08/2024, 9:21 AM

iirc there are ~25M tuples total

vroldanbet

04/08/2024, 9:22 AM

I have an alternative query that seems quite faster. Would you be able to test it in your DB? You just need to make sure you select the right XID8 e.g.

Copy code

EXPLAIN DELETE FROM relation_tuple
WHERE ctid in (SELECT ctid
       FROM relation_tuple 
       WHERE deleted_xid < 'the_xid8_to_GC_from'
       LIMIT 1000);

vroldanbet

04/08/2024, 9:23 AM

explain looks like this:

Copy code

| c0                                                                                                                                         |
| ------------------------------------------------------------------------------------------------------------------------------------------ |
| Delete on relation_tuple  (cost=47.89..854.38 rows=0 width=0)                                                                              |
|   ->  Nested Loop  (cost=47.89..854.38 rows=1000 width=36)                                                                                 |
|         ->  HashAggregate  (cost=47.88..49.88 rows=200 width=36)                                                                           |
|               Group Key: "ANY_subquery".ctid                                                                                               |
|               ->  Subquery Scan on "ANY_subquery"  (cost=0.43..45.38 rows=1000 width=36)                                                   |
|                     ->  Limit  (cost=0.43..35.38 rows=1000 width=6)                                                                        |
|                           ->  Index Scan using ix_gc_index on relation_tuple relation_tuple_1  (cost=0.43..178606.61 rows=5110010 width=6) |
|                                 Index Cond: (deleted_xid < '901'::xid8)                                                                    |
|         ->  Tid Scan on relation_tuple  (cost=0.00..4.01 rows=1 width=6)                                                                   |
|               TID Cond: (ctid = "ANY_subquery".ctid)                                                                                       |
`

Ben Simpson

04/08/2024, 9:23 AM

@Kent Chenery if you wanna try that tomorrow you can use

185328836

it's the one I grabbed on Friday

vroldanbet

04/08/2024, 9:24 AM

versus this, which is what I get with 5M "dead" tuples and 5M living tuples https://cdn.discordapp.com/attachments/1225632871214219294/1226825066701000745/message.txt?ex=66262ce1&is=6613b7e1&hm=86bfcf3c4a693f345021a9a901e4b6a69a144f6abee79d63aa33474b55b80c48&

vroldanbet

04/08/2024, 9:25 AM

The nested loop seems one order of magnitude faster on the worst case

vroldanbet

04/08/2024, 9:38 AM

The tentative query I'm going to propose is, to acommodate it to the current codebase. It has the same explain result.

Copy code

WITH rows AS (SELECT ctid FROM relation_tuple WHERE deleted_xid < 'your_XID8' LIMIT 1000)
          DELETE FROM relation_tuple
          WHERE (ctid) IN (SELECT ctid FROM rows);

vroldanbet

04/08/2024, 9:52 AM

Opened https://github.com/authzed/spicedb/pull/1859 with the proposed change

Kent Chenery

04/08/2024, 7:05 PM

I just tested the delete above (the

WITH rows...

one) on our db (with a rollback) and it executed in 135ms.

Kent Chenery

04/08/2024, 7:08 PM

Are we able to clear up our table manually until that fix is merged/deployed?

vroldanbet

04/09/2024, 8:15 AM

@Kent Chenery that's create to hear. Yes, you can run that to clear up your DB. There are other operations that need to run too (clean up namespaces and transactions) but hopefully those tables won't have issues with the current code.

vroldanbet

04/09/2024, 8:15 AM

Basically you have to run it on a loop, or bump the limit (beware the bigger the limit, more impact it will have on the DB)

Kent Chenery

04/09/2024, 9:08 PM

So I understand things correctly... the other operations that need to run - will SpiceDB do these for us? Or is there something we should run? I'd set up a process to start calling that delete directly outside of SpiceDB. Simply because that is quicker and easier for us to do over waiting for a new release that includes the fix or building an image ourselves.

Ben Simpson

04/09/2024, 9:12 PM

SpiceDB does those after it does the relationship GC, but that's never completing so it's not getting to them

vroldanbet

04/10/2024, 9:51 AM

>So I understand things correctly... the other operations that need to run - will SpiceDB do these for us? Or is there something we should run? @User Typically SpiceDB will do this for you. There is also a

spicedb datastore gc

command to run these on-demand. But because this is an issue with the queries SpiceDB issue when doing GC, and because this branch hasn't landed yet in main, we recommended running it yourself directly. Once the PR lands, you should be able to grab the nightly build. Unfortunately we have a slow week because of several folks on PTO.

Kent Chenery

04/23/2024, 12:58 AM

@vroldanbet just letting you know we updated to SpiceDB

1.31.0

yesterday and the issue is all solved now. Thanks!

vroldanbet

04/23/2024, 7:02 AM

Thanks for following up, and glad to hear! 🎉

2 Views

Previous Next