I m having serious issues trying to upgrade to 1 14 1 on Pos SpiceDB #spicedb

I'm having serious issues trying to upgrade to 1.1...

williamdclt

11/27/2022, 12:02 PM

I'm having serious issues trying to upgrade to 1.14.1 on Postgres (manual, no-downtime upgrade process). I've done the

add-xid-constraints

migration, running

write-both-read-new

, and I'm observing much higher response time than 1.13.0. Initially the response time shot through the roof (>1s P95), it seemed to be a pod CPU issue rather than a database issue, so I mitigated by doubling the # of replicas. It brought back the response time within usable range, but it's still much higher than it was on 1.13.0: my P50 is 3x higher (~15ms) and my P95 is 2x higher (130ms). I'm not seeing any particular pressure on the database, and the CPU usage of my pods is way higher than it used to be (>0.5, used to be ~0.1), so I assume something is burning CPU that wasn't before. I have this problem with

write-both-read-new

but also

write-both-read-old

. I have traces but I'm not getting any insight from them, maybe you do? I'm concerned about whether my cluster will be able to handle the load on Monday: I'm already breaching my SLOs with these response times on a Sunday where we have low traffic 😰

williamdclt

11/27/2022, 12:02 PM

Note: I don't use the dispatch cluster

williamdclt

11/27/2022, 12:07 PM

Response times. - I started the

add-xid-constraints

a bit before midnight - it finished around 4am - around 7am, we see response time going wild, which correlate to our traffic - A bit before 11am, I added more replicas to handle the load

williamdclt

11/27/2022, 12:07 PM

CPU usage over the same period

williamdclt

11/27/2022, 12:44 PM

here's a cpu profile over 30s

cpu-20221127124140

Jake

11/27/2022, 2:46 PM

Can you get a trace from a 95th percentile request?

Jake

11/27/2022, 2:48 PM

You should also be able to safely roll back your pods to write both read old, does that improve the situation.

Jake

11/27/2022, 3:26 PM

Also, do you have Postgres GC turned on, and do you frequently reuse or recreate relationships using TOUCH operations?

williamdclt

11/27/2022, 5:54 PM

> Can you get a trace from a 95th percentile request? Should be able to, I'll try to get that > do you have Postgres GC turned on, and do you frequently reuse or recreate relationships using TOUCH operations? The SpiceDB GC you mean? Yes and yes, we use TOUCH a lot

williamdclt

11/27/2022, 5:55 PM

> You should also be able to safely roll back your pods to write both read old, does that improve the situation. Will try, but I don't think so. I started seeing higher response time as soon as I upgraded to 1.14.1, with write-both-read-old. Didn't think much of it at the time as it wasn't dramatic, it was late at night with low traffic

williamdclt

11/27/2022, 5:57 PM

Also seeing memory increasing a lot, correlating with response time getting higher and higher. A

kubectl rollout restart deployment

seems to calm things down, but still to higher-than-before response times

williamdclt

11/27/2022, 5:59 PM

I'd love to rollback to 1.13.0 to get back to a stable situation but it's not so easy after the

add-xid-constraints

, I expect writes will fail :/

Jake

11/27/2022, 5:59 PM

If you’re running write both read old the read path should be the same as v1.13

Jake

11/27/2022, 6:00 PM

I would be interested to see what is consuming more CPU in that configuration

Jake

11/27/2022, 6:01 PM

Using TOUCH by itself is ok, even exclusively. There is an issue when you have too many copies of the same relationship generated by touch events

Jake

11/27/2022, 6:01 PM

But that wouldn’t show up with write both read old

williamdclt

11/27/2022, 6:03 PM

Here's a trace for a slow query

Jake

11/27/2022, 6:05 PM

Dominated by queryrelationships

Jake

11/27/2022, 6:05 PM

Can you catch one of the slow queries in flight and run explain on it?

williamdclt

11/27/2022, 6:08 PM

switched to

read-old

(at the timestamp my cursor is at), it is maybe slightly better but not as fast as before

williamdclt

11/27/2022, 6:09 PM

> Can you catch one of the slow queries in flight and run explain on it? I don't think I can, I don't have anything logging queries in-flight. I only have the "generic" queries with parameter placeholders

williamdclt

11/27/2022, 6:09 PM

Unless SpiceDB logs that with LOG_LEVEL=debug?

Jake

11/27/2022, 6:11 PM

You will get them with log level debug or you can just run psql to dump running queries a few times by hand until you see one.

williamdclt

11/27/2022, 6:15 PM

I only get the generic plan with these 2 approaches:

Copy code

sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE created_transaction <= $1 AND (deleted_transaction = $2 OR deleted_transaction > $3) AND namespace = $4 AND relation = $5 AND object_id IN ($6) LIMIT 9223372036854775807

I'll try to EXPLAIN it with my best guess of parameters

williamdclt

11/27/2022, 6:17 PM

Copy code

sql
 Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.948 rows=0 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.947 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 81.407 ms
 Execution Time: 5.975 ms

williamdclt

11/27/2022, 6:18 PM

Interesting, it's using

ix_backfill_tuple_temp_2

which is an index I created to help the backfill of xid

Jake

11/27/2022, 6:18 PM

is that planning cost actually representative of the real runtime? it seems crazy

williamdclt

11/27/2022, 6:18 PM

I agree

williamdclt

11/27/2022, 6:20 PM

I got this plan by running

psql

on a spicedb pod, using the same connection string as SpiceDB so... it's as representative of the real runtime as I can imagine

Jake

11/27/2022, 6:20 PM

but i mean if you run that same query without explain does it take 90ms?

williamdclt

11/27/2022, 6:24 PM

it takes ~40ms

williamdclt

11/27/2022, 6:25 PM

Rerunning the same

EXPLAIN ANALYZE

gives me

Copy code

sql
 Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=0.075..0.075 rows=0 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=0.074..0.074 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 34.907 ms
 Execution Time: 0.104 ms
(6 rows)

So that's consistent with 40ms execution time

Jake

11/27/2022, 6:26 PM

can you run this?

select count(*) from relation_tuple group by namespace, relation, object_id, userset_namespace, userset_object_id, userset_relation order by count(*) desc limit 10;

Jake

11/27/2022, 6:26 PM

maybe on a copy of the DB if you're worried about DB CPU

williamdclt

11/27/2022, 6:32 PM

Copy code

sql
 count
-------
   952
   707
   707
   707
   707
   706
   692
   692
   681
   287
(10 rows)

Jake

11/27/2022, 6:32 PM

those don't seem unreasonable

Jake

11/27/2022, 6:33 PM

we have sort of a degenerate edge case that has 86k copies of the same tuple, and it definitely causes performance problems

Jake

11/27/2022, 6:33 PM

your backfill is done now?

Jake

11/27/2022, 6:34 PM

maybe try removing that index?

williamdclt

11/27/2022, 6:34 PM

> your backfill is done now? Yes

williamdclt

11/27/2022, 6:36 PM

> maybe try removing that index? I'll try that in a transaction first

Jake

11/27/2022, 6:42 PM

the last migration that you ran drops the old primary key indices, maybe something about the shape of your data on the old read path was relying on those bigserial primary keys

Jake

11/27/2022, 6:42 PM

seems unlikely...

williamdclt

11/27/2022, 6:45 PM

Without my custom index:

Copy code

sql
 Limit  (cost=0.69..691.69 rows=237 width=138) (actual time=4.441..4.442 rows=0 loops=1)
   ->  Index Scan using uq_relation_tuple_namespace on relation_tuple  (cost=0.69..691.69 rows=237 width=138) (actual time=4.440..4.440 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 37.745 ms
 Execution Time: 4.481 ms
(6 rows)

williamdclt

11/27/2022, 6:46 PM

it's pretty much the same. Not sure my custom index was really useful for anythign

Jake

11/27/2022, 6:46 PM

i don't really have any clue how to debug long planning time

Jake

11/27/2022, 6:47 PM

one thing to try is using real care_recipient IDs

Jake

11/27/2022, 6:47 PM

also try varying the number of them

williamdclt

11/27/2022, 6:50 PM

can do

williamdclt

11/27/2022, 6:52 PM

Copy code

Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=1.072..5.509 rows=138 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=1.071..5.495 rows=138 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{268ea094-92e4-11eb-a25b-067ff236ecb9,4bc4de79-1e3e-40ce-b58e-c5bd389e52ea,af6f7d21-eb5d-42a0-8b9e-cc55dddfeee0,99c8c11d-f6d2-471d-845b-ed2dbc21e822,90ae53f5-ad48-4929-9a12-b4a6ed8a3d80,6091d46b-4db7-11ea-8a02-06a80bfbb33e}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 0.568 ms
 Execution Time: 5.544 ms

williamdclt

11/27/2022, 6:53 PM

Planning time went down, but it's also gone down if I rerun the previous EXPLAIN ANALYZE...

Jake

11/27/2022, 6:54 PM

is that reduction reflected in your overall SLIs?

williamdclt

11/27/2022, 6:54 PM

Not really, I'm not seeing any difference in response time

Jake

11/27/2022, 6:55 PM

that query performance looks more or less like what I would expect

williamdclt

11/27/2022, 6:55 PM

me too. I'm not really sure the problem is in the database queries

williamdclt

11/27/2022, 6:56 PM

The pods CPU is much higher than on 1.13.0, that's what looks most suspect to me

williamdclt

11/27/2022, 6:57 PM

The profile I sent earlier shows a lot of time spent unmarshalling stuff for namespace caching

Jake

11/27/2022, 6:58 PM

yeah, the cache switched to using serialized copies for better cache costing

Jake

11/27/2022, 6:59 PM

but the switch to using vtprotobuf was supposed to offset the performance losses of having to deserialize and reserialize on each request

williamdclt

11/27/2022, 6:59 PM

I've tried disabling the ns cache (made things much worse, not unexpected) and raising/lowering the "ns cache max cost" (to no effect) earlier in the day

Jake

11/27/2022, 7:00 PM

yeah that de/reserialization is in the critical path of the happy path

williamdclt

11/27/2022, 7:01 PM

Is there anything I can do to test whether it's indeed this ns cache that's causing issues?

Jake

11/27/2022, 7:02 PM

not an easy test unfortunately, easiest way to test would probably be to make a build with the PR reverted

Jake

11/27/2022, 7:02 PM

if you want to isolate that as a factor

williamdclt

11/27/2022, 7:04 PM

Ouch, I'm not confident to do that 😬

williamdclt

11/27/2022, 7:04 PM

esp on the prod system

Jake

11/27/2022, 7:04 PM

right

Jake

11/27/2022, 7:07 PM

the commits don't cleanly revert either

Jake

11/27/2022, 7:08 PM

the "easiest" thing to do is probably to give you a downgrade version of the add-xid-constraints migration that will let you move down to v1.13

williamdclt

11/27/2022, 7:09 PM

That would be a great start 😅

williamdclt

11/27/2022, 7:09 PM

I did just find a maybe something

williamdclt

11/27/2022, 7:10 PM

getting rid of the

caveat_name, caveat_context

in the SELECT gives me this plan:

Copy code

sql
 Limit  (cost=0.69..205.12 rows=237 width=128) (actual time=0.072..0.072 rows=0 loops=1)
   ->  Index Only Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..205.12 rows=237 width=128) (actual time=0.071..0.071 rows=0 loops=1)
         Index Cond: ((namespace = 'care_recipient'::text) AND (object_id = ANY ('{1,2,3,4,5,6}'::text[])) AND (relation = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
         Heap Fetches: 0
 Planning Time: 0.520 ms
 Execution Time: 0.095 ms

williamdclt

11/27/2022, 7:10 PM

ie an INDEX ONLY scan

williamdclt

11/27/2022, 7:11 PM

The caveat columns aren't in the index, so PG needs to do some table reading which slows down the query a bunch

williamdclt

11/27/2022, 7:11 PM

That doesn't explain the raised CPU in my pods, but maybe there's 2 distinct things happening

Jake

11/27/2022, 7:12 PM

that's super common though, right? I don't think those query times are sustainable

williamdclt

11/27/2022, 7:12 PM

I don't get your point sorry? 😅

williamdclt

11/27/2022, 7:13 PM

My point is that v1.14.1 introduced these caveat columns in the SELECT (I think), which slows down the query as PG can't do an INDEX ONLY anymore. Could explain why I'm seeing higher response time

Jake

11/27/2022, 7:13 PM

i just mean, to me an index points to a row, if you always need to be able to read everything from the index to have acceptable performance that's probably not sustainable

williamdclt

11/27/2022, 7:14 PM

I agree

Jake

11/27/2022, 7:14 PM

I get what you're saying, but like the caveat context is a json column, I doubt you're going to fit it in your index

williamdclt

11/27/2022, 7:16 PM

Yeah

williamdclt

11/27/2022, 7:16 PM

but that's a pretty significant regression from 1.13.0 then

Jake

11/27/2022, 7:17 PM

if it holds in the general case, I would agree

williamdclt

11/27/2022, 7:23 PM

How feasible would it be to have a rollback migration? I'd be happy moving back to 1.13.0 and buy more time to investigate this 😅

jzelinskie

11/27/2022, 7:31 PM

since the migration is a pretty substantial change, i wonder if doing an ANALYZE on the table would help with planning

jzelinskie

11/27/2022, 7:32 PM

have you tried that yet?

williamdclt

11/27/2022, 7:32 PM

I did

williamdclt

11/27/2022, 7:32 PM

I can try again 😅

jzelinskie

11/27/2022, 7:32 PM

when did you do it last?

williamdclt

11/27/2022, 7:33 PM

~7h ago. Just reran it, I'm not seeing any improvement

jzelinskie

11/27/2022, 7:42 PM

What version of Postgres are you using?

williamdclt

11/27/2022, 7:42 PM

Joey

11/27/2022, 7:52 PM

@williamdclt if you add a temp index including the caveat_name, caveat_context (which should both be empty unless you're using caveats right now, which I doubt), do you see a significant performance impact overall?

williamdclt

11/27/2022, 7:56 PM

I can try

Joey

11/27/2022, 7:57 PM

if that works, we can look into skipping selection of those columns unless caveats is enabled (for now)

Joey

11/27/2022, 7:57 PM

but I'm quite curious to see if that is the primary driver

Jake

11/27/2022, 7:58 PM

this branch will manually revert the migrations to a place that is compatible with v1.13: https://github.com/authzed/spicedb/tree/revert-migration-constraints

Jake

11/27/2022, 7:58 PM

you'll have to make your own build though

jzelinskie

11/27/2022, 8:00 PM

another question: you don't have any more custom indices, right?

jzelinskie

11/27/2022, 8:00 PM

I want to try to rule out modifications you've made, because some of this might even be Postgres vs Aurora

williamdclt

11/27/2022, 8:02 PM

> but I'm quite curious to see if that is the primary driver I would expect that the pod CPU will stay high and limit throughput (possibly due to this marshalling/unmarshalling discussed earlier), but we'll see > another question: you don't have any more custom indices, right? No. Even this

ix_backfill_tuple_temp_2

is completely redundant with

uq_relation_tuple_namespace

actually

williamdclt

11/27/2022, 8:10 PM

Index created (

(namespace, object_id, relation, caveat_name, caveat_context, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction)

). The query now does an

INDEX ONLY

👍 I'll give it a few mins to see the impact on response time

Joey

11/27/2022, 8:12 PM

cool; if that does have a significant impact, we'll come up with at least a workaround tomorrow; to confirm: you're not using caveats right now, right?

williamdclt

11/27/2022, 8:13 PM

> you're not using caveats right now, right? No, we have no plan to use caveats in the near future

williamdclt

11/27/2022, 8:15 PM

@Jake what would I need to run for this rollback?

spicedb migrate XXX

Jake

11/27/2022, 8:16 PM

spicedb migrate head will do it

Jake

11/27/2022, 8:17 PM

just make SURE you're using the branch

Jake

11/27/2022, 8:18 PM

spicedb migrate add-xid-columns

should also do it, I made the destination migration name something that you can start moving forward from again

williamdclt

11/27/2022, 8:18 PM

No improvement from the new index :/

williamdclt

11/27/2022, 8:20 PM

I just observed something interesting: The

planning time

is very high (30-90ms) when I run

EXPLAIN ANALYZE

on my read replica (which is where all the permission checks are going). It's normal (<1ms) when running it on the master

Joey

11/27/2022, 8:20 PM

that's... odd

Jake

11/27/2022, 8:21 PM

that is very interesting, i guess you should dig into that first before doing anything drastic

williamdclt

11/27/2022, 8:21 PM

yeah

williamdclt

11/27/2022, 8:31 PM

Simplest example I can find is

explain analyze select deleted_transaction from relation_tuple WHERE (deleted_transaction > 49851674) limit 10

Jake

11/27/2022, 8:52 PM

ok, i'm going to go play with my kids, @ me if there's something more I can help with tonight

williamdclt

11/27/2022, 8:53 PM

Thank you for your help!

williamdclt

11/27/2022, 10:59 PM

I rollbacked to 1.13. - I ran the migations manually rather than making my own build, felt simpler and safer - The migration went OK but my 1.13.0 wasn't getting ready. The health probe checks that the current migration is known, and

add-xid-columns

isn't in 1.13.0. I set it to

add-ns-config-id

for now

williamdclt

11/27/2022, 11:00 PM

Thank you for your support, I'm going to bed now 😅 I'll have to pick that up with you, we're not in a great place now (our staging is fully on v1.14.0 but prod isn't)

Joey

11/27/2022, 11:01 PM

we think we have an idea on why the CPU usage is higher

Joey

11/27/2022, 11:01 PM

with 1.13, are the queries back to expected performance?

williamdclt

11/27/2022, 11:03 PM

Yes

williamdclt

11/27/2022, 11:03 PM

williamdclt

11/27/2022, 11:03 PM

downgrade was at 22:57

Joey

11/27/2022, 11:04 PM

jzelinskie

11/28/2022, 4:11 PM

No unexpected behavior today?

williamdclt

11/28/2022, 4:26 PM

All good on 1.13

2 Views

Previous Next