https://authzed.com logo
Title
w

williamdclt

11/27/2022, 12:02 PM
I'm having serious issues trying to upgrade to 1.14.1 on Postgres (manual, no-downtime upgrade process). I've done the
add-xid-constraints
migration, running
write-both-read-new
, and I'm observing much higher response time. Initially the response time shot through the roof (>1s P95), it seemed to be a pod CPU issue rather than a database issue, so I mitigated by doubling the # of replicas. It brought back the response time within usable range, but it's still much higher than it was on 1.13.0: my P50 is 3x higher (~15ms) and my P95 is 2x higher (130ms). I'm not seeing any particular pressure on the database, and the CPU usage of my pods is way higher than it used to be (>0.5, used to be ~0.1), so I assume something is burning CPU that wasn't before. I have this problem with
write-both-read-new
but also
write-both-read-old
. I have traces but I'm not getting any insight from them, maybe you do? I'm concerned about whether my cluster will be able to handle the load on Monday: I'm already breaching my SLOs with these response times on a Sunday where we have low traffic 😰
Note: I don't use the dispatch cluster
Response times. - I started the
add-xid-constraints
a bit before midnight - it finished around 4am - around 7am, we see response time going wild, which correlate to our traffic - A bit before 11am, I added more replicas to handle the load
CPU usage over the same period
here's a cpu profile over 30s
j

Jake

11/27/2022, 2:46 PM
Can you get a trace from a 95th percentile request?
You should also be able to safely roll back your pods to write both read old, does that improve the situation.
Also, do you have Postgres GC turned on, and do you frequently reuse or recreate relationships using TOUCH operations?
w

williamdclt

11/27/2022, 5:54 PM
> Can you get a trace from a 95th percentile request? Should be able to, I'll try to get that > do you have Postgres GC turned on, and do you frequently reuse or recreate relationships using TOUCH operations? The SpiceDB GC you mean? Yes and yes, we use TOUCH a lot
> You should also be able to safely roll back your pods to write both read old, does that improve the situation. Will try, but I don't think so. I started seeing higher response time as soon as I upgraded to 1.14.1, with write-both-read-old. Didn't think much of it at the time as it wasn't dramatic, it was late at night with low traffic
Also seeing memory increasing a lot, correlating with response time getting higher and higher. A
kubectl rollout restart deployment
seems to calm things down, but still to higher-than-before response times
I'd love to rollback to 1.13.0 to get back to a stable situation but it's not so easy after the
add-xid-constraints
, I expect writes will fail :/
j

Jake

11/27/2022, 5:59 PM
If you’re running write both read old the read path should be the same as v1.13
I would be interested to see what is consuming more CPU in that configuration
Using TOUCH by itself is ok, even exclusively. There is an issue when you have too many copies of the same relationship generated by touch events
But that wouldn’t show up with write both read old
w

williamdclt

11/27/2022, 6:03 PM
Here's a trace for a slow query
j

Jake

11/27/2022, 6:05 PM
Dominated by queryrelationships
Can you catch one of the slow queries in flight and run explain on it?
w

williamdclt

11/27/2022, 6:08 PM
switched to
read-old
(at the timestamp my cursor is at), it is maybe slightly better but not as fast as before
> Can you catch one of the slow queries in flight and run explain on it? I don't think I can, I don't have anything logging queries in-flight. I only have the "generic" queries with parameter placeholders
Unless SpiceDB logs that with LOG_LEVEL=debug?
j

Jake

11/27/2022, 6:11 PM
You will get them with log level debug or you can just run psql to dump running queries a few times by hand until you see one.
w

williamdclt

11/27/2022, 6:15 PM
I only get the generic plan with these 2 approaches:
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE created_transaction <= $1 AND (deleted_transaction = $2 OR deleted_transaction > $3) AND namespace = $4 AND relation = $5 AND object_id IN ($6) LIMIT 9223372036854775807
I'll try to EXPLAIN it with my best guess of parameters
sql
 Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.948 rows=0 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.947 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 81.407 ms
 Execution Time: 5.975 ms
Interesting, it's using
ix_backfill_tuple_temp_2
which is an index I created to help the backfill of xid
j

Jake

11/27/2022, 6:18 PM
is that planning cost actually representative of the real runtime? it seems crazy
w

williamdclt

11/27/2022, 6:18 PM
I agree
I got this plan by running
psql
on a spicedb pod, using the same connection string as SpiceDB so... it's as representative of the real runtime as I can imagine
j

Jake

11/27/2022, 6:20 PM
but i mean if you run that same query without explain does it take 90ms?
w

williamdclt

11/27/2022, 6:24 PM
it takes ~40ms
Rerunning the same
EXPLAIN ANALYZE
gives me
sql
 Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=0.075..0.075 rows=0 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=0.074..0.074 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 34.907 ms
 Execution Time: 0.104 ms
(6 rows)
So that's consistent with 40ms execution time
j

Jake

11/27/2022, 6:26 PM
can you run this?
select count(*) from relation_tuple group by namespace, relation, object_id, userset_namespace, userset_object_id, userset_relation order by count(*) desc limit 10;
maybe on a copy of the DB if you're worried about DB CPU
w

williamdclt

11/27/2022, 6:32 PM
sql
 count
-------
   952
   707
   707
   707
   707
   706
   692
   692
   681
   287
(10 rows)
j

Jake

11/27/2022, 6:32 PM
those don't seem unreasonable
we have sort of a degenerate edge case that has 86k copies of the same tuple, and it definitely causes performance problems
your backfill is done now?
maybe try removing that index?
w

williamdclt

11/27/2022, 6:34 PM
> your backfill is done now? Yes
> maybe try removing that index? I'll try that in a transaction first
j

Jake

11/27/2022, 6:42 PM
the last migration that you ran drops the old primary key indices, maybe something about the shape of your data on the old read path was relying on those bigserial primary keys
seems unlikely...
w

williamdclt

11/27/2022, 6:45 PM
Without my custom index:
sql
 Limit  (cost=0.69..691.69 rows=237 width=138) (actual time=4.441..4.442 rows=0 loops=1)
   ->  Index Scan using uq_relation_tuple_namespace on relation_tuple  (cost=0.69..691.69 rows=237 width=138) (actual time=4.440..4.440 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 37.745 ms
 Execution Time: 4.481 ms
(6 rows)
it's pretty much the same. Not sure my custom index was really useful for anythign
j

Jake

11/27/2022, 6:46 PM
i don't really have any clue how to debug long planning time
one thing to try is using real care_recipient IDs
also try varying the number of them
w

williamdclt

11/27/2022, 6:50 PM
can do
Limit  (cost=0.69..643.69 rows=237 width=138) (actual time=1.072..5.509 rows=138 loops=1)
   ->  Index Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..643.69 rows=237 width=138) (actual time=1.071..5.495 rows=138 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{268ea094-92e4-11eb-a25b-067ff236ecb9,4bc4de79-1e3e-40ce-b58e-c5bd389e52ea,af6f7d21-eb5d-42a0-8b9e-cc55dddfeee0,99c8c11d-f6d2-471d-845b-ed2dbc21e822,90ae53f5-ad48-4929-9a12-b4a6ed8a3d80,6091d46b-4db7-11ea-8a02-06a80bfbb33e}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
 Planning Time: 0.568 ms
 Execution Time: 5.544 ms
Planning time went down, but it's also gone down if I rerun the previous EXPLAIN ANALYZE...
j

Jake

11/27/2022, 6:54 PM
is that reduction reflected in your overall SLIs?
w

williamdclt

11/27/2022, 6:54 PM
Not really, I'm not seeing any difference in response time
j

Jake

11/27/2022, 6:55 PM
that query performance looks more or less like what I would expect
w

williamdclt

11/27/2022, 6:55 PM
me too. I'm not really sure the problem is in the database queries
The pods CPU is much higher than on 1.13.0, that's what looks most suspect to me
The profile I sent earlier shows a lot of time spent unmarshalling stuff for namespace caching
j

Jake

11/27/2022, 6:58 PM
yeah, the cache switched to using serialized copies for better cache costing
but the switch to using vtprotobuf was supposed to offset the performance losses of having to deserialize and reserialize on each request
w

williamdclt

11/27/2022, 6:59 PM
I've tried disabling the ns cache (made things much worse, not unexpected) and raising/lowering the "ns cache max cost" (to no effect) earlier in the day
j

Jake

11/27/2022, 7:00 PM
yeah that de/reserialization is in the critical path of the happy path
w

williamdclt

11/27/2022, 7:01 PM
Is there anything I can do to test whether it's indeed this ns cache that's causing issues?
j

Jake

11/27/2022, 7:02 PM
not an easy test unfortunately, easiest way to test would probably be to make a build with the PR reverted
if you want to isolate that as a factor
w

williamdclt

11/27/2022, 7:04 PM
Ouch, I'm not confident to do that 😬
esp on the prod system
j

Jake

11/27/2022, 7:04 PM
right
the commits don't cleanly revert either
the "easiest" thing to do is probably to give you a downgrade version of the add-xid-constraints migration that will let you move down to v1.13
w

williamdclt

11/27/2022, 7:09 PM
That would be a great start πŸ˜…
I did just find a maybe something
getting rid of the
caveat_name, caveat_context
in the SELECT gives me this plan:
sql
 Limit  (cost=0.69..205.12 rows=237 width=128) (actual time=0.072..0.072 rows=0 loops=1)
   ->  Index Only Scan using ix_backfill_tuple_temp_2 on relation_tuple  (cost=0.69..205.12 rows=237 width=128) (actual time=0.071..0.071 rows=0 loops=1)
         Index Cond: ((namespace = 'care_recipient'::text) AND (object_id = ANY ('{1,2,3,4,5,6}'::text[])) AND (relation = 'caregiver'::text) AND (created_transaction <= 49851674))
         Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
         Heap Fetches: 0
 Planning Time: 0.520 ms
 Execution Time: 0.095 ms
ie an INDEX ONLY scan
The caveat columns aren't in the index, so PG needs to do some table reading which slows down the query a bunch
That doesn't explain the raised CPU in my pods, but maybe there's 2 distinct things happening
j

Jake

11/27/2022, 7:12 PM
that's super common though, right? I don't think those query times are sustainable
w

williamdclt

11/27/2022, 7:12 PM
I don't get your point sorry? πŸ˜…
My point is that v1.14.1 introduced these caveat columns in the SELECT (I think), which slows down the query as PG can't do an INDEX ONLY anymore. Could explain why I'm seeing higher response time
j

Jake

11/27/2022, 7:13 PM
i just mean, to me an index points to a row, if you always need to be able to read everything from the index to have acceptable performance that's probably not sustainable
w

williamdclt

11/27/2022, 7:14 PM
I agree
j

Jake

11/27/2022, 7:14 PM
I get what you're saying, but like the caveat context is a json column, I doubt you're going to fit it in your index
w

williamdclt

11/27/2022, 7:16 PM
Yeah
but that's a pretty significant regression from 1.13.0 then
j

Jake

11/27/2022, 7:17 PM
if it holds in the general case, I would agree
w

williamdclt

11/27/2022, 7:23 PM
How feasible would it be to have a rollback migration? I'd be happy moving back to 1.13.0 and buy more time to investigate this πŸ˜…
j

jzelinskie

11/27/2022, 7:31 PM
since the migration is a pretty substantial change, i wonder if doing an ANALYZE on the table would help with planning
have you tried that yet?
w

williamdclt

11/27/2022, 7:32 PM
I did
I can try again πŸ˜…
j

jzelinskie

11/27/2022, 7:32 PM
when did you do it last?
w

williamdclt

11/27/2022, 7:33 PM
~7h ago. Just reran it, I'm not seeing any improvement
j

jzelinskie

11/27/2022, 7:42 PM
What version of Postgres are you using?
w

williamdclt

11/27/2022, 7:42 PM
13
j

Joey

11/27/2022, 7:52 PM
@williamdclt if you add a temp index including the caveat_name, caveat_context (which should both be empty unless you're using caveats right now, which I doubt), do you see a significant performance impact overall?
w

williamdclt

11/27/2022, 7:56 PM
I can try
j

Joey

11/27/2022, 7:57 PM
if that works, we can look into skipping selection of those columns unless caveats is enabled (for now)
but I'm quite curious to see if that is the primary driver
j

Jake

11/27/2022, 7:58 PM
this branch will manually revert the migrations to a place that is compatible with v1.13: https://github.com/authzed/spicedb/tree/revert-migration-constraints
you'll have to make your own build though
j

jzelinskie

11/27/2022, 8:00 PM
another question: you don't have any more custom indices, right?
I want to try to rule out modifications you've made, because some of this might even be Postgres vs Aurora
w

williamdclt

11/27/2022, 8:02 PM
> but I'm quite curious to see if that is the primary driver I would expect that the pod CPU will stay high and limit throughput (possibly due to this marshalling/unmarshalling discussed earlier), but we'll see > another question: you don't have any more custom indices, right? No. Even this
ix_backfill_tuple_temp_2
is completely redundant with
uq_relation_tuple_namespace
actually
Index created (
(namespace, object_id, relation, caveat_name, caveat_context, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction)
). The query now does an
INDEX ONLY
πŸ‘ I'll give it a few mins to see the impact on response time
j

Joey

11/27/2022, 8:12 PM
cool; if that does have a significant impact, we'll come up with at least a workaround tomorrow; to confirm: you're not using caveats right now, right?
w

williamdclt

11/27/2022, 8:13 PM
> you're not using caveats right now, right? No, we have no plan to use caveats in the near future
@Jake what would I need to run for this rollback?
spicedb migrate XXX
?
j

Jake

11/27/2022, 8:16 PM
spicedb migrate head will do it
just make SURE you're using the branch
or
spicedb migrate add-xid-columns
should also do it, I made the destination migration name something that you can start moving forward from again
w

williamdclt

11/27/2022, 8:18 PM
No improvement from the new index :/
I just observed something interesting: The
planning time
is very high (30-90ms) when I run
EXPLAIN ANALYZE
on my read replica (which is where all the permission checks are going). It's normal (<1ms) when running it on the master
j

Joey

11/27/2022, 8:20 PM
that's... odd
j

Jake

11/27/2022, 8:21 PM
that is very interesting, i guess you should dig into that first before doing anything drastic
w

williamdclt

11/27/2022, 8:21 PM
yeah
Simplest example I can find is
explain analyze select deleted_transaction from relation_tuple WHERE (deleted_transaction > 49851674) limit 10
j

Jake

11/27/2022, 8:52 PM
ok, i'm going to go play with my kids, @ me if there's something more I can help with tonight
w

williamdclt

11/27/2022, 8:53 PM
Thank you for your help!
I rollbacked to 1.13. - I ran the migations manually rather than making my own build, felt simpler and safer - The migration went OK but my 1.13.0 wasn't getting ready. The health probe checks that the current migration is known, and
add-xid-columns
isn't in 1.13.0. I set it to
add-ns-config-id
for now
Thank you for your support, I'm going to bed now πŸ˜… I'll have to pick that up with you, we're not in a great place now (our staging is fully on v1.14.0 but prod isn't)
j

Joey

11/27/2022, 11:01 PM
we think we have an idea on why the CPU usage is higher
with 1.13, are the queries back to expected performance?
w

williamdclt

11/27/2022, 11:03 PM
Yes
downgrade was at 22:57
j

Joey

11/27/2022, 11:04 PM
ok
j

jzelinskie

11/28/2022, 4:11 PM
No unexpected behavior today?
w

williamdclt

11/28/2022, 4:26 PM
All good on 1.13