williamdclt
11/27/2022, 12:02 PMadd-xid-constraints
migration, running write-both-read-new
, and I'm observing much higher response time. Initially the response time shot through the roof (>1s P95), it seemed to be a pod CPU issue rather than a database issue, so I mitigated by doubling the # of replicas. It brought back the response time within usable range, but it's still much higher than it was on 1.13.0: my P50 is 3x higher (~15ms) and my P95 is 2x higher (130ms).
I'm not seeing any particular pressure on the database, and the CPU usage of my pods is way higher than it used to be (>0.5, used to be ~0.1), so I assume something is burning CPU that wasn't before.
I have this problem with write-both-read-new
but also write-both-read-old
.
I have traces but I'm not getting any insight from them, maybe you do?
I'm concerned about whether my cluster will be able to handle the load on Monday: I'm already breaching my SLOs with these response times on a Sunday where we have low traffic π°add-xid-constraints
a bit before midnight
- it finished around 4am
- around 7am, we see response time going wild, which correlate to our traffic
- A bit before 11am, I added more replicas to handle the loadJake
11/27/2022, 2:46 PMwilliamdclt
11/27/2022, 5:54 PMkubectl rollout restart deployment
seems to calm things down, but still to higher-than-before response timesadd-xid-constraints
, I expect writes will fail :/Jake
11/27/2022, 5:59 PMwilliamdclt
11/27/2022, 6:03 PMJake
11/27/2022, 6:05 PMwilliamdclt
11/27/2022, 6:08 PMread-old
(at the timestamp my cursor is at), it is maybe slightly better but not as fast as beforeJake
11/27/2022, 6:11 PMwilliamdclt
11/27/2022, 6:15 PMsql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE created_transaction <= $1 AND (deleted_transaction = $2 OR deleted_transaction > $3) AND namespace = $4 AND relation = $5 AND object_id IN ($6) LIMIT 9223372036854775807
I'll try to EXPLAIN it with my best guess of parameterssql
Limit (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.948 rows=0 loops=1)
-> Index Scan using ix_backfill_tuple_temp_2 on relation_tuple (cost=0.69..643.69 rows=237 width=138) (actual time=5.947..5.947 rows=0 loops=1)
Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
Planning Time: 81.407 ms
Execution Time: 5.975 ms
ix_backfill_tuple_temp_2
which is an index I created to help the backfill of xidJake
11/27/2022, 6:18 PMwilliamdclt
11/27/2022, 6:18 PMpsql
on a spicedb pod, using the same connection string as SpiceDB so... it's as representative of the real runtime as I can imagineJake
11/27/2022, 6:20 PMwilliamdclt
11/27/2022, 6:24 PMEXPLAIN ANALYZE
gives me
sql
Limit (cost=0.69..643.69 rows=237 width=138) (actual time=0.075..0.075 rows=0 loops=1)
-> Index Scan using ix_backfill_tuple_temp_2 on relation_tuple (cost=0.69..643.69 rows=237 width=138) (actual time=0.074..0.074 rows=0 loops=1)
Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
Planning Time: 34.907 ms
Execution Time: 0.104 ms
(6 rows)
So that's consistent with 40ms execution timeJake
11/27/2022, 6:26 PMselect count(*) from relation_tuple group by namespace, relation, object_id, userset_namespace, userset_object_id, userset_relation order by count(*) desc limit 10;
williamdclt
11/27/2022, 6:32 PMsql
count
-------
952
707
707
707
707
706
692
692
681
287
(10 rows)
Jake
11/27/2022, 6:32 PMwilliamdclt
11/27/2022, 6:34 PMJake
11/27/2022, 6:42 PMwilliamdclt
11/27/2022, 6:45 PMsql
Limit (cost=0.69..691.69 rows=237 width=138) (actual time=4.441..4.442 rows=0 loops=1)
-> Index Scan using uq_relation_tuple_namespace on relation_tuple (cost=0.69..691.69 rows=237 width=138) (actual time=4.440..4.440 rows=0 loops=1)
Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{1,2,3,4,5,6}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
Planning Time: 37.745 ms
Execution Time: 4.481 ms
(6 rows)
Jake
11/27/2022, 6:46 PMwilliamdclt
11/27/2022, 6:50 PMLimit (cost=0.69..643.69 rows=237 width=138) (actual time=1.072..5.509 rows=138 loops=1)
-> Index Scan using ix_backfill_tuple_temp_2 on relation_tuple (cost=0.69..643.69 rows=237 width=138) (actual time=1.071..5.495 rows=138 loops=1)
Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = ANY ('{268ea094-92e4-11eb-a25b-067ff236ecb9,4bc4de79-1e3e-40ce-b58e-c5bd389e52ea,af6f7d21-eb5d-42a0-8b9e-cc55dddfeee0,99c8c11d-f6d2-471d-845b-ed2dbc21e822,90ae53f5-ad48-4929-9a12-b4a6ed8a3d80,6091d46b-4db7-11ea-8a02-06a80bfbb33e}'::text[])) AND ((relation)::text = 'caregiver'::text) AND (created_transaction <= 49851674))
Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
Planning Time: 0.568 ms
Execution Time: 5.544 ms
Jake
11/27/2022, 6:54 PMwilliamdclt
11/27/2022, 6:54 PMJake
11/27/2022, 6:55 PMwilliamdclt
11/27/2022, 6:55 PMJake
11/27/2022, 6:58 PMwilliamdclt
11/27/2022, 6:59 PMJake
11/27/2022, 7:00 PMwilliamdclt
11/27/2022, 7:01 PMJake
11/27/2022, 7:02 PMwilliamdclt
11/27/2022, 7:04 PMJake
11/27/2022, 7:04 PMwilliamdclt
11/27/2022, 7:09 PMcaveat_name, caveat_context
in the SELECT gives me this plan:
sql
Limit (cost=0.69..205.12 rows=237 width=128) (actual time=0.072..0.072 rows=0 loops=1)
-> Index Only Scan using ix_backfill_tuple_temp_2 on relation_tuple (cost=0.69..205.12 rows=237 width=128) (actual time=0.071..0.071 rows=0 loops=1)
Index Cond: ((namespace = 'care_recipient'::text) AND (object_id = ANY ('{1,2,3,4,5,6}'::text[])) AND (relation = 'caregiver'::text) AND (created_transaction <= 49851674))
Filter: ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 49851674))
Heap Fetches: 0
Planning Time: 0.520 ms
Execution Time: 0.095 ms
Jake
11/27/2022, 7:12 PMwilliamdclt
11/27/2022, 7:12 PMJake
11/27/2022, 7:13 PMwilliamdclt
11/27/2022, 7:14 PMJake
11/27/2022, 7:14 PMwilliamdclt
11/27/2022, 7:16 PMJake
11/27/2022, 7:17 PMwilliamdclt
11/27/2022, 7:23 PMjzelinskie
11/27/2022, 7:31 PMwilliamdclt
11/27/2022, 7:32 PMjzelinskie
11/27/2022, 7:32 PMwilliamdclt
11/27/2022, 7:33 PMjzelinskie
11/27/2022, 7:42 PMwilliamdclt
11/27/2022, 7:42 PMJoey
11/27/2022, 7:52 PMwilliamdclt
11/27/2022, 7:56 PMJoey
11/27/2022, 7:57 PMJake
11/27/2022, 7:58 PMjzelinskie
11/27/2022, 8:00 PMwilliamdclt
11/27/2022, 8:02 PMix_backfill_tuple_temp_2
is completely redundant with uq_relation_tuple_namespace
actually(namespace, object_id, relation, caveat_name, caveat_context, userset_namespace, userset_object_id, userset_relation, created_transaction, deleted_transaction)
).
The query now does an INDEX ONLY
π
I'll give it a few mins to see the impact on response timeJoey
11/27/2022, 8:12 PMwilliamdclt
11/27/2022, 8:13 PMspicedb migrate XXX
?Jake
11/27/2022, 8:16 PMspicedb migrate add-xid-columns
should also do it, I made the destination migration name something that you can start moving forward from againwilliamdclt
11/27/2022, 8:18 PMplanning time
is very high (30-90ms) when I run EXPLAIN ANALYZE
on my read replica (which is where all the permission checks are going). It's normal (<1ms) when running it on the masterJoey
11/27/2022, 8:20 PMJake
11/27/2022, 8:21 PMwilliamdclt
11/27/2022, 8:21 PMexplain analyze select deleted_transaction from relation_tuple WHERE (deleted_transaction > 49851674) limit 10
Jake
11/27/2022, 8:52 PMwilliamdclt
11/27/2022, 8:53 PMadd-xid-columns
isn't in 1.13.0. I set it to add-ns-config-id
for nowJoey
11/27/2022, 11:01 PMwilliamdclt
11/27/2022, 11:03 PMJoey
11/27/2022, 11:04 PMjzelinskie
11/28/2022, 4:11 PMwilliamdclt
11/28/2022, 4:26 PM