Another question: do I still need to set
# spicedb
w
Another question: do I still need to set
plan_cache_mode = force_custom_plan
myself in postgres? How does SpiceDB handle that now?
v
You don't have to set it anymore, it's set by default, unless you specify
plan_cache_mode
in the DSN https://github.com/authzed/spicedb/blob/main/internal/datastore/postgres/postgres.go#L615-L634
w
I still find it sad that the generic plans are disabled for all queries, when only a single query had an issue. For example, the following query is by far the most commonly queries (~2500 queries/s in my system), and from a quick test it looks like a generic plan speeds it up by >3x (most of the time is spent in ). That really adds up, and I have some hope that it might free up a bunch of CPU which is my bottleneck ATM
Copy code
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND namespace = $5 AND relation = $6 AND object_id IN ($7) AND ((userset_namespace = $8 AND userset_object_id IN ($9) AND userset_relation = $10)) LIMIT 9223372036854775807
See the "planning time" in these 2 EXPLAIN ANALYZE:
With `plan_cache_mode = force_custom_plan`:
Copy code
sql
 Limit  (cost=0.69..8.72 rows=1 width=138) (actual time=0.033..0.034 rows=0 loops=1)
   ->  Index Scan using uq_relation_tuple_living_xid on relation_tuple  (cost=0.69..8.72 rows=1 width=138) (actual time=0.033..0.033 rows=0 loops=1)
         Index Cond: (((namespace)::text = 'care_recipient'::text) AND ((object_id)::text = '(268ea094-92e4-11eb-a25b-067ff236ecb9,af6f7d21-eb5d-42a0-8b9e-cc55dddfeee0,99c8c11d-f6d2-471d-845b-ed2dbc21e822,90ae53f5-ad48-4929-9a12-b4a6ed8a3d80,9f7a7d6d-9b6a-4b2e-8d33-dfbea8e4c59f)'::text) AND ((relation)::text = 'caregiver'::text) AND ((userset_namespace)::text = 'user_agency_role'::text) AND ((userset_object_id)::text = '(e3108ff2-4041-4287-972e-9fb5ca6d8cc7__cd79fa38-b6e8-4452-a14b-7f67141d0489,e3108ff2-4041-4287-972e-9fb5ca6d8cc7__cd79fa38-b6e8-4452-a14b-7f67141d0488,e3108ff2-4041-4287-972e-9fb5ca6d8cc7__cd79fa38-b6e8-4452-a14b-7f67141d0481)'::text) AND ((userset_relation)::text = '...'::text))
         Filter: (pg_visible_in_snapshot(created_xid, '795:799:795,797'::pg_snapshot) AND (NOT pg_visible_in_snapshot(deleted_xid, '795:799:795,797'::pg_snapshot)))
 Planning Time: 0.204 ms
 Execution Time: 0.057 ms
(6 rows)
With `plan_cache_mode = auto`:
Copy code
sql
 Limit  (cost=0.56..8.60 rows=1 width=138) (actual time=0.040..0.040 rows=0 loops=1)
   ->  Index Scan using ix_relation_tuple_by_subject on relation_tuple  (cost=0.56..8.60 rows=1 width=138) (actual time=0.039..0.039 rows=0 loops=1)
         Index Cond: (((userset_object_id)::text = $9) AND ((userset_namespace)::text = $8) AND ((userset_relation)::text = $10) AND ((namespace)::text = $5) AND ((relation)::text = $6))
         Filter: (((object_id)::text = $7) AND (pg_visible_in_snapshot(created_xid, $1) = $2) AND (pg_visible_in_snapshot(deleted_xid, $3) = $4))
 Planning Time: 0.055 ms
 Execution Time: 0.079 ms
(6 rows)
interestingly they don't use the same index actually 🤔
v
It's tricky to get all of the access patterns fully optimized, given all the variants SpiceDB exercises. I'm sure there are opportunities to make things better, but what works for your database may not work in another database instance, depending on the dataset.
w
I'd say that PG is likely to know better how to optimise a given query? It obviously wasn't true for the specific bug we hit, but it was the exception rather than the rule
v
You can enable the generic query plan via the URI right?
w
But then that would also enable generic query plans for the query that we know to not handle these well (the
SELECT MIN(id), MAX(id) from relation_tuple_transaction ...
), and basically kill my spicedb
v
Ie need to check if that query is still being used in the codebase
j
you should be able to override the param and just check to see if it still impacts
v
the query is no longer SELECT MIN, that predates the redesign of the MVCC in postgres: https://github.com/authzed/spicedb/blob/6390b785228c2d334fc1fdd7271e1fd24d6874e3/internal/datastore/postgres/revisions.go#L35-L43 which version are you using?
w
I'm on v1.27.0, will soon upgrade. It's totally possible that this plan_cache_mode discussion isn't relevant anymore! I'll try explicitely setting it to auto
j
yeah, we'd be quite curious to know
if we can remove it, wonderful
if not, we can discuss making it per query
w
Hm, I'm not sure how to set it actually! I'm using a postgresql URI (
--datastore-conn-uri="postgresql://user:password@mydb:5432/spicedb"
), which doesn't accept the
plan_cache_mode
setting as parameter AFAIK, only these params https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS
j
that's odd... that's how it worked before I thought
isn't that how you set it originally?
w
You're right it does work.
psql
can't deal with it but
pgx
(or whatever handles that) does
the bad news is that the problem is still very much there 😢 not sure if it's exactly the same problem, seems like this time it's this query that causes a massive CPU consumption of the DB:
Copy code
sql
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND namespace = $5 AND relation = $6 AND object_id IN ($7) AND ((userset_namespace = $8 AND userset_object_id IN ($9) AND userset_relation = $10)) LIMIT ?
https://cdn.discordapp.com/attachments/1234843902091984917/1235271657724317759/image.png?ex=6633c3e1&is=66327261&hm=39e40c3b27faeca285bdd0da0723363bac17c084dd24df3393fdc19e24743269&
(it goes back to normal at ~17:40 because I rolled back)
j
darn
I was afraid of that
the output above shows it helps for specific queries?
wait... that query looks like the one above that you showed was helped by moving to
auto
now I'm quite confused
v
that looks like a normal tuple query, not the same
min(), max()
query, which makes sense because all that change a while back after the MVCC for postgres was redesigned. If you set debug level, you should see the parameters used to execute the query, so you can replay it with an explain.
it could also be obtained for tracing, but it's disabled by default: https://github.com/authzed/spicedb/blob/5a7c1c2623543f7cfdbf912e91c3864d847918c2/internal/datastore/postgres/common/pgx.go#L193-L195. We could potentially turn it into a parameter.
w
> wait... that query looks like the one above that you showed was helped by moving to auto It's the query I hoped would be helped by moving to auto 😓 the EXPLAIN suggested that it would, seems like it's wrong
28 Views