๐Ÿ‘‹ I've just made a `schema write` where I removed...
# spicedb
w
๐Ÿ‘‹ I've just made a
schema write
where I removed 3 permissions. I was expecting this to be instant, but it took ~3min, during which i observed a huge increased load on my (Postgres) database due to this query:
Copy code
SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation FROM relation_tuple WHERE created_transaction <= $1 AND (deleted_transaction = $2 OR deleted_transaction > $3) AND namespace = $4 AND relation = $5 LIMIT ?
How comes removing a permission requires a DB query? How comes it is so expensive? My DB CPU jumped from ~10% to 50-60% for 3 minutes: this was my staging environment, I'm concerned about deploying this to production :/
j
When you say 3 minutes -- it was 3 minutes until the clients saw the change or 3 minutes for the
schema write
to return?
w
The latter:
zed schema write
took 3 minutes to return
j
what version of spicedb? just to be sure
also have you configured
--datastore-revision-quantization-interval
to be something other than the default?
j
@williamdclt three permissions or three relations?
w
permissions
> what version of spicedb? just to be sure 1.7.1 > also have you configured --datastore-revision-quantization-interval to be something other than the default? it's explicitly at 5s
j
The reason that query is run is to ensure that removing a relation doesnโ€™t leave dangling data on it
We probably need to make it skip for permissions, since they cannot have data now
As for why the query takes so long, that is a good question, especially since it is limit 1
I imagine writes were occurring at the same time?
j
ah yeah it's definitely the sanity check
There have been some pretty big changes/optimizations to the pg datastore since 1.7.1 which makes the perf harder to reason about in retrospect
@Joey should I create an issue for skipping permissions in the sanity check?
j
yep
I'll address that later today, since it is a small fix
j
w
> I imagine writes were occurring at the same time? Yes, although not many Thanks for the fix guys ๐Ÿ™‚ appreciate it!
j
It should be a simple one to backport, but I'd suggest upgrading when the fix goes out
There are lots of performance fixes between 1.7.1 and 1.10.0/HEAD
j
@williamdclt could you do me a favor if you get a moment and run an explain on your Postgres for that query?
I'd be curious to see what its doing on your install
w
Copy code
sql
Limit  (cost=0.69..4677.70 rows=1 width=129) (actual time=56.761..56.762 rows=0 loops=1)
  ->  Index Scan using uq_relation_tuple_living on relation_tuple  (cost=0.69..135634.24 rows=29 width=129) (actual time=56.760..56.760 rows=0 loops=1)
        Index Cond: (((namespace)::text = 'agency'::text) AND ((relation)::text = 'caregiver_or_manager'::text))
        Filter: ((created_transaction <= 12666087) AND ((deleted_transaction = '9223372036854775807'::bigint) OR (deleted_transaction > 12666087)))
Planning Time: 0.410 ms
Execution Time: 56.790 ms
This is with fairly random params though:
Copy code
sql
EXPLAIN ANALYZE SELECT
    NAMESPACE,
    object_id,
    relation,
    userset_namespace,
    userset_object_id,
    userset_relation
FROM
    relation_tuple
WHERE
    created_transaction <= 52666087
    AND (
        deleted_transaction = 9223372036854775807
            OR deleted_transaction > 52666087
    )
    AND NAMESPACE = 'agency'
    AND relation = 'caregiver_or_manager'
LIMIT 1;
I'm also forever suspicious of having a different plan when it's a prepared statement too
j
hmmm
seems pretty quick
could indeed be the consistency level
> I'm also forever suspicious of having a different plan when it's a prepared statement too yeah
j
56ms is actually pretty slow for that
a filter over 29 rows returned by the index takes 56ms? something seems wrong
that definitely should be a single digit millisecond query
j
I meant compared to 60+s