Hi guys just had some questions SpiceDB #spicedb

Hi guys just had some questions

Will Thornton

10/09/2023, 12:48 PM

Hi guys, just had some questions regarding LookupResources latency. We are currently attempting to host SpiceDB self hosted, and are struggling a bit with performance. We are running 6 AWS instances of spiceDB, on separate nodes, each with 4 vCPU and 8 Gb of RAM - this is pointed an RDS instance with the same. Looking at metrics for both, neither are particularly saturated. All of this is running in eu-west-2, as well as the applications that use this system. Even with minimize latency added to all of our queries (not really caring about the new-enemy problem) - while paginated - each of the individual requests are taking ~200ms to complete. Comparing this to your serverless environment, sending the same request - we see similar latencies - but that includes the round-trip to America. Considering the number of tuples we have, this can result in LookupResources requests taking ~15s if not more. Any advice here regarding how to improve our performance?

vroldanbet

10/09/2023, 12:59 PM

Hi 👋 What version are you running? I'd suggest looking into OpenTelemetry and running some checks with

--explain

to get a better understanding on the path traversed on the permission check used for

LookupResources

. Typically what we see is contention over connection pools - it's not so much a CPU/Memory problem on the SpiceDB side. You want to understand where is time being spent down to the SQL query.

LookupResources

can put a bunch of load over your database cluster. I suggest turning on Performance Insights for RDS to get a sense spikes are happening and queries are getting queued: https://aws.amazon.com/rds/performance-insights/

Will Thornton

10/09/2023, 1:11 PM

v1.25.0 of SpiceDB I see that the top few queries are roughly of the same type - I assume these are the LookupResource queries?

SELECT namespace, object_id, relation, userset_namespace, userset_object_id, userset_relation, caveat_name, caveat_context  FROM relation_tuple WHERE pg_visible_in_snapshot(created_xid, $1) = $2 AND pg_visible_in_snapshot(deleted_xid, $3) = $4 AND ((userset_namespace = $5 AND userset_object_id IN (<LIST OF OBJECT_IDS>))) AND namespace = $30 AND relation = $31 ORDER BY userset_namespace, userset_object_id, userset_relation, namespace, object_id, relation LIMIT $32

These take on order of `183±300`ms to complete, with the worst we've seen taking

3000ms

vroldanbet

10/09/2023, 1:32 PM

how long are the pages you are asking for? it sounds like the database is being the bottleneck. Chances are there is an index missing too.

Will Thornton

10/09/2023, 2:50 PM

We've got our block size set to the default setting for postgres: 8192 bytes We have the indexes currently, which are the default ones that came with SpiceDB:

ix_relation_by_deleted_xid

ix_relation_tuple_by_subject

and

ix_relation_tuple_by_subject_relation

Will Thornton

10/09/2023, 2:59 PM

I don't particularly mind optimizing heavily for read performance vs write performance

vroldanbet

10/09/2023, 3:23 PM

sorry, I actually mean the

limit

parameter in the

LookupResources

call, which defines the size of the page to return

vroldanbet

10/09/2023, 3:25 PM

it's expected you have those indexes, but for your access pattern there may be an unindexed access pattern we may be missing. RDS performance insights should tell you when queries are taking too long, and then you'd have to identify the query by enabling verbose logs and try running an

EXPLAIN

in RDS for that specific query that's being slow for you

Will Thornton

10/09/2023, 3:44 PM

Ah the limit parameter is always 1000 for these queries

Will Thornton

10/09/2023, 3:46 PM

Okay that makes some sense - noting that that above query is the worst performing, would an index over

userset_namespace

userset_object_id

namespace

and

relation

help here?

yetitwo

10/09/2023, 4:01 PM

@Will Thornton how are you running spicedb? is it in a k8s environment?

Will Thornton

10/09/2023, 4:02 PM

Yes it is

Will Thornton

10/09/2023, 4:03 PM

Specifically v1.24.16 on eks

vroldanbet

10/09/2023, 5:17 PM

@Will Thornton postgres query planning is a blackbox, I'd suggest: 1. capture an actual query that is being slow 2. run explain on your RDS cluster. 3. if it does a sequential scan or something inefficient, add index 4. run explain again until query time improves

Will Thornton

10/09/2023, 5:20 PM

Postgres query planning is massively opaque I agree - feels like black magic when you finally get someting working. Okay, will turn on the verbose logs and see what happens

Will Thornton

10/09/2023, 5:20 PM

See if I can find where it is struggling

vroldanbet

10/09/2023, 5:24 PM

I've had my fair share of RDS query planner over the last weeks, that was the strategy I followed, sorry I cannot give more advice. If you could provide a reproduction scenario, we could look into it

Joey

10/09/2023, 5:31 PM

> I assume these are the LookupResource queries?

Joey

10/09/2023, 5:31 PM

yeah, if you can get an EXPLAIN on one of them, that would help

Will Thornton

10/09/2023, 8:39 PM

"Limit (cost=379.12..379.24 rows=46 width=92) (actual time=0.201..0.202 rows=5 loops=1)" " -> Sort (cost=379.12..379.24 rows=46 width=92) (actual time=0.200..0.201 rows=5 loops=1)" " Sort Key: userset_object_id, userset_relation, object_id" " Sort Method: quicksort Memory: 25kB" " -> Bitmap Heap Scan on relation_tuple (cost=209.78..377.85 rows=46 width=92) (actual time=0.184..0.190 rows=5 loops=1)" " Recheck Cond: (((userset_object_id)::text = ANY ('{7,13,15,17,22,681}'::text[])) AND ((userset_namespace)::text = 'user'::text) AND ((namespace)::text = 'channel'::text) AND ((relation)::text = 'reader'::text))" " Heap Blocks: exact=5" " -> Bitmap Index Scan on ix_relation_tuple_by_subject (cost=0.00..209.76 rows=46 width=0) (actual time=0.179..0.180 rows=5 loops=1)" " Index Cond: (((userset_object_id)::text = ANY ('{7,13,15,17,22,681}'::text[])) AND ((userset_namespace)::text = 'user'::text) AND ((namespace)::text = 'channel'::text) AND ((relation)::text = 'reader'::text))"

Will Thornton

10/09/2023, 8:40 PM

That is horribly formatted, let me see if I can that as a nicer response

Will Thornton

10/09/2023, 8:42 PM

https://cdn.discordapp.com/attachments/1160921854811308113/1161041012832088104/query_plan.csv?ex=6536dab4&is=652465b4&hm=fdb71be4c88b4e6fc358b4bd488fbfe03c777fdb4e7ddea499236a9381f60d73&

Will Thornton

10/09/2023, 8:42 PM

This is for an example query from above

Joey

10/09/2023, 8:48 PM

execution time: 0.2ms

Joey

10/09/2023, 8:49 PM

do you have an example of a request taking 200+ms?

Will Thornton

10/09/2023, 10:13 PM

I'll enable the logs as suggested above, and then find an example for you

Will Thornton

10/10/2023, 11:19 AM

https://cdn.discordapp.com/attachments/1160921854811308113/1161261729565655130/query_plan.csv?ex=6537a843&is=65253343&hm=92a68a681e9c3e00074381741e484436d067071c9ee903e214fd8aa582794cc1& https://cdn.discordapp.com/attachments/1160921854811308113/1161261729892802581/query.txt?ex=6537a843&is=65253343&hm=69433edd9e93711ae40b39a20cfdcafa843f3354d5f0b32595b4b6635f2744dd&

Will Thornton

10/10/2023, 11:20 AM

There's a few cropping up like this

Will Thornton

10/10/2023, 11:21 AM

Also the 'excution time' here isn't necessarily accurate from what we see - when the database is under load as it's excuting lots of these queries, we're seeing excution times ~400 ms

Will Thornton

10/10/2023, 11:21 AM

Coming back from the postgres logs

Will Thornton

10/10/2023, 1:28 PM

The limit here looks very suspicious - being that it is the max of a signed 64 bit integer

9223372036854775807

Will Thornton

10/10/2023, 1:29 PM

I specify a limit of 1000 in all LookupResources requests

yetitwo

10/10/2023, 1:44 PM

thought: it may be an intermediate hop, where you need to get all results in order to have the final answer to the question contain the number of results that you want

Will Thornton

10/10/2023, 1:50 PM

Yeah, true

Will Thornton

10/10/2023, 1:50 PM

Just I get suspicious whenver I see numbers like that

yetitwo

10/10/2023, 2:21 PM

i feel you there

vroldanbet

10/10/2023, 4:00 PM

yeah there is an

IN

clause there, so the limit is implicit to the number of elements there, which I believe is controlled by the application

Joey

10/10/2023, 5:14 PM

This is an intermediate lookup

Joey

10/10/2023, 5:14 PM

Likely over an arrow@

Will Thornton

10/10/2023, 6:18 PM

Is there a reason why it would be that slow - even for an intermediate query? I do have a lot of relations in that table at this point but is there anything I can do about it

vroldanbet

10/10/2023, 6:37 PM

I'd suggest doing

EXPLAIN (ANALYZE, VERBOSE, BUFFERS)

to have more data around the query

vroldanbet

10/10/2023, 6:38 PM

if it's taking 200ms+, it must be loading more rows that it actually needs to execute teh query

vroldanbet

10/10/2023, 6:38 PM

do you happen to know how many relationships are in that table, and how many are there in that relation specifically?

134 Views

Previous Next