hey there SpiceDB #spicedb

hey there

figedi

04/12/2024, 7:22 AM

hey there I am currently trying to implement a search index for auth checks for listing queries. Currently this is implemented with polling by using

LookupResources

on a self hosted instance in k8s using the spicedb-operator. There is not a high number of tuples (< 200k), and the expected outcome tuples for a single request are also relatively small (< 50k). Now i already gave the underlying db (postgres) a lot of resources, as well as the spicedb pods as well. I am experiencing either: - Really long response times (~ 30s-1min) - Errors: 4 DEADLINE_EXCEEDED Is there any way to configure spicedb to increase the deadline? List-query permission checking is an absolute must for us, so w/o it and w/o materialize being public (yet) I am not sure about other options :/ PS: As the underlying db is GCP cloudsql, i have some query insights about whats happening. So although i already gave the postgres 4 cpus and 16gigs of ram, it spikes to 100% cpu usage when requesting that data. It seems a bit fishy to me that this already leads to such high usage

vroldanbet

04/12/2024, 7:38 AM

Are you using optional_limits in your

LookupResources

queries?

vroldanbet

04/12/2024, 7:39 AM

The paper describes basically using

Expand

API for what you are trying to achieve. @ecordell has put some thought to it.

figedi

04/12/2024, 7:40 AM

yes i tried 3 things (nodejs user here) - promise based api - regular "streaming" api - limits w/ pagination (promise based) limits worked, but was even slower (> 2-3mins total response time)

vroldanbet

04/12/2024, 7:41 AM

well you have to use limits. The lack of limits is a backward compatibility guarnatee we left because it was the first design of the API, but it can cause the server to get OOMKilled because it has to buffer all elements in memory before streaming them

vroldanbet

04/12/2024, 7:41 AM

how many elements are you LR requests returning?

figedi

04/12/2024, 7:42 AM

just did some spot-checks, roughly 20-30k

vroldanbet

04/12/2024, 7:43 AM

yeah, so that's going to take a bit to compute, but 2m-3m seems like a lot, it may be missing indexes. Have you looked into the database profiler?

vroldanbet

04/12/2024, 7:44 AM

A user identified https://github.com/authzed/spicedb/issues/1687, and I wonder if it could be affecting other DBs too

figedi

04/12/2024, 7:45 AM

yep, so it does compute individual requests fast'ish, most load-causing query is something like this

Copy code

SELECT
  namespace,
  object_id,
  relation,
  userset_namespace,
  userset_object_id,
  userset_relation,
  caveat_name,
  caveat_context
FROM
  relation_tuple
WHERE
  pg_visible_in_snapshot(created_xid,
    $1) = $2
  AND pg_visible_in_snapshot(deleted_xid,
    $3) = $4
  AND namespace = $5
  AND relation = $6
  AND object_id IN ($7, <....>) LIMIT $107

It executes itself fast (6ms) but is called a ridiculous amount (600k times roughly)

figedi

04/12/2024, 7:45 AM

the queryplan itself looks okay https://cdn.discordapp.com/attachments/1228243823696805989/1228249740508921867/Screenshot_2024-04-12_at_09.45.50.png?ex=662b5bb5&is=6618e6b5&hm=692dd03e2d9148e6887eeb78057a7411c40a21e5b0274f7188a7277db7d8a0a1&

vroldanbet

04/12/2024, 7:46 AM

right. It really depends on your schema, chances are there is an opportunity to optimize how your schema is traversed.

vroldanbet

04/12/2024, 7:46 AM

To me it feels like something your schema makes every tuple reachable

figedi

04/12/2024, 7:47 AM

sounds reasonable! the schema is quite complex as it does something similar as in the playground with google cloud permissions

vroldanbet

04/12/2024, 7:48 AM

have you done

zed permission check --explain

on the same LR path you are using? What's the SpiceDB version?

vroldanbet

04/12/2024, 7:49 AM

you could use

zed backup create

and

zed backup redact

and sends us a dump of your schema, but being totally transparent we are swamped in work and it be best effort

figedi

04/12/2024, 7:49 AM

> have you done zed permission check --explain on the same LR path you are using? didnt do that, can try now > version latest 1.30.1

figedi

04/12/2024, 7:50 AM

> but being totally transparent we are swamped in work and it be best effort no worries, I have a call scheduled for next week to discuss further options. we are looking into hosting as well, don't really want to manage this myself 🙂

figedi

04/12/2024, 7:55 AM

regarding the permission check explain, should it output something? I am getting for a sample permission

Copy code

9:55AM INF debugging requested on check
true
9:55AM WRN No debuging information returned for the check

vroldanbet

04/12/2024, 10:56 AM

could you upgrade our zed version?

19 Views

Previous Next