Hello Team, In our spicedb setup we have
# spicedb
r
Hello Team, In our spicedb setup we have only 2 spicedb nodes no client or operator. We are observing spikes in memory,cpu and disk as well and it crashes some times only in one pod. This is being observed majorly when data push is happening in to db. we are using postgresql as our db. Can you please helpe us solve this issue. Please let me know if you need our yaml details.
v
please provide information about the deployment setup, the client workload and the version you are using
y
this could be SpiceDB's garbage collection mechanism, especially if 1. you haven't modified the default spicedb settings 2. your data push does writes one-by-one and 3. your datastore is undersized
what is client workload means?
Spice DB version: 1.16
Can you please suggest what settings we need to do which will solve this issue? Currently we are using default spice DB settings.
v
what kind of requests are you doing to SpiceDB when it crashes
Please move to the most recent version 1.27.0
This looks OK. Are you sure dispatching is properly configured?
Disk seems unusual - make sure you don't have debug log level enabled
r
In Spice DB we are storing some permission related stuff which is around 80k data
sure
How we can check dispatching thing?
v
you can check that the SpiceDB DispatchCheck API is being called
r
but how we can configure this properly?
v
it should be something like this
Copy code
- name: SPICEDB_DISPATCH_UPSTREAM_ADDR
  value: kubernetes:///<your_service_name>.<your_service_kube_namespace>:<your_service_dispatch_port_name>
y
are you making the writes in batches or are you making those writes individually?
r
From kafka we are pushing the data one by one into spice DB.
Thanks can you please help me where it needs to be configure?
Can you please confirm what is spiceDB setting we need to configure to fix this issue?
Do you recommend to use batching to push the data into spiceDB?
v
this would be in your kubernetes Deployment. This is why we recommend using the operator so you don't have to deal with these bits and automate the upgrades - we strongly recommend moving to the operator
y
that's likely a problem. are you using postgres as your datastore?
we ran into an issue where pushing updates one-by-one through kafka and it created a new postgres snapshot for each update. by default those snapshots are retained for 24hrs and then garbage collected, but if the garbage collection can't complete within its timeout, it will thrash, which will cause high CPU usage on your datastore without any traffic on it
we fixed the issue by reducing the snapshot retention to 1hr (the window only needs to be as large as the maximum window in which you want to call
at_exact_snapshot
consistency) and making our kafka consumer do batch updates
upsizing our database also helped
v
that is fixed in 1.27.0 @yetitwo - GC is now way faster
the problem is that the queries were very inefficient. The GC should be able to keep up, as once it encounters garbage, it should iterate fast over it
We added a new index to it. The problem is that new index requires Postgres 15 to work, because the query planner would refuse to select it in earlier versious - due to the xid8 type, which was relatively new to PG
but at least we didn't observe this causing OOMKills - it saturated RDS and made SpiceDB slow overall
r
Thank you let us try upgrade the spiceDB version i hope that will solve this CPU memory issue.
Can you please suggest which version of postres we need to use as Datastore to fix this issue?
If we are upgrading spiceDB to 1.27.0 then what should the postgresql db version we need to use?
v
PG 15
r
Do we have any document of SpiceDB where its mentioned that to use PG 15 with SpiceDB Latest version?
Can we use Latest version PG16?
v
The minimum supported version is PG 13.8. SpiceDB is being tested against 13.7, 14, 15, 16. There should be documentation in the public docs about the minimum required version,.but not about the recommended PG15
r
What my question is to fix this spiceDB Spike issue We need to upgrade the version to 1.27 and we are using PG 14.7 so do we need to upgrade PG version to 15 or 16 to fix that Garbage collection issue?
v
Im not sure your issue is related with the GC inefficiency fixed in 1.27. What I would definitely suggest is to update because there have been many many perf improvements since 1.16. You don't need to update to PG15, it's not strictly necessary to run 1.27, but if indeed the problem was related to GC, then you need to at least run PG15 to have it fixed
r
Ok. Thank You. Let us try update spiceDB and upgrade PG to 15 then i will update you.
v
1.28 was just released - use that
r
Sure. Thanks !!