I enabled enable-experimental-watchable-
# spicedb
c
I enabled enable-experimental-watchable-schema-cache as recommended by @vroldanbet but it seems to be causing some weird caching behavior. Turning it off immediately resolves the issue. The hit ratio is high for the duration of quantization window and then falls over several hours until it flatlines around 10%. In the "Reboots" chart each jump in cache hit ratio was caused by a reboot of the spicedb pods. It seems to stay high for about the length of the quantization window but then falls. In the "On->Off" graph enable-experimental-watchable-schema-cache was changed from true to false.
v
Hey that does not seem right, can you share all your SpiceDB flags?
c
RevisionQuantization and MaxRevisionStalenessPercent are exagerated for testing, but I was seeing it on small values as well. It was just harder to see the relationship between cache hit and quantization window so I increased them. https://cdn.discordapp.com/attachments/1205241511973879869/1205250986717741096/configuration.json?ex=65d7b073&is=65c53b73&hm=aba5d30490e1077a533d4288a30c7febb52bb3b889c7029c763025b1b7b256f7&
We were also seeing some weird performance issues at about 2x the quantization window. All of latency numbers were roughly doubling. Disabling enable-experimental-watchable-schema-cache fixed that as well.
j
you sure you're filtering the caching metrics?
the new schema cache also has its own caching metrics
so its possible that your prom metric names are matching both
I doubt that's the root cause, but worth checking
j
okay, those should be distinct then
and the dispatch cache should not be influenced by the schema cache at all
except in total memory used
its possible you're getting less caching due to faster eviction
but that seems unlikely unless your nodes are VERY memory limited
see what your hit rate is on the schema cache
v
ristretto (cache library used by SpiceDB) has some internal metrics which I believe can be exposed with an additional flag. It would be interesting to plot those and see what's happening when the schema cache is on/off. Maybe it's hitting the default cache capacity (which is inferred from the resource requests and is known to make inaccurate decisions) and thus causing eviction.
7 Views