DAU used in the 1M QPS blogpost SpiceDB #spicedb

DAU used in the 1M QPS blogpost

sashayakovtseva_46690

11/09/2023, 9:14 AM

Hi guys! I am trying to run load test similar to one described [in your blogpost](https://authzed.com/blog/google-scale-authorization). Just like in the blogpost, I came to a conclusion that DAU modelling is crucial if we want to measure spicedb performance and cache leverage. So I wonder what sample factor did you use for you tests? 10% or 1% or any other? p.s. it seems like there is a mistake with user id sequence for sample factor 1% (should be 0, 100, ... 900).

vroldanbet

11/09/2023, 9:40 AM

We ran various tests with different combinations of RPS and dataset size. For the larger datasets it typically was under 1%.

vroldanbet

11/09/2023, 9:41 AM

We also ran tests to validate that increasing the sampling rate was a matter of adding more compute to it

vroldanbet

11/09/2023, 9:42 AM

1% of 1000 is 10 or am I missing something?

sashayakovtseva_46690

11/09/2023, 10:03 AM

> 1% of 1000 is 10 yes. doesn't that mean you'd have 10 users in a sample with IDs (0, 100, ..., 900) ?

vroldanbet

11/09/2023, 10:49 AM

you are right actually

vroldanbet

11/09/2023, 10:50 AM

so yeah an error in the post, well spotted

sashayakovtseva_46690

11/09/2023, 11:39 AM

> We also ran tests to validate that increasing the sampling rate was a matter of adding more compute to it Currently I observe the following. SpiceDB v1.27.0 and the quantization window remained at the default 5 seconds, with a max staleness of 100% (same as in the blogpost). Given 1kk users exist in SpiceDB, I try to run tests with sampling factor 10%. i.e 100k users are in the sample, and I randomly pick userID from (0, 10, 20, .... 99990) to send CheckPermission. For uniform distribution and 1k RPS that would mean same user hits the cache approximately every 100 seconds while cache is reusable during approximately 10s (according to my settings). Effectively according to metrics that gives <2% dispatch cache hit rate. So my logic here is that I either have to increase RPS or lower my sample rate to match generated load. Is my logic correct here? I can't reach the same cache hit ratio as you did and I wonder that may be wrong.

vroldanbet

11/09/2023, 12:25 PM

That is reasonable, except that we randomly picked from the pool, not sure if that's the same you did. And yes, cache hit rate will increase with RPS, a very big dataset with log RPS will see low cache hit rate

vroldanbet

11/09/2023, 12:27 PM

please note that SpiceDB's cache is built for hot-spot caching, not a cache in the traditional sense. https://authzed.com/blog/hotspot-caching-in-google-zanzibar-and-spicedb I understand that folks want to optimize for higher cache hit rate to lower the latency, but at least with the current Zanzibar inspired architecture that can't be achieved without compromising security

vroldanbet

11/09/2023, 12:28 PM

Using

at-least-as-fresh

can also get you better cache hit rates, because you are telling SpiceDB you are ok with an older revision

vroldanbet

11/09/2023, 12:28 PM

in contrast

minimize_latency

has to compute a new revision each quantization window (plus staleness offset)

vroldanbet

11/09/2023, 12:29 PM

if you are ok with more staleness, you can always increase the quantization window

sashayakovtseva_46690

11/09/2023, 12:29 PM

thanks for the confirmation! yeah, I've read that post as well 🙂 very well-written and clear unfortunately I can use

minimize_latency

now only, which effectively invalidates cache every 10s (for settings above).

vroldanbet

11/09/2023, 12:32 PM

is 10% of 100K a realistic DAU? is that something you identified in your application?

vroldanbet

11/09/2023, 12:32 PM

like 10% of your userbase active at all times?

sashayakovtseva_46690

11/09/2023, 1:14 PM

10% of my userbase is active during an hour e.g 1kk is total amount of users in SpiceDB and I know that average daily active users are around 27.5% or 275k users in 24 hours. that means each hour I get approx. 11.5k unique users, which is approx 10% of total userbase. Do you suggest I should calculate per-minute or per-second DAU instead?

vroldanbet

11/09/2023, 2:18 PM

I think so. You are generating 1K RPS out of a 10K pool. That means it takes 10 seconds to have all users active (roughly). That seems different than 10% of Userbase active in an hour.

2 Views

Previous Next