https://authzed.com logo
Title
l

littleberk

01/18/2023, 6:34 AM
Hi, I'm researching SpiceDB now. I am using postgres as a datasource. I have 350M records in the relation_tuple table, the size of the table is 150GB, while the size of the indexes is 115GB, is this normal? Can I use postgres for over 1 billion relations or should I be looking at something else?
v

vroldanbet

01/18/2023, 9:56 AM
hey @littleberk, all datastores rely on indexes to optimize the lookup of rows in the critical path of the application, which is necessary in order to achieve low latency. You can try with other datastores, I'd suggest CockroachDB, but I'm not sure it's going to be entirely different. That does not mean that the current setup with our indexes is absolutely optimal, I'm sure there are still things we could look into, but I have to say we were looking into covering indexes, which will be even heavier. I think @Jake has done some load-tests but I'm not sure we measured the size on disk.
I'd definitely suggest opening an issue to keep track of this
l

littleberk

01/20/2023, 10:12 AM
for information in the CockroachDB there are half as many indexes 3 instead of 6, there is also data compression This leads to the fact that if Postgress has 400 bytes of data per relation, whereas in CockroachDB it takes only 20 bytes per relation
v

vroldanbet

01/20/2023, 10:19 AM
Thanks @littleberk ! that's useful information to know, I'm assuming row level compression here is being the big differentiator, since tuples can be easily compressed (since most columns are strings, and there is a lot of redundancy at table level). I assume there must be also compression in Postgres but maybe its off by default, or kicks in under different conditions. Something the team has discussed is to move away from resource/subject types as strings, and instead use a foreign key, this would help with the amount of data needed to store. We haven't put much effort in optimizing disk usage since disk is cheap and instead the focus has been performance