hello everyone, I'm here from the
# zanzibar
b
hello everyone, I'm here from the annotated paper (zanzibar.tech). I'm trying to grok the Leopard Indexing System section. Here's my understanding, please correct me if I am wrong: -
Member2Group(U)
returns all the groups for which U is a direct member -
Group2Group(G)
returns all the descendants of group G - These groups are implemented using data structures optimized for set operations - If the intersection of the results from Member2Group and Group2Group is non-empty, then U is a member of G - Though not explicitly stated in the paper, a tree is used to map Parent groups to child groups (so as to enumerate all the descendants)
v
I'm going to defer to @ecordell who's spent more time thinking about Leopard, but I'd say all of that is correct, although I cannot confirm the latter point, my understanding is that is not the case. The key is that Leopard wants to turn these membership calculations into O(1), and keeping a tree wouldn't give you that.
b
that's the part that confused me, for a system that's meant to index group membership, the
Cardinality(Intersection(Member2Group(U), Group2Group(G))) != 0
seems quite compute heavy
v
that's one of the challenges, yeah
e
this section describes how "leopard tuples" are stored at least at a high level: https://zanzibar.tech/2QaCOtx-it:0.1b4FuVweU:H they say they're using a "skiplist-like" structure for fast intersections, but the same structure may be used to seek to a specific parent group in O(logn). or they may build separate indexes on top, it's not really spelled out. but either way I would assume they can find the spanning regions of tuples for a given parent group (for group2group) and given user (for member2group) and then use the fast intersection they mention to compute whether the intersection is non-zero so if there are N entries, assume a logN seek to find the parent group, a logN seek to find the user, and then a M time to compute the intersection where M=min of the two sets and M is much smaller than N (i.e. if N is millions, M might be dozens or hundreds) and don't forget this index is sharded, so there may be some cap to N so that the absolute time of a given intersection computation can be limited just by adding more shards.
18 Views