https://authzed.com logo
Continuing off this thread
p

Perseus

03/14/2023, 5:52 AM
Continuing off this thread ^ I'm running into a weird problem while exposing SpiceDB behind an ALB on AWS - the ALB has a configuration for 'idle connection timeout' which is 60s by default. my request flow is - [External Service] -> [ALB] -> SpiceDB Pod(EKS) What I noticed after setting up the ALB was that there were requests on the external service that were taking as long as the total
idle connection timeout
config. By default, that value is 60s, so some requests would take 60s to resolve and then get timed-out. I increased it to 600s, and then those requests started taking 600s. Exposing SpiceDB through an ELB (eks allocates an ELB by default for any
LoadBalancer
services) works fine not sure if anyone has experienced this (assumption right now is that its some odd gRPC behavior with ALBs)
this is the response time graph of the external service interacting with SpiceDB. the flattening at the end is when I switched back from ALB to ELB
it pretty much looked like every http/2 connection would eventually run into this problem until a new one was established
v

vroldanbet

03/14/2023, 9:25 AM
I suspect connections may be not getting drained gracefully properly at either side of the load balancer. Your application could be attempting to use a connection that has since being closed by the load-balancer. Are you using authzed's go client? Have you aligned your client application connection to have a lifetime < 60s? This may also need some tweaking on the SpiceDB side, as I'm not sure we expose it - see https://pkg.go.dev/google.golang.org/grpc@v1.53.0/keepalive#ServerParameters
p

Perseus

03/14/2023, 10:01 AM
I'm using nodejs, so the node client
I haven't done any tweaking on the client gRPC settings - all defaults
v

vroldanbet

03/14/2023, 10:17 AM
I'd suggest exploring setting starting with forcing a max lifetime of connections of, say, 59s, and have ALB with idle connection of 60s. If the problem persists, then we need to look into the other side of the LB - SpiceDB
y

yetitwo

03/14/2023, 1:53 PM
also in my experience ALBs support gRPC, but only just, and gRPC isn't designed to be run through a load balancer
a gRPC client wants to know about all of the nodes associated with a service because it does client-side load balancing
p

Perseus

03/21/2023, 11:19 AM
managed to get request ids propagating from my application to spicedb - found that the requests werent actually making their way to spicedb, so its something between the application and the ALB
v

vroldanbet

03/21/2023, 11:20 AM
good tracing is here to help 😄
p

Perseus

03/21/2023, 11:21 AM
haha yeah, i think itd be helpful to document how to propagate those ids as well - i looked through the spicedb source code to figure out that I need to be adding that data into the
Metadata
for each request - new to gRPC, so wasnt aware of this
but no luck figuring out what it is between service -> ALB yet. i added 5s deadlines to each call, so now it gives a DEADLINE EXCEEDED error for a few calls after 5s, but id like to not have this happen at all. something to do with connection pooling or the likes id guess
v

vroldanbet

03/21/2023, 12:54 PM
this is a good point, I'll open an issue in the docs repository to make this clearer
my best guess is this related to stale connections in the pool caused by connection draining. ALB, just like any service, will prune connections after a given lifetime. This is necessary in order to able to perform operations in ALB (think deploying a new version of ALB). The reverse proxy terminates it's side of the TCP connection, but ghe client does, and the moment it goes to pick up the connection and use it, it's unusable. Perhaps the go gRPC client is not able to surface this properly and instead returns a deadline error. A potential exercise would be to look into a way to get the connection pool to evict connections with a lifetime. You could set it to something very low, say, 30s, and see if the problem continues