https://authzed.com logo
hi we noticed during a DB maintenance
v

vad

05/02/2023, 10:02 AM
hi, we noticed during a DB maintenance that spicedb 1.19 is not able to reconnect to PG after a failover. The fix has been a pod restart for spicedb. Is this a known behaviour or should we investigate more? We run on AWS using RDS with multi AZ.
v

vroldanbet

05/02/2023, 11:54 AM
That seems unexpected. Theoretically, if
pgx
detects a connection to the backend is broken, it would remove it from the pool. I guess the usual node drain procedure for horizontally scalable databases like CRDB or Spanner does not work here (single primary), so SpiceDB has to "take the hit", but I would have expected it to eventually recover without manual intervention. I'd suggest opening an issue.
v

vad

05/02/2023, 12:03 PM
I'll try to reproduce it and then open an issue. Thank you
I'm not able to reproduce the issue locally. I've tried with a PG container which I moved to another IP (docker-compose down / up), and spice 1.19.0 pointed to a hostname defined in
/etc/hosts
(which I switch after the IP change). The main difference I can think of is that DNS resolution is not involved, but according to https://github.com/jackc/pgx/issues/913 it shouldn't be a problem
v

vroldanbet

05/03/2023, 1:30 PM
hmm, are you running the spicedb binary or the container for that test?
In case DNS resolution problems could be associated with anything around the container base image (we use chainguard images)
v

vad

05/03/2023, 2:45 PM
container in prod (k8s), binary locally. Will try that