ecordell we just went to upgrade to v1
# spicedb
m
@ecordell we just went to upgrade to v1.24.0 with latest operator - did any of the configuration for connections limits change?
The upgrade with the same configuration settings we had before led to a large spike and hold of database connection usage
j
from which version?
m
v1.21.0
we are using
SPICEDB_DATASTORE_CONNPOOL_READ_MAX_OPEN
and similar for write
has that changed since 1.21?
also using postgres
It does not appear to be respecting those values
j
I believe they changed at some point; @ecordell would know more
m
we got bit by this last time when it changed from
SPICEDB_DATASTORE_CONN_MAX_OPEN
to the ones split for read/write
j
yeah, that was the change to which I was referring
e
Try
SPICEDB_DATASTORE_CONN_POOL_READ_MAX_OPEN
/
datastoreConnPoolReadMaxOpen
we changed the names and kept the old ones around for backwards compatibility, but there is a bug in 1.24 where the env var parsing bypassed the backwards-compatible names
m
Okay, I will try that now. Btw, we are only using the operator and deploying this directly to K8s. Is there anyway we can ensure that these dont bite us next time? We ran into the same thing last time with an upgrade where the config names changed and I'm not finding a way to ensure we are aware of it without trying to monitor this channel/the repo.
We aren't running authzed manually, so the deprecation warnings/etc are somewhat hidden from us unless we log into the pods
j
moving forward we're endevouring to keep the config back-compat
but in this case that apparently didn't work
e
we can also address some of these directly in the operator; adjusting the flags that get passed to spicedb to match the version
m
I would even prefer that the deployment fails if we pass configs that are invalid, but this one appears to just have ignored the old ones and used the default values which put some strain on our available connections
Any method of having that immediate feedback and either stopping the deployment or warning in someway would be nice to have
e
That is good feedback. It's one drawback to using env vars for configuration, having extra around doesn't error out like cli flags do. Do you check spicedb and/or operator release notes before upgrading? if we had put a good message there, would you have seen it? Do you run a separate canary and/or stage instance? or that wouldn't've caught this because it is only noticeable at scale?
m
Yes, I normally will check SpiceDB and Operator notes to ensure there is no breaking changes before we do an upgrade. We then will bump our operator, ensure that it picks up the new version in the channel status, and then bump the cluster. This is actually our staging instance, so there was not a prod outage here, but I only raised the concern because we ran into the same issue last attempt at upgrading when they were split out to read/write (and I believe it was not backcompat for that change). I'm not sure if it would be too much on the operator, but it would be interesting if the operator could leverage the status/upgrade channel logic to also indicate deprecated config/env vars based on the new version - that could be a stretch, but I do monitor that to ensure that there isn't any crazy migration we need to be ready for