ecordell we just went to upgrade to v1 SpiceDB #spicedb

ecordell we just went to upgrade to v1

mgagliardo

09/26/2023, 7:37 PM

@ecordell we just went to upgrade to v1.24.0 with latest operator - did any of the configuration for connections limits change?

mgagliardo

09/26/2023, 7:38 PM

The upgrade with the same configuration settings we had before led to a large spike and hold of database connection usage

Joey

09/26/2023, 7:39 PM

from which version?

mgagliardo

09/26/2023, 7:39 PM

v1.21.0

mgagliardo

09/26/2023, 7:39 PM

we are using

SPICEDB_DATASTORE_CONNPOOL_READ_MAX_OPEN

and similar for write

mgagliardo

09/26/2023, 7:39 PM

has that changed since 1.21?

mgagliardo

09/26/2023, 7:40 PM

also using postgres

mgagliardo

09/26/2023, 7:41 PM

It does not appear to be respecting those values

Joey

09/26/2023, 7:41 PM

I believe they changed at some point; @ecordell would know more

mgagliardo

09/26/2023, 7:42 PM

we got bit by this last time when it changed from

SPICEDB_DATASTORE_CONN_MAX_OPEN

to the ones split for read/write

Joey

09/26/2023, 7:43 PM

yeah, that was the change to which I was referring

ecordell

09/26/2023, 7:43 PM

Try

SPICEDB_DATASTORE_CONN_POOL_READ_MAX_OPEN

datastoreConnPoolReadMaxOpen

we changed the names and kept the old ones around for backwards compatibility, but there is a bug in 1.24 where the env var parsing bypassed the backwards-compatible names

mgagliardo

09/26/2023, 7:45 PM

Okay, I will try that now. Btw, we are only using the operator and deploying this directly to K8s. Is there anyway we can ensure that these dont bite us next time? We ran into the same thing last time with an upgrade where the config names changed and I'm not finding a way to ensure we are aware of it without trying to monitor this channel/the repo.

mgagliardo

09/26/2023, 7:46 PM

We aren't running authzed manually, so the deprecation warnings/etc are somewhat hidden from us unless we log into the pods

Joey

09/26/2023, 7:46 PM

moving forward we're endevouring to keep the config back-compat

Joey

09/26/2023, 7:46 PM

but in this case that apparently didn't work

ecordell

09/26/2023, 7:47 PM

we can also address some of these directly in the operator; adjusting the flags that get passed to spicedb to match the version

mgagliardo

09/26/2023, 7:48 PM

I would even prefer that the deployment fails if we pass configs that are invalid, but this one appears to just have ignored the old ones and used the default values which put some strain on our available connections

mgagliardo

09/26/2023, 7:49 PM

Any method of having that immediate feedback and either stopping the deployment or warning in someway would be nice to have

ecordell

09/26/2023, 7:53 PM

That is good feedback. It's one drawback to using env vars for configuration, having extra around doesn't error out like cli flags do. Do you check spicedb and/or operator release notes before upgrading? if we had put a good message there, would you have seen it? Do you run a separate canary and/or stage instance? or that wouldn't've caught this because it is only noticeable at scale?

mgagliardo

09/26/2023, 7:57 PM

Yes, I normally will check SpiceDB and Operator notes to ensure there is no breaking changes before we do an upgrade. We then will bump our operator, ensure that it picks up the new version in the channel status, and then bump the cluster. This is actually our staging instance, so there was not a prod outage here, but I only raised the concern because we ran into the same issue last attempt at upgrading when they were split out to read/write (and I believe it was not backcompat for that change). I'm not sure if it would be too much on the operator, but it would be interesting if the operator could leverage the status/upgrade channel logic to also indicate deprecated config/env vars based on the new version - that could be a stretch, but I do monitor that to ensure that there isn't any crazy migration we need to be ready for

Previous Next