Hello Guys. SpiceDB #spicedb

Hello Guys.

Rodolfo

07/08/2025, 1:13 PM

Hello Guys. I have a slight problem in production where my pods are restarting due to a` invalid memory address or nil pointer dereference` 🧵

Rodolfo

07/08/2025, 1:16 PM

this is what we could see that the logs. https://cdn.discordapp.com/attachments/1392131382108749876/1392132151260221440/message.txt?ex=686e6b99&is=686d1a19&hm=da3e033efcc383cc9bc25043fd1501509e6366dfc83a44e2ac425aef793ffc99&

Rodolfo

07/08/2025, 1:16 PM

I had a look on the log memory / cpu and all looked normal

yetitwo

07/08/2025, 2:24 PM

that's unexpected. what version of the client are you using and what version of SpiceDB are you on? were things working and then the problem started manifesting?

Rodolfo

07/08/2025, 2:28 PM

spiceDB v1.44.0 client is Java grpc (not sure if thats super interesting) I will have a look at the logs of the last question

yetitwo

07/08/2025, 2:40 PM

it might be 🤔

yetitwo

07/08/2025, 2:40 PM

do you know the request where it's happening and how you're constructing that request?

Rodolfo

07/08/2025, 2:47 PM

the first occurrence was at: 06/13 around that date we have an associated release one day earlier with:

feat: bump spice db from v1.39.1 to v1.44.0

yetitwo

07/08/2025, 2:48 PM

iiinteresting

Rodolfo

07/08/2025, 2:49 PM

@yetitwo it wouldnt be my first suspession as we are using spiceDB for 3 years now. but it is standard generated code, using kotlin, we have 500ms timeout (server side on Java), from Java - SpiceDb we have 100-50ms timeouts with 3 retires (for checkPermissions)

Rodolfo

07/08/2025, 2:49 PM

if this rings any bells I can persue anything you feel suspicious about

yetitwo

07/08/2025, 2:50 PM

it'd help to know which method is triggering it, or if all of them are

Rodolfo

07/08/2025, 2:50 PM

if you would like me to "downgrade spiceDb" I can perhaps purseu that avenue aswell to test

Rodolfo

07/08/2025, 2:51 PM

ah I wish I knew 🤔 (we are heavy users) is there a way to include traceid by omission on the logs?

Rodolfo

07/08/2025, 2:51 PM

on spiceDb side?

Rodolfo

07/08/2025, 2:52 PM

spiceDb OTEl is working fine, just not outputting a traceid for me to correlate requests.. I can perhaps check from timestampts, but I would be sure

yetitwo

07/08/2025, 2:57 PM

hmm yeah i don't know that we have that implemented

yetitwo

07/08/2025, 2:58 PM

wait

yetitwo

07/08/2025, 2:58 PM

https://github.com/authzed/spicedb/pull/1772

yetitwo

07/08/2025, 2:58 PM

it seems like there should be something in the logs that can be used to correlate

yetitwo

07/08/2025, 3:11 PM

are there request failures associated with this, or are the panics happening out-of-band? i'm looking at the trace and it seems like it's related to trace export which should be async

mparnisari

07/08/2025, 4:26 PM

Hey @Rodolfo thanks for reporting this! Are you able to share the schema that you are using?

Rodolfo

07/09/2025, 9:34 AM

@yetitwo ups my bad 😄 We have traceIds on normal messages, just not on that system error. @mparnisari I do not think this problem comes from a request to spice. I believe it is internally span processing that the trace is pointed towards

/home/runner/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.35.0/trace/batch_span_processor.go:115 +0x2d9

This is probably the reason I do not have a traceId for that log line, as it comes from an internal job (probably from the OTEL lib) I already went though a similar problem around 2 years ago, where a bug in OTEL instrumentation was causing my application to crash. (happened to anyone that was not using OTEL metrics, as they were still being collected until an OOM) These are our configured parameters: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT SPICEDB_OTEL_PROVIDER SPICEDB_OTEL_SERVICE_NAME SPICEDB_OTEL_TRACE_PROPAGATOR

Rodolfo

07/09/2025, 10:06 AM

I upgrade to 1.44 lets see if that helps

mparnisari

07/09/2025, 3:30 PM

Hmm what is interesting is that there are no recent reports of panics in the OTEL library https://github.com/open-telemetry/opentelemetry-go/issues?q=is%3Aissue%20%22invalid%20memory%20address%20or%20nil%20pointer%20dereference%22

yetitwo

07/09/2025, 4:27 PM

yeah the trace pointing at our protobuf code makes it sound like it's trying to include something from the dispatch call as context and then failing to marshal it because it's a nil value

yetitwo

07/09/2025, 5:57 PM

y'all aren't using the

CountRelations

API, are you?

Previous Next