Hello Guys.
# spicedb
r
Hello Guys. I have a slight problem in production where my pods are restarting due to a` invalid memory address or nil pointer dereference` ๐Ÿงต
I had a look on the log memory / cpu and all looked normal
y
that's unexpected. what version of the client are you using and what version of SpiceDB are you on? were things working and then the problem started manifesting?
r
spiceDB v1.44.0 client is Java grpc (not sure if thats super interesting) I will have a look at the logs of the last question
y
it might be ๐Ÿค”
do you know the request where it's happening and how you're constructing that request?
r
the first occurrence was at: 06/13 around that date we have an associated release one day earlier with:
feat: bump spice db from v1.39.1 to v1.44.0
y
iiinteresting
r
@yetitwo it wouldnt be my first suspession as we are using spiceDB for 3 years now. but it is standard generated code, using kotlin, we have 500ms timeout (server side on Java), from Java - SpiceDb we have 100-50ms timeouts with 3 retires (for checkPermissions)
if this rings any bells I can persue anything you feel suspicious about
y
it'd help to know which method is triggering it, or if all of them are
r
if you would like me to "downgrade spiceDb" I can perhaps purseu that avenue aswell to test
ah I wish I knew ๐Ÿค” (we are heavy users) is there a way to include traceid by omission on the logs?
on spiceDb side?
spiceDb OTEl is working fine, just not outputting a traceid for me to correlate requests.. I can perhaps check from timestampts, but I would be sure
y
hmm yeah i don't know that we have that implemented
wait
it seems like there should be something in the logs that can be used to correlate
are there request failures associated with this, or are the panics happening out-of-band? i'm looking at the trace and it seems like it's related to trace export which should be async
m
Hey @Rodolfo thanks for reporting this! Are you able to share the schema that you are using?
r
@yetitwo ups my bad ๐Ÿ˜„ We have traceIds on normal messages, just not on that system error. @mparnisari I do not think this problem comes from a request to spice. I believe it is internally span processing that the trace is pointed towards
/home/runner/go/pkg/mod/go.opentelemetry.io/otel/sdk@v1.35.0/trace/batch_span_processor.go:115 +0x2d9
This is probably the reason I do not have a traceId for that log line, as it comes from an internal job (probably from the OTEL lib) I already went though a similar problem around 2 years ago, where a bug in OTEL instrumentation was causing my application to crash. (happened to anyone that was not using OTEL metrics, as they were still being collected until an OOM) These are our configured parameters: OTEL_EXPORTER_OTLP_TRACES_ENDPOINT SPICEDB_OTEL_PROVIDER SPICEDB_OTEL_SERVICE_NAME SPICEDB_OTEL_TRACE_PROPAGATOR
I upgrade to 1.44 lets see if that helps
m
y
yeah the trace pointing at our protobuf code makes it sound like it's trying to include something from the dispatch call as context and then failing to marshal it because it's a nil value
y'all aren't using the
CountRelations
API, are you?