Hello Authzed people!
We've had something fun happening on our SpiceDB deployment yesterday: a random SIGSEGV 🥲
- spicedb v1.25.0
- This was happening at pod start. The pod kept restarting for exactly an hour, then it solved itself.
- The first instance of the crash was at pod start (probably k8s rescheduling a pod), not a running instance
- After a few restarts, k8s scheduled a different pod (on a different node): it had the same thing happen (so it's not a pod or node-specific problem)
- Started at 20:48 UTC, solved itself at 21:49 (5min backoff, previous failed restart was 21:44)
- When it solved itself, it happened to be just when k8s rescheduled the workload on a third pod. Could be a coincidence.
- This other running pods were fine (although maybe restarting them at this time would have made them go loopy too)
- This was not related to any deployment, we haven't deployed anything in a while. We've been running 1.25.0 for a while too. Never had that issue.
- Doesn't seem correlated to any weird traffic, this is rather a low-traffic time for us (it was crashing at startup before any traffic anyway)
Any idea what this could have been? has this been seen before?
https://cdn.discordapp.com/attachments/844600078948630559/1214870533242494976/image.png?ex=65faaf5a&is=65e83a5a&hm=2610599845de5425fbcfd64541f7590fe7b4deae45522f732cdd9ba95c6bd746&