Averting excessive oopses

Posted Dec 1, 2022 18:05 UTC (Thu) by farnz (subscriber, #17727)
In reply to: Averting excessive oopses by esemwy
Parent article: Averting excessive oopses

It's also worth noting that this is an optional mechanism; if your system is meant to be extremely long running, and to keep going through oopses, you'd turn this off. But for a use case like Twitter (or Amazon Web Services, or Facebook, or Google), this mechanism will result in servers that are accumulating software faults panicking and thus rebooting before things get serious - and you'll have other monitoring that says "this server is rebooting frequently, take it out of service" already.

Averting excessive oopses

Posted Jan 5, 2023 14:50 UTC (Thu) by sammythesnake (guest, #17693) [Link]

There's a point where "very long running" bumps up against the mean time between kernel upgrades you don't want to miss. Longer ruining than that is probably not something to aspire to!

Providing the "cosmic ray" and "meh - non-critical subsystem" oopsen don't add up to 10,000 much more quickly than that, then your reboot rate will probably be pretty unaffected, I'd have thought.

For my own use case, the most common cause of reboots is accidentally knocking the power cable loose while hunting for something that fell down the back of my desk, so this is all a little academic for me :-P