|
|
Subscribe / Log in / New account

The Debian init system general resolution returns

The Debian init system general resolution returns

Posted Oct 18, 2014 15:07 UTC (Sat) by ctpm (guest, #35884)
In reply to: The Debian init system general resolution returns by cortana
Parent article: The Debian init system general resolution returns

>Hardware watchdogs can be used to reboot the system if either systemd or the kernel stop responding. You can read more about these features at http://0pointer.de/blog/projects/watchdog.html.

Yes, hardware watchdogs have been around for a very very long time and are obviously useful when a *kernel* stops responding or there is no process running in userspace to let you in and recover the system manually. And that is where HW watchdog use is justified -- as a last resort.

But if your failure is caused by a bug on a potentially complex thing like systemd, which could be recoverable by restarting that service alone, why would you want to bring the system down with mounted filesystems and everything? Think of a server exporting luns via network or a distributed FS shard. That spontaneous reboot may cost quite a lot in terms of recovery for the whole system. Why would you choose to nuke the system as a first option, when a least a good portion of it might be still doing useful work?

So IOW, I'm not saying that the service monitoring and recovery features on systemd are useless. I'm saying that the complex logic and state machines to handle that, should run on a child of PID 1, not in PID 1 itself.
Yes, PID 1 has the advantage of adopting every orphan in the system, but the costs of a failure can be quite high. I think a more useful approach would be to work with the kernel people to perhaps solve the waiting and re-parenting problem, instead of just overloading PID 1 and hoping for the best, and that nobody else minds about being restarted by HW watchdog. It's not like the kernel isn't already getting code whose main user will be systemd anyway.


to post comments

The Debian init system general resolution returns

Posted Oct 18, 2014 18:43 UTC (Sat) by peter-b (subscriber, #66996) [Link]

systemd uses the hardware watchdog to check if systemd stops responding. systemd then provides watchdog functionality to services. So if a service stops responding, systemd restarts the service, and if systemd stops responding (and thus stops monitoring services), the hardware watchdog restarts the system.

I'm not sure why you consider it unreasonable to put functions for starting and monitoring services into a process that's sole role and reason for being is to start and monitor services...


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds