These points have been made before. This does not alter their validity.
Arguing that reliability in a critical system component is not of concern is absurd. Yes, the linux kernel can contain bugs. If you do care about reliability, you'll take steps to mitigate the chance of failure. This is a completely separate issue to the robustness and reliability of PID1; there's no need to confuse the discussion by mixing them together.
Ignoring the fact that this is an important concern does not do systemd any favours. systemd could certainly be changed to move the vast majority of the complexity in PID1 to a separate program running in a separate process. There's no intrinsic need for anything to be in PID1 except process reaping, starting/restarting another process and handling shutdown. Everything else can be done in another process; even shutdown--you can just exec the shutdown program. Take a look at how s6 is structured--it has a lot to be said for it, and there's no reason why systemd can't do this.
Posted Nov 5, 2012 1:04 UTC (Mon) by HelloWorld (guest, #56129)
[Link]
Yeah well, you have a point, an absolutely minimal PID 1 will probably have fewer bugs than systemd.
Anyway, I think the risk is tolerable. I have never seen systemd crash on the systems I've been using. Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew. The code to do that is used in other places too (i. e. configuration reloading and reboot-less upgrades), so it's not some obscure code path that is never tested.
You'd have to be very unlucky to hit a bug that makes systemd crash and corrupts its internal state enough for the recovery mechanism to fail.
GNOME and/or systemd
Posted Nov 5, 2012 20:22 UTC (Mon) by nix (subscriber, #2304)
[Link]
Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew.
Now that is neat. (There's still the faint possibility of corrupted state causing a loop of endless crashes, but that's not as bad as a panic.)
GNOME and/or systemd
Posted Nov 5, 2012 20:50 UTC (Mon) by jimparis (subscriber, #38647)
[Link]
Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew.
Now that is neat.
It also appears to be completely untrue. The way I read it, systemd will (optionally) dump core, (optionally) switch VTs, (optionally) spawn an emergency shell, and (unconditionally) freeze.
GNOME and/or systemd
Posted Nov 5, 2012 22:12 UTC (Mon) by raven667 (subscriber, #5198)
[Link]
That seems to be correct. There is logic in there for serializing state and re-execing itself which if I am reading correctly, is part of the startup process, so maybe the OP thought that was part of the crash recovery process. It seems that there is some infrastructure such that the described recovery _could_ be attempted, in the same fashion that it drops to /bin/sh on SIGSEGV,SIGILL,SIGFPE,SIGBUS,SIGQUIT,SIGABRT
GNOME and/or systemd
Posted Nov 6, 2012 0:23 UTC (Tue) by HelloWorld (guest, #56129)
[Link]
Uh, yes, sorry, I had misunderstood what someone told me in systemd's IRC channel.
Anyway, if you configure systemd to spawn a shell, you can exec systemd from there, so not everything is lost.