LWN.net Logo

GNOME and/or systemd

GNOME and/or systemd

Posted Nov 3, 2012 2:45 UTC (Sat) by HelloWorld (guest, #56129)
In reply to: GNOME and/or systemd by nix
Parent article: GNOME and/or systemd

> And I don't care. In my entire professional life I have never ever been bitten by this problem.
Well, I guess you hadn't been bitten by the recent ext4 problem either until recently. And besides, even if you haven't, others have been, see Cyberax' comment.

> Er, if that's happened your network interfaces have almost certainly been shut down as well. How on earth is systemd supposed to help here?
systemd helps because, unlike sysvinit, it can reliably terminate a service so you don't have to mess around with kill(1) in the first place.

> You don't get it. They don't fail for me.
Well, lucky you. They do fail for others. I've had at least one boot failure due to a broken init script myself. And it broke for others too, see Scott's blog entry.

> but it is more unstable than sysvinit PID 1,
So you keep saying. But where are all those bugs that supposedly crash systemd all the time? I'm not from Missouri, but you'll have to show me anyway.

Besides, as Cyberax and raven667 have pointed out, there's not actually a whole lot going on in systemd itself, most of the recent activity is in systemd-journald and systemd-logind, both of which don't have PID 1.


(Log in to post comments)

GNOME and/or systemd

Posted Nov 3, 2012 10:06 UTC (Sat) by rleigh (subscriber, #14622) [Link]

> > but it is more unstable than sysvinit PID 1,
> So you keep saying. But where are all those bugs that supposedly crash
> systemd all the time? I'm not from Missouri, but you'll have to show me
> anyway.

Code has bugs, and the number of bugs increases as a function of the code size. systemd is much bigger than sysvinit, with a correspondingly larger probability of hitting such a bug. init is absolutely critical, and having it kept as tiny and simple as possible is essential for a reliable system.

This is not to say that systemd can't be large and complex, just that the complexity should not be in PID1. There's no reason why systemd PID1 can't be as small and tiny as sysvinit, with the rest in other processes. As a good example, look at s6, which has focussed on reliability to an even greater extent. There is no reason why systemd and other init systems couldn't adopt this approach.

At present, running a safety-critical or guaranteed reliable system with systemd is an untenable proposition. The risk of failure is too high. This isn't about "bugs that supposedly crash systemd"--it doesn't matter if any have been found or not. It's about the fact that a fault in PID1 will bring the system down, and managing that risk. systemd is a much greater risk than sysvinit. It's more reliable in other ways, as discussed in the thread. But that improvement is irrelevant so long as systemd PID1 remains a critical point of failure that is impossible to validate for correctness.

Regards,
Roger

GNOME and/or systemd

Posted Nov 3, 2012 11:39 UTC (Sat) by HelloWorld (guest, #56129) [Link]

> Code has bugs, and the number of bugs increases as a function of the code size. systemd is much bigger than sysvinit, with a correspondingly larger probability of hitting such a bug.
Great! Show one!

GNOME and/or systemd

Posted Nov 3, 2012 11:44 UTC (Sat) by rleigh (subscriber, #14622) [Link]

> > Code has bugs, and the number of bugs increases as a function of the
> > code size. systemd is much bigger than sysvinit, with a correspondingly
> > larger probability of hitting such a bug.
> Great! Show one!

This is unhelpful, and completely ignores what I said.

GNOME and/or systemd

Posted Nov 3, 2012 11:58 UTC (Sat) by HelloWorld (guest, #56129) [Link]

> This is unhelpful, and completely ignores what I said.
That's because there's nothing in your comment that hadn't been said already, bold formatting notwithstanding.

And again, if you care so much about reliability, the elephant in the room is the Linux kernel. If that is an acceptable risk, then so is systemd.

GNOME and/or systemd

Posted Nov 3, 2012 12:19 UTC (Sat) by rleigh (subscriber, #14622) [Link]

These points have been made before. This does not alter their validity.

Arguing that reliability in a critical system component is not of concern is absurd. Yes, the linux kernel can contain bugs. If you do care about reliability, you'll take steps to mitigate the chance of failure. This is a completely separate issue to the robustness and reliability of PID1; there's no need to confuse the discussion by mixing them together.

Ignoring the fact that this is an important concern does not do systemd any favours. systemd could certainly be changed to move the vast majority of the complexity in PID1 to a separate program running in a separate process. There's no intrinsic need for anything to be in PID1 except process reaping, starting/restarting another process and handling shutdown. Everything else can be done in another process; even shutdown--you can just exec the shutdown program. Take a look at how s6 is structured--it has a lot to be said for it, and there's no reason why systemd can't do this.

Regards,
Roger

GNOME and/or systemd

Posted Nov 5, 2012 1:04 UTC (Mon) by HelloWorld (guest, #56129) [Link]

Yeah well, you have a point, an absolutely minimal PID 1 will probably have fewer bugs than systemd.

Anyway, I think the risk is tolerable. I have never seen systemd crash on the systems I've been using. Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew. The code to do that is used in other places too (i. e. configuration reloading and reboot-less upgrades), so it's not some obscure code path that is never tested.

You'd have to be very unlucky to hit a bug that makes systemd crash and corrupts its internal state enough for the recovery mechanism to fail.

GNOME and/or systemd

Posted Nov 5, 2012 20:22 UTC (Mon) by nix (subscriber, #2304) [Link]

Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew.
Now that is neat. (There's still the faint possibility of corrupted state causing a loop of endless crashes, but that's not as bad as a panic.)

GNOME and/or systemd

Posted Nov 5, 2012 20:50 UTC (Mon) by jimparis (subscriber, #38647) [Link]

Also systemd doesn't just abort on SIGSEGV, it serializes its state and then execs itself anew.
Now that is neat.
It also appears to be completely untrue. The way I read it, systemd will (optionally) dump core, (optionally) switch VTs, (optionally) spawn an emergency shell, and (unconditionally) freeze.

GNOME and/or systemd

Posted Nov 5, 2012 22:12 UTC (Mon) by raven667 (subscriber, #5198) [Link]

That seems to be correct. There is logic in there for serializing state and re-execing itself which if I am reading correctly, is part of the startup process, so maybe the OP thought that was part of the crash recovery process. It seems that there is some infrastructure such that the described recovery _could_ be attempted, in the same fashion that it drops to /bin/sh on SIGSEGV,SIGILL,SIGFPE,SIGBUS,SIGQUIT,SIGABRT

GNOME and/or systemd

Posted Nov 6, 2012 0:23 UTC (Tue) by HelloWorld (guest, #56129) [Link]

Uh, yes, sorry, I had misunderstood what someone told me in systemd's IRC channel.

Anyway, if you configure systemd to spawn a shell, you can exec systemd from there, so not everything is lost.

GNOME and/or systemd

Posted Nov 3, 2012 20:32 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link]

>tiny and simple as possible is essential for a reliable system.
Yet init(1) is not reliable. Never has been, never will.

What are you going to do now?

GNOME and/or systemd

Posted Nov 4, 2012 1:16 UTC (Sun) by rleigh (subscriber, #14622) [Link]

This really isn't about "init vs systemd", it's about the complexity and robustness of PID1, whatever the init system in use might be.

There is a difference between the reliability of PID1 (e.g. /sbin/init) and the reliability of the programs run by that init such as rc (/etc/init.d/rc) for runlevel change, getty, and then e.g. individual init scripts run by rc/startpar.

In the case of sysvinit, init itself is small, simple and robust. It does little more than run rc on runlevel change, respawn gettys and handle a few other events such as shutdown signals. There is nothing stopping systemd, or a systemd-like complex init running as a respawnable service run directly from init (like getty), layering the more complex stuff on top of an ultra-simple PID1. This is partly what openrc does, building a more complex dependency-based boot on top of sysvinit.

The point here is that a bug in rc or getty will not kill init. And a bug in an init script will not kill rc. PID1 will carry on running, as will your system, if there is a bug in one of these higher level layers. Even in the case of sysvinit, there is scope to strip down PID1 even further--the runlevel change and service respawning could be moved into a separate process, as could shutdown.

While systemd does split some still out into additional binaries, the chance of a bug compromising PID1 functioning is much, much higher. Upstart is in a similar situation. Neither of these /need/ to have the complexity directly in PID1.

GNOME and/or systemd

Posted Nov 4, 2012 1:33 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

Well, in this case the sysvinit PID1 is !@#*&^!*& unreliable. It can't even kill processes robustly. It can't start them robustly as well - I've had more than one hangup during startup.

> The point here is that a bug in rc or getty will not kill init. And a bug in an init script will not kill rc. PID1 will carry on running, as will your system, if there is a bug in one of these higher level layers.
Yeah. It's kinda like old classic cars with thick steel frame - it can run just fine after collision. You just scrape driver from the steering wheel, replace glass and your descendants can drive it as if nothing has happened!

What use in a robust PID1 if it can't do ANYTHING reliably?

GNOME and/or systemd

Posted Nov 4, 2012 12:27 UTC (Sun) by nix (subscriber, #2304) [Link]

Well, in this case the sysvinit PID1 is !@#*&^!*& unreliable. It can't even kill processes robustly. It can't start them robustly as well - I've had more than one hangup during startup.
You seem to misunderstand what I want of PID 1. Its job is to run an rc script when (in effect) the system starts up and shuts down and to reap processes. *Nothing else* (even forking gettys is really something else's job). Killing processes is killall's job. Monitoring processes is something else's job. Starting processes without hanging up is the job of some scripty thing or something. It is very definitely *not* the job of PID 1, since that is complexity that *can* be somewhere else, and thus *should* be somewhere else, rather than in the one process in the system whose death causes an instant kernel panic.

You keep on giving complaints about sysvinit that have nothing to do with PID 1 robustness, which is my primary concern when choosing an init implementation. sysvinit never fails to reap zombies: it never fails to run its single rc script per runlevel change (those scripts might later hang, but that is not PID 1's fault). It never, ever dies.

I would be happy with systemd were its PID 1 incredibly simple and never changing and all the work done by something else (which can change as often as it likes without causing instant kernel panics if it goes wrong). But instead its PID 1 is more of a kitchen sink than I'd like. Even sysvinit PID 1 really does too much: I'm definitely going to have a look at s6 and see if it has moved things like process supervision to some other binary. PID 1 should not do this job.

GNOME and/or systemd

Posted Nov 4, 2012 17:25 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

If you limit systemd to only running SysV scripts then systemd is incredibly stable. I don't think there was a bug in sysv-compat for a long, long time.

Yet it makes absolutely NO sense to view PID1 functionality in itself. It can't do anything, and any script it runs becomes mission-critical. It's easy to make a non-bootable (or non-haltable) system by making a small mistake in a myriad of twisty [not so] little scripts. And it makes no freaking sense that PID1 itself worked fine.

A car analogy: sysv is a metal cube with thick metal walls. It's very safe (since it can't move) and simple. Only to make it actually do anything useful you need to add wheels, engine, steering system, windows and windshields, etc. And in the end it turns out that a cube on wheels actually doesn't really work as a car and isn't safe anymore.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds