Of course this goes to a General Resolution

Posted Feb 13, 2014 5:11 UTC (Thu) by mbt (guest, #81044)
In reply to: Of course this goes to a General Resolution by Cyberax
Parent article: The Debian technical committee vote concludes

Sigh. Let's look at some of the issues.
> Then you're in luck! Systemd has clearly defined external interfaces with a stability guarantee.
Stability is a good thing. What is not so fun is the great deal of disruption required to get to stability. Besides that, I'm really surprised that the systemd maintainers can be so sure that they've reached that point. It is certainly not to say that they've put in a huge amount of effort. I'm just saying that's a *lot* of public interface to be sure that you've got it set in stone.

Consider one package that certainly has the right to say that its API is in a really stable state: TeX. It's at version 3.14159265, a position to which it has taken more than 30 years to converge. TeX does just one thing--very, very well. It took more than 10 years to get into a reasonably fixed state. And that's from a guy who has a legendary concern with program correctness. Call me somewhat less impressed by systemd's stability guarantees.

Actually, the point I was raising was not about the external API, important as that is. I was looking at the major components (core process-starter/stopper, device-event handler, CGroups manager, logger, cron manager, login/session manager, power management, and others I have likely missed). Why are they not separable? I notice a FD.O has a page on minimal builds for systemd, but I have not found documentation on what you get by enabling or disabling some of the components and how well supported the resulting system would be.

> And again, you're in luck! Most of systemd dependencies are either trivial or can be turned off.
I notice the Gentoo ebuilds for systemd don't have USE flags to control whether to build components like logind. That is very telling since Gentoo loves configurability. How well does systemd work with some of its "optional" components missing? It would be nice to have some clarity.

The build process to get a standalone udev is painful. It is meant to be an indivisible part of systemd, yet it is (still) separable. To build just udev you have to build all of the core systemd, and then separate out just the udev parts. This also means hoping that sometime in the future they don't make a hard dependency between the actual systemd daemon and the udev component. This kind of hard dependency has already befallen logind: since systemd-205 the Gentoo developers have been unable to make a standalone logind because logind is now inextricably bound with the cgroups controller. Did there really *have* to be such a tight binding? I absolutely do not think so. All the same, if you want logind, you must allow systemd to assimilate you.

Re. the attack surface that systemd's many interfaces present:
> And so does the variety of tools that are used for all this stuff right now.
In current systems, an attacker might never be sure of the mix of components. Given the variety of init systems, loggers, cron daemons, power-control systems, and such in current Linux installations, a one-size-fits-all attack is harder to mount. Systemd, by contrast, stands for a monoculture of system components. Biology tells us a lot about the dangers of monocultures.

> Nope. Keeping the set of variants to the barest possible minimum is good. For example, Linus had resisted forking Linux into a 'server' and 'desktop' variants - with great results.
Linux manages a truly impressive thing. A single codebase works for generating kernels from the whole gamut from watches to supercomputers. There are no forks, but there are more than 3000 configuration options. The range of things you can get out of the kernel sources are well supported. I stand in awe of those developers.

Of course this goes to a General Resolution

Posted Feb 13, 2014 6:05 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

> I'm just saying that's a *lot* of public interface to be sure that you've got it set in stone.
Yup. And they're committed to supporting it. And we have assurances of that from RedHat.

> Consider one package that certainly has the right to say that its API is in a really stable state: TeX.
Funny you mentioned it. It's a great example. The core is indeed very simple and robust, but modules that are required to do anything real are most certainly not.

I wanted recently to render my thesis written in 2000 in TeX. Turned out that the recent MikTex no longer can compile it. And I had to go and fix some of the modules used.

Oh, and all of the text is written in KOI-8R encoding and it's not that trivial to make some of the new editors to switch text encodings.

> Actually, the point I was raising was not about the external API, important as that is. I was looking at the major components (core process-starter/stopper, device-event handler, CGroups manager, logger, cron manager, login/session manager, power management, and others I have likely missed). Why are they not separable?
Why should they be? BTW, power manager is separate.

> I notice the Gentoo ebuilds for systemd don't have USE flags to control whether to build components like logind. That is very telling since Gentoo loves configurability. How well does systemd work with some of its "optional" components missing? It would be nice to have some clarity.
Perfectly fine. I'm using it with only journald (and udev, of course) on embedded devices. I even removed localed and other auxilaries.

> In current systems, an attacker might never be sure of the mix of components. Given the variety of init systems, loggers, cron daemons, power-control systems, and such in current Linux installations, a one-size-fits-all attack is harder to mount.
You need only one chink in the armor to get in. Are you sure that your syslog daemon is not subverted by NSA?

>Systemd, by contrast, stands for a monoculture of system components. Biology tells us a lot about the dangers of monocultures.
Computing is not biology. And I'm a computational biologist.

> Linux manages a truly impressive thing. A single codebase works for generating kernels from the whole gamut from watches to supercomputers. There are no forks, but there are more than 3000 configuration options.
Yet the process scheduler has only single implementation.

Of course this goes to a General Resolution

Posted Feb 13, 2014 7:52 UTC (Thu) by mathstuf (subscriber, #69389) [Link]

We all run a single kernel codebase and I'm sure spender or PaXTeam could tell you all about the problems you have running it before you stack stuff on top of it.

And besides, are you really arguing that having a pile of stuff that attackers can't deal with it anyways? Have you seen the stuff that gets worked around by people? Do you really expect every sysadmin to require such a diverse skillset just to deal with whatever heterogeneity in their system set may be?

For logind and cgroups, logind uses a DBus API to talk to the single cgroup manager. What else can it do in the New World Order of cgroups? If you implement that API (like, say, systemd-shim), you can get away with not using systemd as PID 1.

Of course this goes to a General Resolution

Posted Feb 14, 2014 19:50 UTC (Fri) by smurf (subscriber, #17840) [Link] (5 responses)

> core process-starter/stopper

If you want to stop processes you need to be PID 1, otherwise you get a race condition.

> device-event handler,

Starting processes depends on events.

> CGroups manager,

When you start a server you need to put it in a CGroup.

> logger,

On-disk logging is separate. In-memory logging cannot be, since you want to start logging before root is even mounted.

> cron manager,

Because cron jobs don't just depend on the time of day, these days. They depend on being on AC power, or on the database running, or whatever. So best handled within the

> login/session manager,

separate.

> power management

For instance, shutting down? init needs to cleanly stop all its children and then exec something that's in RAM, otherwise the root file system cannot be unmounted cleanly.

You seem to forget that all these jobs need to closely communicate with each other, need to run all the time, need to serialize their internal state if they're to be updated without rebooting (i.e. you cannot just kill and restart your event handler and expect that everything is magically still hunky-dory.

So you either deal with five processes and their communication overhead and the subtle race conditions which are *certain* to crop in … or you put all of this into one single-threaded program which always has consistent local state which you can serialize safely if you want to re-exec yourself after an upgrade … and which you call "pid 1", necessarily.

In summary, can we please stop pretending that Lennart & Co. (a) put all of this into one huge binary that/s /sbin/systemd, (b) cobbled the part that _is_ PID 1 together out of spite, or because they didn't know better? 'Cause, you know, it's quite obvious that they actually thought about this and had/have sound technical reasons.

Of course this goes to a General Resolution

Posted Feb 18, 2014 17:23 UTC (Tue) by HelloWorld (guest, #56129) [Link] (4 responses)

> If you want to stop processes you need to be PID 1, otherwise you get a race condition.
You mean because orphans are reparented to init? That can be avoided with prctl(PR_SET_CHILD_SUBREAPER). So I don't really see why systemd needs to run as PID 1 instead of, say, 2 nowadays.

Of course this goes to a General Resolution

Posted Feb 18, 2014 18:44 UTC (Tue) by smurf (subscriber, #17840) [Link] (3 responses)

Thanks, I wasn't aware of the SUBREAPER call.

Anyway, the point isn't whether it's possible to run systemd as PID-2. You'd still need a way to signal PID-1 that it should please pivot its root file system to the RAM disk and exec /shutdown, so that your root file system can be unmounted cleanly. And probably some other minor quibbles that seem perfectly solvable, but also completely unnecessary when you can just have the features in PID-1.

Which begs the question: why bother? I still haven't seen any reason what the advantage of running systemd as pid-2 (or pid-2+pid-3+pid-4) would be. "Clean separation of responsibilities into separate processes" is not an argument I can accept, because they all need to run continuously and they all need to talk to each other, so you get increased complexity for no net gain.

Of course this goes to a General Resolution

Posted Feb 19, 2014 10:15 UTC (Wed) by HelloWorld (guest, #56129) [Link] (2 responses)

> You'd still need a way to signal PID-1 that it should please pivot its root file system to the RAM disk and exec /shutdown, so that your root file system can be unmounted cleanly.
Why is it PID 1 who needs to do that?

> Which begs the question: why bother? I still haven't seen any reason what the advantage of running systemd as pid-2 (or pid-2+pid-3+pid-4) would be.
The kernel will panic if PID 1 crashes, so it should be as simple as possible. Now, systemd never actually crashed on any of my systems, but why take the risk if you don't have to?

Of course this goes to a General Resolution

Posted Feb 19, 2014 11:06 UTC (Wed) by mchapman (subscriber, #66589) [Link]

> The kernel will panic if PID 1 crashes, so it should be as simple as possible. Now, systemd never actually crashed on any of my systems, but why take the risk if you don't have to?

I think the alternative complicates things.

If systemd running as PID 2 and marked as a child subreaper were to crash, then its children would be inherited by PID 1. Even if PID 1 were to restart systemd, the new systemd wouldn't be able wait on those reparented processes any more. PID 1 would be responsible for reaping them when they exit, and PID 1 would need to pass on notifications to that effect to the systemd process (so that it could re-exec them or whatever).

In short, I think using a separate child subreaper brings as many problems as it solves.

Of course this goes to a General Resolution

Posted Feb 19, 2014 11:34 UTC (Wed) by smurf (subscriber, #17840) [Link]

> Why is it PID 1 who needs to do that?

*Every* program which holds a file open on the root file system (or any file system, for that matter) needs to either exit, or exec() a program within the new (RAM disk) root. Otherwise you cannot unmount the root FS.

PID-1 may not exit. Therefore it's its job to exec the last step. PID-2 cannot do that. (OK, it could call the unmount-and-reboot program on the RAM disk after triggering PID-1 to exec a new init there, but again: what would be the point of that additional complexity?)