Missed some rows

Posted Apr 29, 2011 21:20 UTC (Fri) by dlang (guest, #313)
In reply to: Missed some rows by martinfick
Parent article: Poettering: Why systemd?

>> because the distro maintainers each want to do slightly different things and don't take the time to see what the other distros are doing

>And this is an advantage of not using systemd?

it's not an advantage, it's reality, and it will be reality with systemd as well, just give it a little bit of time

> So tell me again where is the advantage of having "different ways of solving the same problem"?

if we didn't allow for different ways of solvign the same problem, nobody would ever be able to find a better way (and systemd for example could not even be attempted because the OS wouldn't work with it)

you are taking advantage of the fact that the system allows you to solve things in a new way to create systemd, but then claiming that the freedom to do so isn't important for anyone else.

Missed some rows

Posted Apr 29, 2011 21:46 UTC (Fri) by martinfick (subscriber, #4455) [Link] (17 responses)

> if we didn't allow for different ways of solvign the same problem, nobody would ever be able to find a better way (and systemd for example could not even be attempted because the OS wouldn't work with it)

Who said anything about not allowing different ways? The freedom to do so is great. Does that mean it should be done differently without a good reason? Systemd attempts to unify different ways when it makes sense and when there is no good reason for them to be different. In most of these cases it makes sense, if not, why would the different distros agree to switch? Likely because they agree with the proposed way that systemd solves these problems.

No to mention that most of these old ways are likely broken! Many of them are nasty old hacks because there isn't a good common solution implemented anywhere. For example, do you really think that keeping track of pids in files is a better way to kill a daemon than using cgroups? Such a mehtod is fraught with potential miss kills, I am shocked that enterprise distros allow such behavior... "oops, just killed the company database server when I meant to kill my homebrewed monitoring script". Today, most distro scripts are likely knowingly broken. Do you think the average homebrewed script even has a chance of being only half as broken as a distro script? I suspect that systemd makes writing an unbroken homebrewed startup config possible and likely.

Missed some rows

Posted Apr 30, 2011 0:22 UTC (Sat) by nicooo (guest, #69134) [Link] (16 responses)

> if not, why would the different distros agree to switch?
Which ones? I only know of F15.

> I am shocked that enterprise distros allow such behavior
It worked for enterprise unix.

Missed some rows

Posted Apr 30, 2011 8:16 UTC (Sat) by rahulsundaram (subscriber, #21946) [Link]

Meego has switched in their devel branch. OpenSUSE next version. Mandriva has already switched as well. Keep looking.

Oh, yeah.

Posted Apr 30, 2011 9:51 UTC (Sat) by khim (subscriber, #9252) [Link]

It worked for enterprise unix.

Wow. Talk about ignorance! There are many things which worked for Unix till it got serious competition. Then it lost first desktop, then server and not the bettle for enterprise is in full swing.

I fail to see how "it worked for enterprise unix" can be used as justification. Sure it did - because there was no alternative. Some things "enterprise unix" does are still better then what the Linux does, but the list shrinks over time... and usaged of pid files to manage daemons instead of cgroups is not one of them.

Missed some rows

Posted Apr 30, 2011 11:29 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (13 responses)

Actually, no. The problem with miskills due to PID wraparound are very well-known.

Various 'enterprise' Unixes had workarounds since forever. Like ability to 'lock' PID of a process (so it won't be reused). Or locking PID for several minutes after getpid() calls (so "ps | grep ... | xargs kill" won't kill some innocent process).

Missed some rows

Posted May 3, 2011 1:21 UTC (Tue) by wahern (subscriber, #37304) [Link] (11 responses)

That all seems so convoluted. The whole problem boils down to the size of the namespace and the familiar TOCTOU race condition. The cgroups solution works because it uses a different namespace with well-crafted rules, and really only works in the context of systemd, which is taking on a role--maintaining a persistent, unique, global namespace--part of which should be done in the kernel.

The easiest and cleanest general purpose solution would be to extend the PID namespace to 64-bits, or maybe even 128-bits. Problem solved. This is a common solution for when maintaining and communicating a consistent global state is not practically feasible, which is the case with the historical paradigm of process management on Unix.

I don't know why this has never been done. The existing 16-bit namespace is ridiculous. There should be a kernel compile-time option to increase the pid_t width. Then over the course of several years broken applications that make unwarranted assumptions about pid_t could be fixed. The vast majority of issues are probably with printf formatting; people usually cast pid_t to (int). If PIDs were chosen at random (as on OpenBSD) than the 31- or 32-bits shown would actually be useful, much like Git's truncated hash identifiers. So even most broken apps would only be half broken.

I realize it's a *huge* change, but its simple and straight-forward, the consequences are mostly foreseeable, and with open source software readily addressed by even casual C programmers. GCC could be instrumented to track pid_t conversions, and in a matter of weeks I bet Debian's build system would uncover the vast majority of issues. All of a sudden one of the most ugly Unix warts--that is, fundamentally broken in the context of common usage--disappears.

Missed some rows

Posted May 3, 2011 9:10 UTC (Tue) by leighbb (subscriber, #1205) [Link] (1 responses)

Just so that you are aware, you can actually enable a 22-bit pid by doing:

sysctl -w kernel.pid_max=4194304

Not as much as you were after but bigger than you thought :-)

Missed some rows

Posted May 3, 2011 13:13 UTC (Tue) by wahern (subscriber, #37304) [Link]

Thanks. I was completely unaware.

Missed some rows

Posted May 3, 2011 15:03 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

Not really. By going to 32 bits for PID namespace this problem still won't be solved, it will just be harder to trigger.

And larger PID lengths are way too clumsy for humans. That's definitely NOT a good engineering.

Besides, even with 128-bit PID length you'll still have problems with double-forked processes (which are reparented to init).

systemd nicely solves these problems.

Missed some rows

Posted May 3, 2011 18:17 UTC (Tue) by wahern (subscriber, #37304) [Link] (7 responses)

A larger PID wouldn't do everything that systemd does with cgroups. cgroups does two things: (1) provides a larger namespace (roughly 2^(8 * 255) bits, AFAIU) to identify processes, and (2) handles inheritance. But a larger PID would solve in a backwards compatible fashion the one clear issue in Unix process management, the signal-PID race, which is more-or-less the same as the first thing above. Although I'm not familiar with cgroup usage, I think that there's still a race in adding a fresh process to a cgroup, so even systemd could benefit from a larger PID space.

It's really only an unresolvable issue when you have errant, buggy processes. Otherwise, a sophisticated daemon should have a domain socket which takes control messages. But I'm presuming that process management means being able to handle processes that aren't well behaved.

Missed some rows

Posted May 3, 2011 18:21 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

Signals must die, they are a relic of ancient time.

>Although I'm not familiar with cgroup usage, I think that there's still a race in adding a fresh process to a cgroup, so even systemd could benefit from a larger PID space.

Nope. cgroups work on kernel level and so they use proper locking, so PIDs won't be able to leak. Also, one can easily protect processes in a cgroup from an accidental kill (in fact, cgroups can be used as a complete lightweight virtualization solution).

Missed some rows

Posted May 4, 2011 5:03 UTC (Wed) by wahern (subscriber, #37304) [Link] (5 responses)

I'm confused then. Say I have a new process which I want to add to a cgroup. How do I assign the process to a cgroup? All the documentation I can find says to echo the PID to a cgroup control file. But if I'm using a PID--and I'm not the process with that PID--then I'm still subject to a race--the PID can become stale between acquiring the value and communicating it to the cgroup subsystem.

cgroup inheritance I can understand. A process forked from a process already assigned to a particular cgroup atomically inherits membership in the cgroup, just as it would atomically inherit a session id and process group id. But now, say, I want to reassign that process to a different cgroup PID. It seems like there's the same problem as above. What am I missing?

Missed some rows

Posted May 4, 2011 5:44 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

That's a trick question.

You need to somehow have a unique process handle, which PID is definitely not. On Linux it can be done using the /proc/PID/ directory. The sequence would be:
1) Change current directory to /proc/PID
2) Look around and check that this PID is still the correct one. That's safe because if the process its /proc/PID directory becomes empty - and stays that way.
3) Write to /proc/PID/cgroup.

Of course, it's better to create a process directly in the required group in the first place.

Missed some rows

Posted May 4, 2011 8:26 UTC (Wed) by wahern (subscriber, #37304) [Link]

I thought /proc/$PID/cgroup was read-only; to add a process to a group you needed to write to /dev/cgroup/$TASK/tasks. In such case, you're left with a race condition. (I tried confirming or disproving this, but can't even get the example in cgroups.txt to work.)

My proposal was to make PID a unique quasi-handle the same way random UUIDs are unique.

Missed some rows

Posted May 4, 2011 19:31 UTC (Wed) by njs (subscriber, #40338) [Link] (2 responses)

> Say I have a new process which I want to add to a cgroup. How do I assign the process to a cgroup? All the documentation I can find says to echo the PID to a cgroup control file. But if I'm using a PID--and I'm not the process with that PID--then I'm still subject to a race--the PID can become stale between acquiring the value and communicating it to the cgroup subsystem.

In the above scheme, if you're the one who's spawning this new process that you want to end up in a cgroup, then you can do
1) fork
2) the child adds itself to the desired cgroup
3) the child calls exec()

That's race-free.

Missed some rows

Posted May 4, 2011 21:08 UTC (Wed) by wahern (subscriber, #37304) [Link] (1 responses)

Sure. But the issue is handling arbitrary, non-well behaving processes. And AFAICT there's still no provably safe way to handle that on Unix systems. With only a 16-bit (or 15-bit, or 22-bit) PID space, it's trivial to write a program to sit around and wait to take advantage of a race. (I don't have an attacker mindset, but I wouldn't bet against the proposition that it could be a useful vector.)

Of course, "who cares" is a valid reply; we've been living with it for 40 years. But that response challenges the value added by systemd's reliance on esoteric Linux subsystems. For example, when we talk about how a service manager is so much better than a race-prone PID file, nobody ever considers that the race condition is easily avoided by not using root. If you create a user per daemon--_www, _ftp, etc--then even if you read a stale PID and signal the wrong process, as long as you're sending the signal with a service-delegated UID then it will never be delivered.

I never brought it up before because it's arguably not very elegant. I'm loath to defend PID files. But if we're going to replace them with something, I'd like it to be generic and tailored to the specific issue, rather than lauding some supposed panacean init replacement.

The past decade in Linux-land has seen a parade of sophisticated daemon services intended to patch over some clunky Unix interface (device management, process management, etc, etc). They each require application developers to change from portable POSIX patterns to using some new API or library or protocol. But they come and go like the wind. Worthy solutions tend to be so obviously beneficial that all the free unices eagerly adopt or mimic them.

Missed some rows

Posted May 5, 2011 1:06 UTC (Thu) by njs (subscriber, #40338) [Link]

I guess I don't understand what you mean by "managing arbitrary, non-well behaving processes".

IIUC, when systemd starts a service, that service gets stuck (reliably, and race-freely) into its own cgroup, from which it cannot escape. Then you can kill it or whatever reliably, even if it's badly behaved (spawning children that double-fork and end up as orphans, forking to a new PID every 100 ms, whatever you like).

If you're trying to go after a process that was started outside of a cgroup, then this doesn't work so well, but not much does. That process that keeps switching PIDs as quickly as possible can't easily be killed even if you have a collision-free PID space.

Missed some rows

Posted May 4, 2011 21:28 UTC (Wed) by mjthayer (guest, #39183) [Link]

> Actually, no. The problem with miskills due to PID wraparound are very well-known.

> Various 'enterprise' Unixes had workarounds since forever.

A workaround I implemented a while ago for "normal" Unixes was for the daemon to place an advisory lock on its pidfile. It only works on filesystems with that feature of course, but by checking that the file is locked before issuing your kill command you greatly reduce the race window.