Missed some rows
Missed some rows
Posted Apr 30, 2011 11:29 UTC (Sat) by Cyberax (✭ supporter ✭, #52523)In reply to: Missed some rows by nicooo
Parent article: Poettering: Why systemd?
Various 'enterprise' Unixes had workarounds since forever. Like ability to 'lock' PID of a process (so it won't be reused). Or locking PID for several minutes after getpid() calls (so "ps | grep ... | xargs kill" won't kill some innocent process).
Posted May 3, 2011 1:21 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (11 responses)
The easiest and cleanest general purpose solution would be to extend the PID namespace to 64-bits, or maybe even 128-bits. Problem solved. This is a common solution for when maintaining and communicating a consistent global state is not practically feasible, which is the case with the historical paradigm of process management on Unix.
I don't know why this has never been done. The existing 16-bit namespace is ridiculous. There should be a kernel compile-time option to increase the pid_t width. Then over the course of several years broken applications that make unwarranted assumptions about pid_t could be fixed. The vast majority of issues are probably with printf formatting; people usually cast pid_t to (int). If PIDs were chosen at random (as on OpenBSD) than the 31- or 32-bits shown would actually be useful, much like Git's truncated hash identifiers. So even most broken apps would only be half broken.
I realize it's a *huge* change, but its simple and straight-forward, the consequences are mostly foreseeable, and with open source software readily addressed by even casual C programmers. GCC could be instrumented to track pid_t conversions, and in a matter of weeks I bet Debian's build system would uncover the vast majority of issues. All of a sudden one of the most ugly Unix warts--that is, fundamentally broken in the context of common usage--disappears.
Posted May 3, 2011 9:10 UTC (Tue)
by leighbb (subscriber, #1205)
[Link] (1 responses)
sysctl -w kernel.pid_max=4194304
Not as much as you were after but bigger than you thought :-)
Posted May 3, 2011 13:13 UTC (Tue)
by wahern (subscriber, #37304)
[Link]
Posted May 3, 2011 15:03 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (8 responses)
And larger PID lengths are way too clumsy for humans. That's definitely NOT a good engineering.
Besides, even with 128-bit PID length you'll still have problems with double-forked processes (which are reparented to init).
systemd nicely solves these problems.
Posted May 3, 2011 18:17 UTC (Tue)
by wahern (subscriber, #37304)
[Link] (7 responses)
It's really only an unresolvable issue when you have errant, buggy processes. Otherwise, a sophisticated daemon should have a domain socket which takes control messages. But I'm presuming that process management means being able to handle processes that aren't well behaved.
Posted May 3, 2011 18:21 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
>Although I'm not familiar with cgroup usage, I think that there's still a race in adding a fresh process to a cgroup, so even systemd could benefit from a larger PID space.
Nope. cgroups work on kernel level and so they use proper locking, so PIDs won't be able to leak. Also, one can easily protect processes in a cgroup from an accidental kill (in fact, cgroups can be used as a complete lightweight virtualization solution).
Posted May 4, 2011 5:03 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (5 responses)
cgroup inheritance I can understand. A process forked from a process already assigned to a particular cgroup atomically inherits membership in the cgroup, just as it would atomically inherit a session id and process group id. But now, say, I want to reassign that process to a different cgroup PID. It seems like there's the same problem as above. What am I missing?
Posted May 4, 2011 5:44 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
You need to somehow have a unique process handle, which PID is definitely not. On Linux it can be done using the /proc/PID/ directory. The sequence would be:
Of course, it's better to create a process directly in the required group in the first place.
Posted May 4, 2011 8:26 UTC (Wed)
by wahern (subscriber, #37304)
[Link]
My proposal was to make PID a unique quasi-handle the same way random UUIDs are unique.
Posted May 4, 2011 19:31 UTC (Wed)
by njs (subscriber, #40338)
[Link] (2 responses)
In the above scheme, if you're the one who's spawning this new process that you want to end up in a cgroup, then you can do
That's race-free.
Posted May 4, 2011 21:08 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (1 responses)
Of course, "who cares" is a valid reply; we've been living with it for 40 years. But that response challenges the value added by systemd's reliance on esoteric Linux subsystems. For example, when we talk about how a service manager is so much better than a race-prone PID file, nobody ever considers that the race condition is easily avoided by not using root. If you create a user per daemon--_www, _ftp, etc--then even if you read a stale PID and signal the wrong process, as long as you're sending the signal with a service-delegated UID then it will never be delivered.
I never brought it up before because it's arguably not very elegant. I'm loath to defend PID files. But if we're going to replace them with something, I'd like it to be generic and tailored to the specific issue, rather than lauding some supposed panacean init replacement.
The past decade in Linux-land has seen a parade of sophisticated daemon services intended to patch over some clunky Unix interface (device management, process management, etc, etc). They each require application developers to change from portable POSIX patterns to using some new API or library or protocol. But they come and go like the wind. Worthy solutions tend to be so obviously beneficial that all the free unices eagerly adopt or mimic them.
Posted May 5, 2011 1:06 UTC (Thu)
by njs (subscriber, #40338)
[Link]
IIUC, when systemd starts a service, that service gets stuck (reliably, and race-freely) into its own cgroup, from which it cannot escape. Then you can kill it or whatever reliably, even if it's badly behaved (spawning children that double-fork and end up as orphans, forking to a new PID every 100 ms, whatever you like).
If you're trying to go after a process that was started outside of a cgroup, then this doesn't work so well, but not much does. That process that keeps switching PIDs as quickly as possible can't easily be killed even if you have a collision-free PID space.
Posted May 4, 2011 21:28 UTC (Wed)
by mjthayer (guest, #39183)
[Link]
> Various 'enterprise' Unixes had workarounds since forever.
A workaround I implemented a while ago for "normal" Unixes was for the daemon to place an advisory lock on its pidfile. It only works on filesystems with that feature of course, but by checking that the file is locked before issuing your kill command you greatly reduce the race window.
Missed some rows
Missed some rows
Missed some rows
Missed some rows
Missed some rows
Missed some rows
Missed some rows
Missed some rows
1) Change current directory to /proc/PID
2) Look around and check that this PID is still the correct one. That's safe because if the process its /proc/PID directory becomes empty - and stays that way.
3) Write to /proc/PID/cgroup.
Missed some rows
Missed some rows
1) fork
2) the child adds itself to the desired cgroup
3) the child calls exec()
Missed some rows
Missed some rows
Missed some rows