Why not fix process ids?

Posted Sep 2, 2023 6:32 UTC (Sat) by epa (subscriber, #39769)
Parent article: Race-free process creation in the GNU C Library

Yes, a process id can be reused so you can’t reliably find the id and then use it later. This even applies to command line system administration where, in principle, you might kill a different process to the one you just saw in ‘top’. But why does it have to be that way?

Make process ids 64 bit and they can be unique for the lifetime of the system.

Why not fix process ids?

Posted Sep 2, 2023 7:04 UTC (Sat) by Subsentient (guest, #142918) [Link] (4 responses)

The obvious solution, the right solution, and I'm sure there's some irritating illegitimate reason that it won't happen.

Why not fix process ids?

Posted Sep 2, 2023 14:23 UTC (Sat) by corbet (editor, #1) [Link] (3 responses)

That can, indeed, be done now by messing with /proc/sys/kernel/pid_max. Making bigger process ID's the default will always risk breaking applications, though.

Why not fix process ids?

Posted Sep 2, 2023 16:04 UTC (Sat) by pebolle (guest, #35204) [Link]

Which has a hardcoded maximum of ~4,000,000 on 64 bits systems.

I could be misreading include/linux/threads.h, but since systemd on my (Fedora) system sets pid_max to that value out of the box I don't think I actually am.

Why not fix process ids?

Posted Sep 4, 2023 9:50 UTC (Mon) by mezcalero (subscriber, #45103) [Link] (1 responses)

systemd has been bumping this value to the max the kernel allows btw for a longer time. Not a single complaint reached us about that. The incompatibilities turned out to be mostly theoretic.

That said the kernel max is 22bit or so iirc, i.e. far from 32 or even 64bit...

Why not fix process ids?

Posted Sep 4, 2023 10:12 UTC (Mon) by pebolle (guest, #35204) [Link]

> the kernel max is 22bit or so iirc,

That's correct (and thanks for confirming my reading of include/linux/threads.h).

Why not fix process ids?

Posted Sep 4, 2023 19:14 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (2 responses)

> This even applies to command line system administration where, in principle, you might kill a different process to the one you just saw in ‘top’.

It is simple to implement correct process killing. All programmer needs to do is to hold /proc/$pid descriptor while sending signal.
32 and 64-bitness doesn't change anything.

I've checked what htop does and it seems to do it wrong: it opens /proc/$pid then openat() few files from there but then closes directory.

41073 openat(3, "41057", O_RDONLY|O_NOFOLLOW|O_DIRECTORY) = 4
41073 openat(4, "task", O_RDONLY|O_NOFOLLOW|O_DIRECTORY) = 5
...
41073 close(5) = 0
...
41073 close(4)
...
41073 kill(41057, SIGTERM) = 0

Why not fix process ids?

Posted Sep 4, 2023 19:21 UTC (Mon) by adobriyan (subscriber, #30858) [Link] (1 responses)

Hey, it is even possible to do from the command line without "integrated" tools!

If kill -TERM is done from /proc/$pid !

$ ./pause &
[1] 41956

$ cd /proc/41956

# double check it is the same process, VERY IMPORTANT
$ cat comm #cmdline
pause

# send signal WITHOUT LEAVING /proc/$pid (VERY IMPORTANT)
$ kill -TERM 41956

# ... and it's gone!
$ cat comm
cat: comm: No such process

Why not fix process ids?

Posted Sep 4, 2023 23:15 UTC (Mon) by mchapman (subscriber, #66589) [Link]

No, this is not sufficient.

Between your "cat" and "kill" commands, the process could have exited, been reaped by its parent, and another process could have been forked with PID 41956. By the time you run kill, that PID may not be the same process you thought it was.

Simply holding a reference to the (old) /proc/$PID directory does not prevent the PID from being reused.

Why not fix process ids?

Posted Sep 14, 2023 14:50 UTC (Thu) by ksandstr (guest, #60862) [Link]

Well for one thing a 64-bit pid_t will break ABI. That's a no-no, but maybe not as big as the alternatives. It'd still have that strictly theoretical issue of identifier wraparound once 8 exi-PIDs have been spent, and command line users would curse your distant memory for having to reference PID 2**55+177, but ignoring those it'd work. Furthermore, if the format was redefined to reserve pid_t's top 32 bits for a L4-style version field which wouldn't appear in e.g. ps(1) output, wraparound would only have to be processed once a PID had been recycled 2**31 times -- though going from an "UI" PID to a full PID would entail an extremely tenuous TOCTOU issue[-1].

Another substitute solution to pidfds would make process IDs a capability of sorts, such that they're created by fork/spawn, transferred to other processes by unspecified means[0], and invalidated at wait() so they subsequently raise an error upon use. This would ensure that stale PIDs, being those that refer to a since-deceased process, don't end up referring to a different process. However the cost of doing this is a slight API break because kill() etc. would raise "unknown PID" while that PID might actually have come to exist again. Also the question of validating such a capability from e.g. command line parameters will need an answer.

Considering that any use of a PID is an instant TOCTOU hazard to any but the parent process (because it's the only one that can call wait() on that PID), the idea of "just fix the call sites" can be recognized unworkable in a great many cases. Analoguously to the capability idea above, pidfds provide a process-local identifier in the file descriptor[1] and a means to communicate process termination at time of use. And their cost isn't even an ABI break -- just that the old API will be creaky and the new API will be both nonportable, so extensive as to cover every POSIX call that takes a pid_t, and any pidfd_getpid() band-aid call will be another instant TOCTOU hazard (unless). Out of these approaches, pidfds certainly seem like an attractive solution since they mainly require lots of footwork and the creation of a "pre-horizon" category of vulnerable programs that process PIDs in any way.

[-1] this one would be soluble by invalidating wrapped PIDs in processes whose lifecycle intersects the wraparound point, another mild API break and perhaps the bane of init(8) in an interstellar probe or something.
[0] perhaps a general two-stage mechanism to validate a PID and then confirm its correct identity (using e.g. the program's fsid/inode# pair), or an unix domain socket faff not unlike fd transfer.
[1] though transferring these to another process would seem to require a unix domain socket between the two.
[2] there is no 4th footnote; I'm just using this space to point out that I'm currently unemployed but capable of spitting out this kind of off-hand analysis, and a suitably impressed reader's employer could almost certainly use a mad lad like me. *wink* *wink*