Completing the pidfd API
A pidfd recap
Unix-like systems traditionally represent many objects as files, but processes have always been an exception. They are, instead, represented by process IDs (PIDs), which are small integers — limited to 32767 by default, though that limit can be raised on Linux systems. There are a few problems with this representation, but the biggest one is arguably that PIDs are reused; when a process exits, its PID can be assigned to a new, unrelated process, and this can happen quickly. That creates a race condition where code that operates on a process, most often by sending it a signal, might end up performing an action on the wrong process.
A pidfd is, instead, a file descriptor that refers to an existing process. Once the pidfd exists, it will only refer to that one process, so it can be used to send signals without worry that the wrong process might end up being the recipient. This feature is valuable enough that some process-management systems, most notably the one used by Android, are being rewritten to take advantage of it.
There are two ways to create a pidfd. The preferred method in most cases will be to supply the CLONE_PIDFD flag to the clone() system call (or perhaps clone3() in the future); upon successful process creation, a pidfd representing the child will be returned to the parent. It is also possible to create a pidfd for an existing process with pidfd_open(), which was merged for the 5.3 kernel.
A process holding a pidfd for a process can send a signal to that process using pidfd_send_signal():
int pidfd_send_signal(int pidfd, int signal, siginfo_t *info, unsigned int flags);
The 5.3 kernel also adds the ability to pass a pidfd to poll(), which will provide a notification when the process represented by that pidfd exits.
Waiting on a pidfd
While it is now possible to use poll() to learn when a process has exited, that is not a complete solution for process-management systems, which need to be able to wait for specific processes and reap the exit information once they are done. That requires some sort of variant on the wait() system call. To fill in that gap, Christian Brauner proposed the addition of yet another new system call:
int pidfd_wait(int pidfd, int *stat_addr, siginfo_t *info, struct rusage *rusage, int states, int flags);
This call would wait for the given pidfd; the states parameter can be used to specify which state transitions (WSTOPPED for when the process receives a stop signal, for example) to wait for. The flags field offers additional options, including WNOHANG for non-blocking operation; see the above-linked patch cover letter for the full list.
This call, Brauner said, is "one of the few missing pieces to make it
possible to manage processes using only pidfds
". It is destined to
remain missing, though, at least in that form; Linus Torvalds made
it clear that he didn't like it. He had no objection to the desired
functionality, but questioned the need for a new system call; instead, he
said, the waitid() system call should simply be extended with a
new flag.
That is exactly what was done in a new patch series posted by Brauner; waitid() has gained a new P_PIDFD ID-type value that causes the given ID to be interpreted as a pidfd. This approach ended up being a rather smaller patch that does not need to add a new system call; there have been no responses to it as of this writing, but it would be unsurprising if this change were to be merged for 5.4.
Beyond the ability to unambiguously specify which process should be waited for, this change will eventually enable another interesting feature: it will make it possible to wait for a process that is not a child — something that waitid() cannot do now. Since a pidfd is a file descriptor, it can be passed to another process via an SCM_RIGHTS datagram in the usual manner. The recipient of a pidfd will, once this functionality is completed, be able to use it in most of the ways that the parent can to operate on (or wait for) the associated process.
There was one other interesting piece in the original pidfd_wait() proposal: a new clone() flag (CLONE_WAIT_PID) that would cause the newly created process to be invisible to most wait() calls. Only a variant of wait() that specified that process in particular (by specifying its pidfd, for example) would be able to reap its exit information. There are a few use cases for this functionality; one that was listed is a library that needs to create a helper process that won't show up if the calling application calls wait(). This feature was not part of the second patch set, but is expected to show up in a separate posting in the near future.
There will almost certainly be other pidfd-oriented enhancements in the
future; this feature is new and should not be considered to be complete.
But the ability to wait on a pidfd might be seen as the end of the first
round of development for the pidfd concept. It has been a relatively quiet
set of changes, but the move to pidfds is a fundamental change in how
processes are managed on Linux systems.
Index entries for this article | |
---|---|
Kernel | pidfd |
Posted Jul 26, 2019 21:10 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (12 responses)
Posted Jul 26, 2019 22:42 UTC (Fri)
by roc (subscriber, #30627)
[Link]
Posted Jul 27, 2019 2:45 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (10 responses)
Posted Jul 27, 2019 3:01 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (8 responses)
Posted Jul 27, 2019 3:15 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (7 responses)
Posted Jul 27, 2019 3:19 UTC (Sat)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
Posted Jul 27, 2019 3:28 UTC (Sat)
by quotemstr (subscriber, #45331)
[Link] (5 responses)
Posted Jul 27, 2019 11:47 UTC (Sat)
by ale2018 (guest, #128727)
[Link] (4 responses)
How do I know if the process is busy crunching, sleeping, or waiting for input?
Just fooling...
Posted Jul 28, 2019 0:53 UTC (Sun)
by quotemstr (subscriber, #45331)
[Link]
Posted Jul 30, 2019 9:21 UTC (Tue)
by cyphar (subscriber, #110703)
[Link] (2 responses)
Posted Aug 1, 2019 7:16 UTC (Thu)
by mezcalero (subscriber, #45103)
[Link] (1 responses)
Posted Aug 2, 2019 9:14 UTC (Fri)
by flussence (guest, #85566)
[Link]
Posted Jul 28, 2019 1:02 UTC (Sun)
by clugstj (subscriber, #4020)
[Link]
Posted Jul 26, 2019 22:53 UTC (Fri)
by roc (subscriber, #30627)
[Link] (1 responses)
Another question is whether this new API follows the ptrace/waitpid behavior, i.e. each ptraced thread of a process reports exit independently and is independently reaped. I really want that to be true, because that would give us a sane and reliable way to wait for some specific subset of all traced threads to exit, which is currently impossible.
Posted Aug 27, 2019 19:17 UTC (Tue)
by nix (subscriber, #2304)
[Link]
Posted Jul 27, 2019 20:07 UTC (Sat)
by doublez13 (guest, #122213)
[Link] (2 responses)
Posted Jul 27, 2019 20:17 UTC (Sat)
by brauner (subscriber, #109349)
[Link] (1 responses)
Posted Jul 28, 2019 0:47 UTC (Sun)
by doublez13 (guest, #122213)
[Link]
Posted Jul 28, 2019 22:34 UTC (Sun)
by naptastic (guest, #60139)
[Link] (1 responses)
I think BSD saw the (valid, real) problems with /proc and took the wrong lesson, where Linux is now converging on something smarter: providing an even more UNIXy interface ("a process is now also a file") to the process space. I'm looking forward to using this functionality, even if only indirectly.
Posted Jul 29, 2019 10:19 UTC (Mon)
by wahern (subscriber, #37304)
[Link]
The real dilemma after this is how to acquire process fds when children fork. The BSD kqueue framework has permitted tracking forks and exits of descendants since almost the beginning[1], though there's still no mechanism to acquire a process fd for them.
I mention this because there's no grand theory for a better process model, unless you count Capsicum from whence pdfork came. But in the Capsicum security model forking is normally disabled in descendants. Arguably one of the reasons it's taken Linux so long to get a process fd is precisely because of all the open ended questions about where to go next, which while unanswered have the effect of casting doubt on the utility of process fds, notwithstanding that most people agree that in the abstract they're a great idea.
[1] Sometime between 1999, when kqueue was originally merged, and 2003, the earliest hit I got with a naive Google search.
Posted Oct 8, 2019 2:45 UTC (Tue)
by rvk (guest, #111525)
[Link]
Posted Mar 5, 2020 3:46 UTC (Thu)
by re:fi.64 (subscriber, #132628)
[Link] (1 responses)
I think this is false, at least having tried it waitid will always return ESRCH.
Posted Mar 5, 2020 3:50 UTC (Thu)
by re:fi.64 (subscriber, #132628)
[Link]
Posted Jun 14, 2023 2:44 UTC (Wed)
by jredfox_ (guest, #165585)
[Link] (3 responses)
Or an even better solution create a call called reservePID(unsigned long PID). this will reserve the PID until the process that called it is closed. For security reasons it should limit the number of reserves it can use to about 200 PID's for IPC(unrelated non child process's) per process and unlimited amount for child process's.
Posted Jun 14, 2023 5:07 UTC (Wed)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Why not UUIDs then?
And pidreserve doesn't prevent all race attacks.
Posted Jun 14, 2023 23:23 UTC (Wed)
by jredfox_ (guest, #165585)
[Link] (1 responses)
Posted Jun 15, 2023 1:42 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
But more practically, your system with time-based IDs is just ugly, just like UUIDs.
And pidreserve() won't help against targeted wraparound attacks.
Posted Sep 6, 2023 16:59 UTC (Wed)
by bartoc (guest, #124262)
[Link] (1 responses)
Posted Oct 17, 2024 0:34 UTC (Thu)
by jengelh (guest, #33263)
[Link]
In glibc-nptl, pthread_self and _detach are functions that involve just userspace. There is not going to be a deadlock/deadlock-avoiding-noop as you envisioned.
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
https://android-review.googlesource.com/q/topic:%22pidfd+...
https://android-review.googlesource.com/q/topic:%22pidfd+...
https://android-review.googlesource.com/q/topic:%22pidfd+...
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API
Completing the pidfd API