Race-free process creation in the GNU C Library
Unix systems refer to processes via an integer ID (the "process ID" or PID) that is assigned at creation time. The problem with PIDs is that they are reused over time; once a process with a given PID has exited and been reaped, that PID can be assigned to a new and unrelated process with the result that any given PID might not, in fact, refer to the process that the user thinks it does. To address this problem, the pidfd concept was introduced; a pidfd is a file descriptor that acts as a handle for a process. The process associated with a pidfd can never change, so many of the race conditions associated with PIDs do not exist with pidfds.
Current glibc releases include wrappers for a number of the low-level pidfd-related system calls, including pidfd_open(), pidfd_getfd(), and others. There is one piece missing, though: the ability to obtain a pidfd for a new process as that process is created. It is possible to use pidfd_open() to get a pidfd from a PID immediately after creation, but that still leaves a narrow window during which the process identified by a PID could exit and be replaced by another. Closing that window requires obtaining a pidfd from the kernel as a result of creating a new process, and glibc provides no way to do that.
That functionality could be provided by adding a wrapper for the clone3() system call, but there is some resistance to doing that. Instead, Zanella has taken the approach of enhancing the posix_spawn() API, which is seen by many as being a better approach to process creation (when immediately followed by an exec() call) than the Unix fork() model. The result is two new functions:
int pidfd_spawn(int *restrict pidfd,
const char *restrict file,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict],
char *const envp[restrict]);
int pidfd_spawnp(int *restrict pidfd,
const char *restrict path,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict_arr],
char *const envp[restrict_arr]);
Just like posix_spawn() and posix_spawnp(), these functions execute a combination of clone() and exec() to create a new process running the program indicated by file or path. The return value, though, will be a pidfd identifying the created process rather than a PID.
If the creator needs to know the new process's PID, that can be obtained by a new function added by the patch set:
pid_t pidfd_getpid(int pidfd);
This function obtains the PID by looking at the /proc entry for the given pidfd.
The new functions are implemented with clone3() to obtain the pidfd during process creation, without a race window. Using clone3() makes some other things possible as well, specifically creating the new process in a different control group than the creator's. Zanella has made this capability available as well, via an extension to the posix_spawn() attribute mechanism. Creating into a different control group is available for posix_spawn() as well as pidfd_spawn().
While posix_spawn() is seen by many as a better model for the
combination of fork() and exec(), it does not provide all
of the functionality that is available. For cases where this API is not
sufficient, earlier versions of the patch set included a
function called fork_np() as a separate wrapper around
clone3() that would return a pidfd identifying the new child
process. Florian Weimer complained
that this interface differs too much from what the kernel provides, though,
and is "not future-proof at all
". He asked
Zanella to leave this function out of the series for now, and it has been
duly removed from later versions of the series.
Rich Felker, instead, objected
to the concept in general, claiming that any PID-related races are
"purely programmer error
" and that "making a new, complex, highly
nonstandard interface to work around a problem that's programmer error, and
getting this nonstandard and nonportable pattern into mainstream software,
has negative value
". It would be better, he said, to fix the software
affected by this problem. Luca Boccassi disagreed,
though, saying that "these are real race conditions, that cannot be
solved otherwise
". Weimer also said
that there was value in introducing the pidfd functionality.
While there has been no definitive resolution to this particular
disagreement, the fact remains that PID races can be a problem, and there
are users (such as systemd) that would like to have this type of API to
avoid those races. It thus seems reasonably likely that
pidfd_spawn() (though perhaps not fork_np()) will
eventually find its way into glibc.
