|
|
Subscribe / Log in / New account

Race-free process creation in the GNU C Library

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 19:27 UTC (Fri) by mb (subscriber, #50428)
In reply to: Race-free process creation in the GNU C Library by bluca
Parent article: Race-free process creation in the GNU C Library

>containers really should have it

One additional nail into the coffin of unprivileged containers?

>The way polkit/dbus

I'm talking about the fundamental pidfd API. Any process could use pidfds.


to post comments

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 19:35 UTC (Fri) by bluca (subscriber, #118303) [Link] (16 responses)

> One additional nail into the coffin of unprivileged containers?

I'm pretty sure those can have /proc too?

$ id -u
1000
$ unshare -U -m --mount-proc -p -f
$ mount | grep img
proc on /tmp/img type proc (rw,nosuid,nodev,noexec,relatime)

> I'm talking about the fundamental pidfd API. Any process could use pidfds.

Sure, to do process tracking - what kind of process would you need to track in a chroot? Besides, it's all moot, this is not glibc's fault, the kernel provides this interface, so that's what glibc can use to provide an abstraction

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 19:36 UTC (Fri) by bluca (subscriber, #118303) [Link]

(copy-pasta, that should have been --mount-proc=/tmp/img - give us an edit button already!)

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 20:57 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (11 responses)

> what kind of process would you need to track in a chroot

Any process that wants to spawn a process and use pidfd, but also write the pid in a log file or debug trace? Ignoring portability for a second, it could even be something like make or cargo.

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 21:19 UTC (Fri) by bluca (subscriber, #118303) [Link] (10 responses)

That requires procfs to do today, no? So there shouldn't be a regression in that regard?

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 21:30 UTC (Fri) by pbonzini (subscriber, #60935) [Link] (9 responses)

It doesn't require procfs if it uses the (inferior) pid-based API and SIGCHLD. So it's a regression if this hypothetical program wants to switch to pidfd. A ioctl does seem to be a good idea, it can return ESRCH in case of a race.

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 23:23 UTC (Fri) by bluca (subscriber, #118303) [Link] (4 responses)

Ok - sounds like those use cases need to make a choice: continue to use pid-based APIs and no procfs, or switch to pidfds and mount procfs with hidepid= to sandbox it

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 23:46 UTC (Fri) by josh (subscriber, #17465) [Link] (3 responses)

Or bypass glibc and call the nice race-free function the kernel provides, and continue advocating that glibc provide clone3.

Race-free process creation in the GNU C Library

Posted Sep 2, 2023 0:43 UTC (Sat) by bluca (subscriber, #118303) [Link] (2 responses)

The kernel doesn't provide functions to resolve pidfds

Race-free process creation in the GNU C Library

Posted Sep 2, 2023 1:08 UTC (Sat) by josh (subscriber, #17465) [Link] (1 responses)

Given access to clone3, you can directly obtain a pidfd and a pid simultaneously when you first create the process, rather than retrieving the pid later.

(That operation would still be useful when passed a pidfd from elsewhere, but not *necessary* for the common case where you got the pidfd by creating a process.)

Race-free process creation in the GNU C Library

Posted Sep 2, 2023 1:37 UTC (Sat) by bluca (subscriber, #118303) [Link]

The case when you want to resolve a pidfd received via SO_PEERPIDFD/SCM_PIDFD is exactly where you need that, and what is enabled by all these new APIs that have recently been added, and where this resolving glibc function. I know because I had to reimplement it across 4 projects...

Race-free process creation in the GNU C Library

Posted Sep 3, 2023 4:14 UTC (Sun) by IanKelling (subscriber, #89418) [Link] (3 responses)

> So it's a regression if this hypothetical program wants to switch to pidfd.

I don't think it is hypothetical. From my sysadmin perspective, I often build software in a chroot without a /proc mount. Very rarely, the build has needed it and I wanted to know why. Bind bounding /proc, I see find shows 546,160 user-listabable files and 304,803 user readable files. Making that a requirement to create processes just because opt-in to an api that avoids a race condition would be roughly a regression in my book.

Race-free process creation in the GNU C Library

Posted Sep 3, 2023 10:26 UTC (Sun) by bluca (subscriber, #118303) [Link] (2 responses)

Why would compiling some stuff require resolving pidfds?

Race-free process creation in the GNU C Library

Posted Sep 4, 2023 9:16 UTC (Mon) by taladar (subscriber, #68407) [Link] (1 responses)

Why wouldn't it? Compiling spawns lots of processes and that kind of thing usually involves printing the PID when logging what you are doing to be able to distinguish between different instances of the same program (e.g. the compiler when spawned by some sort of build tool).

Race-free process creation in the GNU C Library

Posted Sep 4, 2023 9:53 UTC (Mon) by bluca (subscriber, #118303) [Link]

Then the tools that spawn such processes, if they want to implement tracking by pidfd, will need to implement appropriate fallbacks (which are easy to add as the error codes are different). They'll need that anyway for compatibility with older kernels. So still not sure where the regression would be?

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 22:07 UTC (Fri) by geofft (subscriber, #59789) [Link] (2 responses)

There's a practical problem that a Kubernetes container that is not marked "privileged" (which is a Kubernetes concept, rather different from the ordinary meaning of "privileged" as in "runs as root") gets certain things in /proc overmounted, e.g., /proc/sysrq-trigger and /proc/kcore, as a form of sandboxing. The goal is to reduce the impact of a malicious pid 0 inside a container. (User namespacing would also work, but most Kubernetes deployments don't use it yet - it's an alpha feature on k8s' end and only supports one container runtime.) This is, in isolation, an understandable / defensible feature, and I can see systems other than Kubernetes doing it (e.g., I can totally see it being a systemd Restrict option down the line).

Meanwhile, the kernel has a feature where, if your current /proc is in any way overmounted, you're not allowed to mount a new /proc - because that would give you access to the files that are supposed to be hidden to you. This is also, in isolation, an understandable / defensible feature.

The intersection of these features is that you can't correctly mount /proc inside a nested container or container-like thing inside a non-privileged Kubernetes container. If you make a new pidns (either because you're root or via a new userns, as in your example), all the paths in /proc are wrong because they refer to outer PIDs.

(The intersection of these features also ceases to be really defensible in the case where you don't allow your Kubernetes workloads to run as pid 0, which is a really good idea on its own.)

There have been some patches for a second procfs (whose exact name I'm forgetting) that provides /proc/$pid/ and the /proc/self/ symlink but not anything else in /proc, but I don't think they've been merged. If those could get merged and guaranteed mountable by anyone with CAP_SYS_MOUNT in their current namespace, regardless of what the existing /proc outside it looks like or even whether it exists, that would satisfactorily address the issue.

I suppose another option would be for /proc to always enumerate the calling process's PID namespace, but maybe that gets weird with open file descriptors passed between PID namespaces.

Race-free process creation in the GNU C Library

Posted Sep 1, 2023 22:28 UTC (Fri) by bluca (subscriber, #118303) [Link] (1 responses)

Isn't that what the hidepid= mount options (and systemd's ProtectProc=) do? To resolve pidfds you just need proc/self/fd/ and proc/self/fdinfo which are both available under those sandboxing options

Race-free process creation in the GNU C Library

Posted Sep 2, 2023 1:56 UTC (Sat) by cyphar (subscriber, #110703) [Link]

subset=pids has no effect on the mount_too_revealing() check because all of the "are the flags the same" checks are based on the generic VFS flags not FS-specific ones. So if you only have an overmounted procfs you cannot mount subset=pids even if the overmounts are paths that don't exist with subset=pids.

In fact this also means you can bypass the check entirely -- if you have a "safe" subset=pids mount in your namespace, the kernel will allow you to mount an unmasked (fully-fledged) procfs.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds