LWN: Comments on "Grabbing file descriptors with pidfd_getfd()"

Grabbing file descriptors with pidfd_getfd()

mathstuf — Tue, 27 Jul 2021 15:03:37 +0000

Meddling from "outside" is likely to interfere with guarantees made by any language. I think once you introduce an "interloper" into your process space (either via ptrace, VM managers, OOM killer, interrupts, etc.), you're playing with fire. Sure, we know how to manage it most of the time and can keep it contained, but if it gets loose…well, I hope you have insurance[1].

[1] In code, that would be "sanity checks on top of the language guarantees". IMO, it's just normal defensive coding and the amount you put in depends on how paranoid you tend (or need) to be.

Grabbing file descriptors with pidfd_getfd()

taladar — Tue, 27 Jul 2021 08:13:33 +0000

Closing a file descriptor from another process would also circumvent any static analysis a language (like Haskell or Rust) might have done to ensure certain operations are only done on open file descriptors.

Grabbing file descriptors with pidfd_getfd()

cyphar — Wed, 29 Jan 2020 01:35:01 +0000

You're right about how magic-links work (and re-opening through /proc/$pid/fd does work for pipes), but this does not work for sockets or anonfds -- you'll get -ENXIO when you try to re-open them. Additionally, there is still a pid recycling race condition if you use procfs (unless you have a first-generation /proc/$pid-style pidfd).

Grabbing file descriptors with pidfd_getfd()

dona73110 — Tue, 14 Jan 2020 14:52:41 +0000

>One thing that is possible in current kernels is to open a file that another process also has open; the information needed to do that is in each process's /proc directory. That does not work, though, for file descriptors referring to pipes, sockets, or other objects that do not appear in the filesystem hierarchy.

You sure can open a pipe that another process has open, by opening /proc/PID/fd/FD ... open(2) opens the actual files that these symlinks represent, which in the case of deleted files or pipes, etc, do not correspond to the path in the symlink target returned by readlink.

Grabbing file descriptors with pidfd_getfd()

cortana — Mon, 13 Jan 2020 09:24:23 +0000

It may be interesting to note another alternative: a Mandatory Access Control system such as SELinux, where confined processes are only allowed to bind to ports permitted by the policy (e.g., Apache running in the http_t domain can only listen to ports labelled with httpd_port_t).

Grabbing file descriptors with pidfd_getfd()

rra — Sat, 11 Jan 2020 23:12:18 +0000

To ask what's probably the same question in a slightly different way: is the rule that only root can bind to ports below 1024 still useful?

Back when that was added to UNIX's security model, there were a wealth of programs that used the ability to bind to specific ports as an authorization control of various kinds (remember identd?). Most of those protocols are thoroughly obsolete (I hope no one is using traditional rlogin with rhosts authentication these days), so protecting those ports doesn't serve the same purpose.

I would argue that, today, the security concern is preventing programs from grabbing ports they're not "supposed" to have, but that problem is not limited to ports under 1024 except by history and convention. There are a lot of services that listen to ports above 1024 where some race condition allowing a user process to bind to that port is equally problematic.

It feels like a more useful security primitive now would be controlling the specific ports to which a process can bind, which looks more like socket activation (as you describe), or like a container where the process can bind to any port it wants but only expected ports are routed outside the container, so binding to other ports is futile.

Grabbing file descriptors with pidfd_getfd()

cyphar — Fri, 10 Jan 2020 22:30:53 +0000

Using sendmsg(2) requires co-operation from the other side (or the injection of parasitic code a-la CRIU or rr). Those approaches are really suboptimal for a bunch of reasons, and having an interface which does this properly and doesn't require shellcode injection as part of normal code execution is a massive benefit. Not to mention that seccomp filters on the target process may block some of the syscalls needed for that to work.

Grabbing file descriptors with pidfd_getfd()

kylebot — Fri, 10 Jan 2020 20:53:56 +0000

If I remember correctly, one process can send file descriptors through sendmsg syscall?
Then what's the difference between these two methods.

Grabbing file descriptors with pidfd_getfd()

nix — Fri, 10 Jan 2020 15:14:01 +0000

innbind is usually installed mode 1550, group news, so it's only executable by things in the Usenet news subsystem, which are all in the same trust domain.

Grabbing file descriptors with pidfd_getfd()

Karellen — Fri, 10 Jan 2020 14:29:27 +0000

Thanks for pointing to those!

However, I'd have reservations about using authbind - LD_PRELOAD is handy for debugging and trying weird tricks out, but I'm wary about using it in production systems.

innbind looks much cleaner, and certainly would allow you to write a program that could bind to privileged ports without needing to run as root, but as far as I can tell it allows any program on the system to bind privileged ports. If you installed it so that only members of a specific group were able to run it, and limited which programs ran as members of that group, that could work.

Grabbing file descriptors with pidfd_getfd()

miquels — Fri, 10 Jan 2020 13:39:16 +0000

Or things like authbind and innbind ?

Grabbing file descriptors with pidfd_getfd()

roc — Thu, 09 Jan 2020 21:23:08 +0000

This sounds great. We have code to do this in rr already:
https://github.com/mozilla/rr/blob/79eea40fe0d496abb6fcb0...
It's not nice, especially because we want it to work whether the tracee is 64-bit or 32-bit.

Of course it will be years before the new syscall is widely deployed enough that we can actually rip out our code, but ... progress.

Grabbing file descriptors with pidfd_getfd()

Karellen — Thu, 09 Jan 2020 19:04:46 +0000

I still believe that binding sockets (to well known ports) ought to be something that is handled by system infrastructure and not separately by each individual server.

So, like sd_listen_fds()?

Grabbing file descriptors with pidfd_getfd()

zblaxell — Thu, 09 Jan 2020 18:57:02 +0000

> One would cause the file descriptor to be closed in the target process after being copied to the caller, thus truly "stealing" the descriptor from the target.

That sounds messy--the FD could end up being used again by an open in
some other thread of the target process, causing hilarious confusion on
the target side if the target is not expecting FD thievery.

Why not do an atomic FD swap?

int stolen_fd = pidfd_swapfd(int pid_fd, int target_fd, int flags, int caller_fd)

Set caller_fd = NOFD if you really want the FD closed in the target process;
otherwise, the caller's caller_fd becomes the target's target_fd, while the
former target's target_fd is returned in stolen_fd.

Set target_fd = NOFD to copy caller_fd to the target process, assigning
a new FD as if the target process had performed an open(). The new FD
number in the target is returned in stolen_fd.

caller_fd isn't closed in the calling process--close() is fine for that.

Grabbing file descriptors with pidfd_getfd()

NYKevin — Thu, 09 Jan 2020 18:44:55 +0000

> That distinction matters if the objective is to modify that particular file descriptor. One use case mentioned in the patch series is using seccomp to intercept attempts to bind a socket to a privileged port. A privileged supervisor process could, if it so chose, grab the file descriptor for that socket from the target process and actually perform the bind — something the target process would not have the privilege to do on its own. Since the grabbed file descriptor is essentially identical to the original, the bind operation will be visible to the target process as well.
>
> For the sufficiently determined, it is actually possible to extract a file descriptor from another process now. The technique involves using ptrace() to attach to that process, stop it from executing, inject some code that opens a connection to the supervisor process and sends the file descriptor via an SCM_RIGHTS datagram, then running that code. This solution might justly be said to be slightly lacking in elegance. It also requires stopping the target process, which is likely to be unwelcome.

On first read, I found this rather confusing. Surely the sandboxed process would be able to open that AF_UNIX connection itself, right?

But no, because they're not talking about a sandboxed process that is cooperating with the supervisor. They're (I think) talking about a sandboxed process that is ignorant of its sandbox and thinks it can "just call bind(2)." In that case, you actually need to intercept that call and emulate it outside the sandbox, without the sandboxed process noticing.

What bothers me most, however, is that this still feels like an antiquated system design. In the great before-times, inetd would spawn your server with the socket already hooked to stdin, and you wouldn't need to think about calling bind() or indeed any part of the sockets interface. While there are obvious scalability concerns with that approach, I still believe that binding sockets (to well known ports) ought to be something that is handled by system infrastructure and not separately by each individual server.