LWN: Comments on "New AT_ flags for restricting pathname lookup"

New AT_ flags for restricting pathname lookup

eru — Mon, 08 Oct 2018 04:18:22 +0000

Thanks to all for explaining the need and use of the somethingat() calls.

New AT_ flags for restricting pathname lookup

rweikusat2 — Sun, 07 Oct 2018 19:27:35 +0000

rc = fstatat(dirfd, d_ent->d_name, &, 0);

This should have been

rc = fstatat(dirfd, d_ent->d_name, &st, 0);

and was but got deleted when "htmlifying" the source ... :-(

New AT_ flags for restricting pathname lookup

rweikusat2 — Sun, 07 Oct 2018 17:22:24 +0000

Manipulating pathnames means "doing string operations", something that's fairly cumbersome in C. For an example, consider the following toy-program:

#define _GNU_SOURCE

#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>

static char *cwd[] = {
    ".",
    NULL
};

int main(int argc, char **argv)
{
    DIR *dir;
    struct dirent *d_ent;
    struct stat st;
    int dirfd, rc;
    
    ++argv;
    if (!*argv) argv = cwd;
    do {
        dirfd = open(*argv, O_RDONLY, 0);
        if (dirfd == -1) {
            perror("open");
            continue;
        }
        
        dir = fdopendir(dirfd);
        if (!dir) {
            perror("fdopendir");
            continue;
        }

        printf("-----\nfiles in %s\n-----\n", *argv);
        while ((d_ent = readdir(dir))) {
            rc = fstatat(dirfd, d_ent->d_name, &, 0);
            if (rc == -1) {
                if (errno != ENOENT) perror("fstatat");
                continue;
            }

            if (S_ISREG(st.st_mode))
                printf("%s\t\t%zu bytes\n", d_ent->d_name, (size_t)st.st_size);
        }

        closedir(dir);
    } while (*++argv);

    return 0;
}

This takes a list of directory pathnames as arguments and prints the names and sizes of all files in any of the directories. It uses fstatat because the names returned by readdir are filenames relative to the directory being read. Thanks to the *at-call, they can be accessed without doing dynamic string manipulation and buffer management and also without changing the cwd of the process forward and backward for each directory.

Also, chdir is basically unusable in multi-threaded processes as it changes the working directory of the process, ie, it affects all threads, not just the one executing it and, as seen by another thread, the cwd change is an unpredictable, asynchronously occuring event. Eg, a thread desiring to create two files in the same directory might end up creating them in different directories.

Lastly, the directory a process was started in might have been picked intentionally, eg, as location where core dumps should go to, and the process shouldn't change it except if there's a very good reason for that (and this should be documented).

New AT_ flags for restricting pathname lookup

cyphar — Sun, 07 Oct 2018 04:36:36 +0000

Something like resolveat(2)? The problem is that this would necessarily be conceptually identical to openat(O_PATH). Maybe O_PATH should've been a different syscall but we are mostly stuck with it now, and I think it would be strange to have two methods of opening an O_PATH descriptor. Though, there are some aspects of O_PATH that I think need to be fixed (and would require more convoluted O_ flags -- maybe a new syscall is warranted to fix some of the semantics of O_PATH. I'm not sure.)

And remember that the widespread utility of any resolveat(2) syscall would likely require having AT_EMPTY_PATH support for every *at(2) syscall (which is unfortunately far from the case currently).

New AT_ flags for restricting pathname lookup

judas_iscariote — Sun, 07 Oct 2018 00:34:15 +0000

It is quite unfortunate that kernel developers insist on extending openat() with more and more contrived semantics, I wish they just added new syscalls with well defined behaviour.

New AT_ flags for restricting pathname lookup

wahern — Sat, 06 Oct 2018 01:37:28 +0000

Shouldn't it be possible to quiesce the runtime (pause GC, park all other goroutines, and join all kernel threads)? All the machinery in the scheduler must already be there, more or less. Maybe some component is currently running in a dedicated thread in an infinite loop, but conceptually it could be refactored to be able to enter and exit its core loop.

It might not be particularly efficient and come with a ton of gotchas, but it would at least make some currently impossible things possible, such as using geteuid and forking helper processes. Those things tend to happen early on, anyhow, so performance and other limitations wouldn't matter much.

New AT_ flags for restricting pathname lookup

nix — Fri, 05 Oct 2018 22:24:27 +0000

Generally I do the same thing when debugging eBPF that I do when debugging other programs: printf()! In the case of eBPF you throw in a helper that does a printk() and chuck in calls to the helper liberally. (This is not so useful if you can't modify the eBPF, mind you.)

New AT_ flags for restricting pathname lookup

Cyberax — Fri, 05 Oct 2018 17:14:36 +0000

It's way worse than assembly. With assembly you can typically use debuggers to trace the execution and inspect the environment. Nothing comparable exists for eBPF.

Places to block filesystem traversal

smurf — Fri, 05 Oct 2018 14:24:40 +0000

Also, userspace sanitation depends on the fact that no second thread exists that modifies the sanitized path before it's passed to the kernel. In-kernel defenses against that sort of thing at least work.

New AT_ flags for restricting pathname lookup

nix — Fri, 05 Oct 2018 12:10:48 +0000

eBPF is a nice thing to have if machine-generated (it's a rather nice and orthogonal assembler, and the ability to add helpers is just a killer feature that I wish real assemblers had!), but it's about as pleasant to debug programs written in it as any other assembler: i.e. fairly easy if you're familiar with the code generator, a nightmare otherwise, doubly so if this is the less regular land of handwritten code, disassembled and devoid of comments.

New AT_ flags for restricting pathname lookup

nix — Fri, 05 Oct 2018 12:08:38 +0000

Others have commented on the problems with chdir(). The problem with using long absolute pathnames is twofold: firstly, you race with people modifying symlinks and/or renaming out from underneath you (*at() can at least reduce this by nailing the walk to specific directory inodes). Secondly, the length of pathnames is capped at pathconf(..., _SC_PATH_MAX): but you can make directory trees of arbitrary depth, with absolute paths much deeper than this and indeed deeper than the hardware page size. Nobody does this manually, but it can and does happen with machine-generated hierarchies, and the deep parts of such hierarchies are *only* traversable via chdir() or the *at() syscalls: while you can compose an absolute path that should reach those parts, the kernel will reject it with -ENAMETOOLONG.

So generic code has no choice but to use chdir() or *at() to traverse hierarchies or fail on such deep hierarchies, and generic multithreaded code or library code which might be run in multithreaded contexts has no choice but to use *at().

New AT_ flags for restricting pathname lookup

pbonzini — Fri, 05 Oct 2018 10:47:02 +0000

For one, chdir affects the entire process rather than the current thread only.

New AT_ flags for restricting pathname lookup

Cyberax — Fri, 05 Oct 2018 07:34:41 +0000

I hope so. I've just spent a day debugging a eBPF filter written by somebody else and it's NOT a nice experience at all.

Debugging infrastructure is sorely lacking for it.

New AT_ flags for restricting pathname lookup

flewellyn — Fri, 05 Oct 2018 07:31:40 +0000

I believe neilbrown was joking. I have no evidence for this, but I am desperately choosing to believe it anyway.

New AT_ flags for restricting pathname lookup

kostix — Fri, 05 Oct 2018 07:13:16 +0000

That wouldn't have helped anyway: the problem with not being able to do the classic fork+exec in Go programs is that the code executing in each of them heavily relies on the live Go runtime (which is linked with/into any compiled Go executable and actually manages the whole lifecycle of the program), and that runtime exploits multiple OS threads — both to run the program's goroutines and do its own chores.

Since fork() clones the state of just a single thread — the one which happened to execute that syscall, — as soon as the control resumes in the child process, there is literally no Go runtime anymore around the goroutine "awoken" in the cloned thread, and as soon as it happens to call anything which would normally reach for the runtime, it is hosed. And normally such a call would happen pretty soon.

So basically the only sensible thing one might safely do after forking a process running a Go program is to do a controlled set of preparations and exec().
And actually that's what the syscall.ForkExec does — with some added complexity stemming from Go having an execution model other than C ;-)

You can look at ForkExec in https://golang.org/src/syscall/exec_unix.go and then at forkAndExecInChild in https://golang.org/src/syscall/exec_linux.go — the code is very easy to follow for any programmer with a C background, and it is extensively commented.

Places to block filesystem traversal

epa — Fri, 05 Oct 2018 07:11:12 +0000

It’s not just containers. Path-traversal bugs are a common exploit in archivers like tar or unzip, where unpacking a malicious archive file overwrites things elsewhere in the filesystem. I imagine web servers might also use this flag as an additional defence to make sure they only serve content from the right directory. If the flag existed on all operating systems, a lot of userspace path sanitizing code could be removed.

New AT_ flags for restricting pathname lookup

Cyberax — Fri, 05 Oct 2018 04:33:52 +0000

You can't, not in a race-free way anyway.

New AT_ flags for restricting pathname lookup

eru — Fri, 05 Oct 2018 04:13:36 +0000

openat() is one of those Linux system calls whose rationale I don't quite understand. It allows opening files relative to a particular directory, but can't you do the same thing by manipulating the path name, or by using chdir() first?

New AT_ flags for restricting pathname lookup

luto — Thu, 04 Oct 2018 23:55:34 +0000

It would be “simple” in the sense that getting the eBPF right would be at least as difficult as getting the kernel code with the AT flags right would be. But with eBPF, no one would ever review it carefully or fix the bugs.

eBPF is flexible, but it’s not magic.

New AT_ flags for restricting pathname lookup

Cyberax — Thu, 04 Oct 2018 23:03:49 +0000

No......

Please, no more eBPF. It never ever works outside of kernel developers' machines.

New AT_ flags for restricting pathname lookup

neilbrown — Thu, 04 Oct 2018 22:52:54 +0000

Surely this could be vastly simplified by allowing an eBPF program to be attached to a file descriptor so that when a path_lookup starts from that file descriptor, the eBPF program is used to vet or modify the lookup of each component.

New AT_ flags for restricting pathname lookup

Cyberax — Thu, 04 Oct 2018 21:45:20 +0000

You can pin a goroutine to a thread using LockOSThread, but it basically locks this thread out of running other goroutines.

(Personally, I'd like for them to add goroutine IDs)

New AT_ flags for restricting pathname lookup

wahern — Thu, 04 Oct 2018 21:36:42 +0000

I don't understand why the Go team is so resistant to adding the ability to explicitly pin a goroutine to a machine thread. Goroutines are an amazing, almost ideal construct. But there's a very obvious and unresolvable impedance mismatch between how a goroutine implement threading (linear flow of logical execution) and how traditional operating systems do. A similar mismatch exists with FFI ABIs (i.e. stack details) and with the blocking semantics of some syscalls. In those cases a goroutine *is* pinned to a machine thread; indeed, the very architecture of the Go runtime (the [G]oroutine, OS [M]achine thread, and [P]rocessor scheduling abstractions) is built around this mismatch. It's inexplicable to me why they refuse to expose the scheduling levers that must necessarily exist.