New AT_ flags for restricting pathname lookup
There have been previous attempts at restricting pathname lookup, but none of them have been merged thus far. David Drysdale posted an O_BENEATH option to openat() in 2014 that would require the eventual target to be underneath the starting directory (as provided to openat()) in the filesystem hierarchy. More recently, Al Viro suggested AT_NO_JUMPS as a way of preventing lookups from venturing outside of the current directory hierarchy or the starting directory's mount point. Both ideas have attracted interest, but neither has yet been pushed long or hard enough to make it into the mainline.
Sarai's venture into this territory takes the form of several new AT_ flags that can be used with system calls like openat():
- AT_BENEATH would, similar to O_BENEATH, prevent the pathname lookup from moving above the starting point in the filesystem hierarchy. So, as a simple example, an attempt to open ../foo would be blocked. This option does allow the use of ".." in a pathname as long as the result remains below the starting point, though, so opening foo/../bar would work.
- AT_XDEV prevents the lookup from crossing a mount-point boundary in either the upward or downward direction.
- AT_NO_PROCLINK prevents the following of symbolic links found in the /proc hierarchy; in particular, it is aimed at the links found under fd/ in any specific process's directory.
- AT_NO_SYMLINK prevents following any symbolic links at all, including those blocked by AT_NO_PROCLINK.
- AT_THIS_ROOT performs the equivalent of a chroot() call (to the starting directory) prior to the beginning of pathname lookup. This option, too, is meant to constrain lookups to the given directory hierarchy; it will also change how absolute symbolic links are interpreted.
There are numerous use cases for these new flags, but the driving force this time around would appear to be container workloads and, in particular, runtime systems for containers. Those systems often have to look inside a container and, perhaps, act on files within a container's directory hierarchy. If the container itself is compromised or otherwise malicious, it can attempt to play games with its filesystems to confuse the runtime system and gain access to the host.
This posting got a reception that was positive overall, but with a number
of concerns about the details. For example, Jann Horn liked
AT_BENEATH, but would rather that it forbade the use of
".." entirely, even if the result remains beneath the starting
point. Doing so would help to block exploitation of various types of
directory-traversal bugs, he said. Sarai responded
that 37% of all the symbolic links on his system contained "..";
"this indicates to me that you would be restricting a large amount of
reasonable resolutions because of this restriction
". That said, he
indicated a willingness to change the behavior if need be.
Horn also complained
about the "footgun potential
" of AT_THIS_ROOT which,
he said, shares all of the security failings of chroot(). He
described a scenario where a hostile container could force an escape by
moving directories around: "If the root of your walk is below an
attacker-controlled directory, this of course means that you lose
instantly
". A possible mitigation here would be to require the
starting directory in AT_THIS_ROOT lookups to be a mount point;
Sarai was amenable
to making this change as well.
Horn, along with Andy
Lutomirski, questioned the container-management use case; as Lutomirski
put it: "Any sane container is based on
pivot_root or similar, so the runtime can just do the walk in the container
context
". In this particular case, it turns
out that part of the problem is the result of the fact that the
container runtime in question is written in Go:
Since the system cannot use the relatively cheap ways to get into a
container's context, it has to use an expensive workaround instead; this
expense could be avoided if files could be opened with the new
AT_ flags. Lutomirski responded
that he is "not very sympathetic to the argument that 'Go's
runtime model is incompatible with the simpler solution'
". He
proposed an alternative that might work in this setting without adding the
new flags.
That alternative might work, but the fact remains that there are other use
cases for restricting the scope of pathname lookups; that is why the idea
continues to pop up on the kernel's mailing lists. And Lutomirski, too, agreed
that some of the flags seem useful. Whether this implementation will be
the one that manages to go all the way to the mainline remains to be seen,
but it seems likely that, one of these years, the kernel will gain the
ability to control lookups in a way similar to the one that has been
proposed here.
Index entries for this article | |
---|---|
Kernel | Filesystems/Virtual filesystem layer |
Security | Linux kernel/Virtual filesystem layer |
Posted Oct 4, 2018 21:36 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (3 responses)
Posted Oct 4, 2018 21:45 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
(Personally, I'd like for them to add goroutine IDs)
Posted Oct 5, 2018 7:13 UTC (Fri)
by kostix (guest, #119803)
[Link] (1 responses)
Since fork() clones the state of just a single thread — the one which happened to execute that syscall, — as soon as the control resumes in the child process, there is literally no Go runtime anymore around the goroutine "awoken" in the cloned thread, and as soon as it happens to call anything which would normally reach for the runtime, it is hosed. And normally such a call would happen pretty soon.
So basically the only sensible thing one might safely do after forking a process running a Go program is to do a controlled set of preparations and exec().
You can look at ForkExec in https://golang.org/src/syscall/exec_unix.go and then at forkAndExecInChild in https://golang.org/src/syscall/exec_linux.go — the code is very easy to follow for any programmer with a C background, and it is extensively commented.
Posted Oct 6, 2018 1:37 UTC (Sat)
by wahern (subscriber, #37304)
[Link]
It might not be particularly efficient and come with a ton of gotchas, but it would at least make some currently impossible things possible, such as using geteuid and forking helper processes. Those things tend to happen early on, anyhow, so performance and other limitations wouldn't matter much.
Posted Oct 4, 2018 22:52 UTC (Thu)
by neilbrown (subscriber, #359)
[Link] (7 responses)
Posted Oct 4, 2018 23:03 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (5 responses)
Please, no more eBPF. It never ever works outside of kernel developers' machines.
Posted Oct 5, 2018 7:31 UTC (Fri)
by flewellyn (subscriber, #5047)
[Link] (4 responses)
Posted Oct 5, 2018 7:34 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Debugging infrastructure is sorely lacking for it.
Posted Oct 5, 2018 12:10 UTC (Fri)
by nix (subscriber, #2304)
[Link] (2 responses)
Posted Oct 5, 2018 17:14 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Posted Oct 5, 2018 22:24 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Oct 4, 2018 23:55 UTC (Thu)
by luto (guest, #39314)
[Link]
eBPF is flexible, but it’s not magic.
Posted Oct 5, 2018 4:13 UTC (Fri)
by eru (subscriber, #2753)
[Link] (6 responses)
Posted Oct 5, 2018 4:33 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Oct 5, 2018 10:47 UTC (Fri)
by pbonzini (subscriber, #60935)
[Link]
Posted Oct 5, 2018 12:08 UTC (Fri)
by nix (subscriber, #2304)
[Link]
So generic code has no choice but to use chdir() or *at() to traverse hierarchies or fail on such deep hierarchies, and generic multithreaded code or library code which might be run in multithreaded contexts has no choice but to use *at().
Posted Oct 7, 2018 17:22 UTC (Sun)
by rweikusat2 (subscriber, #117920)
[Link] (1 responses)
Also, chdir is basically unusable in multi-threaded processes as it changes the working directory of the process, ie, it affects all threads, not just the one executing it and, as seen by another thread, the cwd change is an unpredictable, asynchronously occuring event. Eg, a thread desiring to create two files in the same directory might end up creating them in different directories.
Lastly, the directory a process was started in might have been picked intentionally, eg, as location where core dumps should go to, and the process shouldn't change it except if there's a very good reason for that (and this should be documented).
Posted Oct 7, 2018 19:27 UTC (Sun)
by rweikusat2 (subscriber, #117920)
[Link]
This should have been
rc = fstatat(dirfd, d_ent->d_name, &st, 0);
and was but got deleted when "htmlifying" the source ... :-(
Posted Oct 8, 2018 4:18 UTC (Mon)
by eru (subscriber, #2753)
[Link]
Posted Oct 5, 2018 7:11 UTC (Fri)
by epa (subscriber, #39769)
[Link] (1 responses)
Posted Oct 5, 2018 14:24 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted Oct 7, 2018 0:34 UTC (Sun)
by judas_iscariote (guest, #47386)
[Link] (1 responses)
Posted Oct 7, 2018 4:36 UTC (Sun)
by cyphar (subscriber, #110703)
[Link]
And remember that the widespread utility of any resolveat(2) syscall would likely require having AT_EMPTY_PATH support for every *at(2) syscall (which is unfortunately far from the case currently).
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
And actually that's what the syscall.ForkExec does — with some added complexity stemming from Go having an execution model other than C ;-)
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
openat() is one of those Linux system calls whose rationale I don't quite understand. It allows opening files relative to a particular directory, but can't you do the same thing by manipulating the path name, or by using chdir() first?
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup
Manipulating pathnames means "doing string operations", something that's fairly cumbersome in C. For an example, consider the following toy-program:
New AT_ flags for restricting pathname lookup
#define _GNU_SOURCE
#include <dirent.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <sys/stat.h>
static char *cwd[] = {
".",
NULL
};
int main(int argc, char **argv)
{
DIR *dir;
struct dirent *d_ent;
struct stat st;
int dirfd, rc;
++argv;
if (!*argv) argv = cwd;
do {
dirfd = open(*argv, O_RDONLY, 0);
if (dirfd == -1) {
perror("open");
continue;
}
dir = fdopendir(dirfd);
if (!dir) {
perror("fdopendir");
continue;
}
printf("-----\nfiles in %s\n-----\n", *argv);
while ((d_ent = readdir(dir))) {
rc = fstatat(dirfd, d_ent->d_name, &, 0);
if (rc == -1) {
if (errno != ENOENT) perror("fstatat");
continue;
}
if (S_ISREG(st.st_mode))
printf("%s\t\t%zu bytes\n", d_ent->d_name, (size_t)st.st_size);
}
closedir(dir);
} while (*++argv);
return 0;
}
This takes a list of directory pathnames as arguments and prints the names and sizes of all files in any of the directories. It uses fstatat because the names returned by readdir are filenames relative to the directory being read. Thanks to the *at-call, they can be accessed without doing dynamic string manipulation and buffer management and also without changing the cwd of the process forward and backward for each directory.
New AT_ flags for restricting pathname lookup
Thanks to all for explaining the need and use of the somethingat() calls.
New AT_ flags for restricting pathname lookup
Places to block filesystem traversal
Places to block filesystem traversal
New AT_ flags for restricting pathname lookup
New AT_ flags for restricting pathname lookup