|
|
Subscribe / Log in / New account

open() flags: O_TMPFILE and O_BENEATH

By Jonathan Corbet
November 5, 2014
Not much happens on a Linux system without one or more calls to open() or one of its variants. There is no other way to create a file or access a file that already exists. It thus follows that the various flags that control the behavior of open() have a significant effect on the functionality of the system as a whole. Here, we'll look at two specific open() flags; one of them is a relatively recent addition to the kernel, while the other is still in the proposal stage.

O_TMPFILE

The O_TMPFILE flag has been discussed a few times in these pages; the abrupt nature of its addition meant that it had little review and a fair number of post-merge problems. The concept behind this flag is simple enough: it requests the creation of a file with no associated directory entry. It is thus meant for temporary files that will not be opened by any other process.

Eric Rannaud recently asked a question: what should happen when a process makes a call like the following?

    int fd = open("/tmp", O_TMPFILE | O_RDWR, 0);

The flags request the creation of a writable temporary file, but the third argument (the file mode) says that there should be no access (read or write) allowed. As it happens, POSIX is clear enough about this situation when a file is created with ordinary O_CREAT: the provided mode only applies after the creation of the file. So, while a process can create a file that it cannot itself access in general, it can still get a working file descriptor in the act of creation itself.

As it happens, though, file creation with O_TMPFILE does not work that way; the file mode is applied from the beginning, so the open() call listed above will fail. This behavior was widely recognized to be a bug, and Eric's fix was merged for the 3.18-rc3 release. But there are a couple of interesting side notes that are worth looking at.

One is that this call:

    int fd = open("/tmp", O_TMPFILE | O_RDONLY, 0666);

will still fail. When the O_TMPFILE feature was implemented, it seemed that there was no use case for a temporary file that could not be written to, so this case (O_TMPFILE with O_RDONLY) was explicitly forbidden. But it turns out that there is a use case for this type of file: creating an empty file with a specific set of extended attributes atomically. The open() call would be followed by one or more fsetxattr() calls; once everything is in place, linkat() can be used to make the file visible in the filesystem. Linus initially agreed that this use case should be supported, but later changed his mind. So read-only O_TMPFILE files will remain unsupported.

Amusingly, the original bug was discovered while digging into a related glibc bug. It seems that, when O_TMPFILE is used, the mode argument isn't passed into the kernel at all. In the case of open() on x86-64 machines, things work out of sheer luck: the mode argument just happens to be sitting in the right register when glibc makes the call into the kernel. Things do not work as well with openat(), though, with the result that, in current glibc installations, O_TMPFILE cannot be used with openat() at all. The bug is well understood and should be fixed soon.

O_BENEATH

When a developer makes a call to openat(), they will normally expect that the file being opened or created will be located in the specified directory. As is often the case, though, surprises lurk for the unwary. Trouble can come from a surprising symbolic link or deliberately malicious input; either way, it can lead to files being created or opened where they should not be.

David Drysdale has a solution in the form of the O_BENEATH flag for openat(). If this flag is included in the call, the file being accessed must exist in or below the directory provided. The enforcement of this rule is simple enough: the provided path is constrained to not start with "/" or contain "../". Any symbolic links traversed while resolving the path must meet the same conditions.

This feature was implemented as part of the filesystem access restrictions found in the Capsicum patch set. It turns out that there are other potential users as well, though. In particular, when combined with a secure computing ("seccomp") filter, O_BENEATH can be used to safely give a sandboxed process a directory to create files in.

The initial review concerns raised against this patch have been addressed in the current version. It is a relatively simple and non-invasive patch, so there is a reasonable chance that we'll see it enter the mainline during a near-future merge window.

Index entries for this article
KernelO_BENEATH
KernelO_TMPFILE


to post comments

open() flags: O_TMPFILE and O_BENEATH

Posted Nov 6, 2014 17:20 UTC (Thu) by RobSeace (subscriber, #4435) [Link] (1 responses)

> There is no other way to create a file

Just to be a pedant: creat()... Though it's redundant, it is a separate syscall!

open() flags: O_TMPFILE and O_BENEATH

Posted Nov 9, 2014 9:47 UTC (Sun) by richard_weinberger (subscriber, #38938) [Link]

...which is just another wrapper around open:

SYSCALL_DEFINE2(creat, const char __user *, pathname, umode_t, mode)
{
return sys_open(pathname, O_CREAT | O_WRONLY | O_TRUNC, mode);
}

open() flags: O_TMPFILE and O_BENEATH

Posted Nov 21, 2014 6:33 UTC (Fri) by sstewartgallus (guest, #99898) [Link] (1 responses)

Pedant time, there are a bunch of ways to create files, pipe, socketpair, eventfd, etc.. but for regular files the only way to create a file is to use open, openat and mknod.

system call __NR_creat is different from __NR_open

Posted Jun 26, 2024 15:59 UTC (Wed) by jreiser (subscriber, #11027) [Link]

In addition to __NR_open*, there are separate system calls __NR_creat:

/usr/include/asm/unistd_32.h:#define __NR_creat 8
/usr/include/asm/unistd_64.h:#define __NR_creat 85
/usr/include/asm/unistd_x32.h:#define __NR_creat (__X32_SYSCALL_BIT + 85)

/usr/include/asm/unistd_32.h:#define __NR_open 5
/usr/include/asm/unistd_32.h:#define __NR_openat 295
/usr/include/asm/unistd_32.h:#define __NR_openat2 437

/usr/include/asm/unistd_64.h:#define __NR_open 2
/usr/include/asm/unistd_64.h:#define __NR_openat 257
/usr/include/asm/unistd_64.h:#define __NR_openat2 437

/usr/include/asm-generic/unistd.h:#define __NR_openat 56

open() flags: O_TMPFILE and O_BENEATH

Posted Nov 23, 2014 22:09 UTC (Sun) by oak (guest, #2786) [Link] (1 responses)

Let the devious adventure begin: fd = open("a Steel Sky", O_BENEATH, 666);

open() flags: O_TMPFILE and O_BENEATH

Posted Nov 25, 2014 15:59 UTC (Tue) by nix (subscriber, #2304) [Link]

Well, this is trying to solve a problem that's only going to be be triggered by a malicious attacker, "far from the light of day".


Copyright © 2014, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds