Flags for fchmodat()
The prototype for fchmodat() is defined as:
int fchmodat(int fd, const char *path, mode_t mode, int flag);
Its purpose is to change the permissions of the file identified by path to the given mode. In the style of all the *at() system calls, fd can be an open file descriptor referring to a directory; if path is relative, the lookup process will start at the directory indicated by fd rather than the current working directory. The flag argument can be either zero or AT_SYMLINK_NOFOLLOW.
Support for fchmodat() was added to the Linux kernel for the 2.6.16 release in 2006 as part of a series from Ulrich Drepper adding a number of the *at() calls. That version of fchmodat(), though, did not include the flag argument, a situation that continues to the present. As a result, the kernel's fchmodat() implementation is not compliant with the specification, and is not what application developers will expect. That, in itself, is not entirely unusual; applications do not (usually) invoke system calls directly. Instead, they use wrappers in a low-level library, usually the C library, which do what is needed to provide the expected API. That is what happens here, but the result is not ideal.
The POSIX specification defines the behavior of the
AT_SYMLINK_NOFOLLOW flag as: "If path names a symbolic
link, then the mode of the symbolic link is changed
". That behavior
differs from the default, where the mode of the file pointed to by
that link will be changed instead. There are two reasons why one might
want a flag like this: to actually change permissions on a symbolic link,
and, more importantly, to prevent the changing of permissions on a
real file by way of a symbolic link. Attackers have been known to use
symbolic links to confuse a privileged program into changing file modes
that should not be changed; using this flag will prevent such an outcome.
If one looks at the (functionally identical) fchmodat() implementations in the GNU C library and musl libc, two things jump out: implementing AT_SYMLINK_NOFOLLOW in user space is inelegant at best and, due to limitations in Linux itself, neither library is able to implement exactly what the specification says (but they are able to provide the important part).
The C-library implementations start by opening the file indicated by the fd and path arguments to fchmodat() as an O_PATH file descriptor. Such a descriptor allows metadata operations, but cannot be used to read or write the file; thus, it does not require read or write permission on the file to open. That open() call also uses the O_NOFOLLOW flag; if the path ends with a symbolic link, that will cause the link itself to be opened, rather than the file pointed to.
At this point, the C libraries do an fstatat64() call to determine what kind of file has just been opened; if the new file descriptor turns out to be a symbolic link, an EOPNOTSUPP failure status will be returned to the caller. The Linux kernel does not support changing the permission bits on a symbolic link in general (those bits have no real meaning anyway), so neither C-library implementation even tries.
If the target is not a symbolic link, the library could just issue a normal fchmodat() call with the given parameters and no flag. That, however, could open the door to a time-of-check-to-time-of-use vulnerability, where an attacker would replace the file with a symbolic link between the check and the mode change. So, instead, the library must change the mode bits on the file that it actually opened in the first step, without using the path name again. Unfortunately, the obvious way (using fchmod()) won't work, because that system call cannot operate on O_PATH file descriptors in many filesystems. So, instead, the C library generates the path for the open file descriptor under /proc/self/fd, then passes that to chmod() to effect the mode change.
This sequence seems unlikely to be the most efficient way to prevent the following of a symbolic link for an fchmodat() call. It also will fail to work in settings where /proc is not available. A much nicer solution would be to just implement the AT_SYMLINK_NOFOLLOW flag in the kernel, which already has the needed machinery to do so in an atomic and efficient manner.
That is what Gladkov's patch series does: it creates a new fchmodat2() system call that implements the AT_SYMLINK_NOFOLLOW flag. Once this system call is available in released kernels, the C-library implementations can use it for their implementation of fchmodat(), bypassing the current workarounds. The result should be a faster and more robust implementation. Chances are that change will happen soon; VFS maintainer Christian Brauner has applied the series and routed it into linux-next, meaning that it should be pushed during the 6.6 merge window.
Interestingly, this is not the first attempt to add an fchmodat2()
implementation; there were patches posted by Rich
Felker in 2020 and Greg
Kurz in 2017. It is not entirely clear why the patches were not
accepted at that time; it may be simply because VFS patches have
occasionally tended to fall through the cracks over the years. The
previous failure may be part of why Felker responded
rather negatively to a
suggestion from David Howells that, perhaps, it would be better to add
a new set_file_attrs() system call, with a number of new features,
rather than completing fchmodat(). That suggestion has not gained
much support, so Gladkov's attempt appears to be the one that will actually
succeed; after 17 years in the kernel, fchmodat() should finally
get in-kernel AT_SYMLINK_NOFOLLOW support.
Index entries for this article | |
---|---|
Kernel | Releases/6.6 |
Kernel | Symbolic links |
Kernel | System calls/fchmodat() |
Posted Jul 27, 2023 18:34 UTC (Thu)
by bof (subscriber, #110741)
[Link]
Posted Jul 28, 2023 0:55 UTC (Fri)
by Paf (subscriber, #91811)
[Link]
Posted Jul 28, 2023 3:11 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (9 responses)
I personally prefer the former, partly perhaps because at some point long ago I was under the impression it was the more the common practice. But more recently I've gotten the impression that most developers are unfamiliar with the version by argument number (i.e. arity) pattern. Have I always been in the minority camp or did the world change around me?
Posted Jul 28, 2023 4:10 UTC (Fri)
by willy (subscriber, #9762)
[Link] (8 responses)
Most syscalls that end in a number are 16, 32 or 64, indicating their limits (eg sys_time32)
There's wait/waitpid/wait3/wait4, but the arity matches the sequence number. Similarly for dup/dup2/dup3 and pipe/pipe2
The less said about sys_vm86 the better ;-)
There's signalfd4, eventfd2, epoll_create1, accept4 which look to be arity based.
But then there's renameat2 which has 5 arguments. mlock2 which takes 3. preadv2 and pwritev2 which take 5. openat2 takes 4. faccessat2 takes 4. epoll_pwait2 takes 5.
pselect6 is named for its arity, but that's because you can't normally have more than 6 arguments to a syscall. clone3 was preceded by a clone2 that we don't talk about.
I think you could make an argument either way.
Posted Jul 28, 2023 8:50 UTC (Fri)
by brauner (subscriber, #109349)
[Link]
There's also the possibility that a system call like bla4() is broken and you'd wanted to change a system call argument type but not the actual number of arguments. Then you'd not be able to call it bla5() and blat4.2() would be rather weird. Imho, the simple versioning is just more flexible and is nowadays the de facto standard anyway.
I also had documentation for all of this but there was never enough time to send it actually but fwiw:
https://github.com/brauner/linux/commit/5fe619ce62bae64cf...
which is part of
https://github.com/brauner/linux/commits/docs_extensible_...
and contains a lot of other info.
Posted Jul 30, 2023 3:03 UTC (Sun)
by mirabilos (subscriber, #84359)
[Link] (6 responses)
It’s also inflexible (what if you change one argument type in a revision, or even lose one).
Posted Jul 30, 2023 3:08 UTC (Sun)
by willy (subscriber, #9762)
[Link] (3 responses)
Posted Jul 30, 2023 3:20 UTC (Sun)
by mirabilos (subscriber, #84359)
[Link] (2 responses)
I’m not too much of a fan of puns based on specific pronunciations of things, especially if they go unexplained, but it’s probably obvious to english speakers.
Posted Jul 30, 2023 3:25 UTC (Sun)
by willy (subscriber, #9762)
[Link] (1 responses)
Posted Jul 30, 2023 9:23 UTC (Sun)
by Wol (subscriber, #4433)
[Link]
My favourite example is the mutt man page - mutts (dogs) collect mail hence the name, and "mutts don't have bugs, they have fleas ..."
Cheers,
Posted Aug 3, 2023 9:00 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
https://pubs.opengroup.org/onlinepubs/9699919799/function...
Posted Aug 3, 2023 18:24 UTC (Thu)
by jwilk (subscriber, #63328)
[Link]
Posted Jul 28, 2023 10:41 UTC (Fri)
by cyphar (subscriber, #110703)
[Link]
> The Linux kernel does not support changing the permission bits on a symbolic link in general (those bits have no real meaning anyway), so neither C-library implementation even tries. This is actually not quite true, at least not until Christian's patch to enforce this is merged. The restriction on symlink modes was always done on a per-filesystem basis (which lead some filesystems to allowing it by accident -- procfs allows this for several symlinks and magic-links). In fact, several filesystems (btrfs, xfs, and ext4) all returned -EOPNOTSUPP but still modified the inode mode. > Unfortunately, the obvious way (using fchmod()) won't work, because that system call cannot operate on O_PATH file descriptors in many filesystems. So, instead, the C library generates the path for the open file descriptor under /proc/self/fd, then passes that to chmod() to effect the mode change. This restriction is done on the VFS level, it's not per-filesystem (fchmod() uses fdget() rather than fdget_raw() -- and this behaviour is intentional per the description of O_PATH in open(2)). If fchmodat2() adds support for AT_EMPTY_PATH, it would be possible to avoid even procfs nastiness when dealing with O_PATH file descriptors -- something which is necessary in plenty of cases where AT_SYMLINK_NOFOLLOW is inadequate, such as when dealing with paths you need to resolve safely with RESOLVE_IN_ROOT or other openat2() flags). I'll send a patch for this...
Flags for fchmodat()
Flags for fchmodat()
The patch originally used the name fchmodat4, but review requested
Flags for fchmodat()
s/fchmodat4/fchmodat2/
With very few exceptions we don't version by argument number but by
revision and we should stick to one scheme:
Flags for fchmodat()
Flags for fchmodat()
In sheer numbers this scheme also wins iirc.
Flags for fchmodat()
Flags for fchmodat()
Flags for fchmodat()
Flags for fchmodat()
Flags for fchmodat()
Wol
Flags for fchmodat()
Flags for fchmodat()
Flags for fchmodat()