|
|
Log in / Subscribe / Register

Extended attributes for special files

By Jonathan Corbet
September 9, 2021
The Linux extended-attribute mechanism allows the attachment of metadata to files within a filesystem. It tends to be little used — at least, in the absence of a security module like SELinux. There is interest in how these attributes work, though, as evidenced by the discussions that have followed the posting of revisions of this patch by Vivek Goyal, which seeks to make a seemingly small change to the rules regarding extended attributes and special files.

Specifically, extended attributes (often referred to as "xattrs") are name-value pairs that can be attached to a file. The name of an extended attribute is divided into a namespace and an identifier within the namespace. The namespace is currently one of security, system, user, or trusted; each namespace has its own special rules. As a general rule, system and trusted see little use. The security namespace, instead, is used by a number of Linux security modules. An example of a security attribute can be seen by running getfattr on a system that is set up for SELinux:

    cd /; getfattr -d -m - .
    # file: .
    security.selinux="system_u:object_r:root_t:s0"

This is the SELinux label, which is used to determine which access policies apply to this directory. A process must have the CAP_SYS_ADMIN capability to change attributes in the security namespace; even then, a running security module might have its own ideas regarding whether to allow a change to happen.

The user namespace is entirely unprivileged; users may assign whatever attributes they like to any files they are capable of writing. There is an exception, though: it is not possible to attach user extended attributes to symbolic links or device special files. The man page describes this restriction as a control on resource consumption; if any process that can write to /dev/null can attach extended attributes to it, the disk could quickly fill with attribute data. It is still possible for a suitably privileged process to attach security extended attributes to those files.

Goyal's patch relaxes that restriction slightly, so that a process that owns a symbolic link or device special file would be able to attach user extended attributes to that file. That allows users to change files they have control over anyway while avoiding the /dev/null problem. It seems to many like a reasonable change to make, but Andreas Gruenbacher opposed it:

The idea behind user.* xattrs is that they behave similar to file contents as far as permissions go. It follows from that that symlinks and special files cannot have user.* xattrs. This has been the model for many years now and applications may be expecting these semantics, so we cannot simply change the behavior.

The possibility of breaking user space in subtle ways is indeed worrisome; Goyal has suggested that the new behavior could be made opt-in to avoid creating such surprises. But the bigger sticking point has to do with the intended use case for this feature. Kernel developers will normally ask why a new behavior is needed; that information is necessary to evaluate whether the proposed solution is indeed the best way to solve the problem. In this case, not all of them were happy with the answer.

Goyal is working on virtualization and, in particular, with use cases where guests want to share a filesystem with the host. The kernel's virtiofs filesystem is designed for this application; it functions as a sort of FUSE filesystem that runs over Virtio, so it performs reasonably well. On the host side, the virtiofsd daemon performs the actual filesystem access needed by the client.

Virtiofsd tries to drop as many capabilities as possible, though it must still be able to manipulate files owned by any user and group ID. It seems to work well, except for one little problem: security extended attributes. The client may want to manipulate those attributes within the filesystem, but the host may prevent virtiofsd from carrying out those changes. In particular, if the host is running a security module like SELinux, that module will be placing its own interpretation on those attributes and may not be appreciative of the changes that virtiofsd is trying to make. Changing those extended attributes requires CAP_SYS_ADMIN in any case, and virtiofsd is trying to run without that capability.

The answer to the problem, naturally, is another layer of indirection. When properly configured, virtiofsd is able to remap extended attributes between the guest and the host; it can, for example, be made to turn the security attributes managed by the guest into user attributes on the host. No privilege is required for virtiofsd to change user attributes, so everything works as desired.

More accurately, almost everything works as desired. As Goyal notes, when the guest goes into the dreaded relabeling coma that SELinux seems to require from time to time, errors result. The problem is the restriction on user extended attributes for special files; the guest is trying to relabel its devices, but the host cannot assign those labels (remapped as user extended attributes) to the appropriate special files. That problem is what is driving Goyal's patch relaxing the restrictions, which makes this use case work.

Casey Schaufler let it be known that he had no problem with assigning user extended attributes to special files. But the remapping done in virtiofsd, he said, is "unreasonable":

As I have stated before, this introduces a breach in security. It allows an unprivileged process on the host to manipulate the security state of the guest. This is horribly wrong. It is not sufficient to claim that the breach requires misconfiguration to exploit. Don't do this.

Schaufler and Goyal have gone around on this topic numerous times over the three revisions of this patch set. Schaufler contends that the only secure way to handle a situation like this is for host-side changes to require the same level of privilege as is required on the guest. So, for example, it would be acceptable to remap the security extended attributes into the trusted namespace (which also requires CAP_SYS_ADMIN) instead. Goyal, meanwhile, continues to look for a solution that does not require additional privilege in virtiofsd.

That solution, in the end, may take a bit of a different form, inspired by this observation from David Alan Gilbert:

IMHO the real problem here is that the user/trusted/system/security 'namespaces' are arbitrary hacks rather than a proper namespacing mechanism that allows you to create new (nested) namespaces and associate permissions with each one.

Each one carries with it some arbitrary baggage (trusted not working on NFS, user. having the special rules on symlinks etc).

Miklos Szeredi suggested a next step (evidently originally proposed by Eric Biederman) to adapt trusted extended attributes for use within user namespaces. If a given user namespace is owned by user ID 1000, then trusted.foo within the namespace would be stored (and visible) in the initial namespace as something like trusted<1000>.foo. Virtiofsd could be run within a user namespace and use the trusted extended attributes without any extra privilege, and security modules on the host would still have a say in how those attributes are managed. Goyal agreed that this approach might work.

The only problem is that somebody has to implement the new behavior for user namespaces — a complex part of the system where it is easy to make security-wrecking mistakes. Biederman has already pointed out one potential problem related to nested namespaces. But, assuming that feature can be safely provided, Goyal's original problem should be solvable. This is why kernel developers tend to be inquisitive about the use cases driving a proposed change: the best solution to a problem is often not the first one that comes to mind.

Index entries for this article
KernelFilesystems/Extended attributes
KernelVirtualization


to post comments

Extended attributes for special files

Posted Sep 9, 2021 22:45 UTC (Thu) by xecycle (subscriber, #140261) [Link] (1 responses)

"system and trusted see little use" --- well, no idea about system, but I have seen trusted.overlay* just this week; I guess it is being used by a huge portion of the docker/OCI community.

Extended attributes for special files

Posted Sep 10, 2021 0:25 UTC (Fri) by vgoyal (guest, #49279) [Link]

Yes trusted.* is used by overlay. But I think with unprivileged mounting of overlayfs that was not possible as mounter did not have CAP_SYS_ADMIN in init_user_ns. So overlay had to switch to using user.* extended attributes. (Check mount option -o userxattr) to support unprivileged mounting.

Extended attributes for special files

Posted Sep 10, 2021 0:22 UTC (Fri) by pabs (subscriber, #43278) [Link] (1 responses)

I would go with namespace.1000.trusted.foo rather than trusted<1000>.foo, or are there size limits on xattr names?

Extended attributes for special files

Posted Sep 10, 2021 13:20 UTC (Fri) by taladar (subscriber, #68407) [Link]

Whatever the limitations are, I agree that something that could be produced via concatenation of a prefix and the original name would be preferable.

Extended attributes for special files

Posted Sep 10, 2021 9:04 UTC (Fri) by scientes (guest, #83068) [Link] (13 responses)

Attributes are not files. A file is an ordered byte stream with a size, and files are organized into a tree with a root. Attributes break the abstraction that make files useful.

Extended attributes for special files

Posted Sep 10, 2021 9:05 UTC (Fri) by scientes (guest, #83068) [Link] (6 responses)

And sparse files are fine.

Extended attributes for special files

Posted Sep 10, 2021 16:38 UTC (Fri) by NYKevin (subscriber, #129325) [Link] (5 responses)

Sparse files are nothing more than an optimization. If the kernel went around opportunistically poking holes in files with lots of consecutive zeros, it would not break anything as far as I can tell (except for apps that assume that SEEK_HOLE always returns offsets which you can predict based on where you intentionally poked holes from userspace, a notion which the lseek(2) man page flatly contradicts).

Extended attributes for special files

Posted Sep 10, 2021 18:49 UTC (Fri) by nybble41 (subscriber, #55106) [Link] (3 responses)

> If the kernel went around opportunistically poking holes in files with lots of consecutive zeros, it would not break anything as far as I can tell….

I think that would break swap files. The swapon(8) manual page[0] suggests that they are still required to be fully allocated, without holes.

Sometimes non-sparseness is better for performance, since filling holes in sparse files requires block allocation and can lead to fragmentation. One can also easily imagine workflows that would break if writing to file which was fully allocated returned ENOSPC because consecutive zeros were replaced with a hole and the space was subsequently reallocated to something else.

[0] https://manpages.ubuntu.com/manpages/eoan/en/man8/swapon....

Extended attributes for special files

Posted Sep 10, 2021 19:02 UTC (Fri) by Wol (subscriber, #4433) [Link]

> I think that would break swap files. The swapon(8) manual page[0] suggests that they are still required to be fully allocated, without holes.

I'm pretty certain you're right. I seem to remember something about swap digging through the file system layer to get the raw blocks. The swap mechanism apparently writes directly to the disk layer. I think that because it hibernates to swap, it needs to know the disk blocks, a bit like lilo did/does.

Cheers,
Wol

Extended attributes for special files

Posted Sep 11, 2021 22:22 UTC (Sat) by NYKevin (subscriber, #129325) [Link]

Well, sure, but that's mostly because swap files are terrible and weird. They're also part of the kernel, so arguably the kernel could just not poke holes in them in the first place.

> Then how would the kernel know whether you're about to run swapon(8) on some random file in e.g. /home/whoever?

And that's *why* swap files are terrible. The kernel has no way of knowing whether something is going to be swapon'd, so if you do any funny business with block allocation, including COW stuff that btrfs (and zfs?) actually want to do, then you break swap files, swapon randomly fails, and whatever userspace utility is invoking it just has to deal with not having a usable swap file (or it has to recreate it and hope the filesystem plays nice this time around). For now, said COW stuff is sufficiently rare that it's not a Real Problem™ yet, but give it time.

Extended attributes for special files

Posted Sep 12, 2021 11:27 UTC (Sun) by eru (subscriber, #2753) [Link]

>One can also easily imagine workflows that would break if writing to file which was fully allocated returned ENOSPC because consecutive zeros were replaced with a hole

I think the same thing can happen with transparently compressed files, such as those supported by btrfs. In fact, the poking holes method would just be a simple form of transparent file compression.

Extended attributes for special files

Posted Sep 15, 2021 9:26 UTC (Wed) by geert (subscriber, #98403) [Link]

SEEK_HOLE sounds like a great side-channel to store hidden information in a file containing all zeroes ;-)

Extended attributes for special files

Posted Sep 10, 2021 23:42 UTC (Fri) by dvdeug (subscriber, #10998) [Link] (4 responses)

>A file is an ordered byte stream with a size, and files are organized into a tree with a root. Attributes break the abstraction that make files useful.

Unix has always stored attributes like access time in the filesystem, and while the Unix-style file has won out, many systems have had files with heavy attributes like file types or even non-byte stream oriented, always quite useful to the users of that OS.

Extended attributes for special files

Posted Sep 13, 2021 5:18 UTC (Mon) by raof (subscriber, #57409) [Link]

Yeah. It turns out that there are lots of pieces of generally useful information you can have about an ordered byte stream with a known size. Name, access times, owner, permissions, an entirely different way of specifying permissions, and so on.

Extended attributes for special files

Posted Sep 13, 2021 20:08 UTC (Mon) by ballombe (subscriber, #9523) [Link] (2 responses)

Under Unix, to change metadata like the filename, you need write permission to the directory, not to the file itself, but this is the opposite for extended attributes (which cause the /dev/null problem in the first place!).

Extended attributes for special files

Posted Sep 14, 2021 6:10 UTC (Tue) by jem (subscriber, #24231) [Link] (1 responses)

Under Unix, to change metadata like the filename, you need write permission to the directory, but this is the opposite for extended attributes.

Changing the filename is the only piece of metadata you need write permission to the directory, it is the opposite for all other metadata. Also, it is arguable whether the filename is part of the file metadata at all, under the Unix model.

Extended attributes for special files

Posted Sep 14, 2021 12:20 UTC (Tue) by anselm (subscriber, #2796) [Link]

Also, it is arguable whether the filename is part of the file metadata at all, under the Unix model.

I'd say it's pretty clear that in Unix-like systems, the filename is not part of the file metadata. After all, a file “knows” how many names it has, but not what these names are. To actually figure those out you have to examine the whole directory tree on that device to check for hard links to the file's inode – there's no way of getting the names directly from the file itself.

Extended attributes for special files

Posted Sep 17, 2021 13:18 UTC (Fri) by jschrod (subscriber, #1646) [Link]

Having been raised at a time when OS/370 was the prominent OS, I take exception to the notion that files are "an ordered stream of bytes".

And I don't even take into account that we would first have to define what a "byte" is. Maybe you take octets for granted - they aren't.

Extended attributes for special files

Posted Sep 16, 2021 2:37 UTC (Thu) by jamesmorris (subscriber, #82698) [Link] (2 responses)

Interesting definition of little-used -- RHEL, CentOS, Fedora, and billions Android devices.

Extended attributes for special files

Posted Sep 16, 2021 3:10 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link] (1 responses)

> Interesting definition of little-used -- RHEL, CentOS, Fedora, and billions Android devices.

Yeah but that's because of SELinux. I believe the intent of the statement in context, is that it is not widely used outside of it's use in SELinux.

Extended attributes for special files

Posted Sep 16, 2021 11:09 UTC (Thu) by mbunkus (subscriber, #87248) [Link]

Samba stores Windows ACLs in extended attributes.

Linux (POSIX) ACLs are stored in extended attributes (as "system.posix_acl_access" and "system.posix_acl_default").

Special capabilities such as cap_net_raw for the ping command are stored in extended attributes (as "security.capability"). A lot of distributions use these.

See "man 7 xattr" for further examples. It's not just SELinux.

Extended attributes for special files

Posted Sep 22, 2021 20:46 UTC (Wed) by rwmj (subscriber, #5474) [Link] (1 responses)

While it's certainly convenient for virtiofsd to store the security attributes as xattrs, does it really need to do that? Couldn't it store them in a cache somewhere else entirely?

Extended attributes for special files

Posted Sep 24, 2021 10:48 UTC (Fri) by njs (subscriber, #40338) [Link]

I suspect you'd run into nasty cache consistency issues. For example, the guest expects rename() to be atomic, which is difficult if the host has to convert it into a rename() + a separate metadata db update.


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds