LWN: Comments on "User namespaces + overlayfs = root privileges"

User namespaces + overlayfs = root privileges

butlerm — Fri, 15 Jan 2016 01:29:21 +0000

Wouldn't it be safer to ignore SUID / SGID bits if the file in question lacked a UUID in an extended attribute that matched a UUID assigned to the corresponding superuser or group, respectively?

Then presumably a user in one namespace could mount a filesystem created in a different namespace (possibly on a different system), and the security bits in question would be silently ignored, for failure to match the corresponding security identifiers?

And if you really wanted those bits to take effect, you would go change the extended attributes to match the UUIDs of the superuser and/or group in the appropriate name space?

User namespaces + overlayfs = root privileges

nybble41 — Thu, 14 Jan 2016 18:48:25 +0000

> The step that seems most surprising to me is that a process in namespace A is affected by the setuid bit on a file in namespace B; I'd expect the VFS to treat files from a namespace not in your ancestry as if they were on a nosuid,nodev mount.

I would expect SUID/SGID files in subordinate namespaces to work normally, with the caveat that they are SUID/SGID to the corresponding unprivileged user and/or group outside the namespace and not the privileged user/group they appear to belong to inside the namespace. Note that normal users can create SUID/SGID binaries if they have write access to any non-nosuid filesystem; they just can't make them SUID to other users or groups of which they are not members, such as root. It is perfectly possible to create a binary which can be run by other users with your own UID and/or one of your groups (something to look out for if you're trying to revoke permissions).

Forcing nodev for filesystems mounted by users who are not privileged in the root namespace does make a lot of sense, however. I would go so far as to say that it ought to be the default, with root namespace privileges required to enable device support. In most systems there are only two filesystems which should contain device nodes: /dev and /dev/pts.

User namespaces + overlayfs = root privileges

iabervon — Thu, 14 Jan 2016 17:46:43 +0000

The step that seems most surprising to me is that a process in namespace A is affected by the setuid bit on a file in namespace B; I'd expect the VFS to treat files from a namespace not in your ancestry as if they were on a nosuid,nodev mount.

On the other hand, the fact that it's possible to look into another namespace, but it's not obvious that you can, is a poor situation; it's hard to remember the security design when some things that are not actually prohibited are hard or awkward to do.

User namespaces + overlayfs = root privileges

alexl — Thu, 14 Jan 2016 08:25:01 +0000

Not to mention that debian requires you to explicitly enable the kernel.unprivileged_userns_clone sysctl for non-privileged user namespace support, and the fact that even then you can't mount overlayfs (see erics comment above).

User namespaces + overlayfs = root privileges

clopez — Thu, 14 Jan 2016 02:30:47 +0000

OverlayFS was introduced in https://git.kernel.org/linus/e9be9d (v3.18-rc2)

So it don't affects any kernel < 3.18.
Debian stable has 3.16

User namespaces + overlayfs = root privileges

nybble41 — Wed, 13 Jan 2016 21:55:48 +0000

> The exploit uses another property of namespaces that has always seemed like something of a bug: the /proc filesystem provides a route for processes outside of a namespace to "see" inside it.

This actually seems to me like the normal and expected operation of a namespace: processes outside the namespace can see into it, but processes inside the namespace cannot see out. It wouldn't make sense, for example, for a process to be able to create a PID namespace to hide child processes from the original user. Running processes inside a namespace is about limiting those processes, not the ones outside the namespace. Of course, everything needs to be translated properly so that outside processes looking into a namespace see the correct user IDs and so forth.

As for the issue of tricking mount—or probably any number of other programs—into writing to an inherited file descriptor for a SUID file, wouldn't it make more sense to revoke the SUID bit when the file is first opened for write access by a non-root process, rather than waiting until data is actually written? The target program wouldn't even need to be SUID, if it can receive file descriptors from non-root processes some other way. Unix domain sockets (as used in DBUS) come to mind as a possible attack vector.

User namespaces + overlayfs + ubuntu = root privileges

ebiederm — Wed, 13 Jan 2016 21:23:15 +0000

Mainline kernels are not affected as they do not allow mounting overlayfs with only user namespace privilege. Only Ubuntu was affected.

The mainline commit messages not talking about a problem which does not and did not exist in the kernel that was being modified seems reasonable in that context.