Namespaced file capabilities

Posted Jul 3, 2017 20:18 UTC (Mon) by drag (guest, #31333)
Parent article: Namespaced file capabilities

One of the ways that containers are distributed between users is to create file system images.

How does this deal with situations were you have a file system created by one user on a system gets copied and re-used by another user on the same or different system?

You can't know the UID of the user using the container ahead of time.

Namespaced file capabilities

Posted Jul 5, 2017 0:23 UTC (Wed) by hallyn (subscriber, #22558) [Link] (7 responses)

You do not manually have to fill in the UID. It's supposed to be transparent to you. From the point of view of a process in the container, the file capabilities are completely uid-agnostic.

When you then 'setcap cap_net_raw+pe /bin/ping', the kernel will automatically rewrite the xattr for you as one tagged with the kuid of root in your container. When you just 'getcap /bin/ping', it will show it as a regular security.capability.

So if you create a tarfile (respecting the xattrs, which is a trick in itself :) containing that file inside one container, then untar it in the other, the capability on the new file will have the correct new root kuid.

Namespaced file capabilities

Posted Jul 5, 2017 17:20 UTC (Wed) by nybble41 (subscriber, #55106) [Link] (6 responses)

> So if you create a tarfile (respecting the xattrs, which is a trick in itself :) containing that file inside one container, then untar it in the other, the capability on the new file will have the correct new root kuid.

I believe the question was about moving filesystem _images_ from one context to another, not tarfiles. If I understand correctly, the filesystem image will retain the original root kuid in the xattrs, which is not too different from normal UID/GID handling when filesystem images are moved between systems. However, unlike the root UID and GID which can be fixed with chown/chgrp, I'm not sure there is a good way to change the root kuid in the xattrs short of recreating all the capabilities.

Namespaced file capabilities

Posted Jul 5, 2017 18:39 UTC (Wed) by hallyn (subscriber, #22558) [Link] (5 responses)

How exactly is the filesystem image being moved?

You mention chown/chgrp. So long as that is happening, then any capability xattrs are being automatically removed anyway. So just as you have to re-set the setuid and setgid bits, you'll have to re-set the xattrs you care to preserve. As soon as there is agreement on the format for this, I'll write a patch for lxd's fuidshift to do this for namespaced xattrs.

If you're using something like shiftfs, then shifts will simply have to do the right shifting for the xattrs just as it does for ownership.

What other ways are you thinking of?

The two examples listed above are ways that root on the host could 'move' the filesystem image on behalf of the container. The tar example has the advantage that an unprivileged user on the host can do it entirely without becoming host-root, so long as both contexts are allocated to the user. Supporting that, and doing so without any risk of leaking privilege out of the user namespace, are an important feature here.

Namespaced file capabilities

Posted Jul 6, 2017 15:37 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

> What other ways are you thinking of?

I suppose the ideal case would be mounting the filesystem image from inside the user namespace, so that the image contains the container's UIDs and GIDs and there is no need to remap anything. So far as I know, that isn't allowed yet because there are too many potential vulnerabilities in the filesystem code to permit mounting untrusted block devices, including loopback devices. (Somehow removable media gets a free pass here. I understand that a potentially remote vulnerability is worse than one that requires physical access, but I don't buy the argument that limited physical access to plug in a USB device is equivalent to having root privileges. Just consider all the photo kiosks with exposed USB ports... the user is not intended to have full control, and root access could expose quite a bit of private data as well as provide a vector for spreading malware.)

> You mention chown/chgrp. So long as that is happening, then any capability xattrs are being automatically removed anyway.

I was thinking of systemd-nspawn --private-users-chown rather than fuidshift, but the same principle applies. It doesn't appear that systemd-nspawn preserves files capabilities when reassigning UIDs anyway, so this won't be any different from the current situation. You would still need to run a script as root inside the container afterward to restore the capabilities.

Namespaced file capabilities

Posted Jul 6, 2017 15:58 UTC (Thu) by hallyn (subscriber, #22558) [Link] (3 responses)

(my desktop doesn't auto-mount usb sticks :)

Indeed, actual mounting of filesystems in a user namespace is some time away, and if/when it does happen it's likely to be through fuse.

Two notes regarding avoiding the need to re-attach the capability xattrs. First, it's currently the case that if you go ahead and set a global capability (no uid= tag), it will be respected in all namespaces. Secondly, as James had suggested, we could add a 'uid=' (not followed by a number) tag, which would mean "this capability will work in any user namespace other than the initial one (or rather any where root is not mapped to kuid 0)." For the case where your host init system, or docker as host root, is arranging things, this could be useful.

I bet systemd would accept a patch to have it preserve namespaced file capabilities (once they are supported), so that you wouldn't have to have a script do it inside the container.

Namespaced file capabilities

Posted Jul 6, 2017 16:47 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (2 responses)

> (my desktop doesn't auto-mount usb sticks :)

Neither does mine, but any user logged in to a local console can manually mount a USB device of their choice through udisks and exploit filesystem vulnerabilities. On my personal systems that isn't so much of a problem, because I'm both the only local user and the admin (via sudo). It's more of a problem for devices such as kiosks which have other protection against the usual physical attacks—such as being located in a public place and monitored by security cameras—but need to expose USB ports, SD card slots, etc. for use by untrusted individuals. I suppose they could mount the raw block device via FUSE as an untrusted user, bypassing the kernel's filesystem code altogether, but it would be better if filesystems could just treat data from block devices as untrusted input. Besides the obvious security implications this would also help protect the system against more pedestrian data corruption.

I really have to wonder what the point is of an option like "nodev" or "nosuid" when (a) creating device nodes or SUID-root executable the normal way already requires root (or equivalent capabilities) and (b) if a user can create an accessible device node or SUID-root executable _without_ root, by directly modifying the filesystem, they are already presumed to be able to mount a corrupted filesystem, which is equivalent to having root.

Namespaced file capabilities

Posted Jul 6, 2017 17:21 UTC (Thu) by smcv (subscriber, #53363) [Link] (1 responses)

> any user logged in to a local console can manually mount a USB device of their choice through udisks

Only if the policies loaded by polkit say they can. The default policies provided with udisks assume that your system is a typical laptop, desktop or server, where physical access means the attacker has essentially already won; but on a kiosk-style system you don't have to use those defaults.

Namespaced file capabilities

Posted Jul 6, 2017 19:05 UTC (Thu) by nybble41 (subscriber, #55106) [Link]

> Only if the policies loaded by polkit say they can. ... on a kiosk-style system you don't have to use those defaults.

Right, but part of the function of the kiosk (for e.g. a self-service photo order kiosk) is loading files from user-supplied removable media. I would imagine one doesn't typically use udisks for this, but they still need to access the files somehow.