Unprivileged filesystem mounts, 2018 edition
Attempts to make the mount operation safe for ordinary users are nothing new; LWN covered one patch set back in 2008. That work was never merged, but the effort to allow unprivileged mounts picked up in 2015, when Eric Biederman (along with others, Seth Forshee in particular) got serious about allowing user namespaces to perform filesystem mounts. The initial work was merged in 2016 for the 4.8 kernel, but it was known to not be a complete solution to the problem, so most filesystems can still only be mounted by users who are privileged in the initial namespace.
Biederman has recently posted a new patch set "wrapping up" support for unprivileged mounts. It takes care of a number of details, such as allowing the creation of device nodes on filesystems mounted in user namespaces — an action that is deemed to be safe because the kernel will not recognize device nodes on such filesystems. He clearly thinks that this feature is getting closer to being ready for more general use.
The plan is not to allow the unprivileged mounting of any filesystem, though. Only filesystem types that have been explicitly marked as being safe for mounting in this mode will be allowed. The intended use case is evidently to allow mounting of filesystems via the FUSE mechanism, meaning that the actual implementation will be running in user space. That should shield the kernel from vulnerabilities in the filesystem code itself, which turns out to be a good thing.
In a separate discussion, the "syzbot" fuzzing project recently reported a problem with the XFS filesystem; syzbot has been doing some fuzzing of on-disk data and a number of bugs have turned up as a result. In this case, though, XFS developer Dave Chinner explained that the problem would not be fixed. It is a known problem that only affects an older ("version 4") on-disk format and which can only be defended against at the cost of breaking an unknown (but large) number of otherwise working filesystems. Beyond that, XFS development is focused on the version 5 format, which has checksumming and other mechanisms that catch most metadata corruption problems.
There was an extensive discussion over whether the XFS developers are
taking the right approach, but it took a bit of a diversion after Eric
Sandeen complained about bugs that involve
"merely mounting a crafted filesystem that in reality would never
(until the heat death of the universe) corrupt itself into that state on
its own
". Ted Ts'o pointed out that
such filesystems (and the associated crashes) can indeed come about in real
life if an attacker creates
one and somehow convinces the system to mount it. He named Fedora and
Chrome OS as two systems that facilitate this kind of attack by
automatically mounting filesystems found on removable media — USB devices,
for example.
There is a certain class of user that enjoys the convenience of automatically mounted filesystems, of course. There is also the container use case, where there are good reasons for allowing unprivileged users to mount filesystems on their own. So, one might think, it is important to fix all of the bugs associated with on-disk format corruption to make this safe. Chinner has bad news for anybody who is waiting for that to happen, though:
Many types of corruption can be caught with checksums and such. Other
types are more subtle, though; Chinner mentioned linking important metadata
blocks into an ordinary file as an example. Defending the system fully
against such attacks would be difficult to do, to say the least, and would
likely slow the filesystem to a crawl.
That said, Chinner doesn't expect
distributors like Fedora to stop mounting filesystems automatically:
"They'll do that when we provide them with a safe, easy to use
solution to the problem. This is our problem to solve, not blame-shift it
away.
" That, obviously, leaves open the question of how to solve a
problem that has just been described as unsolvable.
To Chinner, the answer is clear, at least in general terms: "We've
learnt this lesson the hard way over and over again: don't parse untrusted
input in privileged contexts
". The meaning is that, if the contents
of a particular filesystem image are not trusted (they come from an
unprivileged user, for example), that filesystem should not be managed in
kernel space. In other words, FUSE should be the mechanism of choice for
any sort of unprivileged mount operation.
Ts'o protested that FUSE is "a pretty
terrible security boundary
" and that it lacks support for many important
filesystem types. But FUSE is what we have for now, and it does move the
handling of untrusted filesystems out of the kernel. The fusefs-lkl module
(which seems to lack a web site of its own, but is built using the Linux kernel library project)
makes any kernel-supported filesystem accessible via FUSE.
When asked (by Ts'o) about making unprivileged filesystem mounts safe, Biederman made it clear that he, too, doesn't expect most kernel filesystems to be safe to use in this mode anytime soon:
It would thus seem that there is a reasonably well understood path toward
finally allowing unprivileged users to mount filesystems without
threatening the integrity of the system as a whole. There is clearly some
work yet to be done to fit all of the pieces together. Once that is done,
we may finally have a solution to a problem that developers have been
working on for at least a decade.
| Index entries for this article | |
|---|---|
| Kernel | Namespaces/User namespaces |
Posted May 30, 2018 16:07 UTC (Wed)
by ms-tg (subscriber, #89231)
[Link] (2 responses)
On Mac OS in particular, is it possible to construct a malicious .dmg file using these principles, since Mac users typically mount those disk images to install software?
Posted Jun 1, 2018 8:23 UTC (Fri)
by ehiggs (subscriber, #90713)
[Link]
https://www.cvedetails.com/cve/CVE-2018-4176/
Posted Aug 24, 2018 19:21 UTC (Fri)
by ssmith32 (subscriber, #72404)
[Link]
Posted May 30, 2018 16:25 UTC (Wed)
by rahvin (guest, #16953)
[Link] (6 responses)
Given the advancement in older filesystems over the last few years how would those developers rate these older filesystems like JFS, XFS, ext4 and others for being the most advanced and development mind-share. It appears that XFS has the most mind share and appears to be advancing the fastest but this could be because of Redhats other efforts, I'm curious what other think.
My concern is probably that there is a LOT of older information out there on what filesystem is best in what circumstances, etc that may no longer be relevant as a particular filesystem has seen more work than others.
Posted May 30, 2018 22:30 UTC (Wed)
by Paf (subscriber, #91811)
[Link] (5 responses)
I can think of three broad types worth addressing.
For traditional extent based file systems on Linux, EXT4 and XFS are clearly best of breed. There is an emerging consensus among enterprise distributions in favor of XFS as the default, if that helps, but neither is dramatically superior in general.
I can’t speak to log structured except to say that those are mostly built in to flash devices rather than used directly.
For copy-on-write, there are three real choices. ZFS, almost certainly best of breed but with complex legal issues, BTRFS, which you can get various answers on the readiness of, and bcachefs which is compelling but pretty clearly still too new.
Posted May 30, 2018 23:39 UTC (Wed)
by rahvin (guest, #16953)
[Link] (4 responses)
Posted May 31, 2018 11:51 UTC (Thu)
by bendystraw (guest, #124653)
[Link]
Posted Jun 2, 2018 10:27 UTC (Sat)
by stevan (guest, #4342)
[Link] (2 responses)
Posted Jun 20, 2018 18:56 UTC (Wed)
by mstone_ (subscriber, #66309)
[Link] (1 responses)
Posted Jun 21, 2018 21:33 UTC (Thu)
by philipstorry (subscriber, #45926)
[Link]
It's curious to hear you call JFS a "me-too", as it predates the Linux kernel by over a year. (It originated with IBM's AIX systems in 1990, was later ported to OS/2, and finally to Linux.)
It's actually quite a nice filesystem for general use. It's got metadata journalling, uses extents and allocation groups, and has a reputation for being fast even under heavy loads whilst not consuming much CPU or memory itself.
XFS is probably the filesystem it's most natural to compare JFS to, as they have similar core features and were both ported to Linux at around the same time in 2001. It's also an OS that came from an old UNIX (IRIX) and is only three years younger than JFS, so understandably has a number of similar design decisions. It seems both were pretty cutting edge for the early 1990's!
I wasn't terribly involved with Linux back in 2001 when they were both ported, but it seems that XFS rapidly won the mindshare battle - it accrued more developers around it. Perhaps that's because SGI were more open to contributions from other developers than IBM were? Or maybe it's because its 64-bit on-disk structure gave it higher headline stats in terms of maximum sizes?
Certainly one of the things I've recently admired about JFS is that it's very much in "maintenance mode" these days. That may not be exciting or sexy, but it does make it attractive it you're looking for reliability. I suspect that the unchanging nature of JFS is why it tends to get discounted - it's not adding new features, but the ones it has are well implemented and reliable. But the tech industry and community likes the shiny new things, and JFS lost its shiny new feel over a decade ago.
Now it's simply a reliable workhorse.
The main reasons to avoid it are either feature requirements (and they're more likely to be COW based) or simply the concern that at some point it may be deprecated due to its inactivity. That sort of concern is kind of a self-reinforcing feedback look really, and I suspect it's started to happen already.
However, it's served three different operating systems well, and is still a viable choice for many purposes. It's a pity JFS doesn't get a little more respect...
Posted May 30, 2018 16:30 UTC (Wed)
by phh (guest, #112196)
[Link] (7 responses)
On the list of supported FS by FUSE, technically there is lklfuse which makes it possible to mount any FS supported by Linux
Posted May 31, 2018 7:55 UTC (Thu)
by dgm (subscriber, #49227)
[Link] (5 responses)
Posted May 31, 2018 10:03 UTC (Thu)
by k3ninho (subscriber, #50375)
[Link] (3 responses)
K3n.
Posted Jun 7, 2018 7:08 UTC (Thu)
by Wol (subscriber, #4433)
[Link] (2 responses)
Pr1mos was a Multics derivative, and I've always felt that in MANY ways it was better than Unix. Unix (in the form of BSD) just happened to be free, and gained traction, and well we all know that "the good enough is the enemy of the best".
Cheers,
Posted Jun 8, 2018 15:03 UTC (Fri)
by nix (subscriber, #2304)
[Link]
Posted Aug 13, 2018 3:52 UTC (Mon)
by fest3er (guest, #60379)
[Link]
Brings to mind Brian Wilson's quip: "Beware the lollipop of mediocrity; lick it once and you'll suck forever."
Posted Jun 1, 2018 12:51 UTC (Fri)
by smurf (subscriber, #17840)
[Link]
Posted May 30, 2018 23:16 UTC (Wed)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted May 31, 2018 1:50 UTC (Thu)
by ncm (guest, #165)
[Link]
Sshd (with password and challenge-response authentication turned off) might be in the set. Anything not specifically designed to be in Z can safely be assumed not to be.
Posted Jun 4, 2018 14:55 UTC (Mon)
by david.a.wheeler (subscriber, #72896)
[Link] (1 responses)
FUSE, as far as I know, doesn't support all the options and features you'd want, and it's always playing catch-up. You can't run a different kernel in a container, either.
The only "easy safe way" I see to access a disk image you don't trust is to run a VM to access the drive and has no other access (in particular, no external network). Then "share" it over a simulated network that only has internal access. This does trust that the VMM is adequately protected, but that has a chance. Then you can run the *native* kernel code to read it. If the system gets broken into, you only get the VM & what it can see.
That's a pretty heavyweight approach. Is there a better one?
Posted Jun 5, 2018 12:49 UTC (Tue)
by robbe (guest, #16131)
[Link]
As to FUSE always playing catch-up, why not flip that around for filesystems like FAT, which are mounted untrusted in the *majority* of uses (e.g. the mentioned automount-my-usb-stick)? The in-kernel FAT implementation would be relegated to legacy status, while distributors made sure that the automount would set up a userspace equivalent (FUSE, or gvfs, or whatever).
That wouldn’t work out for the container case, though.
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
https://www.cvedetails.com/cve/CVE-2015-7110/
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
lklfuse
I like the FUSE-only approach, because it makes the surface of attack fairly small. Then Ts'o suggestion is basically to replace FUSE with 9P. Yeah sure, whatever works I guess.
lklfuse
lklfuse
lklfuse
Wol
lklfuse
lklfuse
lklfuse
Unprivileged filesystem mounts, 2018 edition
Unprivileged filesystem mounts, 2018 edition
Safely mounting random images: Use a VM?
Safely mounting random images: Use a VM?
https://lwn.net/Articles/755669/
