Another Linux capabilities hole found

By Jake Edge
April 15, 2009

A recent patch posted to the linux-kernel mailing list fixes a long-standing flaw in the Linux capabilities implementation. The problem has existed since capabilities were added to the kernel during the 2.1 development series—more than ten years ago. One of the obvious questions is how a bug of that sort could have escaped notice for so long.

The problem was reported in March by Igor Zhbanov, who provided an excellent analysis of the flaw and how it can be exploited. The basic problem lives in the VFS and NFS code which tries to drop privileges, by way of capabilities, before performing operations. The mask of capabilities bits that was used for that purpose does not include CAP_MKNOD (the ability to make a device node entry) or CAP_LINUX_IMMUTABLE (which allows changing the S_APPEND and S_IMMUTABLE file attributes). That means that those capabilities bits are not removed before the file operation is performed.

Zhabanov shows that on a compromised client machine, the root user could give another user CAP_MKNOD, which would allow that user to run the mknod command and create a device entry owned by them. If this was done on an NFS-mounted filesystem, that entry would be created on the server still owned by the user. This works even if the root_squash option—essentially mapping root users on client machines to "nobody" on the server machine—was used on the export.

If the user on the compromised machine can execute code on the server or any other client, they can directly access the device that underlies the device node entry. They will not require any special permissions on the other machines because the device node is owned by them. For example, creating the equivalent of /dev/hda on the server's filesystem might allow direct access to the hard disk block device on any system that had the NFS filesystem mounted. Uglier exploits can certainly be imagined.

This is clearly a nasty problem. Linus Torvalds merged the fix for the recently released 2.6.30-rc2 kernel. One would guess the -stable tree folks won't be too far behind. Serge Hallyn also provided patches for 2.4 and 2.2 kernels, though the latter has become completely unsupported.

The patch was greeted with a question from Valdis Kletnieks: "Wow. How did this manage to stay un-noticed for this long?" Torvalds had a characteristically blunt answer: "Because nobody uses capabilities?" While that might explain how the bug went undetected for so long, it doesn't help alleviate the problem. Whether folks are using capabilities or not is irrelevant, the kernel itself certainly is.

This is not the first time capabilities have been the source of a nasty, exploitable hole. The unfortunately-named "sendmail-capabilities bug" provided a way to gain root privileges by exploiting the way sendmail dropped its privileges. The solution, when this bug was found in 2000, was to "cripple" capabilities in the kernel by disabling capability inheritance. That functionality was not restored until relatively recently.

If distributions and other users were doing more with capabilities, it does seem likely that this particular problem would have been seen sometime in the last decade. But, by and large, Torvalds is right. For one thing, capabilities are a Linux-specific feature, so anyone writing portable code is likely to avoid using them. In addition, they are fairly difficult to wrap your head around; that complexity tends to lead folks to ignore capabilities.

There have been some efforts at using capabilities in distributions more, but one has to wonder how many more exploits still lurk in that code. It is hard to imagine removing capabilities at this late date—it is a user-space interface from the kernel after all—but some must be wondering if the feature is worth all the trouble it has caused.

Index entries for this article
Kernel	Security/Vulnerabilities
Security	Linux kernel/Linux/POSIX capabilities
Security	Vulnerabilities/Privilege escalation

Another Linux capabilities hole found

Posted Apr 16, 2009 8:48 UTC (Thu) by mjthayer (guest, #39183) [Link] (7 responses)

I wonder why we still need network filesystems in the kernel? The traditional reason was to allow for using nfs as a root partition, but surely with FUSE and an initramfs (and perhaps a bit of work to make them usable in this context) this is no longer such an important reason? And given the inherent latency of a network filesystem, the extra performance can't be a valid reason either.

Disclaimer: I am speaking as a naive outsider, not a clueful kernel dev.

Another Linux capabilities hole found

Posted Apr 16, 2009 11:55 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

FUSE is not ready to replace native filesystems. It's far slower than real in-kernel FS, less powerful and can have some consistency problems.

Another Linux capabilities hole found

Posted Apr 16, 2009 14:45 UTC (Thu) by bfields (subscriber, #19510) [Link]

Also, note the problem here was on the *server* side, not the client. And the question of why the server is in the kernel is also interesting, but irrelevant in this case since the userspace server was equally affected by this bug--the userspace server uses setfsuid(), which uses the same mask bits as the in-kernel nfsd is using.

Another Linux capabilities hole found

Posted Apr 16, 2009 14:48 UTC (Thu) by ajb (subscriber, #9694) [Link]

There are other non-kernel nfs implementations. vesta (vestasys.org) has one which they claim is faster than the in-kernel implementation. There is also http://nfs-ganesha.sourceforge.net/

Another Linux capabilities hole found

Posted Apr 17, 2009 7:44 UTC (Fri) by mjthayer (guest, #39183) [Link] (3 responses)

Are these fundamental issues though, or things that can be fixed? Bearing in mind, as I said, that a network fs will be slower than a local one anyway (until networks reach the speed of a local bus that is...) And do you think that it would still be good to keep network filesystems in the kernel if it were possible to put them in user space?

Another Linux capabilities hole found

Posted Apr 18, 2009 17:05 UTC (Sat) by i3839 (guest, #31386) [Link] (2 responses)

Local network latency is very low, much lower than hard disk latencies. Throughput is about 100 MB/s, as fast as fast hard disks. The only reason it's slower is because there's a slow local fs at the other end.

Fundamental issue is that programs use system calls to communicate with the outside world, and most of those system calls deal (sometimes indirectly) with files. For a network filesystem client going through the kernel, then to userspace and back again is just a stupid way of doing something relatively simple.

To sum up, network filesystem clients are in-kernel for all the same reasons why normal filesystems are in-kernel. For network fs servers it's a slightly different trade-off.

Ever further off topic :)

Posted Apr 22, 2009 9:20 UTC (Wed) by mjthayer (guest, #39183) [Link] (1 responses)

Does Linux actually cache file data, or only block data? I would have thought that user space filesystem latency could be reduced quite a bit by clever caching - the kernel caches data from file reads (with a bit of read-ahead), and writes to files, and sends them to the user space filesystem as a package. If the filesystem has an underlying block device, this could be done shortly before the block device cache is due to be flushed.

Ever further off topic :)

Posted Apr 22, 2009 22:57 UTC (Wed) by nix (subscriber, #2304) [Link]

Of course Linux caches file data: in fact it won't work without it.
Absolutely everything that gets put in a page in memory (all file data,
anonymous mmaped pages, you name it) has to pass through the page cache
first. Executables *run* from the page cache: their text pages reside
nowhere else.

There *is* a cache of disk blocks (the buffer cache), but these days it's
used pretty much entirely for metadata (as this doesn't necessarily have a
page in memory devoted to it).