Another Linux capabilities hole found
A recent patch posted to the linux-kernel mailing list fixes a long-standing flaw in the Linux capabilities implementation. The problem has existed since capabilities were added to the kernel during the 2.1 development series—more than ten years ago. One of the obvious questions is how a bug of that sort could have escaped notice for so long.
The problem was reported in March by Igor Zhbanov, who provided an excellent analysis of the flaw and how it can be exploited. The basic problem lives in the VFS and NFS code which tries to drop privileges, by way of capabilities, before performing operations. The mask of capabilities bits that was used for that purpose does not include CAP_MKNOD (the ability to make a device node entry) or CAP_LINUX_IMMUTABLE (which allows changing the S_APPEND and S_IMMUTABLE file attributes). That means that those capabilities bits are not removed before the file operation is performed.
Zhabanov shows that on a compromised client machine, the root user could give another user CAP_MKNOD, which would allow that user to run the mknod command and create a device entry owned by them. If this was done on an NFS-mounted filesystem, that entry would be created on the server still owned by the user. This works even if the root_squash option—essentially mapping root users on client machines to "nobody" on the server machine—was used on the export.
If the user on the compromised machine can execute code on the server or any other client, they can directly access the device that underlies the device node entry. They will not require any special permissions on the other machines because the device node is owned by them. For example, creating the equivalent of /dev/hda on the server's filesystem might allow direct access to the hard disk block device on any system that had the NFS filesystem mounted. Uglier exploits can certainly be imagined.
This is clearly a nasty problem. Linus Torvalds merged the fix for the recently released 2.6.30-rc2 kernel. One would guess the -stable tree folks won't be too far behind. Serge Hallyn also provided patches for 2.4 and 2.2 kernels, though the latter has become completely unsupported.
The patch was greeted with a question from
Valdis Kletnieks: "Wow. How did this manage to stay un-noticed for
this long?
" Torvalds had a characteristically blunt answer: "Because nobody uses
capabilities?
" While that might explain how the bug went undetected
for so long, it doesn't help alleviate the problem. Whether folks are using
capabilities or not is irrelevant, the kernel itself certainly is.
This is not the first time capabilities have been the source of a nasty, exploitable hole. The unfortunately-named "sendmail-capabilities bug" provided a way to gain root privileges by exploiting the way sendmail dropped its privileges. The solution, when this bug was found in 2000, was to "cripple" capabilities in the kernel by disabling capability inheritance. That functionality was not restored until relatively recently.
If distributions and other users were doing more with capabilities, it does seem likely that this particular problem would have been seen sometime in the last decade. But, by and large, Torvalds is right. For one thing, capabilities are a Linux-specific feature, so anyone writing portable code is likely to avoid using them. In addition, they are fairly difficult to wrap your head around; that complexity tends to lead folks to ignore capabilities.
There have been some efforts at using capabilities in distributions more, but one has to wonder how many more exploits still lurk in that code. It is hard to imagine removing capabilities at this late date—it is a user-space interface from the kernel after all—but some must be wondering if the feature is worth all the trouble it has caused.
| Index entries for this article | |
|---|---|
| Kernel | Security/Vulnerabilities |
| Security | Linux kernel/Linux/POSIX capabilities |
| Security | Vulnerabilities/Privilege escalation |
Posted Apr 16, 2009 8:48 UTC (Thu)
by mjthayer (guest, #39183)
[Link] (7 responses)
Disclaimer: I am speaking as a naive outsider, not a clueful kernel dev.
Posted Apr 16, 2009 11:55 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
Posted Apr 16, 2009 14:45 UTC (Thu)
by bfields (subscriber, #19510)
[Link]
Posted Apr 16, 2009 14:48 UTC (Thu)
by ajb (subscriber, #9694)
[Link]
Posted Apr 17, 2009 7:44 UTC (Fri)
by mjthayer (guest, #39183)
[Link] (3 responses)
Posted Apr 18, 2009 17:05 UTC (Sat)
by i3839 (guest, #31386)
[Link] (2 responses)
Fundamental issue is that programs use system calls to communicate with the outside world, and most of those system calls deal (sometimes indirectly) with files. For a network filesystem client going through the kernel, then to userspace and back again is just a stupid way of doing something relatively simple.
To sum up, network filesystem clients are in-kernel for all the same reasons why normal filesystems are in-kernel. For network fs servers it's a slightly different trade-off.
Posted Apr 22, 2009 9:20 UTC (Wed)
by mjthayer (guest, #39183)
[Link] (1 responses)
Posted Apr 22, 2009 22:57 UTC (Wed)
by nix (subscriber, #2304)
[Link]
There *is* a cache of disk blocks (the buffer cache), but these days it's
Another Linux capabilities hole found
Another Linux capabilities hole found
Also, note the problem here was on the *server* side, not the client. And the question of why the server is in the kernel is also interesting, but irrelevant in this case since the userspace server was equally affected by this bug--the userspace server uses setfsuid(), which uses the same mask bits as the in-kernel nfsd is using.
Another Linux capabilities hole found
Another Linux capabilities hole found
Another Linux capabilities hole found
Another Linux capabilities hole found
Ever further off topic :)
Ever further off topic :)
Absolutely everything that gets put in a page in memory (all file data,
anonymous mmaped pages, you name it) has to pass through the page cache
first. Executables *run* from the page cache: their text pages reside
nowhere else.
used pretty much entirely for metadata (as this doesn't necessarily have a
page in memory devoted to it).
