The trouble with CAP_SYS_RAWIO
A February linux-kernel mailing list discussion of a patch that extends the use of the CAP_COMPROMISE_KERNEL capability soon evolved into a discussion of the specific uses (or abuses) of the CAP_SYS_RAWIO capability within the kernel. However, in reality, the discussion once again exposes some general difficulties in the Linux capabilities implementation—difficulties that seem to have no easy solution.
The discussion began when Kees Cook submitted a patch to guard writes to model-specific registers (MSRs) with a check to see if the caller has the CAP_COMPROMISE_KERNEL capability. MSRs are x86-specific control registers that are used for tasks such as debugging, tracing, and performance monitoring; those registers are accessible via the /dev/cpu/CPUNUM/msr interface. CAP_COMPROMISE_KERNEL (formerly known as CAP_SECURE_FIRMWARE) is a new capability designed for use in conjunction with UEFI secure boot, which is a mechanism to ensure that the kernel is booted from an on-disk representation that has not been modified.
If a process has the CAP_COMPROMISE_KERNEL capability, it can perform operations that are not allowed in a secure-boot environment; without that capability, such operations are denied. The idea is that if the kernel detects that it has been booted via the UEFI secure-boot mechanism, then this capability is disabled for all processes. In turn, the lack of that capability is intended to prevent operations that can modify the running kernel. CAP_COMPROMISE_KERNEL is not yet part of the mainline kernel, but already exists as a patch in the Fedora distribution and Matthew Garrett is working towards its inclusion in the mainline kernel.
H. Peter Anvin wondered whether
CAP_SYS_RAWIO did not already suffice for Kees's purpose. In
response, Kees argued that
CAP_SYS_RAWIO is for governing reads: "writing needs a much
stronger check
". Kees went on to
elaborate:
This in turn led to a short discussion about whether a capability was the right way to achieve the goal of restricting certain operations in a secure-boot environment. Kees was inclined to think it probably was the right approach, but deferred to Matthew Garrett, implementer of much of the secure-boot work on Fedora. Matthew thought that a capability approach seemed the best fit, but noted:
In the current mainline kernel, the CAP_SYS_RAWIO capability is checked in the msr_open() function: if the caller has that capability, then it can open the MSR device and perform reads and writes on it. The purpose of Kees's patch is to add a CAP_COMPROMISE_KERNEL check on each write to the device, so that in a secure-boot environment the MSR devices are readable, but not writeable. The problem that Matthew alludes to is that this approach has the potential to break user space because, formerly, there was no capability check on MSR writes. An application that worked prior to the introduction of CAP_COMPROMISE_KERNEL can now fail in the following scenario:
- The application has a full set of privileges.
- The application opens an MSR device (requires CAP_SYS_RAWIO).
- The application drops all privileges, including CAP_SYS_RAWIO and CAP_COMPROMISE_KERNEL.
- The application performs a write on the previously opened MSR device (requires CAP_COMPROMISE_KERNEL).
The last of the above steps would formerly have succeeded, but, with the addition of the CAP_COMPROMISE_KERNEL check, it now fails. In a subsequent reply, Matthew noted that QEMU was one program that was broken by a scenario similar to the above. Josh Boyer noted that Fedora has had a few reports of applications breaking on non-secure-boot systems because of scenarios like this. He highlighted why such breakages are so surprising to users and why the problem is seemingly unavoidable:
Really though, the main issue is that you cannot introduce new caps to enforce finer grained access without breaking something.
Shortly afterward, Peter stepped back to ask a question about the bigger picture: why should CAP_SYS_RAWIO be allowed on a secure-boot system? In other words, rather than adding a new CAP_COMPROMISE_KERNEL capability that is disabled in secure-boot environments, why not just disable CAP_SYS_RAWIO in such environments, since it is the possession of that capability that permits compromising a booted kernel?
That led Matthew to point out a major problem with CAP_SYS_RAWIO:
To see what Matthew is talking about, we need to look at a little history. Back in January 1999, when capabilities first appeared with the release of Linux 2.2, CAP_SYS_RAWIO was a single-purpose capability. It was used in just a single C file in the kernel source, where it governed access to two system calls: iopl() and ioperm(). Those system calls permit access to I/O ports, allowing uncontrolled access to devices (and providing various ways to modify the state of the running kernel); hence the requirement for a capability in order to employ the calls.
The problem was that CAP_SYS_RAWIO rapidly grew to cover a range of other uses. By the time of Linux 2.4.0, there were 37 uses across 24 of the kernel's C source files, and looking at the 3.9-rc2 kernel, there are 69 uses in 43 source files. By either measure, CAP_SYS_RAWIO is now the third most commonly used capability inside the kernel source (after CAP_SYS_ADMIN and CAP_NET_ADMIN).
CAP_SYS_RAWIO seems to have encountered a fate similar to CAP_SYS_ADMIN, albeit on a smaller scale. It has expanded well beyond its original narrow use. In particular, Matthew noted:
Peter had some choice words to describe the abuse of CAP_SYS_RAWIO to protect operations on SCSI devices. The problem, of course, is that in order to perform relatively harmless SCSI operations, an application requires the same capability that can trivially be used to damage the integrity of a secure-boot system. And that, as Matthew went on to point out, is the point of CAP_COMPROMISE_KERNEL: to disable the truly dangerous operations (such as MSR writes) that CAP_SYS_RAWIO permits, while still allowing the less dangerous operations (such as the SCSI device operations).
All of this leads to a conundrum that was nicely summarized by Matthew. On the one
hand, CAP_COMPROMISE_KERNEL is needed to address the problem that
CAP_SYS_RAWIO has become too diffuse in its meaning. On the other
hand, the addition of CAP_COMPROMISE_KERNEL checks in places where
there were previously no capability checks in the kernel means that
applications that drop all capabilities will break. There is no easy way
out of this difficulty. As Peter noted:
"We thus have a bunch of unpalatable choices, **all of which are
wrong**
".
Some possible resolutions of the conundrum were mentioned by Josh Boyer earlier in the thread: CAP_COMPROMISE_KERNEL could be treated as a "hidden" capability whose state could be modified only internally by the kernel. Alternatively, CAP_COMPROMISE_KERNEL might be specially treated, so that it can be dropped only by a capset() call that operates on that capability alone; in other words, if a capset() call specified dropping multiple capabilities, including CAP_COMPROMISE_KERNEL, the state of the other capabilities would be changed, but not the state of CAP_COMPROMISE_KERNEL. The problem with these approaches is that they special-case the treatment of CAP_COMPROMISE_KERNEL in a surprising way (and surprises in security-related APIs have a way of coming back to bite in the future). Furthermore, it may well be the case that analogous problems are encountered in the future with other capabilities; handling each of these as a special case would further add to the complexity of the capabilities API.
The discussion in the thread touched on a number of other difficulties with capabilities. Part of the solution to the problem of the overly broad effect of CAP_SYS_RAWIO (and CAP_SYS_ADMIN) might be to split the capability into smaller pieces—replace one capability with several new capabilities that each govern a subset of the operations governed by the old capability. Each privileged operation in the kernel would then check to see whether the caller had either the old or the new privilege. This would allow old binaries to continue to work while allowing new binaries to employ the new, tighter capability. The risk with this approach is, as Casey Schaufler noted, the possibility of an explosion in the number of capabilities, which would further complicate administering capabilities for applications. Furthermore, splitting capabilities in this manner doesn't solve the particular problem that the CAP_COMPROMISE_KERNEL patches attempt to solve for CAP_SYS_RAWIO.
Another general problem touched on by Casey is that capabilities still have not seen wide adoption as a replacement for set-user-ID and set-group-ID programs. But, as Peter noted, that may well be
With 502 uses in the 3.9-rc2 kernel, CAP_SYS_ADMIN is the most egregious example of this problem. That problem itself would appear to spring from the Linux kernel development model: the decisions about which capabilities should govern new kernel features typically are made by individual developer in a largely decentralized and uncoordinated manner. Without having a coordinated big picture, many developers have adopted the seemingly safe choice, CAP_SYS_ADMIN. A related problem is that it turns out that a number of capabilities allow escalation to full root privileges in certain circumstances. To some degree, this is probably unavoidable, and it doesn't diminish the fact that a well-designed capabilities scheme can be used to reduce the attack surface of applications.
One approach that might help solve the problem of overly broad capabilities is hierarchical capabilities. The idea, mentioned by Peter, is to split some capabilities in a fashion similar to the way that the root privilege was split into capabilities. Thus, for instance, CAP_SYS_RAWIO could become a hierarchical capability with sub-capabilities called (say) CAP_DANGEROUS and CAP_MOSTLY_HARMLESS. A process that gained or lost CAP_SYS_RAWIO would implicitly gain or lose both CAP_DANGEROUS and CAP_MOSTLY_HARMLESS, in the same way that transitions to and from an effective user ID of 0 grant and drop all capabilities. In addition, sub-capabilities could be raised and dropped independently of their "siblings" at the same hierarchical level. However, sub-capabilities are not a concept that currently exists in the kernel, and it's not clear whether the existing capabilities API could be tweaked in such a way that they could be implemented sanely. Digging deeper into that topic remains an open challenge.
The CAP_SYS_RAWIO discussion touched on a long list of
difficulties in the current Linux capabilities implementation: capabilities
whose range is too broad, the difficulties of splitting capabilities while
maintaining binary compatibility (and, conversely, the administrative
difficulties associated with defining too large a set of capabilities), the
as-yet poor adoption of binaries with file capabilities vis-a-vis
traditional set-user-ID binaries, and the (possible) need for an API for
hierarchical capabilities. It would seem that capabilities still have a way
to go before they can deliver on the promise of providing a manageable
mechanism for providing discrete, non-elevatable privileges to
applications.
| Index entries for this article | |
|---|---|
| Kernel | Capabilities |
| Security | Capabilities |
| Security | Secure boot |
(Log in to post comments)
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 17:21 UTC (Wed) by cesarb (subscriber, #6266) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 17:47 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 19:49 UTC (Wed) by cesarb (subscriber, #6266) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 19:59 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 20:11 UTC (Wed) by smurf (subscriber, #17840) [Link]
if (reading && !CAP_SYS_RAWIO) return -EPERM;
if (writing && !CAP_SYS_COMPROMISE_KERNEL) return -EPERM;
when the device is opened, and anything subsequent checking the file's flags.
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 20:20 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 21:25 UTC (Wed) by khim (subscriber, #9252) [Link]
Application which drops CAP_SYS_COMPROMISE_KERNEL will work just fine because both checks happen in open(2) syscall. It'll break application which opens file for reading and writing but then only issues read commands. This can be fixed by changing logic: if read/write open(2) request is attempted without CAP_SYS_COMPROMISE_KERNEL then it's silently translated to read-only open(2) request. Of course application which will try to write to said file will see EBADF which may crash it, but I'm not sure what can save such an application.
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 21:38 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 22:40 UTC (Wed) by smurf (subscriber, #17840) [Link]
It does not fix the "drop privileges and then open the device" case. I know that. But if userspace runs afoul of this problem, the fix is a simple re-ordering of two lines of code. Or retaining an additional capability.
Is there any real-world program that would run aground at this change, or is this an academic exercise?
Personally I never regarded CAP_SYS_RAWIO (or _ADMIN) as written in stone. It's a sufficiently broad catch-all category that a reasonable programmer should expect that requiring a different, or additional, capability for a task that used to work with only this one right might be in the cards.
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 22:48 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 4:50 UTC (Thu) by heijo (guest, #88363) [Link]
Redefining those to no longer being able to do so is idiotic and breaks compatibility.
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 5:04 UTC (Thu) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 19:52 UTC (Thu) by WolfWings (subscriber, #56790) [Link]
Right now you don't appear to be able to drop-all-caps then open /dev/msr, you need to open it first then drop privs as there already is a check to block so much as reads unless you have the RAWIO cap.
So how does the "read = RAWIO, write = RAWIO && COMPROMISE" check on open() instead of on write() break userspace? Programs would be refused access to /dev/msr and complain about it, same as before, and their existing 'Check your caps!' error messages would still apply.
There's a difference between 'breaking' userspace in a way that existing apps error messages don't apply and may not even have error-handling paths for the new issues, and simply enforcing stronger checks in a way compatible with existing error handling.
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 20:03 UTC (Thu) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 17, 2013 15:46 UTC (Sun) by mrjk (subscriber, #48482) [Link]
Can you give an example that would now break -- that wouldn't have broken already?
The trouble with CAP_SYS_RAWIO
Posted Mar 17, 2013 17:02 UTC (Sun) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 21:36 UTC (Wed) by kugel (subscriber, #70540) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 21:49 UTC (Wed) by mjg59 (subscriber, #23239) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 21:39 UTC (Wed) by ebiederm (guest, #35028) [Link]
bool capable(int cap)
{
return ns_capable(&init_user_ns, cap);
}
to:
bool capable(int cap)
{
if (we_dont_trust_root)
return false;
return ns_capable(&init_user_ns, cap);
}
Which is equivalent to running userspace outside the initial user namespace, and trivially gives you and environment that has been audited to work for an untrusted root.
Just a few more things won't work that way but I would not mind a little help flushing out the things that we can trust less than fully privileged users with doing.
As for msrs. Make no mistake someone will eventually implement rdmsr(HALT_AND_CATCH_FIRE). So I can't believe even reading msrs is safe.
The trouble with CAP_SYS_RAWIO
Posted Mar 13, 2013 22:10 UTC (Wed) by spender (guest, #23067) [Link]
http://stealth.openwall.net/xSports/clown-newuser.c
-Brad
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 3:12 UTC (Thu) by shlevy (guest, #87221) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 6:51 UTC (Thu) by kees (subscriber, #27264) [Link]
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linu...
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 11:33 UTC (Thu) by spender (guest, #23067) [Link]
You should also know that the existing kernel exploit payloads for granting root privilege also break out of user namespaces without modification.
So the end result is opening up more attack surface to the most vulnerable part of the system, and soon on all distros you will have no choice but to be exposed to it. It's just broken security design.
-Brad
The trouble with CAP_SYS_RAWIO
Posted Mar 19, 2013 1:57 UTC (Tue) by wahern (subscriber, #37304) [Link]
Unfortunately, "less code, simpler code" is not one of the competing security paradigms in Linux Land.
*BSD "securelevel"
Posted Mar 14, 2013 10:03 UTC (Thu) by ewen (subscriber, #4772) [Link]
Reading through this article reminded me of *BSDs "securelevel", which is a one-way ratchet change (ie having changed it, you can't change it back to a less secure level except by rebooting). It controls various "compromise the kernel" like things. The exact set of things it controls is probably not an ideal match, but the idea of a sysctl value which can only ever be changed to "be at least as restrictive of insecure things you can do as now" seems like a fairly good fit. And it would be completely orthogonal to the Linux capabilities, which seems helpful. (As well as being "system wide" which seems desirable in this case -- if you've booted via secure UEFI you probably don't want to end up in a situation where some processes can compromise the kernel and others cannot....)
Ewen
*BSD "securelevel"
Posted Mar 14, 2013 11:40 UTC (Thu) by spender (guest, #23067) [Link]
-Brad
*BSD "securelevel"
Posted Mar 14, 2013 16:39 UTC (Thu) by ThinkRob (guest, #64513) [Link]
I know the full answer to this is probably not terribly succinct, but I'm just curious to hear what you think, since kernel security is obviously something you're quite passionate about. :)
*BSD "securelevel"
Posted Mar 20, 2013 9:13 UTC (Wed) by renox (subscriber, #23785) [Link]
AFAIK it's only useful iff programs are ported to use it..
*BSD "securelevel"
Posted Mar 14, 2013 18:35 UTC (Thu) by ewen (subscriber, #4772) [Link]
It sounds like the "compromise the kernel" flag is also aimed to cut off a more encompassing set of things, so hopefully it'd be less easy to do an end-run around the protection. (And there'd be more incentive to add "you can't do that either" into the set of things turned off as other ways to manipulate it are discovered: Firewire device DMA being one that comes to mind.)
Ewen
*BSD "securelevel"
Posted Mar 19, 2013 22:33 UTC (Tue) by wahern (subscriber, #37304) [Link]
For example, the immutable files protection can be bypassed by mounting over the directory. It doesn't allow you to change the original file, but allows you to fool other applications at runtime and is thus of little use for, e.g., preventing root kit installation once you've already attained root.
AFAIK nobody has bothered to fix it on systems where it was an issue (NetBSD was immune to this particular attack). The fundamental issue is that even this course-grained capabilities system gives a false of security. Invariably someone will forget about some corner case, or some new feature is added which allows circumvention of the whole pile of policies.
Fine-grained capabilities systems (both system-level and process-level) are just too brittle, including the policies, the mechanisms, and the actual implementations.
Unix systems have only just recently reached a decent level of correctness and reliability with basic file permissions. Anybody who relies on more sophisticated schemes (or allows them in their kernel) is just begging to be rooted.
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 12:35 UTC (Thu) by paulj (subscriber, #341) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 12:55 UTC (Thu) by paulj (subscriber, #341) [Link]
I.e. the capability calls need a version flag, perhaps?
The trouble with CAP_SYS_RAWIO
Posted Mar 14, 2013 15:30 UTC (Thu) by paulj (subscriber, #341) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 22, 2013 4:36 UTC (Fri) by kevinm (guest, #69913) [Link]
It still sounds to me like the simple solution is "remove CAP_SYS_RAWIO from the initial capability set on secure-booted kernels". So you'll lose the ability to perform some iffy SCSI commands - well, you signed up for some bondage and discipline when you asked for a locked-down, secure-booted kernel, didn't you?
The trouble with CAP_SYS_RAWIO
Posted Mar 22, 2013 7:12 UTC (Fri) by dlang (guest, #313) [Link]
Or in enterprise settings, commands to be able to manipulate tape changers
The trouble with CAP_SYS_RAWIO
Posted Mar 31, 2013 7:12 UTC (Sun) by Duncan (guest, #6647) [Link]
Actually, that's rather less of a problem now, and trending less so, than it was a few years ago when /the/ major form of removable media was optical, CD/DVD. Now days, the sub-GB size of a CD looks positively diminutive, and even the near-5-GB size of a standard DVD looks small, compared to the ubiquitous USB thumbdrive of say 8+GB. A Bluray's 25 gigs is a bit better priced media-only (US$6 individual, just over $1/ea in 25-packs, pricewatch.com), but while the stick's a bit more expensive (USB flash: 32G=US$15, 16G=$10) as it ships with its own housing and read/write hardware, it's also CONSIDERABLY less fragile and generally more easily handled, AND direct-block-device read-writable (well, as seen by the OS...).
Additionally, with current inet and smartphone penetrations, people that a few years ago might have used dedicated removable media (either USB sticks or CD/DVD) these days more often either use the inet directly (streaming what might have been on CD a few years ago, or pastebinning it to a friend if not attaching it to an email), or if they do play local media, say in the car, it's from a jacked-in phone more often than a CD.
Unfortunately when I upgraded machines last year I didn't think of that, and bought a blu-ray burner for it. I really haven't used it... Fortunately, it's a USB-based one and wasn't /that/ expensive, so it's usable on my netbook as well should I decide to and not taking any power when it's unplugged (see kernel 3.9's new ZPODD, only I've had that in the form of an unplugged USB-based bluray for a few months now), and being USB, as long as I don't mistreat it it should stay usable for years, so I suppose I'll get some use out of it, over time. But I'd have been better off simply not buying it at all.
So it's considerably easier to do without optical burning than it was even just a few years ago, to the point where many people would miss it about as much as they do their 1.44 MB floppy...
The trouble with CAP_SYS_RAWIO
Posted Mar 22, 2013 23:32 UTC (Fri) by clemenstimpler (guest, #71914) [Link]
The trouble with CAP_SYS_RAWIO
Posted Mar 25, 2013 13:16 UTC (Mon) by mkerrisk (subscriber, #1978) [Link]
Reviewing the article, it's a smart editorial suggestion and I agree with you; and it would have made for a punchier start. Next time...
