LSS: The kernel hardening roundtable

By Jake Edge
September 15, 2011

Hardening the kernel to make attackers' jobs harder was the topic of a wide-ranging discussion at the Linux Security Summit (LSS) held on September 8, 2011. Reducing the attack surface of the kernel, protecting it from user-space attacks, and finding ways to mitigate entire classes of exploitable bugs were all on the table. As might be expected, the biggest barrier to getting these hardening patches accepted into the mainline is often performance concerns. While no firm conclusions were drawn, many ideas were discussed, some of which may eventually find their way into the mainline.

Attack surface

The discussion began with an effort to quantify the "exposed surface" of the kernel as roundtable leader Will Drewry of Google's Chrome OS team put it. He and the other roundtable leader, Kees Cook of the Ubuntu security team, put together their own list, but also asked those present to add to it. Obvious attack surfaces like the system call interface, /proc and sysfs files, the networking stack, and device drivers were mentioned, but also less obvious things like filesystem parsing, auto-loaded kernel modules, device scanning, CPU or other hardware bugs, side-channel timing attacks, and so on.

Enumerating the attack surface "helps define what to pay attention to", Cook said. The intent of many of the kernel hardening patches is to try "to kill off a whole class of problems, rather than shooting individual bugs", he said. The latter is where most of the current kernel security effort goes, he said. Drewry added that the intent is to figure out what can be done now to reduce those attack surfaces. Many of the attack surfaces are still present even in a system that runs a mandatory access control (MAC) system like SELinux, Cook said, because the system call interface is still available to be used (and abused). That is one of the problems with looking to the LSM interface to provide confinement, he added.

Casey Schaufler also pointed out that there is often special-purpose hardware in Linux systems—in years past it was graphics hardware, but these days tends to be video hardware—that is allowed to be directly accessed from user space. That opens up a number of potential security problems, he said, but that won't stop it from happening. The capabilities provided by allowing direct access to these devices are "so compelling that security concerns are secondary".

But there are kernel installations that are more security-sensitive, Cook said, that could benefit from restricting some features even at the cost of performance. If a particular hardening feature has no real cost, it could be put into the kernel without providing a configuration option to disable it. Others, that do have a cost, could be optional and distributions or users could enable them based on their needs.

API/ABI restrictions

The "biggest single exposure" in Linux systems is applications that run as root, Schaufler said, like the X server. Because the kernel is one "gigantic privileged application" it can't be protected against other privileged applications like X, Cook said. But, applications could have the ABI available to them reduced, Drewry said, which would reduce the damage they could do if they are compromised.

The only existing "API management" tool in the kernel (besides the LSM interface) is seccomp, but it is too restrictive to be useful for many applications, Drewry said. Since seccomp only allows four system calls (read(), write(), exit(), and sigreturn()), it is too limited for many possible reduced-ABI applications. The Chrome/Chromium browser team would like to be able to reduce the system calls that its rendering processes can make. Seccomp is too limited for Chromium's needs, so they have implemented a more complicated solution, with a "trusted" assembly language thread that mediates system calls. System call restrictions could also be enforced using ptrace(), Drewry said, but there is an "intense amount of overhead".

What Drewry is looking for is some kind of expanded seccomp where a subset of system calls would be allowed. So far, his patches to implement that have been shot down from various directions, but there is hope that there may be some kind of resolution at the upcoming Kernel Summit.

Some of the attendees were skeptical of an expanded seccomp approach. Schaufler pointed out that there is already a mechanism in the kernel (capabilities) for reducing the impact of vulnerabilities, but "no one uses it". Cook was not convinced that the granularity of capabilities was really all that useful because the number of capability bits that are equivalent to root is so large.

As Drewry cast about for a way to limit system calls, there was discussion of possibly augmenting the LSM interface. As Cook pointed out, the current interface does not mediate all system calls, so it can't be used for Drewry's use case as it stands. James Morris noted that LSM is intended to be an access control framework and not anything more than that. In the end, Drewry doesn't particularly care how to get there, he is just looking for a way for "reducing what I expose to untrusted applications", he said.

Schaufler also pointed out that reducing the ABI available to an application doesn't help "if the ABI is completely well-defined and if it is consistent with the security policy" of the system. "That's a lot of 'if's", Drewry responded, to general agreement, that neither of the two conditions are met on Linux systems. Because the system call interface is not well-defined, nor necessarily consistent with the system security policy, reducing the exposure of parts of that interface can help. Schaufler cautioned that the ad hoc documentation makes it hard to decide where the bugs actually are: "If the code is the documentation, it is impossible to have a bug".

There were questions about whether seccomp filtering (in whatever form) would actually be used by applications. Cook noted that, in addition to Chromium, several other projects popped up on linux-kernel to express interest in the feature, including QEMU, vsftpd, and others. One attendee also hypothesized a DNS server that was limited to recvmsg(), sendmsg(), and write() (to a log file) as another possible use-case.

There were also concerns that seccomp filters would spread security policy throughout the system, but others saw that as a feature. Unlike MAC policy, which tends to be imposed from the outside, seccomp filter policy would embody "the programmer's idea of what it should be able to do", as Cook put it. While the system call granularity may not be exactly right, it is the place where user space enters the kernel, so mediating at that point makes some sense.

Attendees theorized that if a flexible seccomp filter facility was available, multiple applications would take advantage of it. Smalley was a bit skeptical that it would be straightforward for most applications to use the facility because it might require a major rework of the program. He pointed to the privilege separation efforts that went on in OpenSSH as an example. That required "significant refactoring", he said.

Drewry said that the Chromium team's plan is to move the browser to whatever solution becomes available to better contain the renderers. Right now, that is the "trusted thread" sandbox, but if there are other facilities available, Chromium will use them. That could be some kind of SELinux containment, seccomp filtering, or something else entirely. In the future, the team would also like to confine renderers based on where the data comes from, he said, so that all renderers running for a given site were protected from each other as well.

PaX and grsecurity

The roundtable wrapped up with some discussion of bringing more of the grsecurity and PaX hardening patches into the mainline. Those patches tend to be fairly intrusive and have performance implications that make them undesirable to many kernel hackers, but they do provide protections that some would find valuable. According to Cook, there are many pieces of grsecurity and PaX that could make their way into the mainline.

Simple things, like constifying function pointers, are essentially free and should be mainlined immediately: "It's a shame that hasn't been done long ago", one attendee said. Others that have more impact are trickier. Making them optional is one possibility, but even that has a cost that maintainers are likely to push back against. Adding another path through core kernel code can be a maintenance headache, and those may be difficult to get into the mainline.

Andre Hedrick mentioned that he has been pulling apart the grsecurity/PaX patches to try to make them more palatable. For one thing, grsecurity depends on a role-based access control (RBAC) mechanism that isn't present in the mainline (and isn't implemented as an LSM, so it isn't likely to ever be, at least in that form). Hedrick is trying to remove that dependency from the grsecurity features of interest, like better address-space layout randomization (ASLR) and a fully relocatable kernel, both of which can thwart various kinds of attacks.

One goal would be to find the grsecurity/PaX changes that have minimal impact and to get those into the mainline as non-optional protections. Turning RBAC into an LSM might be another useful exercise. grsecurity developer Brad Spengler provided a "long list" of features that could make their way into the kernel at last year's LSS, Cook said. That list would make a good starting point.

Cook also noted several other efforts aimed at hardening the kernel. Those include the work that Openwall hacker Vasiliy Kulikov has been doing, much of which is being discussed on the kernel-hardening mailing list. Also, the Ubuntu security team has been working on a kernel hardening project of its own. There is no lack of ideas out there, and a clear need to make the kernel more resistant to attacks. Based on the discussion, and the various ongoing efforts, we are likely to see more and more hardening patches aimed at the mainline over the next few years.

[ I'd like to thank LWN subscribers for supporting my travel to LSS. ]

Index entries for this article
Kernel	Security/Kernel hardening
Security	Hardening
Security	Linux kernel
Conference	Linux Security Summit/2011

LSS: The kernel hardening roundtable

Posted Sep 17, 2011 13:33 UTC (Sat) by solardiz (guest, #35993) [Link]

Jake, thank you for this informative and well-written article!

LSS: The kernel hardening roundtable

Posted Sep 18, 2011 17:59 UTC (Sun) by Julie (guest, #66693) [Link]

This is a great roundup. Thanks, Jake!

LSS: The kernel hardening roundtable

Posted Sep 20, 2011 8:41 UTC (Tue) by kragilkragil2 (guest, #76172) [Link] (1 responses)

Great article.
I was wondering the GRSecurity/PAX stuff has been around for ages. What are the reasons some of the good parts didn't end up in the kernel? Are the devs(on both sides) hard to work with? Did it take kernel.org and linux.com to go down to open maintainers eyes for security? Did everybody think complex stuff like SELinux would be the sufficient?

LSS: The kernel hardening roundtable

Posted Sep 21, 2011 11:35 UTC (Wed) by nix (subscriber, #2304) [Link]

Are the devs(on both sides) hard to work with?

Is the sun hot?

The kernel developers do not get on very well with pseudonymous developers who believe they already know everything and whose response to any criticisms or suggestions at all is imputations of malice. Actually, the latter is sufficient: see the kernel list's stellar record of cooperation with Joerg Schilling. (Or, for that matter, anyone at all's record of cooperation with Joerg Schilling.)

LSS: The kernel hardening roundtable

Posted Sep 22, 2011 7:23 UTC (Thu) by trasz (guest, #45786) [Link] (3 responses)

Might be worth mentioning that FreeBSD already provides an "extended seccomp"; it's called Capsicum. In a talk (http://www.youtube.com/watch?v=raNx9L4VH2k) there is a nice table comparing the number of lines of code that it took to properly sandobox Chromium using different mechanisms - with Linux and seccomp, it was 11300 lines of code and it was still incomplete; with FreeBSD and Capsicum, it was 100 lines.

LSS: The kernel hardening roundtable

Posted Sep 22, 2011 19:52 UTC (Thu) by Yorick (guest, #19241) [Link]

A capability-based model like Capsicum's would indeed be very nice to have for Linux, for many reasons:

It would give a much more useful environment than a stark read()/write()/_exit() isolation cell
It is based on sound reasoning that is easy to understand (principle of least authority, zero ambient authority)
It would force a healthy review of all the different namespaces in Linux, making us ask ourselves "is this really needed?", and useful ways of converting them into honest file descriptors
Properly done, it would practically give process containers for free
The Capsicum project itself has demonstrated feasibility and we roughly know what to expect from their experience, both in terms of implementation and use

Last time I looked, Capsicum hadn't really addressed resource limitations; this might be necessary in the long run, but is probably not stricly necessary for a first useful attempt.

LSS: The kernel hardening roundtable

Posted Oct 11, 2011 11:58 UTC (Tue) by Pawlerson (guest, #74136) [Link] (1 responses)

This looks nice as a propaganda which is typical for bsd fanboys. I'd like to know how many lines of code freebsd needs to implement SELinux? Entire Linux kernel?

LSS: The kernel hardening roundtable

Posted Oct 11, 2011 12:19 UTC (Tue) by trasz (guest, #45786) [Link]

Not sure why would anyone want to reimplement those, but regarding SELinux - FreeBSD already implements several Mandatory Access Control policies. Differently from Linux, they are stackable. This framework is also used by several commercial operating systems, including MacOS X.

Points of confusion

Posted Sep 29, 2011 16:42 UTC (Thu) by Ross (guest, #4065) [Link] (3 responses)

Thanks for the very interesting write up. However I'm confused by the conversation at several points and I'm not sure if I'm being dense, something was lost in the writeup, or if the participants were saying confusing things and sometimes talking past each other :)

1) "Because the kernel is one "gigantic privileged application" it can't be protected against other privileged applications like X, Cook said."

Umm... would this be any better if the kernel were not privileged? It seems like the problem is that X is a gigantic privileged application and/or that the kernel requires it to be privileged at all. Or is there a point I'm missing?

2) 'Some of the attendees were skeptical of an expanded seccomp approach. Schaufler pointed out that there is already a mechanism in the kernel (capabilities) for reducing the impact of vulnerabilities, but "no one uses it". Cook was not convinced that the granularity of capabilities was really all that useful because the number of capability bits that are equivalent to root is so large.'

Well yes, capabilities exist, but they don't really work. The reason Cook gave is true, but misses the much larger failure: they only remove capabilities that are normally exclusive to root. Hopefully people aren't making Chrome setuid root.

3) "If the code is the documentation, it is impossible to have a bug"

Documentation is good, and important if you want people to code to your intent and not the implementation. However the statement isn't completely fair. Things like stack-smashes, double-frees, and dereferencing of bad pointers would be widely recognized as bugs even for code that doesn't have documentation to say it doesn't crash your system :)

4) "In the future, the team would also like to confine renderers based on where the data comes from, he said, so that all renderers running for a given site were protected from each other as well."

Is there something preventing this now? There's something strange about the sentence: I would almost think that it should be s/a given site/different sites/, but I think that already happens. Wouldn't renderers for the same site would seem to have data coming from the same place at least as much as renderers for different sites would?

Points of confusion

Posted Sep 30, 2011 22:44 UTC (Fri) by Jan_Zerebecki (guest, #70319) [Link]

> 4) "In the future, the team would also like to confine renderers based on where the data comes from, he said, so that all renderers running for a given site were protected from each other as well."

That sentence would make sense if one displayed site embedded something from a different security domain ( e.g. example.com embeds from google.com while you are authenticated by a cookie with google.com ; a whole site by iframe or an picture by img src ). The data comes from different domains and is displayed in one site but still needs to be protected from each other.

Points of confusion

Posted Oct 1, 2011 20:09 UTC (Sat) by oak (guest, #2786) [Link] (1 responses)

> Well yes, capabilities exist, but they don't really work. The reason Cook gave is true, but misses the much larger failure: they only remove capabilities that are normally exclusive to root.

And even for root operations they seem to have too little granularity.

Ptrace capability is a good (worst?) example of this. You need it to read things like process maps & smaps files which many (resource usage measurement) tools need, but that capability allows also attaching, inspecting and changing other users process internals, not just inspect how many mappings they have and how much memory those mappings use. Also, instead of denying access to maps & smaps /proc files, lacking ptrace capability means that you get wrong (empty) content for those files...

Points of confusion

Posted Oct 11, 2011 12:22 UTC (Tue) by trasz (guest, #45786) [Link]

In other words, ptrace capability is an instant root.