Post-init read-only memory

By Jonathan Corbet
December 2, 2015

At the 2015 Kernel Summit, the assembled developers discussed the idea of incorporating more security-hardening patches into the kernel. As part of that effort, it was agreed that taking another look at the out-of-tree grsecurity patches made sense. The first fruit from this work would appear to be the post-init read-only memory patch set from Kees Cook. This work has been received well, but it also highlights some of the difficulties involved with hardening a general-purpose kernel.

The key to a successful exploit is often convincing the kernel to write to an unintended location. See, for example, this recent exploit, which uses a driver bug to overwrite a portion of the vDSO area; that, in turn, enables an attacker to run arbitrary code in kernel mode. One way to defend against such attacks is to minimize, to the greatest extent possible, the memory that the kernel is allowed to write to. A number of techniques, from simply marking data read-only to supervisor-mode access prevention, can be deployed toward that end. There is one class of data, identified by the grsecurity developers, that current techniques overlook, however.

When the kernel boots, it sets up a vast array of data structures describing the hardware it runs on and much more. In many cases, those data structures will never be changed again but, since they are resident in writable memory, they can still be changed by an errant write operation. The post-init read-only memory patch set, as posted by Kees, allows these data structures to be marked with a special __read_only annotation. That will cause them to be placed into a separate ELF section (".data..read_only"). Once the kernel has finished the initialization process, all data found in that section will be marked read-only, never to be changed again. At that point, exploits like the vDSO overwrite linked above will no longer work.

This change seems like an obvious win: unchanging data is marked read-only, blocking known exploits and, perhaps, minimizing the impact of simple bugs as well. As an added bonus, read-only data will be kept together, leading to better cache behavior. It would appear to be an obvious candidate for merging in the near future. That will probably come to pass, but, first, an important question has to be answered: what should happen when the hardware catches an attempt by the kernel to write (post initialization) memory that had been marked __read_only?

When things go wrong

This question matters because there is a potential hazard whenever a data structure is marked __read_only: the developer involved may have overlooked the one case where, after a rare sequence of events on days with a waxing gibbous moon, that data structure must be changed. Or there may be a case where data structures are modified unnecessarily, perhaps storing data that is already there anyway. Such cases work in current kernels, but would break if the data being written were made read-only. Mathias Krause described one such experience, wherein the system would fail during the resume sequence. As he noted: "Debugging that kind of problem is sort of a PITA, you could imagine."

The ideal solution would be to have the compiler catch attempts to modify __read_only data outside of the initialization sequence, but that is not currently possible. Simply marking the relevant data structures const will not work; those data structures are written to during boot and, as PaX Team pointed out, making them const opens the door to all kinds of surprising, optimization-related behavior from the compiler. Where compilers are involved, surprising behavior is rarely a good thing. As an alternative, Mathias suggested the use of a special-purpose GCC module to detect inappropriate writes. There seems to be agreement that this is a good idea, but no such module exists and it will take time to create one. Holding this patch set until a checker module can be created seems undesirable.

But without such a checker, there will almost certainly be situations where the kernel tries to write to something marked __read_only, either because it was so marked in error or as the result of some other bug. There have been a number of ideas put forward on how such problems could be handled.

The most obvious thing to do is to simply oops the kernel, with the usual results for the process that was running and, perhaps, the machine as a whole. Andy Lutomirski supported this approach, saying: "We failed, we might be under attack, let's oops." The problem with this approach, of course, is that it takes the machine out of commission, possibly with an error that is less than fun to try to track down. Ingo Molnar also worried that the oops information would, in most desktop cases, never be seen by the user and, as a result, would never be reported to developers. That highlights an old problem with presenting such information on desktop systems, but that problem is unlikely to be fixed right now.

The alternative to oopsing the system would be to log the error and somehow try to continue. Ingo suggested simply skipping over the offending instruction and trying to continue, but that idea did not go far; as PaX Team pointed out, simply dropping an intended write operation could create no end of strange problems further down the line and may actually help exploit attempts. Linus suggested, instead, that the kernel could mark the relevant page writable and retry the instruction. That would, of course, remove the read-only protection from that page, but it would allow the system to continue to operate while generating diagnostic information for developers. One would probably not want things to work this way on a production system, but it could be an invaluable option for developers.

The final piece of the puzzle might be to have a kernel command-line operation to disable the read-only marking entirely. That would provide an option to users who run into a bug and need to be able to get their work done until a proper fix is available.

Kees has indicated that his current approach is to take the kill-the-machine approach by default. He has already implemented the command-line option, and said that Linus's "mark the page writable" suggestion would not be difficult to add. So the next version of the patch should have addressed most of the concerns expressed so far. Getting it merged may prove to be the easy part, though; the task of identifying and marking truly read-only data could be a long and error-prone affair, even when starting with the work that the grsecurity developers have already done. The good news is that this work should make the kernel more secure, provide a (perhaps imperceptible) performance improvement, and turn up a few bugs along the way.

Index entries for this article
Kernel	grsecurity
Kernel	Security/Kernel hardening
Security	Hardening
Security	Linux kernel

Post-init read-only memory

Posted Dec 3, 2015 2:46 UTC (Thu) by spender (guest, #23067) [Link] (1 responses)

A few notes:

The exploit linked to was just an example of one exploit for an educational kernel vulnerability created as part of a CTF. The linked blog links to another participant's exploit for the same vulnerability that would work regardless of the __read_only changes currently being discussed.

Of note however is that that exploit would be made more difficult (even in the absence of any other grsecurity/PaX features) by RANDSTRUCT. Both exploits also wouldn't work as-is solely due to USERCOPY (another grsecurity feature being discussed recently).

Finally, the initial source of the vulnerability, an overflow in a call to krealloc, is firmly in the class of vulnerabilities PaX's size_overflow GCC plugin was designed to prevent. So regardless of desired exploit method, catching the overflow and terminating the attacking process prevents the attacker from gaining the arbitrary read/write primitive via copy_*_user and thus prevents any exploitation of the vulnerability.

-Brad

Post-init read-only memory

Posted Dec 3, 2015 3:00 UTC (Thu) by spender (guest, #23067) [Link]

Two more comments actually that I forgot to mention:

The proposed patches currently don't handle the use of __read_only in modules, they'll simply still be writable.

Grsecurity makes use of __read_only in many places that won't be possible with the reduced infrastructure proposed upstream. Specifically, we are able to use __read_only on data that is writable infrequently even after init (for instance, to protect important sysctl values, or LSM's security_ops struct). It's able to accomplish this on ARM, x86, and x64 through a feature of our KERNEXEC architecture that temporarily allows write access to read-only data for the current CPU in a race-free manner.

-Brad

Post-init read-only memory

Posted Dec 3, 2015 5:49 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

PaX Team reply has a cool article ID: https://lwn.net/Articles/666555/

I don't have anything more to say.

Post-init read-only memory

Posted Dec 3, 2015 8:48 UTC (Thu) by pabs (subscriber, #43278) [Link]

Are the Linux developers forgetting about kerneloops?

http://oops.kernel.org/

Post-init read-only memory

Posted Dec 3, 2015 9:10 UTC (Thu) by petur (guest, #73362) [Link] (1 responses)

I wonder if a transition period might help, where write attempts are logged/reported but still go through. It would help catch many unforeseen cases...

Post-init read-only memory

Posted Dec 4, 2015 9:53 UTC (Fri) by NAR (subscriber, #1313) [Link]

I guess there should be plenty of experience by now from users of the grsecurity kernel...

Post-init read-only memory

Posted Dec 3, 2015 16:03 UTC (Thu) by fandingo (guest, #67019) [Link]

To oops or not to oops is a policy question, right? Aren't LSMs the vector for making and enforcing security-related policy decisions? So trigger a LSM hook and let that policy make the proper decision for that organization/system/cluster/server/toaster. That allows the most flexibility, including the possibility of simultaneously using all 3 proposed actions (oops, relocate and write, and silently drop) customized for each module.

Post-init read-only memory

Posted Dec 4, 2015 5:39 UTC (Fri) by NCunningham (guest, #6457) [Link]

FWIW, I've been taking a similar approach while working on enhancing my hibernation patch to allow the creation of incremental images. Getting a COW mechanism working has been easy. The difficult part has been figuring out _what_ in kernelspace to make read only. Too much and you can't boot, too little and there's no point in doing it.

All of this is a long way of saying perhaps there's value in making something more generic that could be used for security and incremental hibernation images and whatever else might be able to use it in the future?

Post-init read-only memory

Posted Dec 11, 2015 21:13 UTC (Fri) by fratti (guest, #105722) [Link] (1 responses)

In the D programming language, there is a type qualifier "immutable" which, once the data has been initialised, cannot be changed. The compiler can then statically check that this is not violated. If I'm not misunderstanding this, this is essentially what __read_only is, minus the static checking part.

Such a qualifier might make for either a nice GCC compiler extension or an addition to the next C language specification revision, since (if I'm not mistaken) such a functionality would solve this particular case. The "initialise once, keep around read-only for a long time" paradigm is probably present in a lot of software, so while any language revisions or GCC extensions might be too far away for this Linux patch set, a lot of C code could probably benefit from it.

Post-init read-only memory

Posted Dec 11, 2015 23:40 UTC (Fri) by PaXTeam (guest, #24616) [Link]

i don't know D but if initialization is meant the C way then __read_only is not 'immutable' because a __read_only variable can be modified any number of times - provided it's all done during kernel init. also the __read_only attribute is primarily a hint to enforce a specific property at *runtime*, compile time checking is needed only to avoid false positives. as for its general usefulness, the kernel already has a notion of separating its init code from the rest, userland would need more extensive changes and also the infrastructure to enforce the runtime property (perhaps RELRO could be repurposed or extended for this, right now it's activated too early for being usable for __read_only).

Post-init read-only memory

Posted Dec 11, 2015 22:22 UTC (Fri) by ksandstr (guest, #60862) [Link] (2 responses)

There's another question concerning compiler behaviour. In C, data marked volatile must be read once per evaluation, and written at most once per modification and at no other times, and that order of access (per data object, i.e. irrespective of other data) is maintained according to program source. This forbids the compiler from e.g. using a MMIO register to spill registers because it looks like memory that's either hot[0] right now, or could be warmed up for an eventual overwrite -- which a further slice of the stack might not be.

So the question is: what measures are there for __read_only sections that prevent the compiler from writing the memory willy-nilly? Presumably it's not marked volatile for its performance cost.

[0] wrt TLBs in particular

Post-init read-only memory

Posted Dec 11, 2015 23:05 UTC (Fri) by PaXTeam (guest, #24616) [Link] (1 responses)

does the standard allow accessing objects with static storage duration willy-nilly?

Post-init read-only memory

Posted Dec 12, 2015 1:04 UTC (Sat) by ksandstr (guest, #60862) [Link]

No idea, to be honest. I'm more concerned with whether the standard forbids such access (and I don't know that it does), given that this is the reading that compiler implementors are going to go by. There's a lot of stuff that's implicitly permitted to go on between points of external visibility.