Memory protection keys

By Jonathan Corbet
May 13, 2015

The memory-management units built into most contemporary processors are able to control access to memory on a per-page basis. Operating systems like Linux make that control available to applications in user space; the protection bits supplied to system calls like mmap() and mprotect() allow a process to say whether any given page should be readable, writable, or executable. This level of protection has served for a long time, so one might be tempted to conclude that it provides everything that applications need. But a new hardware feature under development at Intel suggests otherwise; the first round of patches to support this feature explore how programs might gain access to this feature.

This feature is called "memory protection keys" (MPK); it will only be available in future 64-bit Intel processors. When this feature is enabled, four (previously unused) bits in each page-table entry can be used to assign one of sixteen "key" values to any given page. There is also a new 32-bit processor register with two bits for each key value. Setting the "write disable" bit for a given key will block all attempts to write a page with that key value; setting the "access disable" bit will block reads as well. The MPK feature thus allows a process to partition its memory into a maximum of sixteen regions and to selectively disable or enable access to any of those regions. The control register is local to each thread, so different threads can enable or disable different regions independently.

A patch set enabling the MPK feature has been posted by Dave Hansen for review even though, as he noted, nobody outside of Intel will be able to actually run that code at this time. Dave is hoping to get comments on the (minimal) user-space API changes needed to support MPK once the hardware is available.

In the proposed design, applications can set the page keys using any of the system calls that set the other page protections — mprotect(), for example. There are four new flags defined (PROT_PKEY0 through PROT_PKEY3) to represent the key bits. Within the kernel, these bits are stored in the virtual memory area (VMA), and pushed into the relevant location in the hardware page tables. If a process attempts to access a page in a way that is not allowed by the protection keys, it will get the usual SIGSEGV signal. Should it catch that signal, it can look for the new SEGV_PKUERR code (in the si_code field of the siginfo_t structure passed to the handler) to detect a fault caused by a protection key. There is not currently a way to determine which key caused the fault, but adding that is on the list of things to do in the future.

One might well wonder why this feature is needed when everything it does can be achieved with the memory-protection bits that already exist. The problem with the current bits is that they can be expensive to manipulate. A change requires invalidating translation lookaside buffer (TLB) entries across the entire system, which is bad enough, but changing the protections on a region of memory can require individually changing the page-table entries for thousands (or more) pages. Instead, once the protection keys are set, a region of memory can be enabled or disabled with a single register write. For any application that frequently changes the protections on regions of its address space, the performance improvement will be large.

There is still the question (as asked by Ingo Molnar) of just why a process would want to make this kind of frequent memory-protection change. There would appear to be a few use cases driving this development. One is the handling of sensitive cryptographic data. A network-facing daemon could use a cryptographic key to encrypt data to be sent over the wire, then disable access to the memory holding the key (and the plain-text data) before writing the data out. At that point, there is no way that the daemon can leak the key or the plain text over the wire; protecting sensitive data in this way might also make applications a bit more resistant to attack.

Another commonly mentioned use case is to protect regions of data from being corrupted by "stray" write operations. An in-memory database could prevent writes to the actual data most of the time, enabling them only briefly when an actual change needs to be made. In this way, database corruption due to bugs could be fended off, at least some of the time. Ingo was unconvinced by this use case; he suggested that a 64-bit address space should be big enough to hide data in and protect it from corruption. He also suggested that a version of mprotect() that optionally skipped TLB invalidation could address many of the performance issues, especially if huge pages were used. Alan Cox responded, though, that there is real-world demand for the ability to change protection on gigabytes of memory at a time, and that mprotect() is simply too slow.

Being able to turn off unexpected writes could be especially useful when the underlying memory is a persistent memory device; any erroneous write there will go immediately to permanent storage. There have also been suggestions that tools like Valgrind could make good use of MPK.

Ingo's concerns notwithstanding, the MPK hardware feature is being added in response to customer interest; it would be surprising if the kernel did not end up supporting it, especially given that the required changes are not hugely invasive. So the real question is whether the proposed user-space API is correct and supportable in the long run. Hopefully, developers who think they might make use of this feature will take a look at the patches and make themselves heard if they find something they don't like.

Index entries for this article
Kernel	Memory protection keys
Kernel	Security/Security technologies

Memory protection keys

Posted May 14, 2015 9:13 UTC (Thu) by cotte (subscriber, #7812) [Link] (7 responses)

This is hardly new technology, as key protection is a feature of the mainframe architecture from S/360 in 1964: http://en.wikipedia.org/wiki/IBM_System/360#Architectural...

Memory protection keys

Posted May 14, 2015 10:14 UTC (Thu) by meyert (subscriber, #32097) [Link] (1 responses)

Yes, this was also my first idea! This sounds very similar to s390 storage key protection :-)

Memory protection keys

Posted May 14, 2015 10:23 UTC (Thu) by meyert (subscriber, #32097) [Link]

Of course, somebody from IBM did also point it out in the belonging thread: https://lkml.org/lkml/2015/5/7/849

Memory protection keys

Posted May 14, 2015 18:14 UTC (Thu) by hansendc (subscriber, #7363) [Link] (2 responses)

Yes, the concept is not a new one in hardware. At least x86, s390, powerpc and ia64 have some form of protection keys. x86 was the outlier for *not* having it.

However, there is currently no general support for these features on any of these architectures in Linux. These patches are the first proposal I know of to use this hardware in Linux in any substantive way.

Memory protection keys

Posted May 19, 2015 17:32 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (1 responses)

> At least x86, s390, powerpc and ia64 have some form of protection keys. x86 was the outlier for *not* having it.

Typo? Which side is x86 supposed to be on and what did you intend?

Memory protection keys

Posted Nov 12, 2016 17:13 UTC (Sat) by eSyr (guest, #112051) [Link]

arm, maybe?

Memory protection keys

Posted Jun 8, 2015 9:00 UTC (Mon) by marcan (guest, #103032) [Link] (1 responses)

ARM has had this in their MMU for ages, certainly at least since ARM9/ARMv5 (it's called Domain Access Control). The implementation is almost identical: 4 bits in the page table entry select a domain, and a 32-bit Domain Access Control Register has two bits for each of 16 domains to control access.

Except instead of "write disable" and "read disable" bits, there is an extra level of indirection, where the bits choose "no access", "client access", or "manager access". "manager" is R/W, and "client access" can be configured per memory section (1MB virtual address space block) as various combinations of no access, read-only, and read-write for user and supervisor access levels.

Memory protection keys

Posted Jun 8, 2015 11:34 UTC (Mon) by spender (guest, #23067) [Link]

Some important differences though: DACR cannot be modified by userland without entering the kernel, while protection keys can. AFAIK it's also not possible to implement execute-only pages using domains as no access means no access, whereas protection keys apply to data access only, not instruction fetches. Domains affect the kernel as well while this only affects userland (currently). Domains permit granting permissions greater than that specified by the page tables, while protection keys can only give out a subset of existing permissions (due to the userland-only design).

-Brad

Memory protection keys

Posted May 21, 2015 17:19 UTC (Thu) by anton (subscriber, #25547) [Link]

One application use that may benefit from cheap changing of protections is garbage collection. There are schemes where you want to catch it if additional pointers from old to new memory are written. If such writes into old memory are rare, catching that with memory protection is more efficient than software checking of writes. I am not sure if the keys described in the article are really useful for that, it has been some time since I read garbage collection papers.

Another application use may be hardening JIT compiler-generated code against vulnerabilities. Last year I heard a presentation where the JIT compiler was put in a separate process to get different protections for the JIT compiler than for the execution. The protection keys may be an easier way to get the same protection.

Memory protection keys

Posted Jun 15, 2015 8:55 UTC (Mon) by bgoglin (subscriber, #7800) [Link]

What does prevent malicious code from enabling access to all 16 regions in their own thread register before trying to access critical data?

Memory protection keys

Posted Apr 16, 2018 21:06 UTC (Mon) by sfink (guest, #6405) [Link]

Ooh, so in 2015 there was some serious work on something that would be extremely valuable for Spectre mitigation!

This seems extremely useful for partitioning untrusted code in a shared process, eg a web browser that doesn't want to take the process-per-domain hit. And for isolating components like the GC and JIT. It's not a complete protection from malicious attack, of course, since if you have somewhat-controlled execution of code within one partition, you may be able to change the key register, but it seems like a pretty good additional barrier to preventing that access in the first place.

I would love to be able to efficiently ensure that no stray writes corrupt GC bookkeeping data. Any memory corruption anywhere tends to crash in the GC because it scans through and chases pointers with lots and lots of memory. This would make it so the mutator would crash immediately at the right place, making errors far far more obvious and likely to be fixed at the root.