Memory protection keys
This feature is called "memory protection keys" (MPK); it will only be available in future 64-bit Intel processors. When this feature is enabled, four (previously unused) bits in each page-table entry can be used to assign one of sixteen "key" values to any given page. There is also a new 32-bit processor register with two bits for each key value. Setting the "write disable" bit for a given key will block all attempts to write a page with that key value; setting the "access disable" bit will block reads as well. The MPK feature thus allows a process to partition its memory into a maximum of sixteen regions and to selectively disable or enable access to any of those regions. The control register is local to each thread, so different threads can enable or disable different regions independently.
A patch set enabling the MPK feature has been posted by Dave Hansen for review even though, as he noted, nobody outside of Intel will be able to actually run that code at this time. Dave is hoping to get comments on the (minimal) user-space API changes needed to support MPK once the hardware is available.
In the proposed design, applications can set the page keys using any of the system calls that set the other page protections — mprotect(), for example. There are four new flags defined (PROT_PKEY0 through PROT_PKEY3) to represent the key bits. Within the kernel, these bits are stored in the virtual memory area (VMA), and pushed into the relevant location in the hardware page tables. If a process attempts to access a page in a way that is not allowed by the protection keys, it will get the usual SIGSEGV signal. Should it catch that signal, it can look for the new SEGV_PKUERR code (in the si_code field of the siginfo_t structure passed to the handler) to detect a fault caused by a protection key. There is not currently a way to determine which key caused the fault, but adding that is on the list of things to do in the future.
One might well wonder why this feature is needed when everything it does can be achieved with the memory-protection bits that already exist. The problem with the current bits is that they can be expensive to manipulate. A change requires invalidating translation lookaside buffer (TLB) entries across the entire system, which is bad enough, but changing the protections on a region of memory can require individually changing the page-table entries for thousands (or more) pages. Instead, once the protection keys are set, a region of memory can be enabled or disabled with a single register write. For any application that frequently changes the protections on regions of its address space, the performance improvement will be large.
There is still the question (as asked by Ingo Molnar) of just why a process would want to make this kind of frequent memory-protection change. There would appear to be a few use cases driving this development. One is the handling of sensitive cryptographic data. A network-facing daemon could use a cryptographic key to encrypt data to be sent over the wire, then disable access to the memory holding the key (and the plain-text data) before writing the data out. At that point, there is no way that the daemon can leak the key or the plain text over the wire; protecting sensitive data in this way might also make applications a bit more resistant to attack.
Another commonly mentioned use case is to protect regions of data from being corrupted by "stray" write operations. An in-memory database could prevent writes to the actual data most of the time, enabling them only briefly when an actual change needs to be made. In this way, database corruption due to bugs could be fended off, at least some of the time. Ingo was unconvinced by this use case; he suggested that a 64-bit address space should be big enough to hide data in and protect it from corruption. He also suggested that a version of mprotect() that optionally skipped TLB invalidation could address many of the performance issues, especially if huge pages were used. Alan Cox responded, though, that there is real-world demand for the ability to change protection on gigabytes of memory at a time, and that mprotect() is simply too slow.
Being able to turn off unexpected writes could be especially useful when the underlying memory is a persistent memory device; any erroneous write there will go immediately to permanent storage. There have also been suggestions that tools like Valgrind could make good use of MPK.
Ingo's concerns notwithstanding, the MPK hardware feature is being added in
response to customer interest; it would be surprising if the kernel did not
end up supporting it, especially given that the required changes are not
hugely invasive. So the real question is whether the proposed user-space
API is correct and supportable in the long run. Hopefully, developers who
think they might make use of this feature will take a look at the patches
and make themselves heard if they find something they don't like.
| Index entries for this article | |
|---|---|
| Kernel | Memory protection keys |
| Kernel | Security/Security technologies |
Posted May 14, 2015 9:13 UTC (Thu)
by cotte (subscriber, #7812)
[Link] (7 responses)
Posted May 14, 2015 10:14 UTC (Thu)
by meyert (subscriber, #32097)
[Link] (1 responses)
Posted May 14, 2015 10:23 UTC (Thu)
by meyert (subscriber, #32097)
[Link]
Posted May 14, 2015 18:14 UTC (Thu)
by hansendc (subscriber, #7363)
[Link] (2 responses)
However, there is currently no general support for these features on any of these architectures in Linux. These patches are the first proposal I know of to use this hardware in Linux in any substantive way.
Posted May 19, 2015 17:32 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
Typo? Which side is x86 supposed to be on and what did you intend?
Posted Nov 12, 2016 17:13 UTC (Sat)
by eSyr (guest, #112051)
[Link]
Posted Jun 8, 2015 9:00 UTC (Mon)
by marcan (guest, #103032)
[Link] (1 responses)
Except instead of "write disable" and "read disable" bits, there is an extra level of indirection, where the bits choose "no access", "client access", or "manager access". "manager" is R/W, and "client access" can be configured per memory section (1MB virtual address space block) as various combinations of no access, read-only, and read-write for user and supervisor access levels.
Posted Jun 8, 2015 11:34 UTC (Mon)
by spender (guest, #23067)
[Link]
-Brad
Posted May 21, 2015 17:19 UTC (Thu)
by anton (subscriber, #25547)
[Link]
Another application use may be hardening JIT compiler-generated code against vulnerabilities. Last year I heard a presentation where the JIT compiler was put in a separate process to get different protections for the JIT compiler than for the execution. The protection keys may be an easier way to get the same protection.
Posted Jun 15, 2015 8:55 UTC (Mon)
by bgoglin (subscriber, #7800)
[Link]
Posted Apr 16, 2018 21:06 UTC (Mon)
by sfink (guest, #6405)
[Link]
This seems extremely useful for partitioning untrusted code in a shared process, eg a web browser that doesn't want to take the process-per-domain hit. And for isolating components like the GC and JIT. It's not a complete protection from malicious attack, of course, since if you have somewhat-controlled execution of code within one partition, you may be able to change the key register, but it seems like a pretty good additional barrier to preventing that access in the first place.
I would love to be able to efficiently ensure that no stray writes corrupt GC bookkeeping data. Any memory corruption anywhere tends to crash in the GC because it scans through and chases pointers with lots and lots of memory. This would make it so the mutator would crash immediately at the right place, making errors far far more obvious and likely to be fixed at the root.
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
Memory protection keys
