|
|
Subscribe / Log in / New account

Page-table hardening with memory protection keys

By Jonathan Corbet
January 9, 2025
Attacks on the kernel can take many forms; one popular exploitation path is to find a way to overwrite some memory with attacker-supplied data. If the right memory can be targeted, one well-targeted stray write is all that is needed to take control of the system. Since the system's page tables regulate access to memory, they are an attractive target for this type of attack. This patch set from Kevin Brodsky is an attempt to protect page tables (and, eventually, other data structures) using the "memory protection keys" feature provided by a number of CPU architectures.

Memory protection keys are an additional access-permission mechanism that is layered on top of the permissions implemented in the page tables. Memory can be partitioned into a relatively small number (eight or 16, typically) of domains (or "keys"). A key, in the sense used here, is simply a small integer value that has a set of memory-access permissions associated with it. Each page has an assigned key that can be used to impose additional access restrictions. Memory that is nominally writable cannot be written if its key denies that access. The permissions associated with a key can be changed quickly and affect all pages marked with that key; as a result, large swaths of memory can be quickly made accessible or inaccessible at any time.

Changing the permissions associated with a key is an unprivileged operation. Memory protection keys, thus, cannot protect against attackers who are able to execute arbitrary code. They can, though, be useful to protect against unintended access. Critical data can be write-protected using a key, with that key's permissions being briefly changed only when that data must be written. An attacker attempting to overwrite the same data, perhaps through exploitation of a use-after-free vulnerability, will be blocked, making the system that much harder to compromise. Similarly, memory containing sensitive data (cryptographic keys, for example) can be assigned a key that, most of the time, allows no access at all, reducing the likelihood that this data will be leaked to an attacker.

Linux first gained support for memory protection keys with the 4.6 kernel release in 2016. That support is available for 64-bit Arm and x86 systems, but only for user space. Some attempts over the years notwithstanding, memory protection keys have never been used to protect memory in kernel space, despite the fact that the CPUs support that functionality.

Brodsky's patch set is an attempt to change that situation by using memory protection keys to regulate access to page tables on 64-bit Arm systems. Page tables were chosen for protection because of their value as a target, but also because access to them is already well confined to a set of helper functions, making it relatively easy to add the necessary hooks to change the key protections for a brief period when page tables need to be modified.

A recurring concern with memory protection keys is their relatively small number; it is generally expected that there will be demand for more keys than the hardware can provide, though there has been little evidence of that happening so far. The user-space interface added a set of system calls, including pkey_alloc(), which is used to allocate a new key. On the kernel side, though, there may be no need for a general allocation mechanism; the kernel's code is all present in the repository, so keys can be assigned statically, at least for now.

The patch set does add a bit of structure, though, in the form of a concept called "kpkeys levels". Each level allows access to specific regions of memory. The intent would appear to be that access grows monotonically as the level increases, but there is nothing in the code that implements a hierarchy of levels; each level can be independent of the others. Since this is the first use of this mechanism, there are only two levels implemented: KPKEYS_LVL_DEFAULT, which provides access to kernel-space memory that is not further protected, and KPKEYS_LVL_PGTABLES, which enables write access to page tables.

This abstraction might seem like more than is really needed in this case, where one could simply assign a key for page-table pages and be done with it. Brodsky appears to be looking forward to future applications where more complex combinations of permissions are needed. Separating levels from specific keys also makes it possible for multiple levels to use the same key, which could be useful if the available keys are oversubscribed someday.

The interface to kpkeys levels, from the point of view of most kernel code, is fairly simple; there are two new functions:

    u64 kpkeys_set_level(int level);
    void kpkeys_restore_pkey_reg(u64 pkey_reg);

A call to kpkeys_set_level() will set the current kpkeys level, enabling whatever access that level provides. The return value is an architecture-specific representation of the state of key permissions prior to the change (not the previous level, since other code may be using some of the keys outside of the kpkeys levels mechanism). The previous protections can be restored by passing that returned value to kpkeys_restore_pkey_reg().

The page-table protection API is layered on top of the kpkeys levels machinery. It causes page-table pages to be assigned to the memory protection key set aside for page tables; by default, the associated protections do not allow writing to those pages. Any code that must modify a page-table page should first enable access with a call to kpkeys_set_level() setting the level to KPKEYS_LVL_PGTABLES, then use kpkeys_restore_pkreg_reg() to remove that access afterward. The easier and safer way, though, is to use a scope-based guard:

    guard(kpkeys_hardened_pgtables)();

This bit of magic will make page tables writable and ensure that the change is undone as soon as the current function returns, making it impossible to forget to restore the page-table protections regardless of which code path is taken.

The current implementation, Brodsky said, "should be considered a proof of concept only". It includes just enough support for Arm's kernel-space memory protection keys feature ("Permission Overlay Extension" or POE) to make the rest work; it is not intended to be a complete kernel-space POE implementation at this point. There are also no benchmark results showing what the impact of this mechanism is on performance; developers will want to see those measurements eventually.

As it happens, this is not the first attempt to protect page-table pages using memory protection keys; Rick Edgecombe posted an x86 patch set back in 2021. There was also an attempt to use memory protection keys to prevent stray writes to persistent memory by Ira Weiny in 2022. Neither series progressed to the point of being merged into the mainline, and Edgecombe eventually set the page-table work aside in favor of other projects.

It seems clear, though, that there is interest in providing this sort of protection for page-table pages. To be successful, a patch set will almost certainly need to incorporate elements from both the Arm and x86 work to show that it is, indeed, applicable to more than one architecture. If that barrier can be overcome, the kernel might eventually have hardening of page-table access. Thereafter, it may make sense to extend this protection to other critical data structures within the kernel (Brodsky suggests task credentials and SELinux state, among other things). First, though, there needs to be agreement on the core infrastructure, and that discussion has barely begun.

Index entries for this article
KernelMemory protection keys


to post comments

PGTY_table

Posted Jan 9, 2025 16:09 UTC (Thu) by willy (subscriber, #9762) [Link] (1 responses)

In 2018, I added PG_table (since renamed PGTY_table) to mark pages allocated for page tables. In March 2019, I added code to prevent page tables being mapped into userspace. That doesn't address the same attacks being defended against here, but it was a cheap check to add that doesn't require new CPU features.

PGTY_table

Posted Jan 16, 2025 5:03 UTC (Thu) by rgb (subscriber, #57129) [Link]

Is this enabled by default in most common distribution kernels nowadays?


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds