|
|
Log in / Subscribe / Register

No hardware memory isolation for BPF programs

By Daroc Alden
February 25, 2026

On February 12, Yeoreum Yun posted a suggestion for an improvement to the security of the kernel's BPF implementation: use memory protection keys to prevent unauthorized access to memory by BPF programs. Yun wanted to put the topic on the list for discussion at the Linux Storage, Filesystem, Memory Management, and BPF Summit in May, but the lack of engagement makes that unlikely. They also have a patch set implementing some of the proposed changes, but has not yet shared that with the mailing list. Yun's proposal does not seem likely to be accepted in its current form, but the kernel has added hardware-based hardening options in the past, sometimes after substantial discussion.

When a modern CPU needs to turn a virtual address into a physical address, it does so by consulting a page table. This table also dictates whether the memory in question is readable, writable, executable, accessible by user space, etc. Page tables have a multi-level structure, requiring several pointer indirections to find the actual entry for a page of memory. To avoid the overhead of following these indirections on every memory access, the CPU keeps a cache of recently accessed entries called the translation lookaside buffer (TLB).

When the kernel wants to change the access permissions of a given area of memory, it needs to update the page table and then flush the TLB (causing an inevitable performance hit as it refills). Worse, if the area of memory is large, it may need to update many page-table entries. Even just keeping track of which page-table entries need to be updated and iterating through them all can be a time-consuming operation. Memory protection keys are a hardware feature that helps avoid the overhead of changing large sections of the page table, making it practical to change the permissions of memory as part of a routine operation.

Memory protection keys use four bits in the page table to associate each page in memory with one of sixteen keys; there is a special CPU register that associates each key with read and write permissions. This avoids the need to actually change individual page-table entries: just change the permission bits for the corresponding key and the memory becomes inaccessible, without the need to walk the page table or flush the TLB.

The kernel has had support for memory protection keys since 2016, but that support has only extended to user space. The related system calls allow user-space applications to use memory protection keys to implement a faster version of mprotect(). There is no reason, in theory, that memory protection keys couldn't be used within the kernel as well. In practice, there have been a number of attempts to integrate them into the kernel that have not come to fruition.

Yun suggested adding a new set of kmalloc_pkey() and vmalloc_pkey() functions to allocate kernel objects in parts of memory protected by a key. That would make it practical to give BPF programs access to only a subset of kernel memory. Specifically, memory that is owned by BPF programs, or that a subsystem specifically intends to share with a BPF program, could be allocated with a separate memory protection key from other kernel allocations. Then, when entering BPF code, access to general kernel memory could be swiftly disabled. Yun's message described how this would work in some depth, but did not include any of the actual code to implement it β€” even though they intend to share their work-in-progress code soon β€” so it's hard to judge how invasive a change this would be.

Dave Hansen thought that plan might be feasible for subsystems such as the scheduler that have a relatively limited amount of writable data, but that other areas of the kernel would have more problems:

Networking isn't my strong suit, but packet memory seems rather dynamically allocated and also needs to be written to by eBPF programs. I suspect anything that slows packet allocation down by even a few cycles is a non-starter.

IMNHO, any approach to solving this problem that starts with: we just need a new allocator or modification to existing kernel allocators to track a new memory type makes it a dead end. Or, best case, a very surgical, targeted solution.

Alexei Starovoitov, on the other hand, did not just think the suggestion would be difficult to pull off, but also pointless. Yun had listed several CVEs from between 2020 and 2023 as a way of showing that the BPF verifier alone was not enough to ensure security. Starovoitov disagreed: "None of them are security issues. They're just bugs."

Yun agreed that they were bugs, but disagreed that they have no security implications. Yun is of the opinion that the existence of bugs in the BPF verifier that have led to memory corruption in the past is enough justification to put another barrier between running a BPF program and having an exploitable vulnerability.

Considering the previous unsuccessful attempts to use memory protection keys in the kernel and the difficulty of implementation, Yun's proposal to introduce memory protection keys to the BPF subsystem seems unlikely to go anywhere. On the other hand, the kernel has slowly been adding hardening measures such as per-call-site slab caches β€” perhaps associating these caches with memory protection keys is a logical next step. Only time will tell whether memory protection keys are a useful addition to the kernel's various hardening tools, or whether they're a cumbersome distraction. Either way, they will likely have to make their way into the kernel via some other subsystem.


Index entries for this article
KernelMemory protection keys


to post comments

Only sixteen ?

Posted Feb 25, 2026 16:14 UTC (Wed) by nim-nim (subscriber, #34454) [Link] (1 responses)

16 seems an awfully short number to build a new mechanism on, especially when it’s really 15 (the first key is used as default for un-taged memory) and userspace can consume an unknown number of those 15. Of course it is fast because it is short.

Only sixteen ?

Posted Feb 25, 2026 16:48 UTC (Wed) by aviallon (subscriber, #157205) [Link]

This is overlaid on top of the preexisting permissions.
So you can have overlap between userspace and kernel space over the keys, it doesn't matter, since this is all protected by regular page table permissions anyway.

eBPF accessibility to non-root?

Posted Feb 25, 2026 18:11 UTC (Wed) by geofft (subscriber, #59789) [Link] (2 responses)

The big question here seems to be the intended security model around eBPF. Is there supposed to be a security boundary between eBPF programs and general ring 0 access? That is, is the verifier a security-sensitive component or not?

Starovoitov's email says, "eBPF was restricted to root for many years, so the above is simply not true." I don't fully understand what the past tense means here. In a followup email, he says, "Again. They are not security issues. cap_bpf is effectively root." Is he just saying that eBPF was and still is restricted to root-level power, either actual root or the capability? Or did something change recently to allow eBPF to unprivileged users? A very recent LWN article mentions using classic BPF for io_uring specifically because it needs to be unprivileged, so my impression is this hasn't changed.

If eBPF is restricted to root or root-equivalent credentials, what makes it different from, say, kernel module loading / CAP_SYS_MODULE? Why even bother with the kernelspace verifier, if it's not a security issue for the verifier to not correctly verify? I understand we got here because the original vision was unprivileged eBPF, but if we've given up on the goal, why not get rid of the verifier too? This doesn't mean giving up on eBPF itself: the architecture-independent bytecode and CO-RE are still useful for interoperability. It's like how the JVM still remains highly useful a decade after we finally gave up on the idea that Java applets were a meaningful form of sandboxing. It's no slight against eBPF to admit that the verifier isn't needed, if indeed it isn't needed.

This LWN article covering a 2023 talk by Starovoitov makes it sound like the restriction of eBPF to privileged processes is permanent, and has some explicit comparisons with CAP_SYS_MODULE. He mentions that the verifier is good for verifying safety of kernel code to protect you from crashes, but if this isn't a security boundary and just a good feature for stability, why not run the verification offline in userspace instead of forcing it into the kernel? That puts it in the same class of tools as a static analyzer, a C compiler with various warnings on, or a Rust compiler, all of which prevent kernel crashes without running in the kernel themselves.

Specifically - why not deprecate CAP_BPF and make CAP_SYS_MODULE imply CAP_BPF? Then it's clear that any program that can load eBPF programs can just load unsandboxed kernel code as well, and that the verifier has no responsibility.

On the other hand, if it's still a good goal to make eBPF accessible to unprivileged programs, then efforts like this should be welcomed, and it seems reasonable to loosen the requirement to say that you must have CAP_BPF or the kernel + hardware must be in a configuration where there's memory isolation for eBPF execution.

Assuming the decision is that eBPF will forever require the level of privilege needed for arbitrary kernel access, I wonder if the kernel maintainers would be open to some third thing halfway between cBPF and eBPF, more expressive than cBPF so you can write real programs, but not quite as expressive as eBPF so its verifier can be a real security boundary. For instance, I would love the ability to use BPF maps from from inside a seccomp filter, so I can track new file descriptors, and it seems like it should be possible to extend the cBPF bytecode to support BPF function calls while keeping it safe to use from unprivileged programs.

eBPF accessibility to non-root?

Posted Feb 25, 2026 18:49 UTC (Wed) by epa (subscriber, #39769) [Link] (1 responses)

Given that there is often a security boundary between userspace programs running as root and "general ring 0 access", I think it's reasonable to suppose that eBPF also has a security boundary, even if it can do everything root does. On a machine with Secure Boot, running a signed kernel, you may not be able to load arbitrary kernel modules or kexec() a different kernel image. Some bug that tramples memory and subverts the running kernel would be considered a security hole, whether from eBPF or otherwise.

eBPF accessibility to non-root?

Posted Feb 25, 2026 20:21 UTC (Wed) by geofft (subscriber, #59789) [Link]

Well, my question is whether there's a separate boundary, or whether the ability (or inability) to load arbitrary kernel code is equivalent to the ability (or inability) to load eBPF. Does it make sense to have a configuration where a certain process is trusted to load eBPF programs but not arbitrary kernel modules? If not, why distinguish CAP_BPF and CAP_SYS_MODULE?

kernel_lockdown(7), which is used for Secure Boot, says that lockdown mode requires module signatures and also restricts "kernel services that allow direct access of the kernel image" including "BPF," and this presentation from Plumbers 2020 suggests that adding signing to eBPF is the way forward. This 2025 LWN article also implies that signing would be needed for eBPF the same as for kernel modules, or at least that one very large operator that requires signing for kernel modules is not comfortable with unsigned eBPF programs.

If the userspace/module and userspace/eBPF boundaries are one and the same boundary (whether that boundary is "root," "root and it must be signed," "never past boot," "a capability," etc.), then there's no point in an eBPF verifier given that there's no kernel module verifier.

"None of them are security issues. They're just bugs."

Posted Feb 26, 2026 10:37 UTC (Thu) by k3ninho (subscriber, #50375) [Link] (1 responses)

> "None of them are security issues. They're just bugs."
In terms of LMKL rhetoric, has this changed from being an irrelevant distinction? The classical Linus take is that no bug gets special treatment for being a security issue, all bugs need fixing. So how is Starovoitov objecting here?

K3n.

"None of them are security issues. They're just bugs."

Posted Feb 26, 2026 14:16 UTC (Thu) by daroc (editor, #160859) [Link]

As I understood him, Starovoitov's point is that they were discovered and fixed according to the normal bug-fixing process, and that therefore it's not reasonable to use them as evidence that the BPF verifier is insecure. It's just occasionally buggy, as every software component in the kernel is.

Whether you agree with Starovoitov or with Yun about the (lack of) need for additional security-risk-mitigation mechanisms is something where reasonable people can disagree, I think.


Copyright © 2026, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds