More flexible memory access for BPF programs

October 21, 2022

This article was contributed by David Vernet

All memory accesses in a BPF program are statically checked for safety using the verifier, which analyzes the program in its entirety before allowing it to run. While this allows BPF programs to safely run in kernel space, it restricts how that program is able to use pointers. Until recently, one such constraint was that the size of a memory region referenced by a pointer in a BPF program must be statically known when a BPF program is loaded. A recent patch set by Joanne Koong enhances BPF to support loading programs with pointers to dynamically sized memory regions.

Verifying kernel pointers in BPF programs

In order to safely load a BPF program, the verifier must validate that no memory access in the program will ever crash the kernel. This is a complex task, as "memory" can refer to a variety of different contexts in a program. For example, some pointers may reference the BPF program's stack, whereas other pointers, such as kptrs, may reference a structure that was passed from the main kernel via a kfunc. Both of these types of pointers have different scenarios in which an access would or would not be safe, and thus require separate logic in the verifier to ensure that any accesses to them are safe. For the stack pointer, the verifier needs to ensure that the offset of any read is within the program's active stack region. For kptrs returned from a kfunc, the verifier must ensure that the offset of any read is within the bounds of the structure as specified by the structure's BPF Type Format (BTF) information (write accesses are much more strictly controlled).

Yet, while the bounds of these two different memory regions may differ, they both require that all reads to them must take place at static offsets in order for the verifier to be able to ensure that the access is safe. This restriction, of course, precludes any use cases requiring a pointer to a dynamically sized data region. For example, the BPF ring-buffer map type allows BPF programs to write entries into a ring buffer for consumption by user space. If all memory references need to be statically known at run time, the BPF program would only be able to write entries whose sizes were statically known when the program was loaded. It would be useful, however, to be able to write entries whose sizes can be specified dynamically at run time.

dynptrs – Referencing dynamically sized memory

Koong's patch set adds support for accessing dynamically sized regions of memory in BPF programs with a new feature called dynptrs. The main idea behind dynptrs is to associate a pointer to a dynamically sized data region with metadata that is used by the verifier and some BPF helper functions to ensure that accesses to the region are valid. Koong's patch set creates this association in a newly defined type called struct bpf_dynptr. This structure is opaque to BPF programs; within the kernel it is represented by:

    /* the implementation of the opaque uapi struct bpf_dynptr */
    struct bpf_dynptr_kern {
    	void *data;
	u32 size;
	u32 offset;
    } __aligned(8);

The size of the dynamic region is stored in a 32-bit, unsigned integer, with the upper eight bits being reserved for metadata about the dynptr itself. The highest-order bit specifies whether the dynptr is read-only, and the next seven highest-order bits describe the type of memory that is referenced by the dynptr. This leaves 24 bits for the size, implying that a dynptr can point to a region no larger than 16MB. The patch set adds support for two types of dynptrs: BPF_DYNPTR_TYPE_LOCAL, which points to memory that is local to the program such as a map value, and BPF_DYNPTR_TYPE_RINGBUF, which points to data in a BPF_MAP_TYPE_RINGBUF map.

Dynptrs are created and accessed using a series of helper functions. A dynptr may be read using the bpf_dynptr_read() helper, or written using bpf_dynptr_write() for writeable dynptrs. bpf_dynptr_read() will copy memory from the dynptr data region into a buffer specified by the calling program, whereas bpf_dynptr_write() will copy data from a program buffer into the dynptr data region. Before performing the copy, the helper functions verify that the proposed length and offsets refer to a valid part of the dynptr memory region. If the user requires direct access to the memory region contained in the dynptr, they can use the bpf_dynptr_data() helper though, in this case, the size of the memory area being requested must be static so that the verifier can ensure that any accesses to it are valid.

Local memory dynptrs

BPF_DYNPTR_TYPE_LOCAL, or local dynptr support, is added by the second patch of the series. "Local memory" in BPF can refer to several different types of memory used by a program, including, for example, map values, map keys, and stack memory. Koong's patch set allows local dynptrs to be created via a new helper function called bpf_dynptr_from_mem(). Despite the existence of a wide variety of local memory types, the initial patch set only adds support for creating local dynptrs to a map value. This restriction is presumably because the verifier already provides a guarantee to helper functions that receive a pointer to a map value that it will be properly initialized and sized, thus allowing the initial implementation of dynptrs to be as simple as possible.

Other local memory types could be supported in the future as well, though each of these memory types would require additional logic in the verifier for validating the input arguments to bpf_dynptr_from_mem(). While there was no indication in the patch series about when (or whether) other types of local memory will be added, it seems prudent to add support for them so as to provide a more consistent experience in using the API. In the initial implementation, a user will have no way of knowing that a local dynptr only supports map values until their program is rejected by the verifier.

Dynamically sized ring-buffer entries using dynptrs

As mentioned above, the static-sizing constraint forces the size of ring-buffer entries published by the kernel in BPF_MAP_TYPE_RINGBUF maps to be statically known when the program is loaded. To address this problem, Koong included a patch that defines a new BPF_DYNPTR_TYPE_RINGBUF type of dynptr. The patch includes the bpf_ringbuf_reserve_dynptr() helper function for reserving a dynamically sized ring-buffer entry, as well as bpf_ringbuf_submit_dynptr() and bpf_ringbuf_discard_dynptr() for posting the entries to the ring buffer or discarding them respectively. These APIs closely match the existing APIs for reserving and posting statically sized ring-buffer entries.

Dynptrs are also used in the new BPF_MAP_TYPE_USER_RINGBUF map type patch set, recently merged into bpf-next, that I wrote. This map type, which allows user space to publish ring-buffer entries to BPF programs, provides a bpf_user_ringbuf_drain() helper function that allows a BPF program to consume entries from the ring buffer, and invoke some specified callback on each of those entries. This callback receives a dynptr to the ring-buffer entry as its first argument. In order to read the entries, the BPF program can simply use bpf_dynptr_read() or bpf_dynptr_data(), as described above.

Holding off on a `kmalloc()` type dynptr

One thing to note is that none of the above supported dynptr types refer to memory that was allocated via kmalloc(). This would have seemed like an obvious use case at first glance, and was in fact proposed in an earlier version of the patch set via a BPF_DYNPTR_TYPE_MALLOC dynptr type. The type was eventually dropped, however, following discussions that revealed some subtle, yet fundamental, issues that would need to be addressed before it could be supported.

For example, in response to the patch set, Daniel Borkmann raised the question of which memory control group (memcg) should be charged for the allocated memory. This point is relevant; allocations in the kernel that are done on behalf of a user-space process need to be charged to the memcg containing the allocating process. But identifying that process is not always straightforward. The memcg of the process that loaded the program would seem to fit that profile, but as Alexei Starovoitov pointed out, that process (and its memcg) do not necessarily persist after the program has been loaded.

Another question, posed by Starovoitov, is whether memory allocated by a BPF program should be charged to a memcg at all. Most kmalloc() calls in the kernel are not charged in this way and, as was reinforced at the 2022 Linux Kernel Maintainers Summit, BPF programs are instances of kernel programs, not user programs. Borkmann responded that, perhaps, the solution was to allow users to specify the memcg to charge explicitly, rather than implicitly relying on the loading task's memcg as BPF currently does. This would involve the user obtaining a file descriptor to a memcg and passing it to the kernel when a program is loaded. If no descriptor is passed, the default behavior would be to not charge the memory to any memcg. This suggestion was well received by both Starovoitov and Andrii Nakryiko, though the conversation tapered off without a firm conclusion, and Koong eventually sent a follow-on patch that replaced BPF_DYNPTR_TYPE_MALLOC with BPF_DYNPTR_TYPE_LOCAL.

The ability to dynamically allocate from BPF programs is an interesting prospect, so it seems likely that the feature will be revisited once the solution for memory accounting has been clarified.

Ongoing work with dynptrs

Work is currently ongoing that adds new dynptr types to support further BPF use cases in the networking stack. In one recent patch set, Koong proposes adding two new types of dynptrs, one whose underlying memory region contains a socket buffer, and the other whose memory region contains an eXpress Data Path (XDP) buffer. The benefits of these dynptr types are the same for both types of buffers, with the main one being that it allows BPF programs to use more ergonomic APIs for reading and mutating memory in the buffers.

Consider, for example, if a user wanted to parse a type-length-value (TLV) header in a TCP packet contained in a struct xdp_md buffer. This structure contains data and data_end fields that represent the start and end of the packet's data region, respectively. A TLV header contains a header entry that encodes a type, followed by a length of the header value, and then the value which is that specified length. The length entry in the header is a value that can vary at run time between different packets and header types, so iterating over the headers in a packet requires non-static pointer offsets. Without dynptrs, a user would have to code explicit checks for every single read of the packet header to ensure that it fits within the data and data_end fields of the xdp_md buffer. With dynptrs, getting a pointer to the next header TLV is simply a matter of calling bpf_dynptr_data() with an offset calculated from the prior header TLV with the unknown type, and then checking that the pointer received from the helper is non-NULL.

While this doesn't enable entirely new use cases, it does address a significant usability concern in BPF networking programs that is a frequent source of complaints. Additionally, it makes the generated BPF program code more robust to changes in Clang and LLVM, which can sometimes cause the verifier to reject a previously safe program.

So far, the patches haven't received any strong pushback, and it seems unlikely that they will. At this time, yet another patch set has also been submitted upstream adding even more dynptr helper functions. Those functions may be the subject of another article in the future.

Index entries for this article
GuestArticles	Vernet, David

More flexible memory access for BPF programs

Posted Oct 22, 2022 0:40 UTC (Sat) by jhoblitt (subscriber, #77733) [Link] (4 responses)

If we could bundle bpf programs into the kernel image, soon we would be able boot systems and not need a userland at all. ;)

More flexible memory access for BPF programs

Posted Oct 22, 2022 10:58 UTC (Sat) by jorgegv (subscriber, #60484) [Link] (1 responses)

You mean Unikernels?

More flexible memory access for BPF programs

Posted Oct 22, 2022 12:17 UTC (Sat) by jhoblitt (subscriber, #77733) [Link]

It could fit that definition. Although, I think of a unikernel as taking more traditional protected memory proccess and stuffing them into kernel space.

One could image a high performance packet molester based completely on BPF+XDP.

More flexible memory access for BPF programs

Posted Oct 23, 2022 1:43 UTC (Sun) by danobi (subscriber, #102249) [Link] (1 responses)

I think that already exists — see CONFIG_BPF_PRELOAD.

More flexible memory access for BPF programs

Posted Oct 23, 2022 3:05 UTC (Sun) by jhoblitt (subscriber, #77733) [Link]

Wow! I missed the memo on that.

More flexible memory access for BPF programs

Posted Oct 22, 2022 22:03 UTC (Sat) by amarao (guest, #87073) [Link] (1 responses)

If only kernel had had the notion of ownership and lifetimes for variables...

More flexible memory access for BPF programs

Posted Oct 23, 2022 20:31 UTC (Sun) by Manifault (guest, #155796) [Link]

Not sure I'm quite following how that's relevant to dynptrs, which are more about ensuring safe accesses to variably sized memory regions, rather than ensuring the lifetime of the memory it points to (though the verifier does guarantee the memory is still valid when it's accessed as well).

For what it's worth, BPF also does support ownership and object lifetime / reference counting. kfuncs can be "acquire" and "release" kfuncs, and maps can store pointers to kernel objects. See [0] and [1] for more information.

[0]: https://lwn.net/Articles/856005/
[1]: https://lwn.net/Articles/900749/

More flexible memory access for BPF programs

Posted Nov 24, 2022 10:19 UTC (Thu) by polyp (guest, #53146) [Link]

All this work to the BPF mechanism makes me think about the famous Tanenbaum-Torvalds debate. Where do we have an environment where pointer accesses are checked and memory corruption is prevented outside of the sandbox where the program executes? In user-space. Perhaps a micro-kernel with user-space drivers/helpers is the better model.

More flexible memory access for BPF programs

Verifying kernel pointers in BPF programs

dynptrs – Referencing dynamically sized memory

Local memory dynptrs

Dynamically sized ring-buffer entries using dynptrs

Holding off on a kmalloc() type dynptr

Ongoing work with dynptrs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

More flexible memory access for BPF programs

Holding off on a `kmalloc()` type dynptr