BPF meets io_uring
The patch set itself is relatively straightforward, adding less than 300 lines of new code. It creates a new BPF program type (BPF_PROG_TYPE_IOURING) for programs that are meant to be run in the io_uring context. Any such programs must first be created with the bpf() system call, then registered with the ring in which they are intended to run using the new IORING_ATTACH_BPF command. Once that has been done, the IORING_OP_BPF operation will cause a program to be run within the ring. The final step in the patch series adds a helper function that BPF programs can use to submit new operations into the ring.
As a proof of concept, the patch series does a good job of showing how BPF programs might be run from an io_uring. This work does not, though, really enable any new capabilities in its current form, which may be part of why there have been no responses to it on the list. There is little value to running a BPF program asynchronously to submit another operation; one could simply submit that operation directly instead. As is acknowledged in the patch set, more infrastructure will be needed before this capability will become useful to users.
The obvious place where BPF can add value is making decisions based on the outcome of previous operations in the ring. Currently, these decisions must be made in user space, which involves potential delays as the relevant process is scheduled and run. Instead, when an operation completes, a BPF program might be able to decide what to do next without ever leaving the kernel. "What to do next" could include submitting more I/O operations, moving on to the next in a series of files to process, or aborting a series of commands if something unexpected happens.
Making that kind of decision requires the ability to run BPF programs in response to other events in the ring. The sequencing mechanisms built into io_uring now would suffice to run a program once a specific operation completes, but that program will not have access to much useful information about how the operation completed. Fixing that will, as Begunkov noted, require a way to pass the results of an operation into a BPF program when it runs. An alternative would be to tie programs directly to submitted operations (rather than making them separate operations, as is done in the patch set) that would simply run at completion time.
With that piece in place, and with the increasing number of system calls supported within io_uring, it will become possible to create complex, I/O-related programs that can run in kernel space for extended periods. Running BPF programs may look like an enhancement to io_uring, but it can also be seen as giving BPF the ability to perform I/O and run a wide range of system calls. It looks like a combination that people might do some surprising things with.
That said, this is not a feature that is likely to be widely used. On its own, io_uring brings a level of complexity that is only justified for workloads that will see a significant performance improvement from asynchronous processing. Adding BPF into the mix will increase the level of complexity significantly, and long sequences of operations and BPF programs could prove challenging to debug. Finally, loading io_uring programs requires either of the CAP_BPF or CAP_SYS_ADMIN capabilities, which means "root" in most configurations. As long as the current hostility toward unprivileged BPF programs remains, that is unlikely to change; as a result, relatively few programs are likely to use this feature.
Still, the combination of these two subsystems provides an interesting look
at where Linux may go in the future. Linux will (probably) never be a unikernel system, but
the line between user space and the kernel does appear to be getting
increasingly blurry.
| Index entries for this article | |
|---|---|
| Kernel | BPF/io_uring |
| Kernel | io_uring/BPF support |
