Standardizing the BPF ISA
While BPF may be most famous for its use in the Linux kernel, there is actually a growing effort to standardize BPF for use on other systems. These include eBPF for Windows, but also uBPF, rBPF, hBPF, bpftime, and others. Some hardware manufacturers are even considering integrating BPF directly into networking hardware. Dave Thaler led two sessions about all of the problems that cross-platform use inevitably brings and the current status of the standardization work at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit.
Thaler opened the first session (on the first day of the summit) by discussing the many platforms that are now capable of running BPF. With multiple compilers and runtimes, there are inevitable compatibility problems. He defined the goal of the ongoing IETF BPF standardization work as trying to ensure that any compiler can be used with any compliant runtime. He then went into a bit more detail about what "compliant" means in this specific context, which required first explaining a bit of background about the structure of the standardization documents.
In his later session, Thaler would go into more detail about the exact state of the first IETF draft from the working group; for the initial session, he merely stated that the working group had produced a draft instruction set architecture (ISA) specification for BPF. That draft defines the semantics of all of the BPF instructions. One wrinkle is that different implementations may not actually care about implementing every BPF instruction. For example, BPF started off with some instructions that are particular to its initial use case as a packet-filtering language; those packet-filtering instructions might not actually be useful to BPF code running in other contexts.
The draft ISA splits the defined instructions into sets of "conformance groups". A compliant runtime, then, is one that correctly implements the specified instructions for all of the conformance groups it claims to support. Splitting things up in this way helps runtimes (and compilers) communicate exactly what they support, Thaler explained.
The draft ISA splits the existing instructions into groups largely modeled after the RISC-V ISA: atomic32, atomic64, base32, base64, divmul32, divmul64, and packet. Some of these groups include other groups — for example, any implementation claiming to implement base64 must also implement base32. In fact, all of the 64-bit groups include their 32-bit counterparts. Any new instructions that get added to BPF in the future will be added to a new conformance group, so existing groups will never be modified. That means that once an implementation has become compliant, it doesn't necessarily need to stay up to date with new changes to BPF; it can continue claiming compatibility with the old instruction groups and leave things there.
Thaler also described the process that the working group has settled on for deprecations. If a group of instructions needs to be deprecated for whatever reason, they'll be added to a separate conformance group, and then new implementations can explicitly exclude that group and still be considered compliant. A compiler processing a BPF program will need to receive a set of conformance groups implemented by the target (either via compiler flags or other configuration), and take care to emit only supported instructions. The base32 group, which must be supported by all implementations, is already fairly broad, so code generation should not be much of an issue. Hopefully the end result for users will be seamless compatibility.
Instructions are not the only component of BPF, however. Another area requiring standardization is the platform-specific application binary interface (psABI), which includes details such as which registers are saved across a call, which register contains the frame pointer, how large the stack is, and other details, Thaler said. This is all a lot more up in the air, because the working group has not put together a draft of a psABI specification yet. He also floated the possibility that there might end up being multiple psABIs, in which case compilers would need to either choose one to support, or allow some way to specify which one would be used for code generation.
José Marchesi objected to the idea that the frame pointer was a choice that ought to be left to the psABI — BPF uses automatic stack allocation, meaning that the runtime manages the frame pointer in BPF register r10. Thaler responded that the ISA doesn't actually say that; the unwritten psABI would need to say that. Marchesi wasn't satisfied with that explanation, since r10 is treated differently from the other registers. In particular, it is read-only. There was some additional discussion of the point, but other members of the audience didn't seem to agree with Marchesi that the behavior of r10 ought to be specified in the ISA.
At that point, Thaler moved on to addressing another point of compatibility unique to BPF: the verifier. There already exist multiple BPF verifiers, notably the one in the Linux kernel and the PREVAIL verifier. A compiler hoping to produce portable BPF code would need an actual description of what code is or is not verifiable. That is something the working group has been considering, but has not yet written any draft specifications about.
The state of the standard
In his second session, late on the last day of the summit, Thaler updated everyone on the current state of the ISA standard. He began the session by explaining what it is the working group is chartered to do: produce a set of standards and informational documents on several specific topics. How the working group does that is up to them — so they could work on these documents in parallel, but have generally been pursuing them in priority order.
Because having an ISA is foundational to being able to discuss other topics such as the psABI and requirements for the verifier, the ISA is the first document the working group has been focusing on. At the time of the session, the ISA was "almost done", with the last call for comments ending the next day. As of this writing, the ISA is on the agenda for the June 13 meeting of the Internet Engineering Steering Group (IESG).
To people not well versed in IETF minutiae, that might not provide a clear picture of what the state of the document actually is; Thaler provided a brief overview of the remaining process as it applies to the BPF ISA. At the June 13 meeting, the IESG will vote on the proposed document. If it fails to pass, any questions or comments will go back to the working group, the document will be revised, and then it will return to the IESG at a later date. If the vote passes, the document enters the RFC editor's queue. The RFC editor converts the document to the specific format for RFCs, updates any references, and assigns it a tentative RFC number. Then the authors have a final chance to review the RFC editor's changes before it is published, and the assigned number becomes final.
In parallel, the document also needs to be reviewed by the Internet Assigned Numbers Authority (IANA), because IANA will become responsible for managing the official list of conformance groups. Thaler described IANA as comprised of "process people", who are unlikely to raise any objections to the document as long as the procedure described for registering new conformance groups does not have any problems.
All parts of that process are fairly fast, except waiting on the IESG, which only meets every two weeks. So it is quite likely that the BPF ISA may be an official RFC by the end of June, he said. David Vernet, the chair of the working group, asked Thaler whether there was anything that the assembled BPF developers could do to prevent delays. Thaler said that there was not, since it was all waiting on the IETF — except for providing fast responses to feedback.
In particular, Thaler had already received some feedback during the last call for comments. Since the comment period was scheduled to end the next day, he thought that if the attendees quickly replied to these concerns, there would likely not be any other delays. He went through the feedback, most of which was minor and already incorporated. One piece of feedback prompted actual discussion, however. Eric Klein had suggested that the ISA should not define the range of registers available to BPF programs, saying that should be moved to the (not yet written) psABI instead. This suggestion was not well received.
Several audience members, including Marchesi, spoke up to say that the number of registers a CPU has should always be part of the ISA. One audience member asked how compilers are supposed to produce code for a platform without knowing how many registers there are. Marchesi and Alexei Starovoitov separately mentioned that there were some details of how registers were used, such as the use of r0 for return values, or the use of the frame pointer, that did not necessarily need to be included in the ISA, but still thought the number of valid registers was important to include. Thaler noted everyone's responses, and intends to keep the number of registers (currently eleven — ten general purpose and one read-only frame pointer) in the ISA.
Vernet then questioned why they should standardize on eleven registers — other than to match the existing behavior of BPF implementations. Another member of the audience said that was a "very good question, but not one that should affect the standard", given that this is what all existing portable BPF programs do. Marchesi suggested that the ISA could say that the instruction encoding has space for up to 16 registers, but that the exact number depends on the implementation, with a minimum of eleven. Several other people pushed to move on with eleven as-is, noting that if this were a real issue it would have come up at some point before the final 48 hours of the last call for comments on the ISA.
Once the ISA is standardized, the next step (although he was clear that this was only a rough order), will be an informational standard describing the expectations for a BPF verifier. This might include ensuring properties like not using undefined instructions, not dereferencing invalid pointers, or ensuring that programs terminate. Marchesi noted that it would be convenient for compilers if the document took the form of a numbered or named list of rules, so that compiler error messages (and internal code) can reference them by name. Starovoitov thought that those kinds of requirements belonged in a separate document; Thaler concurred, noting that a compiler expectations document was later in his list. Other upcoming tasks for the working group includes standardizing the BPF Type Format (BTF), and informational documents on producing portable binaries — including documenting compiler expectations for verifiable code, and the psABI.
Thaler spoke a bit about his preferred form for the psABI work, and then moved into one last topic for the audience to help with: an ELF profile for BPF. He has a draft proposal, but he has concerns about the right way to perform the standardization. ELF is not an IETF standard — it is defined as part of the System V specifications. So, he asked, what is the right way to register BPF-specific ELF information (like the BPF CPU identifier in the ELF headers)?
The consensus among the audience was that System V was pretty much defunct, and that sending an email claiming the BPF CPU ID to the System V mailing list should be sufficient. With that, the session came to a close.
| Index entries for this article | |
|---|---|
| Kernel | BPF/Standardization |
| Conference | Storage, Filesystem, Memory-Management and BPF Summit/2024 |
Posted Jun 1, 2024 4:34 UTC (Sat)
by alison (subscriber, #63752)
[Link] (5 responses)
Posted Jun 2, 2024 17:19 UTC (Sun)
by willy (subscriber, #9762)
[Link] (4 responses)
https://github.com/vbpf/ebpf-verifier
It does talk about using sudo, but I'm pretty sure that's just to run the Linux verifier as a contrast to PREVAIL.
Posted Jun 2, 2024 17:35 UTC (Sun)
by alison (subscriber, #63752)
[Link] (3 responses)
Posted Jun 2, 2024 22:55 UTC (Sun)
by intelfx (subscriber, #130118)
[Link] (2 responses)
Isn't "Docker on Windows or MacOS" simply a Linux VM under the hood?
(That is not to say that PREVAIL needs Linux kernel. At a glance, it looks like a purely userspace solution.)
Posted Jun 3, 2024 0:02 UTC (Mon)
by alison (subscriber, #63752)
[Link]
On Linux, Docker is a container, so it's running the host's Linux kernel (with all the security implications thereof). Presumably therefore a Linux Docker container won't run on other hosts, but my ignorance of Windows and MacOs is total.
-- Alison, about to hit new LWN comment limit
Posted Jun 3, 2024 10:44 UTC (Mon)
by anselm (subscriber, #2796)
[Link]
On Linux, Docker containers are glorified chroot environments. On a non-Linux machine, you need to somehow bring in the underlying Linux bits that support the glorified chroot environment, and a VM is one reasonable way of doing this.
Posted Jun 4, 2024 11:09 UTC (Tue)
by joib (subscriber, #8541)
[Link] (1 responses)
Posted Jun 5, 2024 12:14 UTC (Wed)
by foom (subscriber, #14868)
[Link]
In all other contexts, use a WASM VM.
PREVAIL verifier question
PREVAIL verifier question
PREVAIL verifier question
PREVAIL verifier question
PREVAIL verifier question
PREVAIL verifier question
Isn't "Docker on Windows or MacOS" simply a Linux VM under the hood?
Standardizing the BPF ISA
Standardizing the BPF ISA
