Supporting BPF in GCC
The GCC project has been working to support compiling to BPF for some time. José Marchesi and David Faust spoke in an extended session at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit about how that work has been going, and what is left for GCC to be on-par with LLVM with regard to BPF support. They also related tentative plans for how GCC BPF support would be maintained in the future.
Marchesi started with a brief overview of some of the recent work in GCC. In December 2023, the project rewrote the BPF-generation code to not use GCC's venerable CGEN library, which generates code generators from a description of the CPU. Marchesi says that the hand-written implementation of BPF code generation is much better; CGEN is abstract and concise, but BPF is "so weird [...] that torturing CGEN into supporting it was challenging".
GCC has also added support for BPF's pseudo-C syntax (a representation of BPF assembly that looks more like C than a traditional assembly language, such as the other BPF representation, does), BPF v4 instructions, converting short jumps into long jumps where appropriate, and platform-specific flags. GCC now puts the version of the BPF CPU into the platform-specific flags of the ELF object it produces. The disassembler uses that information to show the correct version of instructions, and readelf displays that information when inspecting an ELF file. Marchesi asked whether LLVM recorded that information anywhere. LLVM developer Yonghong Song replied that it didn't. Marchesi asked whether Song objected to recording version information in this way; Song did not.
Marchesi continued the list of recent changes to GCC. He noted that bpf-helpers.h (a header file providing some macros for writing portable BPF programs) had been removed, since GCC now supports BPF's special three-underscore type suffixes. GCC also supports "compile once — run everywhere" (CO-RE), where the user-space BPF loader performs relocations on the program before loading it. CO-RE is now enabled by default.
That isn't the only change that brings GCC's output closer to LLVM's; GCC now produces BPF Type Format (BTF) debugging information by default (when debugging information is enabled). It also emits pseudo-C BPF code by default, which Marchesi said "I despise with my whole soul". A lot of inline assembly in BPF programs uses the pseudo-C syntax, however, so GCC has to support it.
Another place where Marchesi had questions for the assembled developers was around support for memmove(), memcpy(), and memset(). On most platforms, GCC generates a call to the library implementing the C language runtime. This isn't possible in BPF (which lacks run-time libraries) so currently GCC inlines the functions instead. Unfortunately, BPF also doesn't have unrestricted loops, so this only works when the loops inside the functions can be unrolled. But that can cause quite large code when operating on large structures, possibly much more than programmers are expecting.
GCC has a new option to emit an error if inlining these functions will produce code larger than a user-specified threshold, but Marchesi wanted to know how Clang handles this. Song indicated that Clang does the same inlining, but currently has a hard-coded limit for the size of the generated code. Marchesi suggested that if the Clang developers do ever wish to make it user-configurable, that they adopt the same name as GCC.
David Vernet suggested that GCC could perhaps emit BPF loops using bounded iterators (which the Linux verifier understands, but which some other BPF implementations do not) on platforms that support it. Marchesi agreed that this was possible, saying that the compiler doesn't really care what the generated code looks like as long as it can be verified.
Marchesi then went on to say that GCC now defines the same BPF feature macros — based on whether a particular class of instructions is available on a given BPF CPU version — that LLVM does. He asked the room whether those feature macros were covered by the work being done to standardize BPF, saying that now that GCC implements them, they need to be documented in the GCC manual, but he was unsure if there was an authoritative source to refer to. Dave Thaler indicated that he would talk about that in his session, right after the current one. That session was about the recent efforts by the IETF working group to standardize the BPF ISA, including standardizing ways for BPF implementations to advertise different optional functionality to compliant compilers.
Having covered many small compatibility features, Marchesi now arrived at "the exciting part" of the session. With all of this work combined, GCC now compiles 100% of the kernel's BPF self-tests, as well as the BPF components of several other projects such as systemd and DTrace. There are still 108 run-time failures in the kernel self-tests, but "it looks like GCC is actually generating code that can be verified".
This news was well received, and a member of the audience suggested it might now be appropriate to add GCC to the BPF continuous integration (CI) system to prevent regressions; Marchesi agreed, asking whether it was worth having test runners for different BPF CPU versions. The consensus seemed to be that it was not. Marchesi indicated that the next milestone for GCC would be to actually eliminate those run-time failures.
Marchesi indicated that the work he had been discussing was currently not in a GCC release, but that it would be incorporated into the next binutils release in June or July, and the GCC 14.2 release in August.
The future
At this point, Faust took over and the session turned to the future of BPF development in GCC. First on the agenda was improvements to inline assembly.
Inline assembly isn't as simple as just dropping some assembly code directly into the compiler's output. When information needs to pass between C and assembly, the programmer needs to indicate which registers correspond to which variables. Currently, GCC warns about using a variable that is shorter than the given register — even if the assembly code never actually touches the upper part of the register. To fix this, Faust proposed adding "w" and "R" register suffixes to indicate the lower 32 bits or full 64 bits of a register, respectively. These could also be used with immediate values, to indicate how large an of immediate value GCC should use when assembling the output. Faust asked the audience what they thought of that design, and there were no particular objections.
Other future work includes ensuring that, once the IETF standardization process actually produces a formalized memory model, GCC follows it, adding support for BPF's may-goto instruction, and pruning excess BTF debug information. BTF has a few needed improvements, because it currently doesn't work alongside link-time optimization. Binutils is also missing support for BTF, meaning it doesn't show up in objdump, nor is it deduplicated by the linker like other debugging-information formats are.
Improving GCC's BTF support is no easy task, however. Internally, GCC treats DWARF as the canonical debugging-information format, and generates BTF from that, Faust explained. One audience member asked whether that was really the case — does GCC not have an internal representation for debugging information? Marchesi clarified that GCC actually uses a slightly tweaked version of DWARF that he called "internal DWARF", but that otherwise GCC really is limited to what can be represented in DWARF. Unfortunately, the upstream DWARF developers are pretty resistant to accepting new features found in BTF, such as type and declaration information. They believe that doing so would bloat the DWARF format, for no real gain.
Vernet noted that DWARF is already a fairly heavy format, so it's funny that size would be the basis of the objection. Marchesi elaborated that the way DWARF is designed makes it nearly impossible to extend without breaking backward compatibility, which means adding new features requires hacky workarounds that introduce extra bloat. Faust noted that implementing BTF type tags in the natural way would cause any DWARF reader that didn't know how to deal with them to be unable to parse the file.
Marchesi then turned the topic away from the new planned features, and toward the ongoing maintenance of BPF support. He began by stating that the GCC developers take producing verifiable programs seriously — a challenging prospect since both the BPF verifier and GCC are moving targets. Ideally, Marchesi said, the GCC developers would like to avoid people needing to make private forks or maintaining and packaging one toolchain per kernel. In order to do that, GCC needs to adopt a maintenance process that works for BPF.
The GCC developers are considering introducing a special maintenance branch for BPF where bug fixes are applied, but only those that seem unlikely to interfere with producing verifiable programs. He emphasized that this is just an idea, not yet set in stone, and asked everyone else what they thought about the issue.
Alexei Starovoitov noted that the BPF CI already catches similar regressions in LLVM, usually with plenty of time to fix them before a release. He also said: "I don't think we've ever had a case" of a bug fix breaking existing BPF code's verifiability. Marchesi asked how often Clang is released, noting that GCC is released once a year. Song said that Clang has two releases per year. Starovoitov said that once GCC is covered by the CI, the GCC developers will have plenty of time to fix any issues. He thought dedicated maintenance branches sounded nice, but that they were probably not worth the effort.
Thaler said that there was a problem with compilers breaking BPF programs — with eBPF for Windows. Vernet replied that people who need to worry about that should be running Clang in their CI and catching it early. Thaler said that this doesn't fix the problem — if they upgrade the compiler and the tests fail, then they remain stuck on an older version.
Another audience member brought up a different use case for an out-of-tree compiler version: Android. They noted that Android user-space software (including BPF loaders and BPF programs) can remain on a device for years. This would not be as much of an issue if Android were not also working to update kernels more often. They said that this had required Android to stick with a specific Clang version, one requiring out-of-tree patches. Marchesi asked whether, having had that experience, they had suggestions for how to make the situation better. The audience member replied that they hadn't thought about it. Marchesi asked everyone to please let him know if they did come up with any clever solutions.
Marchesi then went through some last notes before time for the session ran out. He announced that the Godbolt Compiler Explorer now supports GCC BPF, and then briefly covered GCC's support for non-C languages compiling to the BPF backend. In short, it may work, but isn't supported. "In practice, every BPF program needs BTF", which isn't available for other languages.
Thaler noted that this wasn't really true, because eBPF for Windows doesn't use "call by BTF-id", the core instruction that makes BTF mandatory, which the Linux kernel uses to call kfuncs. Vernet asked whether BTF could be extended to support other languages. Marchesi said that he had spoken to some Rust developers last year, and that they had said you would need to be careful to only use the subset of the type system that BTF can express. Vernet asserted that the IETF standardization process should probably consider non-C languages when it gets around to standardizing BTF.
At that point the session was threatening to put the BPF track, which had previously been running on time, behind schedule. After a few more quick questions from the audience, the session wrapped up.
