BTF, Rust, and the kernel toolchain
BPF Type Format (BTF), BPF's debugging information format, has undergone rapid evolution to match the evolving needs of BPF programs. José Marchesi spoke at Kangrejos about some of that work — and how it could impact Rust, specifically. He discussed debug information, kernel-specific relocations, and the planned changes to kernel stack unwinding. Each of these will require some amount of work to fully support in Rust, but preliminary signs look promising.
BTF
Marchesi described BTF as a format to denote the compiled form of C types. He said that it was similar to DWARF, but "way, way simpler". BTF is designed for a particular use case: efficient, online operations on C types and functions as they exist in memory. DWARF information is concerned with mapping debugging information to the source-level constructs of a programming language; BTF is concerned with what is in the compiled object and "not much related to the source language". At run time, this information is used by BPF programs to access kernel structures correctly, among other uses.
The process of generating BTF for a given kernel is somewhat tortured. When the kernel is compiled with BTF support, it is built with DWARF information. Then pahole converts the DWARF to BTF. One consequence of this approach is that BTF can only include information that is also present in DWARF — a problem for some of the kernel's structure attributes that aren't properly represented, so Marchesi is working toward being able to generate BTF directly. This is already mostly working in GCC, but the kernel is not yet built that way.
When the C compiler does start producing BTF directly, though, it will cause problems for the parts of the kernel written in Rust: the Rust compiler will also need to generate BTF. There are benefits to having Rust generate it as well — BTF could be used for genksyms, the tool that generates lists of kernel symbols to check loadable module compatibility — but it will certainly require some work as well.
The Rust compiler will not have to start from scratch, Marchesi said. People do already write BPF programs in Rust, and LLVM emits "correct-enough BTF". "But that's not by design," he warned, just a result of supporting BTF for C. Properly supporting BTF for Rust will mean making sure it lines up with the BTF generated for the rest of the kernel, that it works even for obscure corner cases, and that it can fully capture the richness of Rust types.
Right now, pahole is sidestepping the issue by just ignoring DWARF generated for Rust code, not creating BTF from it. This has already caused problems for some users. Carlos Bilbao asked whether anyone had tried generating BTF from a program written in a mix of C and Rust, and seen what the problem is. Marchesi explained that Rust generates DWARF with some structures that pahole doesn't support. Miguel Ojeda expanded on that, saying that Rust uses some DWARF types that were originally introduced for C++ support, and that therefore pahole doesn't have existing support for.
Björn Roy Baron and Gary Guo listed some problems with Rust enums that might apply to BTF. In particular, Rust enums are more like tagged unions in C — they have a discriminant and then a set of fields. The Rust compiler doesn't guarantee any particular representation, however; it uses this freedom to optimize some types to take less space. For example, Option<T> is an enum that contains either None or a value of type T. When values of type T can never be zero, the compiler can save the space needed by the enum tag by using zero to represent None.
This means that unlike structures, which can be annotated with #[repr(C)] to instruct the compiler to lay them out exactly like C structures, native Rust enums can't be forced to have a stable layout. The Rust compiler can, each time it is run, choose a different layout for each enum. In practice, a given version of the compiler always uses the same layout, but that isn't guaranteed. If BTF needs to refer to enum types, that freedom could complicate the implementation.
Marchesi also highlighted the difficulty that link-time optimization (LTO) poses. DWARF distinguishes between different compilation units, whereas BTF does not. So name clashes across compilation units are potentially a problem for using BTF in an LTO build of the kernel. Alice Ryhl raised a different problem — LTO can inline Rust code into C compilation units, meaning that the DWARF info can be mixed. That causes a problem for LTO builds today, since pahole can't handle the mixed DWARF info.
CO-RE
After laying out his basic concerns, Marchesi raised the topic of compile once - run everywhere (CO-RE), the approach that lets the kernel load BPF programs without requiring an exact match between the kernel headers the program was compiled against and the running kernel. In order to make this work, the compiler for the BPF program needs to take some special steps. In C, an attribute called preserve_access_index causes the compiler to generate loads and stores in a way that can be patched, and a relocation entry that tells the loader how to patch the program if the layout of the structure has changed from a different version of the kernel. Both GCC and LLVM have support for CO-RE; Marchesi wanted to know if the same approach made sense for Rust, given that the compiler can reorder fields of Rust structures (that aren't marked as using the C layout).
Andreas Hindborg thought that support like that would be great to have in Rust, since it could potentially allow for linking object files from different compilers — something that currently requires explicitly using the C calling convention, since Rust lacks a stable ABI of its own. He did have some questions about how it could work in practice, however, including what happens if a BPF program is built against an incompatible version of the kernel headers.
"Nothing good", Marchesi answered. But in the case of BPF, the verifier would complain about any bad accesses. After some discussion, during which Ojeda and Guo clarified some details of Rust's layout semantics, Marchesi suggested that perhaps a good first step would be generating CO-RE relocations only for #[repr(C)] structures. Guo questioned how that would interact with the offset_of!() macro, which can be used to find the offset of a field within a structure. Marchesi explained that the value would have to change with the relocation, but that this meant that any math that depended on the offset would be broken. Baron suggested that this might require an opaque wrapper type to prevent things from breaking.
Unwinding
Marchesi had one last topic: the potential switch from ORC to SFrame for stack unwinding in the kernel. He wanted to check that the switch would not cause problems for the Rust parts of the kernel. Guo assured him that Rust does support unwinding, currently with the same DWARF-based methods that C programs largely use. The important part is that compiled functions have unwinding information that matches what the C code does, so any potential compiler change might work out of the box. Marchesi called that "very good news", and wrapped up the session on a positive note.
Overall, BTF is unlikely to pose insurmountable challenges to the inclusion of Rust in the Linux kernel, but there are some areas that will need additional work. At the least, there will need to be testing for LLVM's BTF support, for applying CO-RE to the Rust parts of the kernel, and for ensuring that Rust's unwinding support remains working. Some of those areas may also need additional attention to ensure that the kernel can continue working smoothly as a conglomerate of C, BPF, and Rust.
Index entries for this article | |
---|---|
Conference | Kangrejos/2024 |
Posted Oct 2, 2024 18:43 UTC (Wed)
by roc (subscriber, #30627)
[Link] (5 responses)
Couldn't they fix the DWARF instead? Having accurate DWARF is still useful.
Posted Oct 2, 2024 19:19 UTC (Wed)
by daroc (editor, #160859)
[Link] (4 responses)
Posted Oct 2, 2024 21:23 UTC (Wed)
by roc (subscriber, #30627)
[Link] (3 responses)
I won't claim its extensibilty has been particularly *well* designed, but it would be good to know what exactly is the problem here.
Posted Oct 4, 2024 15:02 UTC (Fri)
by jemarch (subscriber, #116773)
[Link] (2 responses)
Now, it may seem that the perfect solution on the DWARF side would be to create a new DW_TAG_annotated_type DIE and link it in the DW_AT_Type chains. This would be indeed reflect the intended semantic perfectly, it would be also easy to implement. Unfortunately, the way DWARF is designed it would also break all DWARF reader in existence. You can't add a new kind of link to this chain in a way existing readers would just mindlessly skip it. It would have been nice if DWARF would have provided us with a DW_TAG_nop_type DIE, but it doesn't.
David Faust managed to find a backwards compatible way to workaround this particular situation, which seems to satisfy all involved parties (kernel, GCC, clang), but it is necessarily convoluted and ugly. It involves the creation of a new kind of DIE _and_ of a new DW_AT_annotation, and while it allows a great deal of node sharing in the DIE tree, it also leads to some duplication of data.
It is taking more than one year of discussions and several implementation attempts to get the tags in DWARF.
BTF got them in a day.
Posted Oct 4, 2024 19:58 UTC (Fri)
by intelfx (subscriber, #130118)
[Link]
So it *is* extensible, just not backward-compatibly?
Posted Oct 5, 2024 4:13 UTC (Sat)
by roc (subscriber, #30627)
[Link]
> It is taking more than one year of discussions and several implementation attempts to get the tags in DWARF.
Yes, the DWARF standardization process is a mess.
> BTF got them in a day.
Yeah, I understand that it's very attractive to have your own format that you control. (Of course if it lasts a long time, producers and consumers will multiply and eventually you'll have to have your own standards process, stability issues, etc.)
I'm a bit concerned that as consumers of compiler metadata proliferate over time and demand more of their own formats, the compiler maintenance burden will grow and so will the overhead at build time. It looks like in the future there will be projects that need to support debugging as well as tools that consume BTF and SFrame so will have to build with DWARF + BTF + SFrame + who knows what else, duplicating work and data.
Posted Oct 2, 2024 22:24 UTC (Wed)
by sam_c (subscriber, #139836)
[Link]
[0] https://lore.kernel.org/linux-modules/20240923181846.5498...
Posted Oct 3, 2024 3:00 UTC (Thu)
by jhoblitt (subscriber, #77733)
[Link] (3 responses)
Posted Oct 3, 2024 10:43 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (2 responses)
Posted Oct 3, 2024 16:03 UTC (Thu)
by willy (subscriber, #9762)
[Link] (1 responses)
Posted Oct 4, 2024 3:23 UTC (Fri)
by sam_c (subscriber, #139836)
[Link]
Posted Oct 4, 2024 11:23 UTC (Fri)
by jemarch (subscriber, #116773)
[Link] (2 responses)
One of the inconvenience of having to generate the BTF from DWARF is that it forces us to have to convey in the DWARF all the compiler-generated information we want in the BTF. pahole already uses other sources other than the kernel DWARF to conform the final BTF, but for things like source code annotations to distinguish between kernel pointers and userland pointers, it is the compiler that needs to provide that information. It is not that DWARF is not powerful enough nor that it is broken: it is simply that it is on the way, for no good reason than I can see, given that both GCC and clang/llvm can already generate BTF directly.
Regarding unwinding, my goal at Kangrejos was to figure out whether Rust compiled code works well in both ORC (i.e. whether objtool is able to reverse-engineer the CFI for Rust compiled functions) and SFrame (i.e. whether the Rust compiler generates the proper cfi assembler directives). The answer was a rotund YES. So nothing seems to be lacking for Rust on that side.
Posted Oct 5, 2024 3:56 UTC (Sat)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Oct 5, 2024 8:15 UTC (Sat)
by Wol (subscriber, #4433)
[Link]
(hint - it's a TV program ...)
Cheers,
Posted Oct 30, 2024 8:10 UTC (Wed)
by vadorovsky (guest, #171932)
[Link]
The first part, which is sanitizing the LLVM DebugInfo in a way it produces BTF acceptable by the kernel, is already done.[2] It comes with the following quirks:
- Support of anonymous structs, which are needed for BTF maps, but are not supported by Rust. Maps can be anonymized using a marker called `AyaBtfMapMarker` (which is basically an alias for `PhantomData<()>`).
The remaining part, which is being worked on, is adding BTF relocations - an equivalent of BPF_CORE_READ/__builtin_preserve_access_index. There is an issue describing the steps[3]. The plan is to replace GEP+load instructions with @llvm.preserve.[...].access index intrinsic calls. There will be a pull request with that work really soon.
[0] https://github.com/aya-rs/bpf-linker
fix the DWARF
fix the DWARF
fix the DWARF
fix the DWARF
fix the DWARF
fix the DWARF
Avoiding DWARF parsing...
other uses?
other uses?
other uses?
other uses?
Couple of small aclarations
Couple of small aclarations
Couple of small aclarations
Wol
BTF support in Aya
- Skipping data-carrying enums. This is fine as long as there are no kernel modules actually using them.
- Sanitizing names of types with generics - type names like `MyType<u32>` are not correct BTF, but we sanitize it to `MyType_3C__5B_u32_5D__3E_` in deterministic way (hex char representations for all problematic characters).
[1] https://aya-rs.dev/
[2] https://github.com/aya-rs/bpf-linker/pull/182
[3] https://github.com/aya-rs/aya/issues/349