BTF, Rust, and the kernel toolchain

By Daroc Alden
October 2, 2024

Kangrejos 2024

BPF Type Format (BTF), BPF's debugging information format, has undergone rapid evolution to match the evolving needs of BPF programs. José Marchesi spoke at Kangrejos about some of that work — and how it could impact Rust, specifically. He discussed debug information, kernel-specific relocations, and the planned changes to kernel stack unwinding. Each of these will require some amount of work to fully support in Rust, but preliminary signs look promising.

BTF

Marchesi described BTF as a format to denote the compiled form of C types. He said that it was similar to DWARF, but "way, way simpler". BTF is designed for a particular use case: efficient, online operations on C types and functions as they exist in memory. DWARF information is concerned with mapping debugging information to the source-level constructs of a programming language; BTF is concerned with what is in the compiled object and "not much related to the source language". At run time, this information is used by BPF programs to access kernel structures correctly, among other uses.

The process of generating BTF for a given kernel is somewhat tortured. When the kernel is compiled with BTF support, it is built with DWARF information. Then pahole converts the DWARF to BTF. One consequence of this approach is that BTF can only include information that is also present in DWARF — a problem for some of the kernel's structure attributes that aren't properly represented, so Marchesi is working toward being able to generate BTF directly. This is already mostly working in GCC, but the kernel is not yet built that way.

When the C compiler does start producing BTF directly, though, it will cause problems for the parts of the kernel written in Rust: the Rust compiler will also need to generate BTF. There are benefits to having Rust generate it as well — BTF could be used for genksyms, the tool that generates lists of kernel symbols to check loadable module compatibility — but it will certainly require some work as well.

The Rust compiler will not have to start from scratch, Marchesi said. People do already write BPF programs in Rust, and LLVM emits "correct-enough BTF". "But that's not by design," he warned, just a result of supporting BTF for C. Properly supporting BTF for Rust will mean making sure it lines up with the BTF generated for the rest of the kernel, that it works even for obscure corner cases, and that it can fully capture the richness of Rust types.

Right now, pahole is sidestepping the issue by just ignoring DWARF generated for Rust code, not creating BTF from it. This has already caused problems for some users. Carlos Bilbao asked whether anyone had tried generating BTF from a program written in a mix of C and Rust, and seen what the problem is. Marchesi explained that Rust generates DWARF with some structures that pahole doesn't support. Miguel Ojeda expanded on that, saying that Rust uses some DWARF types that were originally introduced for C++ support, and that therefore pahole doesn't have existing support for.

Björn Roy Baron and Gary Guo listed some problems with Rust enums that might apply to BTF. In particular, Rust enums are more like tagged unions in C — they have a discriminant and then a set of fields. The Rust compiler doesn't guarantee any particular representation, however; it uses this freedom to optimize some types to take less space. For example, Option<T> is an enum that contains either None or a value of type T. When values of type T can never be zero, the compiler can save the space needed by the enum tag by using zero to represent None.

This means that unlike structures, which can be annotated with #[repr(C)] to instruct the compiler to lay them out exactly like C structures, native Rust enums can't be forced to have a stable layout. The Rust compiler can, each time it is run, choose a different layout for each enum. In practice, a given version of the compiler always uses the same layout, but that isn't guaranteed. If BTF needs to refer to enum types, that freedom could complicate the implementation.

Marchesi also highlighted the difficulty that link-time optimization (LTO) poses. DWARF distinguishes between different compilation units, whereas BTF does not. So name clashes across compilation units are potentially a problem for using BTF in an LTO build of the kernel. Alice Ryhl raised a different problem — LTO can inline Rust code into C compilation units, meaning that the DWARF info can be mixed. That causes a problem for LTO builds today, since pahole can't handle the mixed DWARF info.

CO-RE

After laying out his basic concerns, Marchesi raised the topic of compile once - run everywhere (CO-RE), the approach that lets the kernel load BPF programs without requiring an exact match between the kernel headers the program was compiled against and the running kernel. In order to make this work, the compiler for the BPF program needs to take some special steps. In C, an attribute called preserve_access_index causes the compiler to generate loads and stores in a way that can be patched, and a relocation entry that tells the loader how to patch the program if the layout of the structure has changed from a different version of the kernel. Both GCC and LLVM have support for CO-RE; Marchesi wanted to know if the same approach made sense for Rust, given that the compiler can reorder fields of Rust structures (that aren't marked as using the C layout).

Andreas Hindborg thought that support like that would be great to have in Rust, since it could potentially allow for linking object files from different compilers — something that currently requires explicitly using the C calling convention, since Rust lacks a stable ABI of its own. He did have some questions about how it could work in practice, however, including what happens if a BPF program is built against an incompatible version of the kernel headers.

"Nothing good", Marchesi answered. But in the case of BPF, the verifier would complain about any bad accesses. After some discussion, during which Ojeda and Guo clarified some details of Rust's layout semantics, Marchesi suggested that perhaps a good first step would be generating CO-RE relocations only for #[repr(C)] structures. Guo questioned how that would interact with the offset_of!() macro, which can be used to find the offset of a field within a structure. Marchesi explained that the value would have to change with the relocation, but that this meant that any math that depended on the offset would be broken. Baron suggested that this might require an opaque wrapper type to prevent things from breaking.

Unwinding

Marchesi had one last topic: the potential switch from ORC to SFrame for stack unwinding in the kernel. He wanted to check that the switch would not cause problems for the Rust parts of the kernel. Guo assured him that Rust does support unwinding, currently with the same DWARF-based methods that C programs largely use. The important part is that compiled functions have unwinding information that matches what the C code does, so any potential compiler change might work out of the box. Marchesi called that "very good news", and wrapped up the session on a positive note.

Overall, BTF is unlikely to pose insurmountable challenges to the inclusion of Rust in the Linux kernel, but there are some areas that will need additional work. At the least, there will need to be testing for LLVM's BTF support, for applying CO-RE to the Rust parts of the kernel, and for ensuring that Rust's unwinding support remains working. Some of those areas may also need additional attention to ensure that the kernel can continue working smoothly as a conglomerate of C, BPF, and Rust.

Index entries for this article
Conference	Kangrejos/2024

fix the DWARF

Posted Oct 2, 2024 18:43 UTC (Wed) by roc (subscriber, #30627) [Link] (5 responses)

> One consequence of this approach is that BTF can only include information that is also present in DWARF — a problem for some of the kernel's structure attributes that aren't properly represented, so Marchesi is working toward being able to generate BTF directly.

Couldn't they fix the DWARF instead? Having accurate DWARF is still useful.

fix the DWARF

Posted Oct 2, 2024 19:19 UTC (Wed) by daroc (editor, #160859) [Link] (4 responses)

Unfortunately, DWARF was not designed with extension in mind, which makes it difficult to add new types of information.

fix the DWARF

Posted Oct 2, 2024 21:23 UTC (Wed) by roc (subscriber, #30627) [Link] (3 responses)

DWARF definitely was designed to be extensible! And that extensibilty has been used, a lot.

I won't claim its extensibilty has been particularly *well* designed, but it would be good to know what exactly is the problem here.

fix the DWARF

Posted Oct 4, 2024 15:02 UTC (Fri) by jemarch (subscriber, #116773) [Link] (2 responses)

An example of the kind of problems we face is the support for the declaration and type tags that the BPF verifier needs. New C attributes (btf_decl_tag and btf_type_tag) are added in order to annotate particular declarations and types with arbitrary strings. The interpretation of the strings is in this case up to the kernel, like "percpu" or "user" or "kernel". The verifier needs the annotated types in BTF, but to reach the BTF we are forced to convey the information via DWARF.

Now, it may seem that the perfect solution on the DWARF side would be to create a new DW_TAG_annotated_type DIE and link it in the DW_AT_Type chains. This would be indeed reflect the intended semantic perfectly, it would be also easy to implement. Unfortunately, the way DWARF is designed it would also break all DWARF reader in existence. You can't add a new kind of link to this chain in a way existing readers would just mindlessly skip it. It would have been nice if DWARF would have provided us with a DW_TAG_nop_type DIE, but it doesn't.

David Faust managed to find a backwards compatible way to workaround this particular situation, which seems to satisfy all involved parties (kernel, GCC, clang), but it is necessarily convoluted and ugly. It involves the creation of a new kind of DIE _and_ of a new DW_AT_annotation, and while it allows a great deal of node sharing in the DIE tree, it also leads to some duplication of data.

It is taking more than one year of discussions and several implementation attempts to get the tags in DWARF.

BTF got them in a day.

fix the DWARF

Posted Oct 4, 2024 19:58 UTC (Fri) by intelfx (subscriber, #130118) [Link]

> Now, it may seem that the perfect solution on the DWARF side would be to create a new DW_TAG_annotated_type DIE and link it in the DW_AT_Type chains. This would be indeed reflect the intended semantic perfectly, it would be also easy to implement. Unfortunately, the way DWARF is designed it would also break all DWARF reader in existence. You can't add a new kind of link to this chain in a way existing readers would just mindlessly skip it.

So it *is* extensible, just not backward-compatibly?

fix the DWARF

Posted Oct 5, 2024 4:13 UTC (Sat) by roc (subscriber, #30627) [Link]

It's not obvious from your description why you need a new type DIE. A new DWARF attribute could be applied to existing DWARF type or declaration DIEs and existing DWARF consumers ignore unknown attributes, as far as I know.

> It is taking more than one year of discussions and several implementation attempts to get the tags in DWARF.

Yes, the DWARF standardization process is a mess.

> BTF got them in a day.

Yeah, I understand that it's very attractive to have your own format that you control. (Of course if it lasts a long time, producers and consumers will multiply and eventually you'll have to have your own standards process, stability issues, etc.)

I'm a bit concerned that as consumers of compiler metadata proliferate over time and demand more of their own formats, the compiler maintenance burden will grow and so will the overhead at build time. It looks like in the future there will be projects that need to support debugging as well as tools that consume BTF and SFrame so will have to build with DWARF + BTF + SFrame + who knows what else, duplicating work and data.

Avoiding DWARF parsing...

Posted Oct 2, 2024 22:24 UTC (Wed) by sam_c (subscriber, #139836) [Link]

My hope is that BTF could possibly be used to avoid the need for the kernel to add a DWARF parser [0].

[0] https://lore.kernel.org/linux-modules/20240923181846.5498...

other uses?

Posted Oct 3, 2024 3:00 UTC (Thu) by jhoblitt (subscriber, #77733) [Link] (3 responses)

Out of curiosity, has BTF found any usage outside of BPF/kernel?

other uses?

Posted Oct 3, 2024 10:43 UTC (Thu) by wahern (subscriber, #37304) [Link] (2 responses)

BTF was derived from CTF, which is used by dtrace on Solaris and (IIUC) FreeBSD and macOS. Oracle seems to have ported dtrace over to Linux, and to support kernel images lacking CTF definitions wrote patches for in-kernel runtime generation of CTF from BTF. BTF and CTF remain very similar but aren't binary compatible.

other uses?

Posted Oct 3, 2024 16:03 UTC (Thu) by willy (subscriber, #9762) [Link] (1 responses)

I think it was parallel evolution, not one derived from the other. Obviously they ended up very similar, but they do have slightly different purposes and I don't see any appetite for unifying them.

other uses?

Posted Oct 4, 2024 3:23 UTC (Fri) by sam_c (subscriber, #139836) [Link]

CTFv4 will be "rebased" on BTF and there was some discussion on what stuff should go from CTF->BTF in a Plumbers talk.

Couple of small aclarations

Posted Oct 4, 2024 11:23 UTC (Fri) by jemarch (subscriber, #116773) [Link] (2 responses)

Thanks for the article.

One of the inconvenience of having to generate the BTF from DWARF is that it forces us to have to convey in the DWARF all the compiler-generated information we want in the BTF. pahole already uses other sources other than the kernel DWARF to conform the final BTF, but for things like source code annotations to distinguish between kernel pointers and userland pointers, it is the compiler that needs to provide that information. It is not that DWARF is not powerful enough nor that it is broken: it is simply that it is on the way, for no good reason than I can see, given that both GCC and clang/llvm can already generate BTF directly.

Regarding unwinding, my goal at Kangrejos was to figure out whether Rust compiled code works well in both ORC (i.e. whether objtool is able to reverse-engineer the CFI for Rust compiled functions) and SFrame (i.e. whether the Rust compiler generates the proper cfi assembler directives). The answer was a rotund YES. So nothing seems to be lacking for Rust on that side.

Couple of small aclarations

Posted Oct 5, 2024 3:56 UTC (Sat) by roc (subscriber, #30627) [Link] (1 responses)

"Rotund" may not have been the word you were looking for :-).

Couple of small aclarations

Posted Oct 5, 2024 8:15 UTC (Sat) by Wol (subscriber, #4433) [Link]

I think he was looking for a Greek Wedding :-)

(hint - it's a TV program ...)

Cheers,
Wol

BTF support in Aya

Posted Oct 30, 2024 8:10 UTC (Wed) by vadorovsky (guest, #171932) [Link]

I just wanted to mention that the work on properly supporting BTF and making sure that the produced information lines up with what kernel expects is something that is already being done in bpf-linker[0] - a project which is a part of Aya[1] ecosystem. The work done there is in line with what this article describes. bpf-linker is a bitcode linker, so it has freedom to modify LLVM IR before the actual ELF object is produced.

The first part, which is sanitizing the LLVM DebugInfo in a way it produces BTF acceptable by the kernel, is already done.[2] It comes with the following quirks:

- Support of anonymous structs, which are needed for BTF maps, but are not supported by Rust. Maps can be anonymized using a marker called `AyaBtfMapMarker` (which is basically an alias for `PhantomData<()>`).
- Skipping data-carrying enums. This is fine as long as there are no kernel modules actually using them.
- Sanitizing names of types with generics - type names like `MyType<u32>` are not correct BTF, but we sanitize it to `MyType_3C__5B_u32_5D__3E_` in deterministic way (hex char representations for all problematic characters).

The remaining part, which is being worked on, is adding BTF relocations - an equivalent of BPF_CORE_READ/__builtin_preserve_access_index. There is an issue describing the steps[3]. The plan is to replace GEP+load instructions with @llvm.preserve.[...].access index intrinsic calls. There will be a pull request with that work really soon.

[0] https://github.com/aya-rs/bpf-linker
[1] https://aya-rs.dev/
[2] https://github.com/aya-rs/bpf-linker/pull/182
[3] https://github.com/aya-rs/aya/issues/349