Binary portability for BPF programs

By Jonathan Corbet
November 30, 2018

The BPF virtual machine is the same on all architectures where it is supported; architecture-specific code takes care of translating BPF to something the local processor can understand. So one might be tempted to think that BPF programs would be portable across architectures but, in many cases, that turns out not to be true. During the BPF microconference at the Linux Plumbers Conference, Alexei Starovoitov (assisted by Yonghong Song, who has done much of the work described) explained the problem and the work that has been done toward "compile once, run everywhere" BPF.

Many BPF programs are indeed portable, in that they will load and execute properly on any type of processor. Packet-filtering programs, in particular, usually just work. But there is a significant class of exceptions in the form of tracing programs, which are one of the biggest growth areas for BPF. Most tracing tools have two components: a user-space program invoked by the user, and a BPF program that is loaded into the kernel to filter, acquire, and possibly boil down the needed data. Both programs are normally written in C.

The BPF side of a tracing program may have to dig deeply into the guts of the kernel, and those guts can change significantly from one kernel to the next. The offsets of specific fields within structures are a particular problem; they can differ depending on architecture, kernel configuration options, and more. Tracing programs often need to use those offsets to get the data they are looking for. If the offsets built into a given BPF program do not match the current kernel, the program will not produce the correct results.

This problem is "solved" now by compiling BPF programs on the fly, just prior to loading them into the kernel. To do that, the BPF Compiler Collection (BCC) bundles a copy of the Clang compiler, which is a lot of code to haul around — and much of that code has to be linked into the tracing program itself, where it consumes RAM. This toolchain, along with the kernel development headers, must be installed on the system being traced, a painful task on embedded systems. Even then, it's often necessary to paste specific structure definitions into BPF programs to be able to access the needed fields.

The proposed solution is to introduce structure-field offset information into the BPF Type Format (BTF) section describing a compiled BPF program. Those offsets are built into BPF programs by the compiler now; what is needed is a set of pointers to where those offsets are used and their associated field names; then the libbpf library will be enhanced to "relocate" those offsets to match the current kernel before a given program is loaded into the kernel.

Parts of this problem are hard. In particular, getting the field-name information through LLVM's intermediate representation is difficult; there is "a lot of compiler work" to be done to support this feature. The information needed to perform relocation is more readily available from the vmlinux kernel image file on the target system. Ongoing work includes converting the data-type information stored in the DWARF format in the kernel image to BTF, a process that reduces the size of that information from 120MB to 2MB.

Offsets to structure fields are not the only problem that needs to be solved, though. Imagine a bit of code that looks like:

    #if KERNEL_VERSION == 406
        minrtt = ms.v1;
    #else
        minrtt = ms.v2;
    #endif

The branch that is pruned by the preprocessor never appears in the output, with the result that the generated BPF code is dependent on the kernel version. The planned solution here is to turn the preprocessor variable into a BPF variable, so that the above code could be written as:

    if (__bpf_kernel_version == 406)
        minrtt = ms.v1;
    else
 	minrtt = ms.v2;

Both paths are now present in the generated BPF code, which will do the right thing regardless of the kernel version. Other cases are harder; imagine, for example, code that is dependent on whether the REQ_OP_SHIFT macro is defined. Once again, a global variable (__bpf_req_op_shift) is created to delay the decision until run time and keep all paths present in the generated code. Things get more complicated when it comes to types that may not exist at all depending on something like a configuration variable. Solutions here include a complex "fuzzy struct-type matching" mechanism, or just creating a massive file full of type information (in the BTF format) for a wide range of kernel versions.

The problem can be made arbitrarily complex, though; Jes Sorensen asked whether it would be possible to handle CPU masks, which are stored on the kernel stack — unless the system is too large, in which case they are pushed out to heap storage. The answer was that some things will just never be possible.

Other problems include calling static inline functions and preprocessor macros from BPF programs; there does not appear to be a better solution than just copying them into the program at this point. That will bloat the size of the program, of course, and getting some of those functions past the BPF verifier could prove to be a challenge.

Some related work has to do with adding global variables and read-only data to BPF programs. Globals, which are needed to support some of the techniques described above, can be added without any compiler changes, but the kernel API to support them still needs to be designed and implemented. That is also true of read-only data, which would be especially useful for the handling of strings in BPF programs.

There are clearly a few things to be worked out in this area still, and it may never be possible to run an arbitrary BPF program on any system. But it seems likely that BPF users will see a solution that works for a lot of the commonly-used tools in the BCC collection, which should make life easier for a lot of use cases.

(The slides from this presentation [PDF] are available.)

[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my travel to the event.]

Index entries for this article
Kernel	BPF
Conference	Linux Plumbers Conference/2018

Binary portability for BPF programs

Posted Nov 30, 2018 19:21 UTC (Fri) by nix (subscriber, #2304) [Link]

Hm. I considered doing kernel-version-independent stuff when I did the CTF work for DTrace for Linux, but came to the conclusion that it was pointless: there is still massive .config-related data-structure variation that is almost certain to cause offsets to vary between distinct kernel releases in any case, so there was hardly anything to gain. I don't see what's changed to make it worthwhile here.

(One thing I *did* do was make it .config-independent to the extent that if you compile things as a module, or make them built in, the CTF is still placed in the same per-module location (really a CTF file member with a per-module name in the generated CTF archive), so that DTrace programs could reference e.g. ext4`ext4_inode_info and not have to change their scripts if users chose to build ext4 into the kernel.)

Binary portability for BPF programs

Posted Nov 30, 2018 20:26 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (19 responses)

Why not just put LLVM into the kernel?

Binary portability for BPF programs

Posted Nov 30, 2018 20:47 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (17 responses)

Nowadays that should be WebAssembly.

Binary portability for BPF programs

Posted Nov 30, 2018 21:57 UTC (Fri) by sorokin (guest, #88478) [Link] (16 responses)

The problem with WebAssembly is that it is binary. Binary means unportable and insecure. I would opt into a completely text-based kernel. Being text-based guarantees backward compatibility, portability, extensibility and security.

I can imagine a micro-kernel where each service is Node.js instance. IPC is done using simple text-based JSON. Kernel modules can be loaded from npm at boot time.

What about performance? Well you know JITs nowadays can be as fast as C and sometimes they are even faster. So I guess it can be even faster than Linux kernel, right?

Binary portability for BPF programs

Posted Nov 30, 2018 22:28 UTC (Fri) by mfuzzey (subscriber, #57966) [Link] (2 responses)

Have I been sleeping for the last 4 months?
No not April yet :)

Binary portability for BPF programs

Posted Nov 30, 2018 23:27 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

It's always April somewhere.

Binary portability for BPF programs

Posted Dec 1, 2018 18:48 UTC (Sat) by Camto (guest, #128967) [Link]

T I M E Z O N E S

Binary portability for BPF programs

Posted Nov 30, 2018 23:04 UTC (Fri) by rweikusat2 (subscriber, #117920) [Link]

> What about performance? Well you know JITs nowadays can be as fast as C and sometimes they are even faster. So I guess it can be > even faster than Linux kernel, right?

C isn't executed, hence, C-code has no speed (if a compiler is being used). Machine code is executed. As "at the absolutely wrong time" (namely, while a user is waiting for something to happen) compilers generate machine code, there's obviously no reason why machine code generated "just in time" (when the program was compiled) would have an built-in speed difference compared to machine code generated by the "at the wrong time" compiler.

But that's a pretty meaningless statement: An typical Node.js module depends on each other, typical Node.js module which was at least already planned by the it was written. And a codepath traversing all Node.js code on this planet will take longer to execute that a functionally equivalent, self-contained program. Not to mention that "at the absolutely wrong time" compiling all Node.js code on this planet will already take a lof of time on its own.

Hence, outside of doctored microbenchmarks, the "runs even faster" is not going to happen.

Binary portability for BPF programs

Posted Dec 1, 2018 13:58 UTC (Sat) by meyert (subscriber, #32097) [Link]

This is one of the worst ideas I did hear in the last few years.

Binary portability for BPF programs

Posted Dec 1, 2018 14:52 UTC (Sat) by nix (subscriber, #2304) [Link] (7 responses)

The problem with WebAssembly is that it is binary. Binary means unportable and insecure. I would opt into a completely text-based kernel.

The problem with binary protocols over textual ones is not that they are unportable and insecure, not if their properties are properly specified (as eBPF's has been). It is that they are opaque and hard to debug if you're looking at a raw packet dump. This is not usually considered a problem for assembly languages, which are not usually transmitted over the wire (if you want to debug it, you have a disassembler), and if you are throwing it over the network the ubiquity of tcpdump and/or Wireshark and its massive army of packet dissectors means that binary protocols are much less annoying than they used to be too. The only remaining advantage of textual protocols is that they are easy to write by hand... and who the hell writes major web apps by hand into a telnet session? (Or BPF programs, for that matter). Not even people doing early experimentation do any such thing.

There is a reason the successors to HTTP are all binary protocols. I like textual protocols but in some respects their benefits have declined to irrelevance. The tradeoff wheel has turned once more.

Binary portability for BPF programs

Posted Dec 2, 2018 4:12 UTC (Sun) by sorokin (guest, #88478) [Link] (5 responses)

I did not marked it explicitly, but there should be a big <sarcasm> tag in my comment.

No, I don't think it is a good idea to have micro-kernel consisting of Node.js instances. My comment was intended to have humorous meaning. I just took a few misconceptions I heard from different people and mixed them all together in a single absurd comment.

One example of such misconception is "text based protocols/formats have inherently better backward compatibility than binary". The source of the misconception is comparing key/value-based text formats with sequence-based binary formats. People attribute the distinction to the difference between text and binary, instead of between key/value-based and sequence-based. One can formulate another statement like "key/value-based formats have inherently better backward compatibility than sequence-based". Well this is only partially true. This holds true for only one type of change of the format: adding a new key and assigning a default value if it is not present. Other changes (deleting key, renaming key) break backward-compatibility of key/value-based formats.

Above I've refused only one misconception, but there are many others I used in my comment.

I completely agree with what you said. Thank you for thoughtful answer to my comment. My comment was not intended to be taken seriously. Sorry if my sarcasm was not apparent at first.

Binary portability for BPF programs

Posted Dec 2, 2018 17:21 UTC (Sun) by nix (subscriber, #2304) [Link] (1 responses)

I thought it might be sarcastic, but except for node.js I've heard exactly the same thing from many bearded old fart sysadmins used to typing stuff into telnet sessions. Of course what they actually want is a textual *interpretation* of the output (and the ability to put textual input in and have it translated the other way), but that's rarely what they ask for. :)

Binary portability for BPF programs

Posted Dec 2, 2018 17:43 UTC (Sun) by marcH (subscriber, #57642) [Link]

> I've heard exactly the same thing from many bearded old fart sysadmins used to typing stuff into telnet sessions.

Whereas younger people love using hex editors.

> Of course what they actually want is a textual *interpretation* of the output (and the ability to put textual input in and have it translated the other way), but that's rarely what they ask for. :)

They don't ask that because they know they never get it.

http://www.catb.org/esr/writings/taoup/html/ch05s01.html

Binary portability for BPF programs

Posted Dec 2, 2018 17:33 UTC (Sun) by marcH (subscriber, #57642) [Link] (2 responses)

> People attribute the distinction to the difference between text and binary, instead of between key/value-based and sequence-based.

That confusion is because they're often the same in practice. Key/value means a parser is required, a parser is where the compatibility comes from. Protocols are often binary *because* designers want to just copy from/to memory with as little parsing as possible (just some sanity checks), for instance for performance reasons.

> Other changes (deleting key, renaming key) break backward-compatibility of key/value-based formats.

That's why newer versions rarely ever delete any key and only after a long period of deprecation and warnings and why would anyone rename a key?

> One can formulate another statement like "key/value-based formats have inherently better backward compatibility than sequence-based". Well this is only partially true.

Partially true... in theory.

Binary portability for BPF programs

Posted Dec 3, 2018 18:55 UTC (Mon) by mathstuf (subscriber, #69389) [Link] (1 responses)

> why would anyone rename a key?

I know that IETF handled it well (basically by just saying "oops, sorry"), but some people would not have been able to restrain themselves from fixing "Referer".

Binary portability for BPF programs

Posted Dec 3, 2018 19:20 UTC (Mon) by excors (subscriber, #95769) [Link]

Early versions of HTML5 even attempted to spread the misspelling to the new feature <a rel=noreferer>, for consistency with the HTTP header it refer(r)ed to. But then someone pointed out the DOM API for Referer was already called document.referrer, so HTML5 could switch to <a rel=noreferrer> without feeling guilty about being the first to break consistency.

Binary portability for BPF programs

Posted Dec 2, 2018 14:21 UTC (Sun) by rweikusat2 (subscriber, #117920) [Link]

> The only remaining advantage of textual protocols is that they are easy to write by hand... and who the hell writes major web apps > by hand into a telnet session? (Or BPF programs, for that matter). Not even people doing early experimentation do any such thing.

Well, people who program "web apps" write HTTP "by hand", just like all the other code. Text is also fairly easily generated with simple facilities: It's possible to write a fairly comprehensive HTTP-library in less then 600 lines of code.

Binary portability for BPF programs

Posted Dec 1, 2018 18:50 UTC (Sat) by Camto (guest, #128967) [Link] (2 responses)

> Binary means unportable

Oh yes like PDFs. Those are so unportable.

Binary portability for BPF programs

Posted Dec 2, 2018 4:25 UTC (Sun) by k8to (guest, #15413) [Link]

Funny thing: I've created many many valid PDFs that many PDF reader implementations cannot render correctly. Most of them I didn't even do intentionally. I sort of think of it as "DF".

Binary portability for BPF programs

Posted Dec 2, 2018 8:43 UTC (Sun) by matthias (subscriber, #94967) [Link]

Actually, PDF is a 7-bit ASCII format. Yes, there might be some binary content embedded, especially if you use compression and/or encryption, but like PostScript it is a language that you could write in a normal text editor.

Fun fact: In the national German computer science contest (Bundeswettbewerb Informatik) we once got a solution for an exercise, where some fractal image should be computed. The code was entirely written in PostScript. It could be send to a PostScript printer to compute and print the image. There was no restriction to the programming language. The code had to be documented and the language should be somewhat reasonable.

Binary portability for BPF programs

Posted Dec 1, 2018 14:48 UTC (Sat) by nix (subscriber, #2304) [Link]

LLVM bitcode, in addition to the lack of portability between LLVM releases etc, is a hell of a lot further from the machine than (e)BPF is. BPF is really quite remarkably easy to JIT down to x86_64 and most other modern machine languages, and the interpreter core is pretty small and easy to read as well, as such things go: most opcodes translate directly into a little bit of obvious C acting directly on the variables that hold the register contents. (Far more work goes into the verifier, which of course is necessary even, perhaps especially, if you're JITting).

One sign that BPF is nicely designed: as someone who's been hand-writing BPF recently (rewriting a code generator that used to generate output for a much more, ah, *verbose* intermediate representation with many more opcodes), whenever I found I needed a particular opcode, it was there, and nothing I didn't need was there except the weird historical stuff to do packet content inspection.

I like BPF. I thought I'd hate it, because all such things are generally hateful, but it's not hateful at all: it has no horrible irregularities that make you scream and most of the annoying limits as a general-purpose-but-verified language (stack size, etc, lack of even constrained loops, etc) are being raised or fixed as we speak. If any language takes over its world like BPF is doing, I'm happy it's BPF that's doing so.