Binary portability for BPF programs
Many BPF programs are indeed portable, in that they will load and execute properly on any type of processor. Packet-filtering programs, in particular, usually just work. But there is a significant class of exceptions in the form of tracing programs, which are one of the biggest growth areas for BPF. Most tracing tools have two components: a user-space program invoked by the user, and a BPF program that is loaded into the kernel to filter, acquire, and possibly boil down the needed data. Both programs are normally written in C.
The BPF side of a tracing program may have to dig deeply into the guts of the kernel, and those guts can change significantly from one kernel to the next. The offsets of specific fields within structures are a particular problem; they can differ depending on architecture, kernel configuration options, and more. Tracing programs often need to use those offsets to get the data they are looking for. If the offsets built into a given BPF program do not match the current kernel, the program will not produce the correct results.
This problem is "solved" now by compiling BPF programs on the fly, just
prior to loading them into the kernel. To do that, the BPF Compiler Collection (BCC)
bundles a
copy of the Clang compiler, which is a lot of code to haul around — and
much of that code has to be linked into the tracing program itself, where
it consumes RAM. This
toolchain, along with the kernel development headers, must be installed on
the system being traced, a painful task on embedded systems. Even then,
it's often necessary to paste specific structure definitions into BPF
programs to be able to access the needed fields.
The proposed solution is to introduce structure-field offset information into the BPF Type Format (BTF) section describing a compiled BPF program. Those offsets are built into BPF programs by the compiler now; what is needed is a set of pointers to where those offsets are used and their associated field names; then the libbpf library will be enhanced to "relocate" those offsets to match the current kernel before a given program is loaded into the kernel.
Parts of this problem are hard. In particular, getting the field-name information through LLVM's intermediate representation is difficult; there is "a lot of compiler work" to be done to support this feature. The information needed to perform relocation is more readily available from the vmlinux kernel image file on the target system. Ongoing work includes converting the data-type information stored in the DWARF format in the kernel image to BTF, a process that reduces the size of that information from 120MB to 2MB.
Offsets to structure fields are not the only problem that needs to be solved, though. Imagine a bit of code that looks like:
#if KERNEL_VERSION == 406 minrtt = ms.v1; #else minrtt = ms.v2; #endif
The branch that is pruned by the preprocessor never appears in the output, with the result that the generated BPF code is dependent on the kernel version. The planned solution here is to turn the preprocessor variable into a BPF variable, so that the above code could be written as:
if (__bpf_kernel_version == 406) minrtt = ms.v1; else minrtt = ms.v2;
Both paths are now present in the generated BPF code, which will do the right thing regardless of the kernel version. Other cases are harder; imagine, for example, code that is dependent on whether the REQ_OP_SHIFT macro is defined. Once again, a global variable (__bpf_req_op_shift) is created to delay the decision until run time and keep all paths present in the generated code. Things get more complicated when it comes to types that may not exist at all depending on something like a configuration variable. Solutions here include a complex "fuzzy struct-type matching" mechanism, or just creating a massive file full of type information (in the BTF format) for a wide range of kernel versions.
The problem can be made arbitrarily complex, though; Jes Sorensen asked whether it would be possible to handle CPU masks, which are stored on the kernel stack — unless the system is too large, in which case they are pushed out to heap storage. The answer was that some things will just never be possible.
Other problems include calling static inline functions and preprocessor macros from BPF programs; there does not appear to be a better solution than just copying them into the program at this point. That will bloat the size of the program, of course, and getting some of those functions past the BPF verifier could prove to be a challenge.
Some related work has to do with adding global variables and read-only data to BPF programs. Globals, which are needed to support some of the techniques described above, can be added without any compiler changes, but the kernel API to support them still needs to be designed and implemented. That is also true of read-only data, which would be especially useful for the handling of strings in BPF programs.
There are clearly a few things to be worked out in this area still, and it may never be possible to run an arbitrary BPF program on any system. But it seems likely that BPF users will see a solution that works for a lot of the commonly-used tools in the BCC collection, which should make life easier for a lot of use cases.
(The slides from this presentation [PDF] are available.)
[Thanks to the Linux Foundation, LWN's travel sponsor, for supporting my
travel to the event.]
Index entries for this article | |
---|---|
Kernel | BPF |
Conference | Linux Plumbers Conference/2018 |
Posted Nov 30, 2018 19:21 UTC (Fri)
by nix (subscriber, #2304)
[Link]
(One thing I *did* do was make it .config-independent to the extent that if you compile things as a module, or make them built in, the CTF is still placed in the same per-module location (really a CTF file member with a per-module name in the generated CTF archive), so that DTrace programs could reference e.g. ext4`ext4_inode_info and not have to change their scripts if users chose to build ext4 into the kernel.)
Posted Nov 30, 2018 20:26 UTC (Fri)
by Cyberax (✭ supporter ✭, #52523)
[Link] (19 responses)
Posted Nov 30, 2018 20:47 UTC (Fri)
by ibukanov (subscriber, #3942)
[Link] (17 responses)
Posted Nov 30, 2018 21:57 UTC (Fri)
by sorokin (guest, #88478)
[Link] (16 responses)
I can imagine a micro-kernel where each service is Node.js instance. IPC is done using simple text-based JSON. Kernel modules can be loaded from npm at boot time.
What about performance? Well you know JITs nowadays can be as fast as C and sometimes they are even faster. So I guess it can be even faster than Linux kernel, right?
Posted Nov 30, 2018 22:28 UTC (Fri)
by mfuzzey (subscriber, #57966)
[Link] (2 responses)
Posted Nov 30, 2018 23:04 UTC (Fri)
by rweikusat2 (subscriber, #117920)
[Link]
C isn't executed, hence, C-code has no speed (if a compiler is being used). Machine code is executed. As "at the absolutely wrong time" (namely, while a user is waiting for something to happen) compilers generate machine code, there's obviously no reason why machine code generated "just in time" (when the program was compiled) would have an built-in speed difference compared to machine code generated by the "at the wrong time" compiler.
But that's a pretty meaningless statement: An typical Node.js module depends on each other, typical Node.js module which was at least already planned by the it was written. And a codepath traversing all Node.js code on this planet will take longer to execute that a functionally equivalent, self-contained program. Not to mention that "at the absolutely wrong time" compiling all Node.js code on this planet will already take a lof of time on its own.
Hence, outside of doctored microbenchmarks, the "runs even faster" is not going to happen.
Posted Dec 1, 2018 13:58 UTC (Sat)
by meyert (subscriber, #32097)
[Link]
Posted Dec 1, 2018 14:52 UTC (Sat)
by nix (subscriber, #2304)
[Link] (7 responses)
There is a reason the successors to HTTP are all binary protocols. I like textual protocols but in some respects their benefits have declined to irrelevance. The tradeoff wheel has turned once more.
Posted Dec 2, 2018 4:12 UTC (Sun)
by sorokin (guest, #88478)
[Link] (5 responses)
No, I don't think it is a good idea to have micro-kernel consisting of Node.js instances. My comment was intended to have humorous meaning. I just took a few misconceptions I heard from different people and mixed them all together in a single absurd comment.
One example of such misconception is "text based protocols/formats have inherently better backward compatibility than binary". The source of the misconception is comparing key/value-based text formats with sequence-based binary formats. People attribute the distinction to the difference between text and binary, instead of between key/value-based and sequence-based. One can formulate another statement like "key/value-based formats have inherently better backward compatibility than sequence-based". Well this is only partially true. This holds true for only one type of change of the format: adding a new key and assigning a default value if it is not present. Other changes (deleting key, renaming key) break backward-compatibility of key/value-based formats.
Above I've refused only one misconception, but there are many others I used in my comment.
I completely agree with what you said. Thank you for thoughtful answer to my comment. My comment was not intended to be taken seriously. Sorry if my sarcasm was not apparent at first.
Posted Dec 2, 2018 17:21 UTC (Sun)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Dec 2, 2018 17:43 UTC (Sun)
by marcH (subscriber, #57642)
[Link]
Whereas younger people love using hex editors.
> Of course what they actually want is a textual *interpretation* of the output (and the ability to put textual input in and have it translated the other way), but that's rarely what they ask for. :)
They don't ask that because they know they never get it.
Posted Dec 2, 2018 17:33 UTC (Sun)
by marcH (subscriber, #57642)
[Link] (2 responses)
That confusion is because they're often the same in practice. Key/value means a parser is required, a parser is where the compatibility comes from. Protocols are often binary *because* designers want to just copy from/to memory with as little parsing as possible (just some sanity checks), for instance for performance reasons.
> Other changes (deleting key, renaming key) break backward-compatibility of key/value-based formats.
That's why newer versions rarely ever delete any key and only after a long period of deprecation and warnings and why would anyone rename a key?
> One can formulate another statement like "key/value-based formats have inherently better backward compatibility than sequence-based". Well this is only partially true.
Partially true... in theory.
Posted Dec 3, 2018 18:55 UTC (Mon)
by mathstuf (subscriber, #69389)
[Link] (1 responses)
I know that IETF handled it well (basically by just saying "oops, sorry"), but some people would not have been able to restrain themselves from fixing "Referer".
Posted Dec 3, 2018 19:20 UTC (Mon)
by excors (subscriber, #95769)
[Link]
Posted Dec 2, 2018 14:21 UTC (Sun)
by rweikusat2 (subscriber, #117920)
[Link]
Well, people who program "web apps" write HTTP "by hand", just like all the other code. Text is also fairly easily generated with simple facilities: It's possible to write a fairly comprehensive HTTP-library in less then 600 lines of code.
Posted Dec 1, 2018 18:50 UTC (Sat)
by Camto (guest, #128967)
[Link] (2 responses)
Oh yes like PDFs. Those are so unportable.
Posted Dec 2, 2018 4:25 UTC (Sun)
by k8to (guest, #15413)
[Link]
Posted Dec 2, 2018 8:43 UTC (Sun)
by matthias (subscriber, #94967)
[Link]
Fun fact: In the national German computer science contest (Bundeswettbewerb Informatik) we once got a solution for an exercise, where some fractal image should be computed. The code was entirely written in PostScript. It could be send to a PostScript printer to compute and print the image. There was no restriction to the programming language. The code had to be documented and the language should be somewhat reasonable.
Posted Dec 1, 2018 14:48 UTC (Sat)
by nix (subscriber, #2304)
[Link]
One sign that BPF is nicely designed: as someone who's been hand-writing BPF recently (rewriting a code generator that used to generate output for a much more, ah, *verbose* intermediate representation with many more opcodes), whenever I found I needed a particular opcode, it was there, and nothing I didn't need was there except the weird historical stuff to do packet content inspection.
I like BPF. I thought I'd hate it, because all such things are generally hateful, but it's not hateful at all: it has no horrible irregularities that make you scream and most of the annoying limits as a general-purpose-but-verified language (stack size, etc, lack of even constrained loops, etc) are being raised or fixed as we speak. If any language takes over its world like BPF is doing, I'm happy it's BPF that's doing so.
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
No not April yet :)
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
The problem with WebAssembly is that it is binary. Binary means unportable and insecure. I would opt into a completely text-based kernel.
The problem with binary protocols over textual ones is not that they are unportable and insecure, not if their properties are properly specified (as eBPF's has been). It is that they are opaque and hard to debug if you're looking at a raw packet dump. This is not usually considered a problem for assembly languages, which are not usually transmitted over the wire (if you want to debug it, you have a disassembler), and if you are throwing it over the network the ubiquity of tcpdump and/or Wireshark and its massive army of packet dissectors means that binary protocols are much less annoying than they used to be too. The only remaining advantage of textual protocols is that they are easy to write by hand... and who the hell writes major web apps by hand into a telnet session? (Or BPF programs, for that matter). Not even people doing early experimentation do any such thing.
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs
Binary portability for BPF programs