The challenge of compiling for verified architectures
Marchesi, who has been working on a BPF backend for GCC, started by saying that the problem is inherently unsolvable. The BPF verifier is trying to do something that, in the theoretical sense, cannot be done; to make the problem tractable it imposes a number of restrictions on BPF programs, with the result that many correct programs still fail to pass verification. The verifier is a moving target as it evolves with each kernel release, and it is not the only verifier out there (the Windows BPF implementation has its own verifier that imposes a different set of restrictions). It is a challenging task but, perhaps, something can be done in practice to at least improve the situation.
The "world domination plan" for BPF support in GCC has several stages,
Marchesi said. The initial goal was to achieve parity with LLVM, which is
still used for all BPF programs currently. The LLVM BPF backend, he noted,
was created by the same developers who were working on the BPF virtual
machine, at the same time; each side could be changed to accommodate the
other. The GCC developers, instead, have to deal with the virtual machine
as it exists. So that task took a few years, but it has been largely
achieved.
The next step is to be able to compile and run all of the kernel's BPF tests; that is what is being worked on now. Compiling the tests is mostly done, but running them is trickier for the reasons described above: although the programs are known to be correct, the code generated by GCC runs afoul of the verifier. Once that problem has been overcome, Marchesi said, the next object will be to compile and run all existing BPF programs.
A verified target like the BPF virtual machine is not the same as a sandbox, he said. A sandbox involves putting a jail around code to block any attempted bad behavior. The verifier, instead, proves to itself that a program is not harmful, then runs it in a privileged environment. That verification imposes a number of "peculiarities" that a compiler must cope with. For example, BPF programs have disjoint stacks, where each frame is stored separately from the others. But the real problem is generating verifiable code. As an example here, he noted that the verifier will complain about backward jumps in BPF code; developers can work to avoid them, but optimizing compilers can do interesting things, and it's not always clear where a backward jump might have been introduced. There are many other ways to fail verification, and the list changes from one kernel release to the next.
Verification problems, he said, come in two broad categories. One is that certain types of source constructs are simply not verifiable; these include computed gotos and loops without known bounds. The other is transformations from optimization; a developer can avoid problematic constructs and generally write with verification in mind, but an optimizing compiler can mangle the code and introduce problems. To be useful, a compiler for the BPF target must take extra pains to produce verification-friendly code — not a simple objective.
Paths toward a solution
Marchesi then presented a long list of approaches that could be taken to this problem, starting with "approach zero": "do nothing". That, he said, is the current strategy with the GCC work. It is good enough to be able to compile the DTrace BPF support routines, but far from sufficient in the general case. About 90% of the BPF self-tests, he said, do not currently get past the verifier when compiled with GCC. So this approach clearly is not good enough even for existing programs, much less those that might appear in the future.
Approach one is to disable optimizations entirely. This comes at a cost to both performance and program size; size is important, he said, because of the size limit imposed by the verifier. Another problem, it seems, is that some programs simply will not work without -O2 optimization; they rely on the constant folding and propagation that the optimizer provides. So this does not appear to be a viable option for a real-world compiler.
Approach two is to disable optimizations more selectively, finding the ones that result in unverifiable code and disabling them. This is a task that would have to be handled automatically, he said. It could be addressed by formalizing the constraints imposed by the verifier, then testing the code produced after each optimization pass to ensure that the constraints are satisfied. If a problem is introduced, the changes from that pass could be discarded. Some passes, though, can introduce a lot of different optimizations; throwing them out would lose the good changes along with the bad one. Some problems, he said, are also caused by combinations of passes.
A different way, approach three, would be "antipasses" that would explicitly undo code transformations that create problems. This is the current approach used by LLVM; it has the advantage of only undoing problematic changes, with no effect on the rest of the compilation process. But antipasses are fragile and easily broken by changes elsewhere in the compiler; that leads to "maintenance hell". There was, he said, frustration expressed at this year's LSFMM+BPF Summit by developers who find that they can only successfully build their programs with specific LLVM versions. As a result, the LLVM developers are adding new features, but users are sticking with older releases that actually work and are not benefiting from those features.
Approach four would be target-driven pass tailoring, or the disabling of specific transformations by hooking directly into the optimization passes. LLVM is trying to move in this direction, he said, but other compiler developers are resisting that approach. It leads to entirely legal (from the language sense) transformations becoming illegal and unusable. This is, though, a more robust approach to the problem.
A variant would be approach five: generic pass tailoring. A new option (-Overifiable, perhaps) could be added to the compiler alongside existing options like -Osmall or -Ofast. With this option, optimizations would be restricted to those that make relatively unsurprising transformations to the code. This option would not be restricted to any one backend and could be useful in other contexts. Some companies, he said, have security teams that run static analyzers on binary code. A more predictable code stream created by an option like this (he even suggested that -Opredictable might be a better option name) would be easier for the analyzers to work with.
Approach six would be language-level support; this could take the form of, for example, #pragma statements providing loop bounds. It was quickly pointed out, though, that the verifier cannot trust source-level declarations of this kind, so this approach was quickly discarded. Approach seven, adding support to the assembler, was also quickly set aside as unworkable.
Marchesi concluded the talk by saying that, in the end, a combination of
the above approaches will probably be needed to get GCC to the point where
it routinely creates verifiable BPF code. He would like to build a wider
consensus on the details of what is to be done before proceeding much
further. To that end, he is working with the BPF standardization process,
and would like to get other GCC developers involved as well; he is also
coordinating with the LLVM developers. The problem may be unsolvable in
the general sense, but it should still be possible to make things better
for BPF developers.
| Index entries for this article | |
|---|---|
| Kernel | BPF/Compiler support |
| Conference | GNU Tools Cauldron/2023 |
