The challenge of compiling for verified architectures

By Jonathan Corbet
October 6, 2023

On its surface, the BPF virtual machine resembles many other computer architectures; it has registers and instructions to perform the usual operations. But there is a key difference: BPF programs must pass the kernel's verifier before they can be run. The verifier imposes a long list of additional restrictions so that it can prove to itself that any given program is safe to run; getting past those checks can be a source of frustration for BPF developers. At the 2023 GNU Tools Cauldron, José Marchesi looked at the problem of compiling for verified architectures and how the compiler can generate code that will pass verification.

Marchesi, who has been working on a BPF backend for GCC, started by saying that the problem is inherently unsolvable. The BPF verifier is trying to do something that, in the theoretical sense, cannot be done; to make the problem tractable it imposes a number of restrictions on BPF programs, with the result that many correct programs still fail to pass verification. The verifier is a moving target as it evolves with each kernel release, and it is not the only verifier out there (the Windows BPF implementation has its own verifier that imposes a different set of restrictions). It is a challenging task but, perhaps, something can be done in practice to at least improve the situation.

The "world domination plan" for BPF support in GCC has several stages, Marchesi said. The initial goal was to achieve parity with LLVM, which is still used for all BPF programs currently. The LLVM BPF backend, he noted, was created by the same developers who were working on the BPF virtual machine, at the same time; each side could be changed to accommodate the other. The GCC developers, instead, have to deal with the virtual machine as it exists. So that task took a few years, but it has been largely achieved.

The next step is to be able to compile and run all of the kernel's BPF tests; that is what is being worked on now. Compiling the tests is mostly done, but running them is trickier for the reasons described above: although the programs are known to be correct, the code generated by GCC runs afoul of the verifier. Once that problem has been overcome, Marchesi said, the next object will be to compile and run all existing BPF programs.

A verified target like the BPF virtual machine is not the same as a sandbox, he said. A sandbox involves putting a jail around code to block any attempted bad behavior. The verifier, instead, proves to itself that a program is not harmful, then runs it in a privileged environment. That verification imposes a number of "peculiarities" that a compiler must cope with. For example, BPF programs have disjoint stacks, where each frame is stored separately from the others. But the real problem is generating verifiable code. As an example here, he noted that the verifier will complain about backward jumps in BPF code; developers can work to avoid them, but optimizing compilers can do interesting things, and it's not always clear where a backward jump might have been introduced. There are many other ways to fail verification, and the list changes from one kernel release to the next.

Verification problems, he said, come in two broad categories. One is that certain types of source constructs are simply not verifiable; these include computed gotos and loops without known bounds. The other is transformations from optimization; a developer can avoid problematic constructs and generally write with verification in mind, but an optimizing compiler can mangle the code and introduce problems. To be useful, a compiler for the BPF target must take extra pains to produce verification-friendly code — not a simple objective.

Paths toward a solution

Marchesi then presented a long list of approaches that could be taken to this problem, starting with "approach zero": "do nothing". That, he said, is the current strategy with the GCC work. It is good enough to be able to compile the DTrace BPF support routines, but far from sufficient in the general case. About 90% of the BPF self-tests, he said, do not currently get past the verifier when compiled with GCC. So this approach clearly is not good enough even for existing programs, much less those that might appear in the future.

Approach one is to disable optimizations entirely. This comes at a cost to both performance and program size; size is important, he said, because of the size limit imposed by the verifier. Another problem, it seems, is that some programs simply will not work without -O2 optimization; they rely on the constant folding and propagation that the optimizer provides. So this does not appear to be a viable option for a real-world compiler.

Approach two is to disable optimizations more selectively, finding the ones that result in unverifiable code and disabling them. This is a task that would have to be handled automatically, he said. It could be addressed by formalizing the constraints imposed by the verifier, then testing the code produced after each optimization pass to ensure that the constraints are satisfied. If a problem is introduced, the changes from that pass could be discarded. Some passes, though, can introduce a lot of different optimizations; throwing them out would lose the good changes along with the bad one. Some problems, he said, are also caused by combinations of passes.

A different way, approach three, would be "antipasses" that would explicitly undo code transformations that create problems. This is the current approach used by LLVM; it has the advantage of only undoing problematic changes, with no effect on the rest of the compilation process. But antipasses are fragile and easily broken by changes elsewhere in the compiler; that leads to "maintenance hell". There was, he said, frustration expressed at this year's LSFMM+BPF Summit by developers who find that they can only successfully build their programs with specific LLVM versions. As a result, the LLVM developers are adding new features, but users are sticking with older releases that actually work and are not benefiting from those features.

Approach four would be target-driven pass tailoring, or the disabling of specific transformations by hooking directly into the optimization passes. LLVM is trying to move in this direction, he said, but other compiler developers are resisting that approach. It leads to entirely legal (from the language sense) transformations becoming illegal and unusable. This is, though, a more robust approach to the problem.

A variant would be approach five: generic pass tailoring. A new option (-Overifiable, perhaps) could be added to the compiler alongside existing options like -Osmall or -Ofast. With this option, optimizations would be restricted to those that make relatively unsurprising transformations to the code. This option would not be restricted to any one backend and could be useful in other contexts. Some companies, he said, have security teams that run static analyzers on binary code. A more predictable code stream created by an option like this (he even suggested that -Opredictable might be a better option name) would be easier for the analyzers to work with.

Approach six would be language-level support; this could take the form of, for example, #pragma statements providing loop bounds. It was quickly pointed out, though, that the verifier cannot trust source-level declarations of this kind, so this approach was quickly discarded. Approach seven, adding support to the assembler, was also quickly set aside as unworkable.

Marchesi concluded the talk by saying that, in the end, a combination of the above approaches will probably be needed to get GCC to the point where it routinely creates verifiable BPF code. He would like to build a wider consensus on the details of what is to be done before proceeding much further. To that end, he is working with the BPF standardization process, and would like to get other GCC developers involved as well; he is also coordinating with the LLVM developers. The problem may be unsolvable in the general sense, but it should still be possible to make things better for BPF developers.

Index entries for this article
Kernel	BPF/Compiler support
Conference	GNU Tools Cauldron/2023

approach six

Posted Oct 6, 2023 15:56 UTC (Fri) by jemarch (subscriber, #116773) [Link] (2 responses)

We thought a bit more about approach six and source level pragmas/annotations after the event. While it is true that a pragma like the following cannot be trusted by the verifier:

#pragma loop bounded 64
for (int i = 0; i < n; ++i) { ... }

Another kind of pragmas may be useful to make compiling verified code more practical. Let's call them "must-pragmas". For example:

#pragma loop must bound 0, 64
for (int i = 0; i < n; ++i) { ... }

That pragma would make the compiler to fail at compile time if it is not able to determine with 100% certainty, all the stages of compilation considered, that the bounds of the loop are indeed 0 and 64.

The idea is that if the compiler can figure out the loop bounds, then it may be possible to emit BPF code in a form from which the verifier can also determine the same bounds, and therefore there is a chance the compiler can produce a verifiable program.

On the other hand, if the compiler cannot determine the loop bounds, using all the information it has about the program being compiled (from source code to the several intermediate representations in their respective abstraction levels) then there is little or no hope for a low-level verifier, that only has access to the compiled instructions, to do so. A compile-time error is in order in that case, pointing to the specific "non-verifiable" source code construct.

Other example of that kind of pragmas could be `#pragma loop must unroll' and, in general, annotations that make the compiler to fail if some particular optimization has not been successfully applied to some particular source code construct.

So approach six may still contribute to the solution... perhaps 8-)

We plan to continue and expand the discussion on how to make compiling to BPF and verified targets more practical at the LPC Toolchains Track in November, along with the clang/llvm and kernel chaps. It will be fun, interesting and hopefully useful.

approach six

Posted Oct 6, 2023 20:28 UTC (Fri) by ejr (subscriber, #51652) [Link] (1 responses)

I've seen references combining CompCert and BPF, but I certainly am not up to date.

Do you have any insights on how the certifiable-C-ish language infrastructures could help?

approach six

Posted Oct 7, 2023 18:16 UTC (Sat) by jemarch (subscriber, #116773) [Link]

Hmm, AFAIK the verification provided by CompCert is that the generated code implements the same semantics than the source language file. I don't think that is exactly relevant to the verified targets problem. However, the notion/possibility of defining a BPF C subset of the C programming language that makes it easier to compile it into verifiable code has been already been raised by some. How similar would that be to MISRA is an interesting question.

Approach Load-time-optimization

Posted Oct 6, 2023 22:16 UTC (Fri) by rywang014 (subscriber, #167182) [Link] (5 responses)

The problem is that verification is done on the optimized byte code, but it's easier to be done on the pre-optimized one. So we can imagine to load a pre-optimized byte code, verify it, go through a trusted, in-kernel optimizer and then load it.

Approach Load-time-optimization

Posted Oct 7, 2023 0:44 UTC (Sat) by khim (subscriber, #9252) [Link] (4 responses)

People tried it and it just doesn't work. This assumes that your optimizer doesn't have bugs which is surprisingly hard, almost impossible to achieve if you allow it to accept arbitrary input.

Trusted optimizer is just something that doesn't exist, period.

Java developers tried to achieve that state and, ultimately, failed, after spending many billions on that.

Approach Load-time-optimization

Posted Oct 7, 2023 1:46 UTC (Sat) by roc (subscriber, #30627) [Link] (3 responses)

There's a way around that: Proof-carrying code. https://en.wikipedia.org/wiki/Proof-carrying_code
It has never really been used AFAIK but the theory works. It's just hard work to write the compiler the right way.

Approach Load-time-optimization

Posted Oct 7, 2023 6:41 UTC (Sat) by Wol (subscriber, #4433) [Link] (2 responses)

> It has never really been used AFAIK but the theory works. It's just hard work to write the compiler the right way.

In theory, theory and practice are the same. In practice, they are not.

Which is why I swear BY relational theory (it makes data analysis oh so easy), but I swear AT RDBMSs, and don't touch them with a barge pole unless I'm forced to (which is most of the time, now, sadly :-(

Cheers,
Wol

Approach Load-time-optimization

Posted Oct 7, 2023 9:59 UTC (Sat) by joib (subscriber, #8541) [Link] (1 responses)

I'm slightly in awe of your ability to twist every discussion into an argument about the merits of what was this thing called again (maybe Pick?) vs RDMS's. Here, have a (virtual) diploma.

Approach Load-time-optimization

Posted Oct 7, 2023 17:08 UTC (Sat) by Wol (subscriber, #4433) [Link]

:-)

Unfortunately, the more I work with them, the more I curse them. I can't help being a fanatic :-)

Cheers,
Wol

The challenge of compiling for verified architectures

Posted Oct 7, 2023 1:51 UTC (Sat) by roc (subscriber, #30627) [Link] (16 responses)

> A sandbox involves putting a jail around code to block any attempted bad behavior. The verifier, instead, proves to itself that a program is not harmful, then runs it in a privileged environment.

This is a bit of a false dichotomy. WebAssembly combines static checks with dynamic checks, e.g. there is static checking that you don't use an uninitialized local variable, and dynamic checking that pointer accesses are in-bounds. Likewise BPF combines dynamic checks (e.g. array bounds) with static checks. WebAssembly leans more towards the dynamic side, that's all.

If someone really thinks there's a hard line between a sandbox and a verifier, I'd like to know how they draw that line.

The challenge of compiling for verified architectures

Posted Oct 7, 2023 18:51 UTC (Sat) by ibukanov (subscriber, #3942) [Link] (10 responses)

There is importantly difference between BPF and WebAssembly that can be used to draw the line.

BPF in Linux allows to run untrusted code in the same address space as the trusted code. So far all known speculative execution bugs has been addressed. WebAssembly implementations as I understand mostly gave up on that and assume that the security boundary is provided by the address space isolation.

The challenge of compiling for verified architectures

Posted Oct 7, 2023 21:46 UTC (Sat) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

This is incorrect. WebAssembly by construction can't access memory outside of its sandbox. Just like in eBPF, there's no way to create arbitrary pointers that can have unchecked access to memory.

It's really no different from eBPF these days. Except that it has actual flow control and much better debugged code.

For example: https://wasmer.io/posts

The challenge of compiling for verified architectures

Posted Oct 7, 2023 23:26 UTC (Sat) by helge.bahmann (subscriber, #56804) [Link] (8 responses)

You do not need to create arbitrary pointers, you just need to coax the CPU into speculating a confusion between a pointer and an integer.

Roughly, this works by calling a function typed (void*)(uint64_t) through an indirect call (function pointer). While the WASM security model prevents you from directly assigning a function fn with signature void fn(void*) to the function pointer in question, you can cause the CPU to speculate that the call through your function pointer may end up in fn (via BTB poisoning) and therefore cause speculated access to arbitrary memory locations from WASM into the host address space (similarly applies to e.g. JacaScript). This can be exploited as really reliably for as long as the JIT ever emits an indirect call (and all JITs known to me do that, but notably researchers have also found ways to use the branch cascade resulting from switch/case-like constructs to cause the required BTB confusion).

From a high level point of view, you have to assume that any "protection" offered by a languages or runtimes type system to separate pointers from arbitrary integers can be circumvented by speculative confusion of branch targets.

The challenge of compiling for verified architectures

Posted Oct 7, 2023 23:41 UTC (Sat) by helge.bahmann (subscriber, #56804) [Link]

Should note: TTBOMK Wasm JITs (notably wasmtime) are not emitting retpolines which would stop the "naive" spectre V2 described at least. I am doubtful that even doing so would close all loopholes, it is damn hard to guarantee that adversarial code is jitted without side channels (unless you completely give up on asymptotically approaching "native" performance). I am not convinced eBPF is succeeding on side channel attacks either.

The challenge of compiling for verified architectures

Posted Oct 8, 2023 2:06 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

> Roughly, this works by calling a function typed (void*)(uint64_t) through an indirect call (function pointer).

Wasmtime does have a mitigation for this particular vulnerability: https://bytecodealliance.org/articles/security-and-correc...

It's fair to say that WASM does not try to guarantee the lack of side-channels in general. Implementations try to add mitigations where they feel that it's reasonable, but are not fixated on that. eBPF probably has some advantage in that area, but I doubt it's really insurmountable.

The challenge of compiling for verified architectures

Posted Oct 8, 2023 3:43 UTC (Sun) by helge.bahmann (subscriber, #56804) [Link]

Are you certain? To my knowledge only "out-of-bounds" function table accesses are protected (so you "probably" cannot speculate jump out of JIT-generated code), and also no retpolines produced. I know that in 2021 I managed to get speculated call on wasmtime JIT to type-confused function happen after a lot of trial and error (AMD Zen2, and no IBRS in user space). Maybe I missed situation changing since then, but I am not aware.

The challenge of compiling for verified architectures

Posted Oct 8, 2023 3:25 UTC (Sun) by roc (subscriber, #30627) [Link] (4 responses)

eBPF may have more comprehensive Spectre mitigations than (some implementations of) WebAssembly, but that would be a surprising way to define "sandbox" vs "verifier".

The challenge of compiling for verified architectures

Posted Oct 8, 2023 5:04 UTC (Sun) by ibukanov (subscriber, #3942) [Link] (3 responses)

Checks at runtime can be bypassed via a speculation, checks in the verifier cannot. Some runtime checks can be protected against speculative leaks, but V8 implementation of JS and Wasm, for example, gave up on that.

So Wasm sandbox requires too many dynamic checks to hope for reliable practical protection against SPECTRE and friends. EBP implementation in Linux, by reducing the number of runtime checks and constraining the code sufficiently, still managed to provide the protection against known attacks. Hence ability to provide practical protection against speculative leaks is a useful criterion to classify.

The challenge of compiling for verified architectures

Posted Oct 8, 2023 7:06 UTC (Sun) by Cyberax (✭ supporter ✭, #52523) [Link]

> Checks at runtime can be bypassed via a speculation, checks in the verifier cannot.

Does that include new and fancy features like iterators?

The challenge of compiling for verified architectures

Posted Oct 8, 2023 9:57 UTC (Sun) by roc (subscriber, #30627) [Link] (1 responses)

Looks to me like Wasmtime is still "hoping for reliable practical protection against SPECTRE and friends".
https://bytecodealliance.org/articles/security-and-correc...

I still think this is a poor way to draw a line between "sandbox" and "verifier". If Spectre vulnerabilities are found in eBPF will it cease to be a verifier and become a sandbox? That doesn't make sense to me.

The challenge of compiling for verified architectures

Posted Oct 8, 2023 12:57 UTC (Sun) by ibukanov (subscriber, #3942) [Link]

Presently all found speculation bugs have been promptly addressed in BPF while Wasm implementations continue to hope. Yes, this difference is poorly defined and probably cannot be formalized. Yet it is clear that it is affected by the nature of checks that has to be done at runtime.

The challenge of compiling for verified architectures

Posted Oct 9, 2023 2:24 UTC (Mon) by njs (subscriber, #40338) [Link] (4 responses)

I think the big difference is in the consequences for ffi. If you want to expose some gross low level unsafe operations to ebpf that temporarily make the kernel's state inconsistent, that's fine – you just need to teach the verifier how to recognize when consistency is restored, so it can reject any programs that fail to do that.

With a sandbox, you have to enter the inconsistent state before you can discover that the program is bad – which means your sandbox needs to be able to recover from arbitrarily complex inconsistent states after the fact, which is harder than recognizing them.

This is closely related to ebpf verifying termination up front, vs a sandbox that handles non-termination by enforcing a timeout. If you want to enforce a timeout, it means you need a plan to recover consistency from any state the program could reach, because you don't know when the timeout might fire.

The challenge of compiling for verified architectures

Posted Oct 9, 2023 17:46 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (3 responses)

> – which means your sandbox needs to be able to recover from arbitrarily complex inconsistent states after the fact, which is harder than recognizing them.

Not necessarily. You just have to be able to safely crash the tab in which the wasm is running. That's perfectly secure, but maybe not the best UX in the world, so browser developers *prefer* to try and fix the sandbox. But if the sandbox is unfixable, crashing the tab is always a valid option. The real dangers are:

* Failing to recognize that the sandbox is broken in the first place.
* Convincing yourself that you have fixed the sandbox, but it is not actually fixed.
* Recognizing that the sandbox is broken, but too late (so the attacker has already had an opportunity to attack other parts of the browser and/or exfiltrate data).

The challenge of compiling for verified architectures

Posted Oct 9, 2023 20:20 UTC (Mon) by Wol (subscriber, #4433) [Link]

> * Recognizing that the sandbox is broken, but too late (so the attacker has already had an opportunity to attack other parts of the browser and/or exfiltrate data).

I thought we were talking about kernels, not browsers.

The user is not going to be inconvenienced too much by wasm crashing a browser window, but the company could easily go bust if bpf crashes the kernel and/or exfiltrates data.

Cheers,
Wol

The challenge of compiling for verified architectures

Posted Oct 10, 2023 7:32 UTC (Tue) by njs (subscriber, #40338) [Link] (1 responses)

> if the sandbox is unfixable, crashing the tab is always a valid option

Not if the tab has been granted FFI access to functions that manipulate state in the host. It's host state that I'm talking about, not sandbox state. eBPF programs regularly do things like write to long-lived data structures (maps, ring buffers), modify the reference counts of kernel objects, mutate packets in flight, etc.

The challenge of compiling for verified architectures

Posted Oct 12, 2023 19:13 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

"Crashing the tab" is something that only exists in browsers, and to the best of my knowledge, no browser runs eBPF, so I'm really not sure what you're getting at here.

The challenge of compiling for verified architectures

Posted Oct 7, 2023 11:57 UTC (Sat) by gray_-_wolf (subscriber, #131074) [Link] (3 responses)

> Another problem, it seems, is that some programs simply will not work without -O2 optimization; they rely on the constant folding and propagation that the optimizer provides.

Does not these mean the programs are broken and should be fixed?

The challenge of compiling for verified architectures

Posted Oct 7, 2023 12:37 UTC (Sat) by mathstuf (subscriber, #69389) [Link] (1 responses)

Could it be that programs need -O2 to get under the instruction budget that (IIRC) eBPF enforces? Either verifier time or execution time could be too much without some kind of tree shaking or trivial inlining.

The challenge of compiling for verified architectures

Posted Oct 7, 2023 12:49 UTC (Sat) by segher (subscriber, #109337) [Link]

Not just the instruction budget, but you need some optimisation to pass the verifiers at all (mostly -O1, but still).

The challenge of compiling for verified architectures

Posted Oct 8, 2023 5:21 UTC (Sun) by ibukanov (subscriber, #3942) [Link]

As article explains, constant folding, for example, has to be done to sufficient extent to pass the verifier. Something like consteval from the latest C++ may help here, but that is not in C. Plus requiring to sprinkle consteval results in ugly code.

The challenge of compiling for verified architectures

Posted Oct 9, 2023 16:44 UTC (Mon) by iabervon (subscriber, #722) [Link]

I'm surprised that they don't have "no backwards jumps" as an attribute that targets can have that affects the optimization possibilities. I'd have thought that an input to the optimizer (or, really ordering in code generation, since assigning an ordering is always necessary even without doing transformations for optimization) would be the costs of jumps with different offsets, and a target could just say that any negative offset jump costs more than any other code. You'd still want a hint that it should find a sequence with only forward jumps if the graph doesn't have cycles, rather than making it guess what would be relatively inexpensive, but it should be possible to arrange that any transformation that introduces cycles or puts a block out of execution order looks to the optimizer like it would make the code run slower.

In general, it seems like, if it were possible to get clear information on what will definitely pass verification, and it were possible to express that as costing less than other instruction sequences, then it would naturally do the right thing on the general principle that any transformation that turns out to increase the cost estimate rather than decreasing it should be skipped in this particular case.