LWN: Comments on "Bounded loops in BPF for the 5.3 kernel"

Bounded loops in BPF for the 5.3 kernel

johill — Thu, 08 Aug 2019 19:27:09 +0000

The verifier always halts, simply because it gives up after a while. It only processes a maximum of 1e6 instructions (BPF_COMPLEXITY_LIMIT_INSNS) before giving up:

                if (++env->insn_processed > BPF_COMPLEXITY_LIMIT_INSNS) {
                        verbose(env,
                                "BPF program is too large. Processed %d insn\n",
                                env->insn_processed);
                        return -E2BIG;
                }

Bounded loops in BPF for the 5.3 kernel

fuhchee — Wed, 07 Aug 2019 15:48:11 +0000

Yes, exactly - construing the language as the sequence of bpf bytecodes that are accepted as inputs.

Bounded loops in BPF for the 5.3 kernel

laarmen — Wed, 07 Aug 2019 07:39:37 +0000

What the halting problem basically says is that there are some programs for which one cannot tell if they will halt or not. Doesn't mean all programs, nor even most programs, just "some". I'm not aware of any proof that the verifier is one of those programs (nor of a proof that it halts, for that matter), and a quick Google search doesn't turn out any promising link.

Anyone aware of such work ?

Bounded loops in BPF for the 5.3 kernel

massimiliano — Tue, 06 Aug 2019 11:39:32 +0000

The way I see it, the semantics of the BPF is that a program needs to be verified, and then executed. The verification step is an integral part of the execution, because if affects the final result of the high level invocation of a BPF program (the invocation might fail because the verification fails).

My take is that the verifier, being implemented in a Turing complete language (C), is subject to the halting problem and in principle could never terminate.

It is like if every BPF program had an initial implicit "verify" instruction opcode. A BPF program without verification step is Turing complete (thanks to its loops), and one with the verification step... maybe is not Turing complete by itself (it number of states is bounded so in the end it is a a finite state machine), but it is still subject to the halting problems because of the verifier!

Of course, as said in other comments, nothing of this matters in practice. But it is one way of understanding how BPF programs are not violating the "halting problem" principle: it's just that kernel developers are trusting the verifier to always terminate :-)

Bounded loops in BPF for the 5.3 kernel

farnz — Tue, 06 Aug 2019 09:27:20 +0000

As an aside, examining the code in Matt Godbolt's Compiler Explorer shows that gcc 9.1 leaves an unnecessary MOV before UD2, while clang goes straight to RET. MSVC compiles to clear RAX, load EAX from *RAX, and return 0, while ICC generates code to set up SSE2 the way it wants it, loads AL from *NULL, and then calls an "abort" subroutine.

Bounded loops in BPF for the 5.3 kernel

ecree — Mon, 05 Aug 2019 17:21:29 +0000

So the verifier does not incur the cost of tracking all registers precisely; instead, when the need arises, it backtracks through the use of a register in the code to generate a precise value if possible.

Alexei can correct me, but I don't think this is quite accurate. AIUI, the verifier always calculates precise value bounds, but normally that precision is ignored for pruning. Only if the path from a pruning point to an exit uses the precise value does the precise value need to be checked for pruning safety. So the backtracking is marking parent states as "needs precision", not actually generating new values.

Should verification be done in the kernel?

ecree — Mon, 05 Aug 2019 17:16:54 +0000

The eBPF developers are aware of this work, and we looked into whether any of it could be of use. The following analysis is my own, but I believe the other devs broadly agree with it.
There are some problems which are non-trivial, at least, with trying to extend PREVAIL to cover the full feature set of the existing verifier, including:
* the zone domain can only support relational difference constraints _or_ relational integer congruence constraints (Miné 2004 §5.4.6), not both at once (the paper just states this is 'straightforward' without further explanation) which is needed for alignment checking.
* similarly, the paper doesn't actually explain how abstract interpretation can check loop termination; once you've got a fixed-point state set, how do you determine that the program doesn't go round in circles within that set? Just stating that it's possible doesn't guide implementors much.
* the verifier today has checks more complicated than simple 'safety of execution', for instance it can detect certain speculative-execution vulnerabilities (don't ask me how this works). It's not clear that an abstract semantics can be written that deals with this.
* sure your runtime complexity is good, but that memory usage!
Our conclusion was that the best use we could make of PREVAIL was to incorporate some of its detail ideas, while rejecting the overall architecture. In particular, I was intrigued by the concept (§7.3) of having userspace supply (essentially) a safety proof which the kernel merely validates; the equivalent of this in the existing verifier's architecture would be to specify explored_states at pruning points which are both sufficient to prove safety downstream and provable from the upstream code (which currently are discovered by the verifier and often overly precise, leading to path explosion); this would be particularly useful in the form of loop invariants as the kernel verifier would only need to prove (a) holds at loop entry, (b) maintained by loop and (c) sufficient to prove safety from loop exit. I haven't had the time to dig into this myself (and expect to be far too busy in the near future).

Bounded loops in BPF for the 5.3 kernel

excors — Mon, 05 Aug 2019 17:09:45 +0000

I guess that's because it's only undefined behaviour if you actually execute main(), and the compiler doesn't know if you're going to execute main(), so it can't reject the code at compile time. And in practice this pattern probably occurs frequently in dead code, so programmers would get annoyed if it generated a warning.

GCC still knows that function must never be executed, so it replaces it with the "ud2" instruction (at least with gcc -O2).

Bounded loops in BPF for the 5.3 kernel

shane — Mon, 05 Aug 2019 16:40:09 +0000

Hm... turning your code into a C program:

#include <stddef.h>

int
main ()
{
    int* p = NULL;
    int i = *p;
    return i;
}

We discover that clang doesn't warn or complain in any way:

$ clang --version
clang version 8.0.0-3 (tags/RELEASE_800/final)
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
$ clang -Weverything foo.c
$ ./a.out
Segmentation fault (core dumped)

Likewise by default GCC doesn't:

$ gcc --version
gcc (Ubuntu 8.3.0-6ubuntu1) 8.3.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ gcc -Wextra -Wall -Wpedantic foo.c
$ ./a.out
Segmentation fault (core dumped)

We can use a feature disabled by default:

$ gcc -Werror=null-dereference -O foo.c 
foo.c: In function ‘main’:
foo.c:7:9: error: null pointer dereference [-Werror=null-dereference]
     int i = *p;
         ^
cc1: some warnings being treated as errors

Enabling this warning apparently breaks the GCC self-bootstrapping build (as well as things like building the Linux kernel). The ticket which added the warning is here:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=16351

Bounded loops in BPF for the 5.3 kernel

sdalley — Sat, 03 Aug 2019 14:06:29 +0000

Oh, that's not the issue. With the back-in-time feature, It can easily arrange to terminate before it begins, so everyone should be happy...

Bounded loops in BPF for the 5.3 kernel

mathstuf — Fri, 02 Aug 2019 13:39:36 +0000

Seems like a philisophical question to me. Is this C code? Or is it invalid because a reasonable C implementation could just say "no" rather than outputting any object code for it?

int* p = NULL;
int i = *p;

Bounded loops in BPF for the 5.3 kernel

Karellen — Thu, 01 Aug 2019 19:48:46 +0000

Just because the verifier only outputs programs which are guaranteed to halt, does that mean that the language used to express them is necessarily Turing-incomplete?

Bounded loops in BPF for the 5.3 kernel

johill — Thu, 01 Aug 2019 19:27:10 +0000

Yeah, I was indeed thinking of "the programs that will be accepted by the kernel verifier" - because that's actually what the article discusses (changes/improvements in the verifier).

The language itself is fairly obviously Turing complete (not that I've written down an emulator), but of course that is a theoretical concept anyway, as you say.

And in any case, going back to the OP's comment, BPF is almost certainly already reading email somewhere, in the form of DPI :-)

Bounded loops in BPF for the 5.3 kernel

excors — Thu, 01 Aug 2019 17:27:08 +0000

I think there's some ambiguity on what the BPF language is (in the theoretical computer science sense of "language"). Is it all the programs that can be syntactically expressed with BPF bytecode? Or is it all the programs that can be expressed *and* that the verifier accepts? I think johill was using the latter, but the phrase "verifier for Turing-complete languages" seems to imply the former.

The pre-verification language may be Turing-complete (if it had unbounded loops or equivalent). The post-verification language cannot be, assuming the verifier only accepts programs that it can prove will halt (and rejects ones where it can't decide), because there will always exist a computation that halts but that cannot be implemented in a program that the verifier will accept. The post-verification language may still be useful in practice (as you say), but it's not Turing-complete.

In practice, Turing-completeness seems a pretty useless concept and a big distraction. Turing machines only care about the boundary between infinite and non-infinite, so a computation that requires more time than exists in the universe is just as good as one that completes almost immediately. A practical software engineer cares about a boundary measured in seconds or milliseconds, and a program that takes years is just as bad as one that never halts, so they need a totally different set of tools to analyse that problem.

Bounded loops in BPF for the 5.3 kernel

Karellen — Thu, 01 Aug 2019 15:29:52 +0000

Just because a verifier for Turing-complete languages cannot reliably classify *all* arbitrary programs into either "safe" or "unsafe", that does not mean they don't exist or are useless. Verifiers can correctly classify large numbers of useful programs as safe. They can also correctly classify large numbers of unsafe programs as such. Any program that a verifier cannot classify, due to being insufficiently powerful, or due to the Halting problem, can simply be declared as "not verified safe" and rejected.

Note that this means that some useful, safe programs could be rejected. That is unfortunate, but not an argument that verifiers are impossible, or even useless.

Bounded loops in BPF for the 5.3 kernel

mathstuf — Thu, 01 Aug 2019 14:22:46 +0000

Bounded loops are still push-down automata, not Turing machines. You need unbounded loops to be Turing-complete.

Bounded loops in BPF for the 5.3 kernel

abatters — Thu, 01 Aug 2019 13:56:01 +0000

I'm confused now. So you *don't* want your BPF programs to terminate?

Bounded loops in BPF for the 5.3 kernel

pizza — Thu, 01 Aug 2019 10:59:56 +0000

I'm more worried that, coupled with the kernel's new time travel features and all these new "AI" accelerators, a BPF program could gain sentience, go back in time and try to kill Sarah Conner.

Should verification be done in the kernel?

anadav — Thu, 01 Aug 2019 00:12:31 +0000

The results are quite impressive and loop support is very useful. However, there might be a question of whether the approach of using ad-hoc optimizations for compiled byte code (following verification failures, or llvm version changes) is a viable path, or whether it is a whack-a-mole. In addition, such optimizations have introduced in the past several security bugs.

Verifying byte code for safety guarantees is a known problem with practical and proven solutions. Since eBPF verification is not usually on the hot-path (or at least it should not be), it raises the question: why does eBPF verification have to be performed in the kernel? Wouldn’t it be better to perform it in userspace using standard tools instead?

It may be beneficial just to introduce this option to users, by separating the binary rewriting and verification stages and provide hooks for userspace to register an external verifier. IMA and other security measures can then be used to ensure the verifier itself is not compromised. Having such a verifier can also assists in the adoption of eBPF in other environments (e.g., DPDK).

As a reference [1], my colleagues have developed an eBPF verifier prototype that uses the well-known verification technique called abstract interpretation and shows promising results (good asymptotic scalability, very low rate of false positives). It verifies loop safety out of the box and can be extended to also check that programs with loops terminate.

[1] https://github.com/vbpf/ebpf-verifier

Bounded loops in BPF for the 5.3 kernel

johill — Wed, 31 Jul 2019 22:16:21 +0000

It can never be Turing complete (at least not intentionally, given the verifier), see the Halting problem.

Bounded loops in BPF for the 5.3 kernel

mtaht — Wed, 31 Jul 2019 21:51:26 +0000

And now bpf is turing complete. How long will it be until it can read mail?