|
|
Log in / Subscribe / Register

A different approach to BPF loops

By Jonathan Corbet
November 29, 2021
One of the key features of the extended BPF virtual machine is the verifier built into the kernel that ensures that all BPF programs are safe to run. BPF developers often see the verifier as a bit of a mixed blessing, though; while it can catch a lot of problems before they happen, it can also be hard to please. Comparisons with a well-meaning but rule-bound and picky bureaucracy would not be entirely misplaced. The bpf_loop() proposal from Joanne Koong is an attempt to make pleasing the BPF bureaucrats a bit easier for one type of loop construct.

To do its job, the verifier must simulate the execution of each BPF program loaded into the kernel. It makes sure that the program does not reference memory that should not be available to it, that it doesn't leak kernel memory to user space, and many other things — including that the program will actually terminate and not lock the kernel into an infinite loop. Proving that a program will terminate is, as any survivor of an algorithms class can attest, a difficult problem; indeed, it is impossible in the general case. So the BPF verifier has had to find ways to simplify the problem.

Initially, "simplifying the problem" meant forbidding loops altogether; when a program can only execute in a straight-through manner, with no backward jumps, it's clear that the program must terminate in finite time. Needless to say, BPF developers found this rule to be a bit constraining. To an extent, loops can be simulated by manually unrolling them, but that is tiresome for short loops and impractical for longer ones. So work soon began on finding a way to allow BPF programs to contain loops. Various approaches to the loop problem were tried over the years; eventually bounded loop support was added to the 5.3 kernel in 2019.

The problem is thus solved — to an extent. The verifier checks loops by simulating their execution for each combination of initial states and demonstrating that each loop terminates before executing the maximum number of allowed instructions. This verification can take some time and, for some programs, the verifier is simply unable to conclude that the loops will terminate, even though those programs may be correct and safe. There are simply too many possible states and iterations to work through.

The difficulty of verifying loops is complicated by the fact that, by necessity, the verifier works with BPF code, which is a low-level instruction set. The semantics of a loop encoded in a higher-level language are gone by this time. The code may just iterate over the elements of a short array, for example, but the verifier has to piece that together from the BPF code. If there were a way to code a bounded loop in a way that the verifier could see, life would be a lot easier.

That, in short, is the purpose of Koong's patch. It adds a new helper function that can be called from BPF code:

    long bpf_loop(u32 iterations, long (*loop_fn)(u32 index, void *ctx),
    		  void *ctx, u64 flags);

A call to bpf_loop() will result in iterations calls to loop_fn(), with the iteration number and the passed-in ctx as parameters. The flags value is currently unused and must be zero. The loop_fn() will normally return zero; a return value of one will end the iteration immediately. No other return values are allowed.

Essentially, bpf_loop() takes the mechanics of the loop itself out of the BPF code and embeds it within the kernel's BPF implementation instead. It allows the verifier to know immediately that the loop will terminate, since that is outside the control of the BPF program itself. It is also easy to calculate how many instructions may be executed within the loop in the worst case; that and the limit on stack depth will prevent programs that run nearly forever as the result of nested loops.

For BPF programmers, the benefit is that any loop that can be implemented using bpf_loop() becomes much easier to get past the verifier; whole layers of bureaucracy have been shorted out, as it were. Note that loops that, for example, follow a linked list are possible with bpf_loop(); the developer need only supply a maximum possible length as the number of iterations, then terminate early when the desired element has been found or the end of the list has been hit. The form of programs may shift a bit to fit the template, but it should be possible to make that change in many cases.

Another significant advantage is that the time required to verify BPF programs is greatly reduced, since the verifier does not need to actually simulate the execution of all those loops. Some benchmarks show what a difference that can make; one program that takes nearly 30 seconds to verify in current kernels can be verified in 0.15s instead. That significantly increases the practicality of many types of BPF program.

There are many reasons why Fortran remained dominant in numerical applications for so long; one of those is that do loops, by their predictable structure, are relatively easy to vectorize. The purpose of bpf_loop() is different, but it works by the same mechanism: constraining what can be expressed in the language to make it easier for the computer to understand what is really being done. That, in turn, should make it easier for developers to convince the computer that it can safely run their programs.

Index entries for this article
KernelBPF/Loops


to post comments

A different approach to BPF loops

Posted Nov 30, 2021 0:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (73 responses)

Why do we need a verifier, again? Just put an instruction counter in the runtime. Then throw away the BPF nonsense and just use something like LUA or JS.

A different approach to BPF loops

Posted Nov 30, 2021 1:33 UTC (Tue) by developer122 (guest, #152928) [Link] (34 responses)

There are two ways to do it.

1. The way you noted, with an interpreted program and a limit on the maximum number of instructions to interpret.

2. The current way, which runs faster *compiled* machine code in kernel space, but which requires verification first.

When the execution time of a BPF program is important (eg, inserted into a packet filter or system call) #2 makes more sense.

Use of a lua interpreter has been explored in the pas though: https://lwn.net/Articles/830154/

A different approach to BPF loops

Posted Nov 30, 2021 3:05 UTC (Tue) by roc (subscriber, #30627) [Link] (29 responses)

There's nothing stopping you compiling to native code, with type verification, but using an instruction counter to bound overall runtime.

A different approach to BPF loops

Posted Nov 30, 2021 6:57 UTC (Tue) by developer122 (guest, #152928) [Link] (28 responses)

Can't use an instruction counter with native code unless the CPU hardware supports it. Best you can do is rely on the scheduler but that defeats the purpose of a lot of BPF hooks.

A different approach to BPF loops

Posted Nov 30, 2021 6:58 UTC (Tue) by developer122 (guest, #152928) [Link]

Otherwise, you've essentially combined GCC with the BPF verifier. Hopefully(?) you're doing it in-kernel so that the kernel can trust that the output is safe.

A different approach to BPF loops

Posted Nov 30, 2021 8:00 UTC (Tue) by mb (subscriber, #50428) [Link] (6 responses)

>Can't use an instruction counter with native code

Why not?

A different approach to BPF loops

Posted Nov 30, 2021 8:49 UTC (Tue) by syoc (subscriber, #142594) [Link] (5 responses)

Take this with a grain of salt. An interpreted program can have it's instructions counted because there is some state keeping system (the interpreter) running each instruction on behalf of the program. Compiled languages have it's machine code scheduled to the CPU and off it goes. The only option is that the CPU itself needs to count the instructions, there is no other party with "between" the program and the CPU able to keep state.

A different approach to BPF loops

Posted Nov 30, 2021 13:46 UTC (Tue) by mb (subscriber, #50428) [Link] (4 responses)

>The only option is that the CPU itself needs to count the instructions

A trusted compiler backend can emit native code that does the counting and bounds checking.

A different approach to BPF loops

Posted Nov 30, 2021 17:30 UTC (Tue) by developer122 (guest, #152928) [Link] (3 responses)

Then you have simply moved the compiler into the kernel and combined it with the verifier. Now the only language BPF programs can be written in is Lua.

A different approach to BPF loops

Posted Nov 30, 2021 21:04 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (2 responses)

We already have a JIT compiler for BPF in the kernel.

A different approach to BPF loops

Posted Dec 2, 2021 19:20 UTC (Thu) by ncm (guest, #165) [Link] (1 responses)

It is very simple. It just transcribes instructions from one format to another.

Calling it a "JIT compiler" stretches the term to the breaking point. In particular, it does nothing like your typical JIT compiler that interprets code, counting runs until it sees that a section is executed frequently enough, then naively compiles it and counts execution again, noting argument types actually used, and then if warranted generates and substitutes optimized code, possibly several versions according to argument types already seen, reserving the unoptimized version for cases not seen yet.

It's not even a compiler as seen for GPU shaders, where the code is compiled and optimized immediately, targeting the GPU, with no runtime counting; it sees no source code.

So, not a JIT compiler as anybody uses the term anywhere else. Optimization in normal terms is expected to be done before the BPF code is seen by the kernel. All types are known beforehand. Certain commonly seen BPF instruction sequences might be mapped to a native version, but that all occurs long before any attempt at actual execution.

So it is really much closer to an ahead-of-time macro-assembler that runs, most usually, only once at program startup, with the output executed with no monitoring beyond the usual event stats counters.

A different approach to BPF loops

Posted Dec 3, 2021 1:26 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link]

> Calling it a "JIT compiler" stretches the term to the breaking point.

That's how it's called in the kernel (e.g. https://elixir.bootlin.com/linux/v5.15.6/source/arch/arm6... ).

> In particular, it does nothing like your typical JIT compiler that interprets code, counting runs until it sees that a section is executed frequently enough, then naively compiles it and counts execution again

For example, .NET JIT-compiler doesn't do that. It just compiles the code as-is, not even de-virtualizing calls.

> So, not a JIT compiler as anybody uses the term anywhere else.
As usual, it's you who's inventing strange meanings of commonly accepted terms.

A different approach to BPF loops

Posted Nov 30, 2021 8:51 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (15 responses)

You certainly can. Just use a register for it. You need to increment it at backbranches and returns.

A different approach to BPF loops

Posted Nov 30, 2021 17:32 UTC (Tue) by developer122 (guest, #152928) [Link] (14 responses)

Something needs to actually do that work. In native code it needs to have been compiled or patched in. If you're doing the analysis for patching, you've just re-implemented the verifier's logic and added counting overhead. If you are doing this with a "trusted compiler" then that compiler needs to be in-kernel.

A different approach to BPF loops

Posted Nov 30, 2021 18:10 UTC (Tue) by comex (subscriber, #71521) [Link] (12 responses)

> If you're doing the analysis for patching, you've just re-implemented the verifier's logic and added counting overhead.

The logic in question is *very* simple: where are the branch instructions, and how many other instructions are executed before each. It's much simpler than the verifier, but that's an apples to oranges comparison because the verifier would still be needed: even if you're not relying on the verifier to prove termination, you still need it for type checking.

Yes, adding dynamic checks would have overhead, but probably not very much. I suspect this bpf_loop patch would have higher overhead (though also not very much), since it involves the JITted code calling C which then calls back into the JITted code.

A different approach to BPF loops

Posted Nov 30, 2021 18:41 UTC (Tue) by comex (subscriber, #71521) [Link] (10 responses)

Replying to myself just to clarify.

The goal of runtime termination checking would *not* be to avoid the existing complexity of the verifier. Not only would the verifier still be required, it would get more complicated. That's not because of the work to patch in the termination check itself, which as I said is simple, but because the bounds checking code would have to know how to approximate how running through a loop an unknown number of times would affect register values. This approximation has to be conservative enough that it can be calculated without actually simulating the loop once for every possible iteration count (i.e. the current behavior), but not so conservative that it rejects code typically emitted by LLVM. Indeed, I think one major downside of runtime termination checking is that it might increase the number of situations where LLVM relies on an invariant too subtle for the verifier to understand, resulting in spurious verification failures. (Though this could potentially be solved by adding optional runtime verification of other things as well…)

Anyway:

Compared to the status quo, the advantage of runtime termination checking would be that you can have potentially-long loops without blowing up verification time (exponentially in the case of nested loops).

Compared to this patch, the advantage of runtime termination checking would be that you can use normal C loops rather than this ugly construct.

A different approach to BPF loops

Posted Nov 30, 2021 20:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (9 responses)

> That's not because of the work to patch in the termination check itself, which as I said is simple, but because the bounds checking code would have to know how to approximate how running through a loop an unknown number of times would affect register values.

The verifier won't have to follow the loop bounds anymore, since out-of-bounds access is now safe (checked in runtime). As an additional implementation detail, the bounds checking elision can be added later.

eBPF is also basically typeless, so there's no type checking required.

A different approach to BPF loops

Posted Nov 30, 2021 21:03 UTC (Tue) by comex (subscriber, #71521) [Link]

Yes, that is roughly what I meant by "optional runtime verification of other things". It's just that moving memory access bounds checking to runtime, in addition to termination checking, is a step beyond what you originally mentioned. But I think it would be a good idea if only from a binary compatibility perspective.

A different approach to BPF loops

Posted Dec 1, 2021 8:14 UTC (Wed) by matthias (subscriber, #94967) [Link] (7 responses)

> The verifier won't have to follow the loop bounds anymore, since out-of-bounds access is now safe (checked in runtime).

How do you want to check out-of-bounds access in runtime? Of course you can start up a full blown jvm (or sth. similar) in kernel space that is capable of doing such thing. First, you only wanted some counter in a register. Now, you want to verify every memory access of the program at runtime. Of course this can be done, but I doubt that you can sell this to kernel devs. They usually care about performance.

> As an additional implementation detail, the bounds checking elision can be added later.

You need this from the start. If you do not elide most of the bound checking, performance will be horrible.

> eBPF is also basically typeless, so there's no type checking required.

If you want to do out-of-bounds checking, you at least need to distinguish types by their size.

A different approach to BPF loops

Posted Dec 1, 2021 8:24 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link] (6 responses)

> How do you want to check out-of-bounds access in runtime?

The kernel already JIT-compiles eBPF code into machine code. eBPF is pretty simple, you can read its spec here: https://github.com/iovisor/bpf-docs/blob/master/eBPF.md

The out-of-bounds access can only happen when running array-access instructions (like ldxw), but all possible arrays _already_ carry the size information, so there's no problem whatsoever to compile-in a range check.

> You need this from the start. If you do not elide most of the bound checking, performance will be horrible.
> Now, you want to verify every memory access of the program at runtime. Of course this can be done, but I doubt that you can sell this to kernel devs. They usually care about performance.

eBPF is already pretty slow in most cases, due to marshaling of the input/output data into (essentially) hash maps. Along with super-inefficient instruction set. If anything eBPF is an example how you can make a Turing machine out of the most Brainfuck-type instruction set.

Replacing eBPF with WebAssembly (or something functionally similar) would likely result in a _speedup_. A certain subset of eBPF can indeed work pretty well, but it's even more limited than normal. And can be done for WASM as well.

A different approach to BPF loops

Posted Dec 2, 2021 19:38 UTC (Thu) by ncm (guest, #165) [Link] (5 responses)

> The kernel already JIT-compiles eBPF code into machine code

No, it does not. It simply transcribes the eBPF code the first time the code is presented, normally long before any event that would cause execution. That is, by definition, "ahead-of-time", not "just-in-time". It is also not compilation at all, by any usual definition. In particular, there is no parsing, no syntax tree, and no flow graph.

> eBPF is already pretty slow in most cases

Yet, it is the favored way to speed up essential kernel operations.

A different approach to BPF loops

Posted Dec 3, 2021 1:28 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

> That is, by definition, "ahead-of-time", not "just-in-time".

"Ahead of time" is "on a build farm machine". "Just-in-time" is "immediately as the code is loaded".

eBPF is JIT-ed.

A different approach to BPF loops

Posted Dec 3, 2021 4:13 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

No one can save you from yourself.

Let's back off a bit

Posted Dec 3, 2021 14:39 UTC (Fri) by corbet (editor, #1) [Link] (1 responses)

So, while I might agree that whether BPF's machinery can be called a "JIT" or not is one of the most crucial questions we need to answer in the free-software community, I still think we should be able to get there while being civil toward each other. To that end, perhaps we can let this burning question, as vitally important to the future of civilization as it is, rest for just a little bit?

Let's back off a bit

Posted Dec 10, 2021 13:31 UTC (Fri) by nix (subscriber, #2304) [Link]

When civilization rebuilds after the radioactive rubble of the JIT Wars stops bouncing, the historians of the future will look back at this thread and say "the antagonists could have settled the question *right there*, if only..."

:)

A different approach to BPF loops

Posted Dec 3, 2021 8:22 UTC (Fri) by excors (subscriber, #95769) [Link]

> "Ahead of time" is "on a build farm machine". "Just-in-time" is "immediately as the code is loaded".

Android's ART is called an ahead-of-time compiler, but that runs on the end user's phone whenever they install an app, not on a build farm. (...except for pre-installed system apps/libraries which are compiled by dexpreopt as part of the platform build, alongside all the native code)

Newer versions of ART also have a JIT compiler (https://source.android.com/devices/tech/dalvik/jit-compiler) which does run-time profile-guided recompiling of individual methods. If I'm reading the description correctly, the JIT also saves some profiling information so the AOT compiler can be run again while the phone is idle and charging, which means the AOT compiler can do a lot of traditionally JIT-exclusive profile-guided optimisations.

But eBPF does call itself a JIT too, and I think that's quite reasonable. The distinction between AOT and JIT is fuzzy and depends on where you draw the boundary in your architecture diagram - it might be build farm vs device, or installing an app vs running an app, or when the application is stored on disk vs in memory, or userspace vs kernel, etc. And then you might have something like ART that goes back and forth across the boundary. The terminology isn't really important since the design space is so wide and varied - anyone who cares about the details of a particular implementation will have to look at the system architecture to find exactly what it means in that context.

A different approach to BPF loops

Posted Nov 30, 2021 19:42 UTC (Tue) by matthias (subscriber, #94967) [Link]

> Yes, adding dynamic checks would have overhead, but probably not very much. I suspect this bpf_loop patch would have higher overhead (though also not very much), since it involves the JITted code calling C which then calls back into the JITted code.

It should not be too hard to inline that particular function. But on the other hand. If this should be inlined anyway, it would be more straightforward to add a special loop instruction to BPF.

A different approach to BPF loops

Posted Nov 30, 2021 22:40 UTC (Tue) by mb (subscriber, #50428) [Link]

>If you're doing the analysis for patching, you've just re-implemented the verifier's logic

No, that's not true.
In the static verifier you need to solve the halting problem
In a dynamic runtime check you don't have to solve the halting problem.

That's a major reduction in complexity. You pay for with runtime overhead.

A different approach to BPF loops

Posted Nov 30, 2021 9:29 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

You can add instrumentation to count the number of BPF instructions executed by your compiled code, or the number of BPF branches, or whatever it is you want to apply limits to. Whether you compile to machine code or not is simply an implementation detail.

A different approach to BPF loops

Posted Nov 30, 2021 17:34 UTC (Tue) by developer122 (guest, #152928) [Link] (1 responses)

And how do you add that "instrumentation"? It's either compiled or patched in. If patched, you've just re-implemented most of the verifier, just to add overhead to the program. If it's compiled in then your compiler needs to be in-kernel to ensure that the code actually came from it. That adds a new parsing-untrusted-strings attack surface to the kernel and limits what languages the BPF programs can be written in.

A different approach to BPF loops

Posted Nov 30, 2021 21:32 UTC (Tue) by roc (subscriber, #30627) [Link]

The JIT compiler would add the instrumentation and it's already in-kernel.

A different approach to BPF loops

Posted Dec 3, 2021 4:11 UTC (Fri) by ncm (guest, #165) [Link]

Adding instrumentation would defeat the purpose of the feature. As a reminder, the purpose of eBPF is to enable performance optimizations. Making your performance optimizations radically slower would make the feature useless.

A different approach to BPF loops

Posted Nov 30, 2021 8:17 UTC (Tue) by ibukanov (subscriber, #3942) [Link] (3 responses)

In the generated code one can add a verification instruction before each jump backwards that adds to a counter the number of the instructions since the last jump and check the counter.

For ultimate performance one can do what JS JITs do. Start a timer when JS is started. Then, when the timer expires, patch the generated code to turn NOP sequences placed before backward jumps into exit jumps.

A different approach to BPF loops

Posted Nov 30, 2021 17:39 UTC (Tue) by developer122 (guest, #152928) [Link] (2 responses)

Not only is live patching something that's being called during system calls going to add a lot of overhead, you're implying moving the compiler that's generating the code into the kernel. This limits what languages BPF can be written in and adds a new "parsing untrusted strings" attack surface. If you're doing the code generation JIT, you're also sacrificing the performance of native code for things like packet filters.

A different approach to BPF loops

Posted Nov 30, 2021 21:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

The kernel already has a JIT compiler that generates machine code from eBPF.

A different approach to BPF loops

Posted Dec 2, 2021 19:24 UTC (Thu) by ncm (guest, #165) [Link]

No, it does not. Not for any sensible definition of "JIT compiler".

A different approach to BPF loops

Posted Nov 30, 2021 6:47 UTC (Tue) by pwfxq (subscriber, #84695) [Link] (31 responses)

The article tells you:

the verifier [...] makes sure that the program does not reference memory that should not be available to it, that it doesn't leak kernel memory to user space, and many other things

How does an instruction counter handle those cases?

A different approach to BPF loops

Posted Nov 30, 2021 7:01 UTC (Tue) by epa (subscriber, #39769) [Link] (21 responses)

Perhaps "the verifier" and "the compiler" are being merged into one concept here. If we were talking about Rust we would say that the compiler makes sure the program doesn't reference memory not available to it. So you could have a compiled language where the compiler enforces most safety properties but not that of non-termination (which as we all know can't be enforced in the general case without limiting the power of the language). Then an instruction counter (or even a timer interrupt) at run time takes care of the rest.

A different approach to BPF loops

Posted Nov 30, 2021 7:40 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (20 responses)

Now you have two options, which are both bad:

1. The compiler is part of the kernel. It is written in C, has to consume arbitrary untrusted strings from userspace in a complicated Turing-complete language, and any buffer overrun is potentially a security exploit. Well, maybe you can write the compiler in Rust instead of C, but this is still not the sort of thing you normally want to run in kernel mode.
2. The compiler is not part of the kernel. Then how does the kernel know that a given binary was actually emitted by the compiler? What if the user compiles a program, then modifies the resulting bytecode (or machine code) such that it does something unsafe? Does the compiler have some sort of magical signature that only it can produce?

A different approach to BPF loops

Posted Nov 30, 2021 8:02 UTC (Tue) by developer122 (guest, #152928) [Link] (8 responses)

I think there are some languages for which #1 might be unpleasant but palatable. TCL with it's 11-12 very simple parsing rules comes to mind. But the point still stands. Keep complicated functionality handling untrusted input (like compilers) out of the kernel if at all possible.

A different approach to BPF loops

Posted Nov 30, 2021 17:18 UTC (Tue) by edeloget (subscriber, #88392) [Link] (7 responses)

> Keep complicated functionality handling untrusted input (like compilers) out of the kernel if at all possible.

And this is a case where it's not possible. So simply put, we *need* the verifier. Without the verifier, BPF cannot be trusted and thus has nothing to do in the kernel itself (and that would be a shame given how useful it is).

A different approach to BPF loops

Posted Nov 30, 2021 19:39 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (6 responses)

> > Keep complicated functionality handling untrusted input (like compilers) out of the kernel if at all possible.

> And this is a case where it's not possible.

It is possible to have the compiler and/or verifier run as user-mode helper programs, outside the kernel. What you can do with eBPF is basically a subset of what you could do by compiling and loading a native-code module. In principle a daemon could provide this service (validating eBPF, compiling it, and loading it) without any special kernel involvement other than providing the necessary hooks to attach the module to whatever the eBPF program should control.

A different approach to BPF loops

Posted Dec 2, 2021 16:28 UTC (Thu) by edeloget (subscriber, #88392) [Link] (5 responses)

> It is possible to have the compiler and/or verifier run as user-mode helper programs,
> outside the kernel. What you can do with eBPF is basically a subset of what you could
> do by compiling and loading a native-code module. In principle a daemon could
> provide this service (validating eBPF, compiling it, and loading it) without any special
> kernel involvement other than providing the necessary hooks to attach the module to
> whatever the eBPF program should control.

No, as this would still allow malicious or simply erroring BPF to be loaded in the kernel.

Replace "BPF" with "ioctl" and "verifier" with "input check" and you'll get the reason why a verifier is necessary within the kernel: it simply cannot trust the user space to give it input that will be devoid of error and non malicious. The kernel needs to verify every single input it gets, otherwise it open itself to abuse.

And frankly, I'm surprised that anybody argues with this: this is both Security 101 and Code Resilience 101.

A different approach to BPF loops

Posted Dec 2, 2021 20:22 UTC (Thu) by nybble41 (subscriber, #55106) [Link] (4 responses)

You can build and load a malicious or erroneous _module_ into the kernel right now which exports functions to be called from an eBPF filter. There is no verifier for arbitrary native-code modules. And while there are different capabilities involved, most programs with the necessary capabilities to attach eBPF filters are probably running as root and could also load modules. They can even be loaded in response to actions by unprivileged processes, e.g. Bumblebee loads and unloads drivers on demand for systems with hybrid graphics, and various subsystems load modules automatically when the related features are used. Those dynamically loaded modules often include ones which were build locally from source, e.g. with DKMS. It is a short jump from there to having some privileged process verify that a given program in eBPF or WASM or any other language is safe, compile it to a self-contained native-code module with an entry point which can be invoked from eBPF, and load it. The only real question is how much you trust the user-space verifier and compiler to ensure that only safe programs are loaded into the kernel.

A different approach to BPF loops

Posted Dec 3, 2021 0:10 UTC (Fri) by mjg59 (subscriber, #23239) [Link] (3 responses)

Which is why the kernel can be configured to permit only appropriately signed kernel modules. It's not necessary to restrict what can be loaded into the kernel under all circumstances, but it must be possible to configure it in such a way that what's loaded is verifiably trustworthy. Having the ability to load code that can bypass the verifier seems reasonable, as long as there's a mechanism for enforcing signatures on it.

A different approach to BPF loops

Posted Dec 10, 2021 13:50 UTC (Fri) by nix (subscriber, #2304) [Link] (2 responses)

Well... surely the solution is obvious, then? The verifier etc is moved into an external userspace program (invoked at need, passing info to and from the kernel via one of its multiplicity of existing mechanisms, though frankly just receiving BPF for verification on stdin and passing back verification results on stdout seems simplest), and there is a kernel option just like the option for signed modules which can be used to require the thing to pass signature verification before it is invoked. For extra security, make it so that killing it panics the kernel just like killing init does, and make it unptraceable.

(The protocol can be made as elaborate as you like, e.g. having the kernel describe its helpers to the userspace code, lobbing BTF at it to aid in verification etc.)

Voila, instant as-trusted-as-the-admin-wants verifier that doesn't need to be in kernel space.

(IMHO, the real reason why the thing is in kernel space is that the BPF people are kernel people so they prefer to do stuff in kernel space if at all possible.)

A different approach to BPF loops

Posted Dec 10, 2021 16:25 UTC (Fri) by farnz (subscriber, #17727) [Link] (1 responses)

Doesn't even need to be an external userspace program; it can be part of the kernel, just running as a userspace process not a kernelspace process. That way, you're guaranteed that the verifier and the kernel are in sync, but the verifier can do as much as it wants because it's acting as a normal userspace program, with virtual memory, preemptive scheduling and all the other loveliness of userspace.

A different approach to BPF loops

Posted Dec 11, 2021 1:35 UTC (Sat) by mathstuf (subscriber, #69389) [Link]

I can just forsee this wanting be namespaced as well. Is there a reason this would be exempt?

A different approach to BPF loops

Posted Nov 30, 2021 8:05 UTC (Tue) by mb (subscriber, #50428) [Link] (9 responses)

3. You compile the high level stuff in userspace and pass the IR (BPF?) to the kernel. Then inside the kernel you compile to native machine code with a much simpler compiler. Isn't that how it's currently actually done? Except that the compilation to machine code doesn't emit checks for memory bounds and runtime limits.

A different approach to BPF loops

Posted Nov 30, 2021 8:13 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (8 responses)

Well that's just semantics. You can call the BPF-to-machine-code backend a "compiler" or a "verifier" or even a "JITer" if you like. No matter what label you use, at some point, some component of the kernel has to actually verify that the BPF code follows all of the rules, so in contexts where we think this verification step is the bottleneck, you might as well refer to that component of the kernel as "the verifier."

A different approach to BPF loops

Posted Nov 30, 2021 13:48 UTC (Tue) by mb (subscriber, #50428) [Link] (7 responses)

>Well that's just semantics

Well, the claim was

>1. The compiler is part of the kernel. It is written in C, has to consume arbitrary untrusted strings from userspace in a complicated Turing-complete language

That's simply not true.

Just a subset of "the compiler" has to be in the kernel. That is the point I'm trying to make.

A different approach to BPF loops

Posted Nov 30, 2021 17:24 UTC (Tue) by edeloget (subscriber, #88392) [Link] (4 responses)

> Just a subset of "the compiler" has to be in the kernel. That is the point I'm trying to make.

So you would introduce a language (say, the mb packet filtering language) that would be compiled in the kernel to native instructions - and if this is too complex for a given architecture, you would implement a interpreter for this language. In other word, a VM with an integrated JIT compiler where it can be implemented.

Which is exactly what BPF is.

And this still requires you to validate the input, as you are still able to produce pre-compiled code that could be used for nefarious purposes. So you'll probably introduce a verifier for your intermediate language.

Which is exactly what the BPF subsystem do.

Now, I read this code from times to times. By all means, this is not an overly complex piece of code. I would not say that it's simple (the problem to solve is not) but it's clearly not an undecipherable code monster. So, in the end: what's the problem exactly?

A different approach to BPF loops

Posted Nov 30, 2021 22:27 UTC (Tue) by mb (subscriber, #50428) [Link] (3 responses)

>And this still requires you to validate the input

The difference is (1) statically verifying the program vs. (2) dynamically verifying it.

(1) is what the current verifier does, if I understand it correctly. Which is hard due to the halting problem.

(2) is simple w.r.t. halting. But it has some runtime overhead. But I don't see why this wouldn't be possible with JITed code, if the JIT compiler is trusted.

A different approach to BPF loops

Posted Dec 2, 2021 16:36 UTC (Thu) by edeloget (subscriber, #88392) [Link] (2 responses)

Neither static nor dynamic verification can detect if a program ever halts. This is impossible. And since there is no difference between "the program is X ins long" and "the program has executed X ins", why should we paying a runtime penalty when we can avoid it?

After all, the program is loaded once, but it can be executed billions of times -- especially if it respond to every single packet that comes through the network: in this case you definitely don't want run time verification : maintaining a 10Gbps bandwdith means you have to process 1 byte of data every 0.8 ns, and a typical 1500 bytes packet would be processed in 1.5 µs. Throw in run time verification in this process and you're killing your bandwidth.

A different approach to BPF loops

Posted Dec 2, 2021 17:42 UTC (Thu) by Wol (subscriber, #4433) [Link]

> Neither static nor dynamic verification can detect if a program ever halts. This is impossible. And since there is no difference between "the program is X ins long" and "the program has executed X ins", why should we paying a runtime penalty when we can avoid it?

Actually, it's dead easy to detect if SOME programs halt. So while in general what you say may be true, it is not a universal truth. The verifier simply asks the question "Is this program in the set that I can prove they halt", and if the answer is "no" the program is rejected.

That's the whole point of this loop construct - it MUST (and provably does) halt.

Cheers,
Wol

A different approach to BPF loops

Posted Dec 2, 2021 17:45 UTC (Thu) by farnz (subscriber, #17727) [Link]

You can detect if a program ever halts - it's not impossible to do this (look up "totality checking" in the functional programming world, for example). What you can't do is accurately classify programs into "definitely does not halt" and "definitely does halt" - there is always a category of "might not halt" involved, and real world engineering of program verification involves minimising the size of this category, making sure that nothing useful falls into "might not halt", and coming up with a good way to handle "might not halt" programs in your system.

In the case of BPF, for example, "might not halt" can just be treated in the same way as "definitely does not halt".

A different approach to BPF loops

Posted Nov 30, 2021 21:58 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Well, the claim was [...]

No, it was not. The claim was that *if* you want to combine "the verifier" and "the compiler" into a single component, then you have to do either (1) or (2). Then you proposed splitting "the compiler" into two separate components, which is just going back to the point where we started but with a different naming scheme.

A different approach to BPF loops

Posted Nov 30, 2021 22:37 UTC (Tue) by mb (subscriber, #50428) [Link]

>No, it was not

yes, it was. It was a quote. You can read it on the discussion thread, if you forgot.

>but with a different naming scheme.

And with vastly smaller complexity.

A different approach to BPF loops

Posted Nov 30, 2021 8:14 UTC (Tue) by developer122 (guest, #152928) [Link]

side note: I recall long ago infrastructure was merged for packaging special userspace programs into the kernel that it could deploy for special tasks, complete with autogenerated communication channels with the kernel. Not sure what's become if it.

It also doesn't really help you if the program is exploitable and is taken over to emit malicious machine code to be run in kernel space.

A different approach to BPF loops

Posted Nov 30, 2021 8:56 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (8 responses)

This is a solved problem (JS, Java, LUA, ...). Basically, just use bounds-checked array access and accessor functions for kernel-level stuff.

And the total hilariousness is that BPF is step-by-step becoming yet-another-JS-like-JIT these days. But with really custom infrastructure and tooling.

A different approach to BPF loops

Posted Nov 30, 2021 9:27 UTC (Tue) by roc (subscriber, #30627) [Link] (7 responses)

More like WebAssembly than JS, but yeah. And it was obvious years ago that this was going to happen.

A different approach to BPF loops

Posted Nov 30, 2021 9:52 UTC (Tue) by wsy (subscriber, #121706) [Link] (6 responses)

And wasm has a loop instruction that is more elegant than this hacky loop call.

A different approach to BPF loops

Posted Nov 30, 2021 17:30 UTC (Tue) by edeloget (subscriber, #88392) [Link] (5 responses)

> And wasm has a loop instruction that is more elegant than this hacky loop call.

Good thing that the kernel is an open source project then, it should be easy for you to make it more elegant and you would have one less reason to complain :)

A different approach to BPF loops

Posted Nov 30, 2021 21:47 UTC (Tue) by roc (subscriber, #30627) [Link] (4 responses)

Open-source patch culture doesn't really help here. One does not simply submit a patch "since you're reimplementing a subset of WebAssembly slowly and poorly, here's a patch to rip out BPF and start over with WebAssembly". The problem is with the big picture.

To be fair, it's a common failure pattern and one that is very hard to avoid:
1) "We need a very small subset of the functionality of <big project X>"
2) "We can easily reimplement that small subset and tailor it to our needs, much more easily than importing or subsetting <X>"
3) ... time passes ...
4) "We need a little bit more of <X>, but it's definitely easier to just add that to our own thing instead of ripping out our own thing and using <X> (especially because our thing isn't compatible with <X>)"
5) GOTO 3

Each decision is locally optimal, but eventually you have reimplemented most of <X>. Maybe you've done a better job, but in many cases you haven't, e.g. when <X> was designed from scratch by people who understood its problem domain really well, but you only learned the problem domain as you added requirements one at a time.

I'm certainly not saying that "import the big project" is always the right thing to do. But you do need to plan to avoid this trap. (It needs a catchy name.)

A different approach to BPF loops

Posted Dec 1, 2021 8:41 UTC (Wed) by taladar (subscriber, #68407) [Link]

It is essentially https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule in a more general form.

A different approach to BPF loops

Posted Dec 2, 2021 16:48 UTC (Thu) by edeloget (subscriber, #88392) [Link] (2 responses)

The kernel did not reimplement a subset of WebAssembly. As far as I know (and I might be wrong, especially since I never followed the development of WebAssembly) eBPF was firstpresented in 2013 (JIT added in 2014 by Eric Dumazet), WebAssembly was first released in 2015. This is not really a case of "reinvent, but worse".

And the goal would not be to remove BPF and replace it with WebAssembly (that would not have much sense). The goal would be to take the design of the execution loop of WebAssembly and map that to BPF to make it more elegant -- and this would qualify for the open source patch culture.

(As for the catchy name, may I suggest "My One Was Better" or the MOWB effect ?).

A different approach to BPF loops

Posted Dec 3, 2021 1:38 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

The work on eBPF started around 2013, with the advanced features coming out around 2014-2015 timeframe. This was almost in parallel with WASM development, so it's fair to stay that while Linux was not strictly NIH, it was definitely on a parallel evolutionary branch.

And back then there were also other technologies, like NaCl and PNaCl ( https://developer.chrome.com/docs/native-client/nacl-and-... ) and even LUA (with LuaJIT!)

A different approach to BPF loops

Posted Dec 3, 2021 20:43 UTC (Fri) by ncm (guest, #165) [Link]

eBPF came from BPF, which was ancient tech that matured in the BSDs. By the time it got to Linux, it was well-baked.

How to deal with runtime exceptions in kernel context?

Posted Nov 30, 2021 10:49 UTC (Tue) by matthias (subscriber, #94967) [Link] (5 responses)

> Why do we need a verifier, again? Just put an instruction counter in the runtime. Then throw away the BPF nonsense and just use something like LUA or JS.

Apart from all the discussions about efficiency of these runtime checks in the thread above, nobody has asked the question whether runtime checks are even sensible in all the places where BPF is used nowadays. How should the kernel react if some BPF program is terminated because it did some illegal memory access or used up the allowed runtime?

BPF programs are used in all sorts of security related contexts. Should some access be granted or not granted if the responsible BPF program is terminated? Granting will surely lead to security exploits. And non-granting might lead to DOS. Of course the BPF verifier will never be able to check whether the program at hand does (semantically) what it is supposed to do. But having a proper verification that the program does not do unsafe things seems quite valuable to me.

How to deal with runtime exceptions in kernel context?

Posted Nov 30, 2021 12:26 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Current eBPF requires admin permissions anyway, and in practice you can easily slow down your system to a crawl by adding all kinds of BPF policies. Constant time is not zero time. Attempts to add finer-grained permissions (e.g. for seccomp) mostly fizzled.

For the classic BPF, the situation doesn't need to change.

How to deal with runtime exceptions in kernel context?

Posted Nov 30, 2021 12:53 UTC (Tue) by foom (subscriber, #14868) [Link] (3 responses)

It should quite clearly not grant access in that case. (I know there are more complex scenarios where the correct answer is less clear, but I'm sure those could be figured out)

The current state of BPF, where there is no spec for what code you're allowed to emit other than "whatever the kernel verifier is clever enough to reverse engineer today" is very unfortunate. How is an optimizing compiler supposed to know what optimizations are okay, and which will turn a verifiable program unverifiable? They cannot. It's effectively infeasible to write a C->BPF compiler (Clang's BPF backend notwithstanding).

It would be a very nice improvement to allow the BPF runtime to abort execution and eliminate "verifier failure" as a thing.

If a particular user wants to verify the correctness of their code before putting it into production in order to avoid the possibility of DOSing themselves, they may surely do so via writing/using some external program analysis tool -- one which is NOT built into the kernel.

How to deal with runtime exceptions in kernel context?

Posted Dec 3, 2021 20:55 UTC (Fri) by ncm (guest, #165) [Link] (2 responses)

> It's effectively infeasible to write a C->BPF compiler (Clang's BPF backend notwithstanding)

Why does Clang not withstand? It is absolutely the most common way to generate eBPF code. Asserting that something everyone who does any eBPF uses all the time is infeasible does nothing for your credibility.

It is inconvenient that code Clang emitted might be rejected by the verifier, but that probably happens to any person only once; then they take the time to learn the rules. Clang can also emit regular x86 machine code that will never complete, but that does not make emitting x86 infeasible. It is your responsibility to write your program correctly.

How to deal with runtime exceptions in kernel context?

Posted Dec 6, 2021 5:48 UTC (Mon) by foom (subscriber, #14868) [Link]

The code clang emits can be rejected by the verifier despite the original program following the rules, and despite the output being semantically correct.

This can happen due to optimizations that rearrange the code such that the verifier is no longer smart enough to understand it. That, then, results in proposals like https://lists.llvm.org/pipermail/cfe-dev/2020-June/065894... to try to hack around the problem by disabling various optimizations. Which is a hack, not a principled change -- even if it is generally effective.

It is useful that _in practice_ things often work, despite all that -- without clang understanding anything about which sorts of code transformations are permissible and which are not.

Nevertheless, this is a fundamentally unsound architecture.

How to deal with runtime exceptions in kernel context?

Posted Dec 10, 2021 13:55 UTC (Fri) by nix (subscriber, #2304) [Link]

> that probably happens to any person only once; then they take the time to learn the rules

Ha ha no. The rules are not especially clear and are not unchanging (though they aren't changing too fast any more, that's true). You basically have to read the verifier's source code from start to end at least once to get a handle on the reasons why it might reject stuff, and it is not an especially simple piece of code. (I've read it multiple times and still spend most of my time in any given BPF-using session cursing the verifier. It slows down development a *lot*, and about half the things it finds are not actually bugs in my code but limitations in the verifier.)

FORTRAN loops

Posted Nov 30, 2021 21:17 UTC (Tue) by Wol (subscriber, #4433) [Link] (1 responses)

I've said it before, but why not copy FORTRAN?

The DO LOOP variable is read-only, and/or trying to write to it hides the real variable. You could if you wish make the loop terminate based on either the real or the user version of the variable, so the user could terminate early by setting a value that kills the loop. But they can't stop the loop terminating when the real value hits that limit, regardless of what they write to their copy of it.

(FORTRAN doesn't enforce that, but it *explicitly* says that behaviour is permitted and must be allowed for.)

Cheers,
Wol

FORTRAN loops

Posted Nov 30, 2021 21:20 UTC (Tue) by Wol (subscriber, #4433) [Link]

Whoops - just realised that that is pretty much what this patch does :-)

Cheers,
Wol

A different approach to BPF loops

Posted Dec 2, 2021 14:20 UTC (Thu) by smitty_one_each (subscriber, #28989) [Link]

Maybe the loop is
Constrained, and so, no sonnets
But there's poetry


Copyright © 2021, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds