Writing network flow dissectors in BPF
Network packet headers contain a great deal of information, but the kernel often only needs a subset of that information to be able to perform filtering or associate any given packet with a flow. The piece of code that follows the different layers of packet encapsulation to find the important data is called a flow dissector. In current Linux kernels, the flow dissector is written in C. A patch set has been proposed recently to implement it in BPF with the clear goal of improving security, flexibility, and maybe even performance.
Flow dissection in the kernel
Flow dissector usage is common in high-speed networks, but it is not limited to that use case. Information from the flow dissector may be used by a firewall or a traffic filter, or in any other situation where complete packet parsing is not necessary. For example, just the source IP address or UDP port number may be enough for the kernel to decide on the action to perform on a packet. The flow dissector extracts just the values the caller has requested from the headers; those values are called "keys". The Linux implementation of the flow dissector is not the only one; Wireshark, for example, has its own version of packet dissection.
Let us show packet dissection on an example of extracting fields from a UDP packet; in this case, it has been asked to extract two keys: the IP addresses and the UDP port numbers. The dissector starts at the Ethernet layer to check whether the packet contains an IP header directly or if there are VLANs or other encapsulations to deal with. At the IP level it will save the source and destination addresses, along with the protocol field, which determines the type of the payload protocol. Assuming that the protocol is UDP, the dissector will look for the source and destination port fields, respectively, and save their values. At that point, the requested keys have been found and the dissector's work is done. Note that the real dissector will be more complicated; it will also follow other encapsulation cases (if the UDP header does not directly follow the IP header, for example), take into account the packet fragmentation, and possibly save more keys.
Linux currently includes one built-in flow dissector; the flower classifier [PDF] is the main user. The idea of using a flow dissector based on BPF was raised back in 2017 [PDF]. Petar Penkov and Willem de Bruijn have recently proposed to add a flow dissector written in BPF for the receive path.
Switching to BPF
Using BPF for flow dissection adds a number of interesting security features. Certain types of vulnerabilities become impossible because BPF programs are guaranteed to terminate and thus won't go into an infinite loop. The submission mentioned CVE-2013-4348 as an illustration of the type of problem that can be avoided. That vulnerability consisted of the flow dissector entering an infinite loop when using IPIP encapsulation in cases where it received a packet with a tiny header length value.
Additionally, in BPF, all memory accesses are checked, so it is not possible for a program to read outside of the packet bounds. Another possible feature of the new flow dissector is allowing the administrator to disable the dissection of packets containing unused protocols, reducing the number of possible attacks. If a bug is discovered, the administrator can remove the faulty code from the dissector without a kernel recompilation or even a reboot. Or, they can add their own specialty dissector if needed.
The proposed patch adds the BPF execution in __skb_flow_dissect(); each network namespace can have its own dissector. The submission includes an example program to illustrate the concept. It already handles most of the protocols needed, including multiple levels of IP encapsulation, VLAN, and IPv4/v6 extension headers.
The possible return values of the BPF flow dissector were the subject of a discussion. Currently the program returns BPF_OK (if dissection has completed successfully) or BPF_DROP (if the dissector came to a conclusion that it is necessary to drop the packet). Song Liu asked if there should be a separate value to allow fallback to the C implementation if the BPF program encounters a protocol it does not support. Penkov answered that fallback to C would defeat the security guarantees provided by BPF. De Bruijn explained further that this idea had been discussed, but they had decided against it. The goal of the BPF flow dissector is to replace the built-in version; falling back to the C dissector would make reaching this goal harder because users would quickly come to depend on this behavior.
The location of the hook for the BPF program was also discussed. It would be possible to add it to the XDP (eXpress Data Path) hook instead, for example. The authors of the proposal did not go that way for multiple reasons. The first reason is that it would be more expensive — the XDP hook executes before GRO (generic receive offload), so the dissector would run more often. The XDP hook also runs before the SKB structure to hold the packet is allocated, and there is no easy way to move the dissected keys to the SKB afterwards. It could be possible to implement both flow dissection and GRO in a single pass, but that would require much more flow state to work.
In the first version of the submission, the definitions of the various structures used to hold flow-dissector keys were copied into the BPF program itself, since they are otherwise not visible to user space. Alexei Starovoitov noted that everything used by a BPF program becomes part of the user-space API; he suggested three solutions: moving all the key structures to the user-space headers, wrapping all of them into a separate structure and modifying the code when the internal ones change, or waiting for BTF to solve the issue. The last option was eliminated since BTF is not ready for this kind of use yet. As networking maintainer David Miller also supported the wrapping option, this is the solution the second submission used: struct bpf_flow_keys contains all of the supported keys so that they are available to the BPF program.
The first version of this patch set ran into an interesting problem: since the offset at which to start dissection in a packet is supplied by the caller, the BPF verifier cannot ensure that accesses using that offset will remain within bounds. So relatively slow accessors had to be used to get at packet data. Daniel Borkmann suggested a simple trick to get around this issue: the BPF program need only check that the offset is in bounds at the beginning; after that, the verifier can prove that subsequent uses will remain within bounds. That change improved the performance of BPF dissectors to be comparable with the in-kernel dissector.
Current state
A performance evaluation is included with the submission; it compares the BPF flow dissector with the in-kernel dissector and the no-op dissector on a UDP flow. The speed of the dissectors is similar, with the BPF one performing slightly better than the in-kernel dissector. More evaluations will probably follow, but the results are already encouraging.
The BPF flow dissector already integrates with the flower classifier since it uses the same interface. The tests included with the patch set show this integration: the demonstration drops packets from a specified UDP source port in the scenario we covered at the beginning of the article. As flower uses the flow dissector to match flows, dropping the right ones shows the right dissection.
The BPF flow dissector is another part of an increasingly BPF-based network processing model that includes XDP (covered here previously). The goal of the BPF flow dissector is ambitious: to replace the built-in one. Time will tell if it will succeed. Before that, the patch set received positive comments and it seems likely that it will be included in the near future.
Index entries for this article | |
---|---|
Kernel | BPF/Networking |
Kernel | Networking |
GuestArticles | Rybczynska, Marta |
Posted Sep 6, 2018 22:15 UTC (Thu)
by mtaht (subscriber, #11087)
[Link] (11 responses)
I still have a hard time believing an ebpf version would be even close in speed to the C version... but... cool.
Posted Sep 7, 2018 9:26 UTC (Fri)
by grawity (subscriber, #80596)
[Link] (10 responses)
Posted Sep 7, 2018 17:45 UTC (Fri)
by mm7323 (subscriber, #87386)
[Link] (9 responses)
I'd be surprised if this could easily be beaten or quickly replicated by the eBPF compiler.
The only case I see BPF winning is if the BPF program is produced and compiled specifically for the task in hand and thus omits a lot of option checking and branches that are not used in the specific dissector. This could well be what's happening here.
Posted Sep 7, 2018 19:11 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (5 responses)
A BPF compiler should have an easier time optimizing data access because of the stricter invariants baked into the design; and because these invariants demand greater discipline from the outset. C compilers are super sophisticated and the eBPF compiler quite naive, but we overestimate the ease of proving constraints in general purpose code and underestimate the dividends from code underpinned by strong, natural invariants.
I think this is one aspect of BPF that indisputably gives it a leg up, notwithstanding the abusive use of BPF. I've been critical of the shift to using BPF and eBPF as a green field for reinventing subsystems that are more maintainable and transparent using the normal C kernel interfaces and architecture. From a functional perspective I'm suspicious of heavy BPF usage, but I can't deny the technical potential.
[1] For example, using fixed-sized arrays...
Posted Sep 7, 2018 20:32 UTC (Fri)
by Paf (subscriber, #91811)
[Link]
In general, in high performance computing situations, one reason Fortran is sometimes preferred is that it does not have pointers (in practical terms - they do exist but are very little used). A pointer in C can point *anywhere*, so unless the compiler can trace its entire lifecycle, they can make lots of otherwise relatively straightforward optimizations very hard. Pointers can also point in to the middle of what you'd like to have as unitary data types, again preventing certain optimizations. So in general, Fortran code is more easily optimized than C. (There are modified dialects of C that implement some additional constraints for this same reason (C11 has some effort in this direction, I believe).)
That specific example might not apply here, but is hopefully interesting.
Posted Sep 7, 2018 22:25 UTC (Fri)
by mm7323 (subscriber, #87386)
[Link]
Often that's not the case as code can be a mess in any language- and though this may also be a situation where the C is aged and not well optimised.
The security benefits of eBPF make this initiative a good thing though. And if anything, further use of eBPF should help drive optimisation efforts in the compiler/assembler.
Posted Sep 8, 2018 15:08 UTC (Sat)
by mtaht (subscriber, #11087)
[Link] (1 responses)
Posted Sep 11, 2018 17:14 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
And the Graal compiler available in newer OpenJDK promises to show what a JIT is really capable of, because the current compiler is so old and complex that progress in it ceased a long time ago. Early adopters of Graal report even 20% improvements, if I remember correctly.
Posted Sep 11, 2018 9:09 UTC (Tue)
by miquels (guest, #59247)
[Link]
Found this posting by Walter Bright from 2009. Too bad no C compiler has implemented something like it (AFAIK):
C's Biggest Mistake.
Posted Sep 8, 2018 15:03 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (1 responses)
Posted Sep 13, 2018 11:45 UTC (Thu)
by nix (subscriber, #2304)
[Link]
In the absence of loops I don't see that target-dependent loop optimizations are going to be particularly relevant (except, I suppose, for complete unrolling.)
Posted Sep 11, 2018 17:06 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
A C compiler is like a cow. It's big and complex (how many stomachs?) because it has to be if it wants to stand a chance extracting anything of value from the extremely low quality food it takes.
Assembler, on the other hand, is like a dog. It's much simpler because transition from meat to meat is kind of easy.
But C? Nope. It has to work *hard*.
Posted Sep 12, 2018 11:27 UTC (Wed)
by dbkm11 (guest, #125598)
[Link]
Writing network flow dissectors in BPF
As long as it's compiled to similar machine code (iirc, ebpf is compiled on load) then there's no reason it shouldn't run at a similar speed, is there?
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
[2] ...and passing them in a way that access bounds remain easily statically provable despite C's pointer decay semantics. (E.g. keep pointer derivations in lexically close proximity to parameter declarations and variable definitions containing the array.) Relying on the compiler to hoist bounds checking out of loops has proven spotty at best. Like JIT optimizations and performance claims, it's theoretically workable but difficult for compilers to resolve consistently. Plus, such hoisting can be done more reliably in something like BPF because there are more invariants enforced from the outset.
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
> [1] For example, using fixed-sized arrays...
> [2] ...and passing them in a way that access bounds remain easily statically provable despite C's pointer decay semantics.
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF
Writing network flow dissectors in BPF