BPF comes to firewalls
It may be tempting to think that iptables has been the kernel's packet-filtering implementation forever, but it is a relative newcomer, having been introduced in the 2.4.0 kernel in 2001. Its predecessors (ipchains, introduced in 2.2.10, and ipfwadm, which dates back to 1.2.1 in 1995) are mostly forgotten at this point. Iptables has served the Linux community well and remains the firewalling mechanism that is most widely used, but it does have some shortcomings; it has lasted longer than the implementations that came before, but it is clearly not the best possible solution to the problem.
The newer nftables subsystem, merged for the 3.13 kernel release in early 2014, introduced an in-kernel virtual machine to implement firewall rules; users have been slowly migrating over, but the process has been slow. For some strange reason, system administrators have proved reluctant to throw away their existing firewall configurations, which were painful to develop and which still function as well as they ever did, and start over with a new and different system.
Still, it was logical to assume that nftables would eventually take over, especially as the iptables compatibility layers improved. Some people started to doubt this story, though, when serious development started on the BPF virtual machine. There seemed to be a lot of overlap between the two virtual machines, and BPF was being quickly extended in ways that improved its performance, functionality, and security. Even so, nftables development has continued, and there has been little talk — until now — of pushing BPF into the core of the firewalling code.
Bringing in BPF
The announcement of bpfilter changes that situation, though. In short, bpfilter enables the creation of BPF programs that can be attached to points in the network packet path and make filtering decisions. In the proof-of-concept patches, those programs are attached at the express data path (XDP) layer, where they are run from the network-interface drivers. But, as Daniel Borkmann noted in the introduction to the patches, BPF programs could be just as easily attached at any other point in the path, allowing them to make decisions at the same points that iptables rules do.
There are a number of advantages claimed for the bpfilter approach. BPF programs can be just-in-time compiled on most popular architectures, so they should be quite fast. The work that has been done to enable the offloading of XDP-level programs to the network interface itself can come into play here, moving firewall processing off the host CPU entirely. The use of BPF enables the writing of firewall rules in C, which may appeal to some developers who are starting from the beginning. And firewall code would be subject to the BPF verifier, adding a layer of security to the whole system.
One of the core design features for bpfilter is the ability to translate existing iptables rules into BPF programs. This feature is intended to make it easy for existing firewall configurations to be moved over to the new scheme, perhaps without system administrators even knowing that it is happening. This translation is done in an interesting manner. Iptables rules are passed to the kernel, so the kernel must take responsibility for doing that work, but the task can be a complex one that would benefit from a user-space implementation.
To enable such an implementation, the bpfilter developers have created a new mechanism that supports the creation of a special type of kernel module to handle this kind of task. These modules would be part of the kernel and would be shipped by distributors as just another .ko file, but they would contain an ordinary ELF executable. After the module has been loaded, its code can be run in a separate user-space process; all that is required is a call to a special version of call_usermodehelper().
This mechanism allows the translation code to be managed as if it were just another part of the kernel. That code can be developed in user space, though. When it runs, the translation code will be separated from the kernel, making it harder to attack the kernel via that path. If this mechanism catches on, one can imagine that a number of other tasks could eventually be pushed out of the kernel proper into one of these special user-space modules. Developers should be careful, though; this could prove to be a slippery slope leading toward something that starts to look like a microkernel architecture.
Early responses
There have not been a whole lot of comments thus far on the code itself. That may be partly because, in their haste to get a proof of concept out to illustrate the idea, the developers never quite got around to writing comments in the code — or even changelogs for the patches. The idea itself, though, has raised concerns for some developers.
Harald Welte, who is not often seen in this community these days, showed up with a number of questions. At the top of his list was the decision to emulate iptables rules with the new BPF mechanism. If the new subsystem is to ever replace the iptables implementation, it will need to implement exactly the same behavior; small and subtle differences could introduce security problems into deployed firewall configurations. Given the complexity of iptables, the chances of such differences happening are significant.
More fundamentally, the networking developers have wanted to phase out iptables and its user-space interfaces for some time. Iptables has not aged entirely well. For example, there is no way to add or replace a single rule (or small set of rules); iptables can only wipe out the entire configuration and start from scratch. That makes firewall changes expensive; it also gets difficult to coordinate changes when they are being made by multiple actors at once. The increasing use of containers has created just this kind of situation; addressing this problem requires moving away from the iptables API. The fact that iptables requires separate rule sets for IPv4 and IPv6 creates a pain point for administrators as well.
Implementing the iptables API with bpfilter, Welte said, will "risk
perpetuating the design mistakes we made in iptables some 18 years ago
for another decade or more
". It will push back the (already
distant) date when that API could be deprecated and removed.
Rather than focusing on iptables, Welte said, the developers should create
an emulation of the newer nftables API, which was designed with the lessons
from iptables in mind. That would support sites that have
already migrated and encourage that migration to continue.
Networking maintainer David
Miller (who authored some of the new code) replied that iptables is still far more widely
used, so implementing that interface provides for better testing coverage in
the near
term. Welte answered, though, that most of
the biggest use cases (Docker and Kubernetes, for example) use the
command-line tools rather than the iptables API, so there is no need to
implement emulation of the API itself to test with those systems. Miller,
however, disagreed
with the idea that the iptables binaries could be easily replaced on
deployed systems: "Like it or not iptables ABI based filtering is going to be in the data
path for many years if not a decade or more to come
".
Interestingly, while there was talk of implementing the nftables API, nobody has yet questioned the idea of applying the BPF virtual machine to firewalls, even though it would be likely to supplant nftables relatively quickly. Instead, Miller said in the discussion that nftables failed to address the performance problems in Linux's packet-filtering implementation, driving users toward user-space networking technologies instead. There is a real possibility that nftables could end up being one of those experiments that is able to shed some light on the problem space but never takes over in the real world.
Overall, bpfilter is an extremely young project and there are a lot of questions yet to be answered about it. While much of the packet-filtering logic can likely be expressed in BPF code, there are more advanced features (like connection tracking, pointed out by Florian Westphal) that are still likely to need a fair amount of kernel support. There are no performance numbers with the patch set, so any performance gains are still theoretical at this point. And the code itself is quite young, lacking both features and documentation.
The end result is that we'll probably not see bpfilter in the mainline
kernel in the immediate future. Given the developers who have worked on
it, though, bpfilter is clearly a serious initiative that is firmly aimed
at getting into the mainline eventually. If it truly proves to be a better
solution to the network packet-filtering problem, those developers seem
likely to prevail eventually.
Index entries for this article | |
---|---|
Kernel | BPF/Networking |
Kernel | Modules/ELF modules |
Kernel | Networking/Packet filtering |
Posted Feb 20, 2018 4:35 UTC (Tue)
by eahay (guest, #110720)
[Link] (6 responses)
Posted Feb 20, 2018 6:46 UTC (Tue)
by kay (subscriber, #1362)
[Link]
Posted Feb 20, 2018 12:19 UTC (Tue)
by bernat (subscriber, #51658)
[Link] (1 responses)
Posted Feb 24, 2018 20:07 UTC (Sat)
by kleptog (subscriber, #1183)
[Link]
In any case, if we do firewall rules as BPF we end up with the same problem surely? The performance improvement would be that you can pass your firewall through an compiler/optimiser to make it more efficient, but as a side effect you end up with the same problem, namely, to update a single rule you need to replace the whole program. Only now you've added an optimise step in between.
Unless you change your API to transactional one where you can send updates and get a confirmation asynchronously and the backend is smart enough to avoid actually updating the kernel for every change.
Posted Apr 19, 2018 2:26 UTC (Thu)
by manhnt (guest, #123784)
[Link] (2 responses)
Posted Aug 13, 2018 4:07 UTC (Mon)
by fest3er (guest, #60379)
[Link]
Posted Aug 13, 2018 16:37 UTC (Mon)
by antiphase (subscriber, #111993)
[Link]
Posted Feb 20, 2018 6:42 UTC (Tue)
by valberg (guest, #83862)
[Link]
Posted Feb 20, 2018 7:01 UTC (Tue)
by epa (subscriber, #39769)
[Link] (2 responses)
Posted Feb 20, 2018 7:17 UTC (Tue)
by dgm (subscriber, #49227)
[Link]
Posted Feb 20, 2018 16:17 UTC (Tue)
by josh (subscriber, #17465)
[Link]
Posted Feb 20, 2018 7:51 UTC (Tue)
by vadim (subscriber, #35271)
[Link] (4 responses)
nftables is a quite nice idea. I think the problem with it was that they were slow at implementing the last few features that were actually quite important. For instance, nftables can do MSS clamping only since kernel 4.14. This was released this November. nftables has been around since 2014, like this article says. MSS clamping is a feature in wide use for DSL and fiber setups, and this is important precisely to the kinds of people that want to run their own firewall. IMO, the other problem with it is that the documentation is still not great, and the syntax leaves a lot to be desired. For instance, nftables involves stringing together commands in a way that highly resembles a run-on sentence: It's not immediately obvious how the syntax works and what words fit in where in the hierarchy. The way "ppp0" is not quoted or delimited in any way also makes it hard to tell apart commands from data, though this can be done as seen below. There's a C-ish form that looks a bit nicer, but then when you run into a command that starts with "nft add" it's not obvious how to put that into your config file, which looks like: Note how it is subtly different: we go from "ip filter" to "table ip filter", and from "forward" to "chain forward", and for someone not familiar with the syntax it's not really apparent that "oifname" in the first example is the point where you'd want to start copy/pasting. I hope that besides the technical details, the makers of BPF also take care of producing a better syntax and good documentation.
Posted Feb 20, 2018 15:25 UTC (Tue)
by ringerc (subscriber, #3071)
[Link] (2 responses)
Posted Feb 20, 2018 16:17 UTC (Tue)
by flussence (guest, #85566)
[Link] (1 responses)
Posted Feb 21, 2018 0:15 UTC (Wed)
by florianfainelli (subscriber, #61952)
[Link]
Posted Mar 1, 2018 11:31 UTC (Thu)
by jengelh (guest, #33263)
[Link]
This is where the iptables UI excels - the tokens for "options" and tokens for "values" never ever overlap, I am tempted to say *context-free*. The nft "tcp" instead could either mean "-p tcp" or "--tcp-flags ..." depending on where it's located, and what makes the bpf/ip/tc/nft syntax so terrible.
Posted Feb 20, 2018 12:56 UTC (Tue)
by iq-0 (subscriber, #36655)
[Link]
But the real challenges are often not the ruleset overhead, but are related to connection tracking, matching against advanced set datastructures and in the interaction with the rest of the network stack. I feel like here is a basic conflict between calling kernel functions to get better access to advanced algorithms and datastructures and the basic JIT and offloading story of bpfilter.
And didn't BPF programs have a size constraint? Or is that something that can be worked around using BPF_MAP_TYPE_PROG_ARRAY?
Posted Feb 20, 2018 13:40 UTC (Tue)
by kooky (subscriber, #92468)
[Link] (2 responses)
I've been using nftables and find it just works now I've got the hang.
Tim
Posted Feb 21, 2018 0:20 UTC (Wed)
by florianfainelli (subscriber, #61952)
[Link] (1 responses)
https://www.mail-archive.com/netdev@vger.kernel.org/msg21...
Posted Feb 21, 2018 13:26 UTC (Wed)
by pomac (subscriber, #94901)
[Link]
Anyway, for those that want to follow the threads in a easier manner:
https://marc.info/?l=linux-netdev&m=151905824829539&... - [PATCH RFC PoC 0/3] nftables meets bpf
Posted Feb 23, 2018 23:01 UTC (Fri)
by ofranja (guest, #11084)
[Link]
Or, even further, an exokernel architecture.
In the original MIT exokernel research (~1995), a packet filter language w/JIT compiler is mentioned as a way to filter and delegate network traffic to userspace with minimal [1] kernel support (although not necessarily using these terms, but the general idea is the same).
[1] https://pdos.csail.mit.edu/archive/exo/exo-slides/sld011.htm
Posted Jul 26, 2019 20:13 UTC (Fri)
by valentine (guest, #133435)
[Link]
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
nft add rule ip filter forward oifname ppp0 tcp flags syn tcp option maxseg size set 1452
table ip filter {
# allow packets from LAN to WAN, and WAN to LAN if LAN initiated the connection
chain forward {
iifname "lan0" oifname "wan0" accept
}
}
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
>
> nft add rule ip filter forward oifname ppp0 tcp flags syn tcp option maxseg size set 1452
>
>It's not immediately obvious how the syntax works and what words fit in where in the hierarchy.
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
BPF comes to firewalls
https://marc.info/?l=netfilter-devel&m=15187884440366... - [PATCH RFC 0/4] net: add bpfilter
BPF comes to firewalls
BPF comes to firewalls
I've some small questions about the post.
- What is the relationship between BPF and eBPF?
- I still haven't understood how to work BP: are attached at the express data path (XDP) layer, so they are run from the network-interface drivers or not? If yes the NIC drivers must be rewritten?