Nftables reaches 1.0
The first public nftables release was made by Patrick McHardy in early 2009. At that time, the kernel had a capable packet-filtering subsystem in the form of iptables, of course, that was in widespread use, but there were a number of problems driving a change. These include the fact that the kernel had (and still has) more than one packet-filtering mechanism: there is one for IPv4, another for IPv6, yet another for ARP, and so on. Each of those subsystems is mostly independent, with a lot of duplicated code. Beyond that, iptables contains an excessive amount of built-in protocol knowledge and suffers from a difficult API that, among other things, makes it impossible to update a single rule without replacing the entire set.
The core idea behind nftables was to throw away all of that protocol-aware machinery and replace it with a simple virtual machine that could be programmed from user space. Administrators would still write rules referring to specific packet-header fields and such, but user-space tooling would translate those rules into low-level fetch and compare operations, then load the result into the kernel. That resulted in a smaller packet-filtering engine that was also far more flexible; it also had the potential to perform better. It looked like a win, overall, once the minor problem of transitioning a vast number of users had been overcome.
Nftables made a bit of a splash when it was launched, but then bogged down and disappeared from view, perhaps because McHardy decided he had more interesting opportunities to pursue in courtrooms. In 2013, though, Pablo Neira Ayuso restarted the project with the idea of getting the code merged into the mainline as soon as possible. That part succeeded; nftables found its way into the 3.13 kernel release at the beginning of 2014.
The work since then has been a hard slog of filling in the gaps and making nftables sufficiently appealing that users would want to make the transition. The language used to write filtering rules has gained a long list of features for stateful tracking, address mapping, efficient handling of address intervals and large rule chains, and support for numerous protocols. There was also documentation to write, of course; the nftables wiki has a lot of information about how it all works.
There is, of course, one other significant impediment to transitioning away from iptables: the vast number of deployed, working firewalls using the latter. In many cases, rewriting the firewall rules may be the best course of action because many complex filtering setups can be expressed much more efficiently in the new scheme. But, for administrators who just want their painfully developed firewall to keep working, the benefits of nftables may be less appealing than one might expect. The nftables developers have developed a set of scripts to translate iptables firewalls into the nftables equivalent, which should help, but it is still a big jump.
In some cases, users may eventually make that jump without even noticing, though. Linux distributions have carried support for nftables for some time now, and work is being done to port tools like Red Hat's firewalld to nftables. In cases like this, users may have never seen the iptables rules in the first place and, with luck, will not notice that the underlying mechanism has changed.
When will that change happen? It is still somewhat hard to say. The 2018 Netfilter
Workshop decreed that iptables is "a legacy tool
" whose
days are numbered. Debian switched to nftables by default in the 2019
Debian 10 "buster"
release, though Ubuntu didn't follow until the 21.04 release. While almost
all distributions
ship nftables, many of them have yet to make the switch to use it by
default.
The release of nftables 1.0.0 can be seen as a signal that it is time for the laggards to get more serious about making the switch. While it is hard to imagine iptables support being removed anytime soon, it's rather easier to foresee that enthusiasm for maintaining it will continue to wane. New features will show up in nftables instead, and users will eventually need to migrate over to take advantage of them. It only took 13 years, but this transition finally appears to be heading into its final stage.
There is, however, one other interesting question. In 2018, the BPF developers announced bpfilter, a packet-filtering mechanism that runs on the BPF virtual machine. The announcement drew some attention at the time; BPF had (and has) a lot of momentum, and a lot of work has been done to optimize the virtual machine and make it safe to use. Arguably, it makes sense to use that rather than maintain yet another virtual machine just for packet filtering. That would allow the removal of a bunch of code and the focusing of maintenance effort on BPF.
The bpfilter code was merged for the 4.18 kernel release; it also brought in a "user-mode blobs" mechanism that was intended to facilitate the translation of firewall rules to the new machine. Since then, however, development on this code has come to a halt; there have been exactly two (trivial) commits to the code in net/bpfilter in 2021. The removal of this code was discussed in June 2020 but it survived at that time. Since then, the cobwebs have only gotten thicker; it seems fair to say that bpfilter is not an active area of development at this point, and that it seems unlikely to displace nftables anytime soon.
Whether that is the "right" outcome is hard to say. Perhaps the
special-purpose virtual machine used by nftables is a better solution to
this particular problem than the more general BPF. Or possibly nftables
came out on top simply because the developers behind it continued to show
up and push the project forward. One of the keys to success in kernel
development is simple persistence; that is doubly true for a critical
subsystem like packet filtering, where it is more than reassuring to know that
the developers are in it for the long haul.
Index entries for this article | |
---|---|
Kernel | Networking/Packet filtering |
Kernel | Nftables |
Posted Aug 27, 2021 15:46 UTC (Fri)
by magfr (subscriber, #16052)
[Link] (1 responses)
Posted Aug 27, 2021 18:11 UTC (Fri)
by aszs (subscriber, #50252)
[Link]
Posted Aug 27, 2021 15:57 UTC (Fri)
by johill (subscriber, #25196)
[Link] (2 responses)
Today NFT has a whole bunch of 'eval' methods, so to compile to BPF you just need to have a function that returns a few BPF instructions instead. Where not implemented, provide a BPF helper function that calls the existing eval function from BPF.
It doesn't even seem that hard, and if you implement the most commonly used 'eval' methods directly and then send the program through the compiler you'll probably already win something?
Posted Aug 29, 2021 7:12 UTC (Sun)
by nilsmeyer (guest, #122604)
[Link] (1 responses)
Under: "Bringing in BPF"
Posted Aug 29, 2021 7:43 UTC (Sun)
by johill (subscriber, #25196)
[Link]
Posted Aug 27, 2021 22:54 UTC (Fri)
by jkingweb (subscriber, #113039)
[Link] (6 responses)
I'm not sure what documentation I was following at the time; it may have led me down suboptimal paths, or things may have improved in the years since. I'll have to give it another look!
Posted Aug 28, 2021 6:47 UTC (Sat)
by wtarreau (subscriber, #51152)
[Link] (1 responses)
The really nice thing compared to iptables is the instant and atomic load of the rules. No more situation where the nat table loads while the filter table fails etc. And the ability to define objects supporting lists about everywhere (ports, hosts etc) is great. I used to do that using scripts requiring a more complex language to automatically produce iterations. Now it is natural in the config language.
What still really annoys me is the lack of command-line help. I promised Pablo I would some day send him a patch for this but still failed to find sufficient time to work on it. Having to go to the wiki to figure you need to type "nft list rulesets" after not having used it for 2 months is pretty annoying, especially when you've been used to "iptables -h" providing very detailed syntax information. But this minor user-interface aspect aside, nftables is a great technology that is far closer from the spirit of traffic filtering than ipfwadm, ipchains or iptables could be, making it extremely user-friendly.
It's difficult to adopt it, but it's really worth it. Most of the effort is to convert the existing config. I would strongly encourage new firewall deployments to start with nftables, as it will be much easier than iptables for the first setup, an will not require any conversion.
Posted Aug 29, 2021 3:59 UTC (Sun)
by josh (subscriber, #17465)
[Link]
But I do wish the documentation was much better, especially the documentation for the kernel-to-userspace interfaces.
Posted Aug 28, 2021 15:41 UTC (Sat)
by hailfinger (subscriber, #76962)
[Link] (3 responses)
Overall, I think nftables has a nice future ahead and I'm looking forward to testing nftables 1.0.
Posted Aug 28, 2021 20:25 UTC (Sat)
by pbonzini (subscriber, #60935)
[Link] (2 responses)
Ahah, that's actually a coincidence. GCC error messages for C were bad mostly due to the usage of yacc for the parser. When the parser was rewritten as recursive decent in 2004 by Joseph Myers that laid the foundation for improving error recovery. They then finally improved when GCC developers including myself got fed up of a few particularly egregious cases[1][2].
But competition with llvm wasn't particularly involved. In fact for C++ (which used recursive descent since before clang was started) error message quality has always been comparable to clang.
More recently (and long after I had stopped working on GCC), David Malcolm did a huge amount of work on caret diagnostics, where GCC's front ends were indeed lagging behind. But that's a different story.
[1] https://gcc.gnu.org/legacy-ml/gcc-patches/2010-10/msg0261...
[2] https://gcc.gnu.org/legacy-ml/gcc-patches/2010-11/msg0180...
Posted Aug 28, 2021 21:24 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (1 responses)
Posted Aug 30, 2021 13:22 UTC (Mon)
by pbonzini (subscriber, #60935)
[Link]
In C, the problem was abysmal error recovery, causing dozens of cascaded errors for a single missing semicolon or fat-fingered type name (such as "intt" or "unsgined char"). With a recursive descent parser it's relatively easy and maintainable to add heuristics that look ahead and insert missing tokens or fix things up as necessary. For example if you see two consecutive unknown identifiers, it's likely that the first is a misspelled type and the second is a variable name. With some luck, that will remove a lot of errors involving that variable, because the compiler now knows about it and treats it as declared.
Posted Aug 29, 2021 13:49 UTC (Sun)
by moorray (guest, #54145)
[Link] (1 responses)
Is this timeline right? I remember wondering if Patric got unhinged *because* Pablo’s implementation got picked over his.. The quote makes it sound like Pablo came after. It was before my time tho.
Posted Aug 30, 2021 19:59 UTC (Mon)
by armijn (subscriber, #3653)
[Link]
Posted Aug 30, 2021 6:46 UTC (Mon)
by carORcdr (guest, #141301)
[Link]
Glad to see a substantive article on nftables and iptables. Do you have any numbers on the use in production firewalls?
In Rusty's words:
When your Linux box is the only thing between the chaos of the Internet and your nice, orderly network, it's nice to know you can restrict what comes tromping in your door.
Rusty Russel, Linux IPCHAINS-HOWTO, v1.0.8 (2000-07-04)
Posted Aug 30, 2021 9:13 UTC (Mon)
by taladar (subscriber, #68407)
[Link] (6 responses)
When I call ntf --help I get
> Usage: nft [ options ] [ cmds... ]
but not a single command is listed in the help output, nor another command/option that would display that information.
When I try ntf help I get
> Operation not permitted (you must be root)
which seems like a weird mix of errors and also "unexpected newline" is an odd error to emit for commandline parameters, not to mention that it is far too low level in general.
There is also no obvious option in the --help output to list the currently active ruleset.
On top of that, since firewalls are quite complex we will be unlikely to maintain an iptables and an nftables version of our rulesets in our Puppet configuration management so a working and usable and fully featured version will have to be part of the oldest distros we use before it is even something to consider, so I would imagine nothing will happen before about 2030 since the current version doesn't really look usable yet.
Posted Aug 30, 2021 14:35 UTC (Mon)
by nybble41 (subscriber, #55106)
[Link] (5 responses)
Posted Sep 1, 2021 18:58 UTC (Wed)
by Chousuke (subscriber, #54562)
[Link] (4 responses)
For example, if you wanted to know how to perform a 1:1 nat for an entire IP prefix, the manual page would not help because it doesn't even mention that you can use bitwise operators (&, |) with netmasks to perform calculations and modifications on packet fields.
I know there's a partial sentence somewhere on the wiki page that indirectly hints at this being possible because I found it some time ago when I had to do prefix translation, but I can't find it anymore.
nftables is capable, but its documentation makes me sad. It's unbeliveably bad.
Posted Sep 1, 2021 19:27 UTC (Wed)
by Chousuke (subscriber, #54562)
[Link] (1 responses)
I tried finding the relevant documentation from the wiki page but I can't; I've forgotten where I found it the last time. The manual page says "Expressions can be combined using binary, logical, relational and other types of expressions", but *nowhere* does it detail what those expressions "binary", "logical" or "relational" expressions are. It doesn't even contain the word "operator".
I did find out that man libnftables-json at least lists "binary operations", but there's no context.
Just in case someone ends up needing it, you can do stuff like this:
ip daddr 10.240.1.0/24 dnat to ip daddr & 0.0.0.255 | 10.140.7.0;
I don't even remember how I figured that out the first time, but it wasn't thanks to the documentation.
Posted Sep 9, 2021 4:48 UTC (Thu)
by chaispaquichui (guest, #77035)
[Link]
Posted Sep 2, 2021 5:19 UTC (Thu)
by carORcdr (guest, #141301)
[Link] (1 responses)
There are many non-iproute2 programs, including significant ones, that have far fewer examples. Some have null.
My definition of an example in the context of a program is a command string--
$|# program argument[s] file|filepath
I realize some may limit the definition of string to alphabetic characters. I do not. My definition of string is a string of characters--alphabetic, numeric and/or symbolic.
Posted Sep 3, 2021 19:10 UTC (Fri)
by Chousuke (subscriber, #54562)
[Link]
Lately I've felt a bit spoiled by OpenBSD manual pages. If you want to know what good documentation with man pages can look like, you can take a look at some of them. If everything were documented to the same standard I would never need Google...
For example, If I want a quick overview on how OSPF works, I can just "man ospfd" on OpenBSD. The explanation may not strictly speaking have much to do with configuring ospfd itself, but well-placed context "fluff" is a huge quality-of-life improvement as it helps me understand the kinds of problems I can solve with the software.
Posted Aug 30, 2021 11:22 UTC (Mon)
by evgeny (guest, #774)
[Link]
Posted Sep 1, 2021 18:17 UTC (Wed)
by flussence (guest, #85566)
[Link] (1 responses)
I'd write out a laundry list of the snags I hit regularly but it turns out one already exists (https://bugzilla.netfilter.org/show_bug.cgi?id=1461). I've got a fail2ban setup that barely works; sometimes after a few hours of operation it refuses to add an address to a set that doesn't contain it (this smells like unhandled hash collision... bug 1392?) — much worse is that sometimes adding a /32 is randomly and silently corrupted into a range covering half the ipv4 internet (bug 1438 - note that it's happened to me even though I'm not setting auto-merge). I've found that appending a literal "/32" to the input prevents the latter, but I don't understand why.
In spite of that I'll continue to use it because it's easier to reason about rules that look like C instead of COBOL. The fundamental design at least seems sound and none of my gripes are unsolvable, I just wish I didn't have to handhold it so much.
Posted Sep 9, 2021 4:13 UTC (Thu)
by splitice (guest, #154172)
[Link]
Fingers crossed bpfilter will hit the mark better.
Posted Sep 2, 2021 14:01 UTC (Thu)
by ecree (guest, #95790)
[Link]
I heard nothing back, leading me to suspect that maybe the problem is that no-one *else* can remember all the corners of iptables either. 'The implementation is the spec' is fine until you want to replace the implementation.
Nftables reaches 1.0
Packet filtering.
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
https://lwn.net/Articles/747551/
> One of the core design features for bpfilter is the ability to translate existing iptables rules into BPF programs.
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
- The syntax is nice once you get used to it and I think most of it is more easily readable due to the structure
- The documentation was incomplete, especially for NAT
- If the documentation says that two ways to specify a rule are equivalent you should verify that instead of blindly rewriting working rules
- Concatenations are cool, but rarely work
- Order within in a single rule matters sometimes
- Combining the same rule from "table ip nat" and "table ip6 nat" into "table inet nat" only works in some cases
- If your kernel and the nftables userspace are not the same age you will run into problems, so either upgrade both or none, this may be different now that 1.0 is released
- Kernel 5.10 is roughly where most of the interesting functions start working if your userspace is new enough and nftables 0.9.6 (Buster Backports) is similarly a point where things start working better
- On Debian Buster without backports the whole thing is really painful, it's manageable with backports
- Priorities as keywords (introduced in 0.9.6) instead of priorities as numbers helps a lot with readability compared to older versions
- Error messages exist, but in netfilter 0.9.6 (from 2020) they were as helpful as gcc error messages ("error: expected ‘asm’ or ‘__attribute__’" instead of "missing semicolon") from the era before llvm, they are a bit better now
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
> [...]
> Error: syntax error, unexpected newline, expecting string
> help
> ^
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0
Nftables reaches 1.0