The return of nftables
A firewall works by testing a packet against a chain of one or more rules. Any of those rules may decide that the packet is to be accepted or rejected, or it may defer judgment for subsequent rules. Rules may include tests that take forms like "which TCP port is this packet destined for?", "is the source IP address on a trusted network?", or "is this packet associated with a known, open connection?", for example. Since the tests applied to packets are expressed in networking terms (ports, IP addresses, etc.), the code that implements the firewall subsystem ("netfilter") has traditionally contained a great deal of protocol awareness. In fact, this awareness is built so deeply into the code that it has had to be replicated four times — for IPv4, IPv6, ARP, and Ethernet bridging — because the firewall engines are too protocol-specific to be used in a generic manner.
That duplication of code is one of a number of shortcomings in netfilter that have long driven a desire for a replacement. In 2009, it appeared that such a replacement was in the works when Patrick McHardy announced his nftables project. Nftables replaces the multiple netfilter implementations with a single packet filtering engine built on an in-kernel virtual machine, unifying firewalling at the expense of putting (another) bytecode interpreter into the kernel. At the time, the reaction to the idea was mostly positive, but work stalled on nftables just the same. Patrick committed some changes in July 2010; after that, he made no more commits for more than two years.
Frustrations with the current firewalling code did not just go away, though. Over time, it also became clear that a general-purpose in-kernel packet classification engine could find uses beyond firewalls; packet scheduling is another fairly obvious possibility. So, in October 2012, current netfilter maintainer Pablo Neira Ayuso announced that he was resurrecting Patrick's nftables patches with an eye toward relatively quick merging into the mainline. Since then, development of the code has accelerated, with nftables discussion now generating much of the traffic on the netfilter mailing list.
Nftables as it exists today is still built on the core principles designed by Patrick. It adds a simple virtual machine to the kernel that is able to execute bytecode to inspect a network packet and make decisions on how that packet should be handled. The operations implemented by this machine are intentionally basic: it can get data from the packet itself, look at the associated metadata (which interface the packet arrived at, for example), and manage connection tracking data. Arithmetic, bitwise, and comparison operators can be used to make decisions based on that data. The virtual machine is capable of manipulating sets of data (typically IP addresses), allowing multiple comparison operations to be replaced with a single set lookup. There is also a "map" type that can be used to store packet decisions directly under a key of interest — again, usually an IP address. So, for example, a whitelist map could hold a set of known IP addresses, associating an "accept" verdict with each.
Replacing the current, well-tuned firewalling code with a dumb virtual machine may seem like a step backward. As it happens, there are signs that the virtual machine may be faster than the code it replaces, but there are a number of other advantages independent of performance. At the top of the list is removing all of the protocol awareness from the decision engine, allowing a single implementation to serve everywhere a packet inspection engine is required. The protocol awareness and associated intelligence can, instead, be pushed out to user space.
Nftables also offers an improved user-space API that allows the atomic replacement of one or more rules with a single netlink transaction. That will speed up firewall changes for sites with large rulesets; it can also help to avoid race conditions while the rule change is being executed.
The code worked reasonably well in 2009, though there were a lot of loose ends to tie down. At the top of Pablo's list of needed improvements to nftables when he picked up the project was a bulletproof compatibility layer for existing netfilter-based firewalls. A new rule compiler will take existing firewall rules and compile them for the nftables virtual machine, allowing current firewall setups to migrate with no changes needed. This compatibility code should allow nftables to replace the current netfilter tables relatively quickly. Even so, chances are that both mechanisms will have to coexist in the kernel for years. One of the other design goals behind nftables — use of the existing netfilter hook points, connection-tracking infrastructure, and more — will make that coexistence relatively easy.
Since the work on nftables restarted, the repository has seen over 70 commits from a half-dozen developers; there has also been a lot of work going into the user-space nft tool and libnftables library. The kernel changes have added missing features (the ability to restore saved counter values, for example), compatibility hooks allowing existing netfilter extensions to be used until their nftables replacements are ready, many improvements to the rule update mechanism, IPv6 NAT support, packet tracing support, ARP filtering support, and more. The project appears to have picked up some momentum; it seems unlikely to fall into another multi-year period without activity before being merged.
As to when that merge will happen...it is still too early to say. The developers are closing in on their set of desired features, but the code has not yet been exposed to wide review beyond the netfilter list. All that can be said with certainty is that it appears to be getting closer and to have the development resources needed to finish the job.
See the nftables web
page for more information. A terse but
useful HOWTO document has been posted by Eric Leblond; it is probably
required reading for anybody wanting to play with this code, but a quick,
casual
read will also answer a number of questions about what firewalling will look
like in the nftables era.
| Index entries for this article | |
|---|---|
| Kernel | Networking/Packet filtering |
| Kernel | Nftables |
Posted Aug 21, 2013 5:48 UTC (Wed)
by kugel (subscriber, #70540)
[Link] (5 responses)
Posted Aug 21, 2013 15:55 UTC (Wed)
by johill (subscriber, #25196)
[Link] (4 responses)
Posted Aug 21, 2013 22:01 UTC (Wed)
by ncm (guest, #165)
[Link] (3 responses)
Posted Aug 21, 2013 23:50 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (2 responses)
So, it's kind of a moot point. It would be one thing if the project stalled before surpassing BPF in functionality. Then we could all jeer "I told you so". But this doesn't seem to be one of those occasions. nftables seemed to stall simply because too many people are comfortable with iptables, and are heavily invested in the arcane common-line syntax. And those who aren't can shift to using PF on OpenBSD or FreeBSD. Plus NetBSD has NPF, now, which is pretty cool.
Posted Aug 22, 2013 16:57 UTC (Thu)
by intgr (subscriber, #39733)
[Link] (1 responses)
One of the advantages of BPF is that Linux already has a working BPF JIT compiler for many architectures (x86, ARM, SPARC, POWER and S/390). This is a non-trivial amount of code.
Posted Aug 22, 2013 18:25 UTC (Thu)
by raven667 (subscriber, #5198)
[Link]
Posted Aug 21, 2013 9:08 UTC (Wed)
by rvfh (guest, #31018)
[Link] (1 responses)
Posted Aug 22, 2013 8:07 UTC (Thu)
by tobur (guest, #89244)
[Link]
The idea is of course not to mess up too much with users habits, this makes using nftables totally transparent for those who will want to keep using iptables. And I bet most will do for a while.
Posted Aug 21, 2013 9:48 UTC (Wed)
by patrick_g (subscriber, #44470)
[Link] (1 responses)
Posted Aug 21, 2013 20:26 UTC (Wed)
by jengelh (guest, #33263)
[Link]
At first nftables chugged along (it was presented on the workshop as early as 2008, corbet!) until I pushed for early merging of a nextgen packetfilter (which happened to be xt2 code, but that is not the point), in the style of btrfs. But the powers that be wanted rather complete implementations instead. Doable, tho...
With NFWS 2013, everybody (especially new contributors) sided with nftables by way of submitting many patches there. xt2 got no support and has been rendered incompetetive too, with nft taking up and reimplementing ideas I had for xt2. That situation is very discouraging.
I temporarily work on more rewarding projects.
Posted Aug 21, 2013 14:03 UTC (Wed)
by dambacher (subscriber, #1710)
[Link] (12 responses)
Wich ones am I missing?
Posted Aug 21, 2013 15:51 UTC (Wed)
by johill (subscriber, #25196)
[Link] (11 responses)
There's also an ASN.1 decoder, though I'm not really sure what that really is.
Posted Aug 21, 2013 18:35 UTC (Wed)
by aliguori (subscriber, #30636)
[Link] (10 responses)
Many protocols use ASN.1 such as CIFS, X509, etc.
Posted Aug 22, 2013 0:21 UTC (Thu)
by wahern (subscriber, #37304)
[Link] (9 responses)
http://www.itu.int/ITU-T/asn1/uses/
It's a darned shame that ASN.1 support isn't more widespread in the FOSS world. Even Perl modules are crappy. OpenSSL has a fairly complete library, but it's a gigantic PITA to use, like most FOSS ASN.1 tools. Really, ASN.1 was meant to be automatically compiled from the message description to code, with your application code manipulating a real data structure. This is the best open source ASN.1 project I'm aware of:
http://lionet.info/asn1c/compiler.html
It supports streaming parsing and composition without being tied to any I/O model. The only downside is that strings and arrays are always dynamically allocated, which makes constructing and destroying messages fairly verbose, especially if you care about malloc failure. Some proprietary ASN.1 compilers support fixed length arrays which make life a little easier when you're dealing with several simple string fields or lists with a small, finite limit on their length. That makes it easier to use message caches with simpler initialization.
Posted Aug 22, 2013 3:51 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
BTW, if something was initially designed by telecom guys then it is a great reason to avoid it like a plague.
Posted Aug 22, 2013 8:52 UTC (Thu)
by gdt (subscriber, #6284)
[Link] (5 responses)
Your "telecom guys" comment serves no purpose. It references an issue of twenty years ago. These days at IETF meetings you're just as likely to see a "telecom guy" arguing against some hacked-together draft and asking that time be taken to do it better, with the IP equipment vendors opposing that for reasons of "time to market".
Posted Aug 22, 2013 9:17 UTC (Thu)
by dlang (guest, #313)
[Link] (3 responses)
This is like the claims that transparent disk compression was made obsolete by faster disks (including, but not limited to SSDs). In some cases this is true and the system does have better things to spend it's processor time on, but in other cases, the system has more processing power than it needs while waiting for the I/O, so spending even a substantial amount of processing power to cut down on the amount of I/O needed can be a substantial win.
Posted Aug 22, 2013 10:25 UTC (Thu)
by khim (subscriber, #9252)
[Link] (2 responses)
Single-core CPU power basically flatlined: nine years ago top-of the line CPU was Pentium 4 570J @ 3.8GHz while today it's Core i7 4770K @ 3.9GHz. Micro-architectural improvements mean that today's Core i7 is faster then identically-clocked Pentium 4 from last decade, but difference is not striking. Meanwhile ethernet went from 10GbE to 100GbE, USB went from 480Mbit/s to 10GBit/s, even PCIExpress went from 250MB/s to 985MB/s! Sure, if you'll include number of cores in your analysis you'll find out that CPU copes more-of-less fine - but then latency is often limiting factor in communication protocols and SMP is not a big help there.
Posted Aug 22, 2013 12:57 UTC (Thu)
by intgr (subscriber, #39733)
[Link]
Depends on what you mean by "striking". I find that current server CPUs are 3-5 times faster at single-threaded workloads than 8-year-old single-core ones. Every generation of Intel processors still has performance gains of 10% or so while consuming less power.
I see your point about the overhead of complex protocol encodings. But if we go back to the original topic of firewalling: increasing network speeds would not be such a big problem if we weren't still be stuck with packet sizes that were designed for 10 Mbps networks.
Posted Aug 22, 2013 22:05 UTC (Thu)
by dlang (guest, #313)
[Link]
It's also _extremely_ unlikely that you are going to need to limit your computation to a single core. Using multiple cores is trivial if you have multiple communication streams to process (put each stream on it's own core, or processing of data from each interface on it's own core, etc). But even if you have one stream to process, you can almost always find a way to split the workload across multiple cores (one core works on what you are sending now, the second works on what you will be sending in a few hundred ms, etc)
so the move to multiple cores does end up helping the processing of things like this.
Posted Aug 22, 2013 12:40 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
As for compactness, GZIP+XML is about as compact as ASN.1 BER for most use-cases. Google's protobufs are also pretty compact.
Posted Aug 29, 2013 14:24 UTC (Thu)
by moltonel (subscriber, #45207)
[Link]
Concerning CPU usage, while hard to compare (the codec gives you many data verifications for free, which you have to write yourself with the likes of protobuf and msgpack), it is nothing to be ashamed of. And while some people have 10Gbits to play with, others are more concerned with the per-MB data roaming fee that is drilling a hole through their wallet.
Posted Aug 22, 2013 14:43 UTC (Thu)
by dw (guest, #12017)
[Link]
Also to my knowledge, nobody has yet to ship a Protocol Buffers implementation riddled with security holes due to specification complexity, although that might just be because nobody thought to look there yet..
Posted Aug 21, 2013 17:26 UTC (Wed)
by luto (guest, #39314)
[Link] (1 responses)
Posted Aug 22, 2013 2:41 UTC (Thu)
by ras (subscriber, #33059)
[Link]
To give Patrick his due, he has implemented good academic proposals in the past. (The HFSC qdisc springs to mind).
I hope he has seen this one.
Posted Aug 31, 2013 20:15 UTC (Sat)
by compte (guest, #60316)
[Link] (3 responses)
Posted Aug 31, 2013 21:05 UTC (Sat)
by nybble41 (subscriber, #55106)
[Link] (2 responses)
That sounds like a prime use case for the new(-ish) IP set rules. You wouldn't want to do 150,000 separate tests on every packet anyway. IP set matches are much more efficient. Note that once you've created the set, you can use the "save" and "restore" ipset commands to avoid running 150,000 "add" commands every time you set up your firewall rules. You can also add a range of IPs with a single command, e.g. "ipset add peerguard 1.2.3.0/24". Requires Linux 2.6.39 or later with CONFIG_IP_SET_HASH_IP enabled.
Posted Aug 31, 2013 23:41 UTC (Sat)
by compte (guest, #60316)
[Link] (1 responses)
Posted Sep 1, 2013 1:51 UTC (Sun)
by nybble41 (subscriber, #55106)
[Link]
Not quite. This doesn't depend on any code other than ipset and iptables. The list of IP addresses/ranges to block is in the file "ip-list". In the command
> xargs -i ipset add peerguard {} < ip-list
the "{}" is an argument to "xargs -i" which serves as a placeholder. The xargs tool (with the "-i" option) runs the given command once for each line in the standard input (here redirected from the file ip-list), replacing any occurrences of "{}" with the data from the input. This is equivalent to a series of commands like:
> ipset add peerguard 12.23.34.45
The ipset tool adds each IP address to the IP set named "peerguard", which was created in the previous "ipset create" command as a hash-based set of IP addresses, and referenced with the "-m ipset --match-set peerguard src" option to iptables to search the set for the source IP address of the packet.
You'll probably want to change the "-j DROP" in my example to "-j REJECT", to match your previous rules. I wasn't sure which approach Peerguardian took. Also, if you have a large number of address ranges, like the /24 in your example, you may want to use "hash:net" rather than "hash:ip" when creating the IP set so that the ranges are stored more efficiently in the kernel. You can pass ranges to "ipset add" either way, but in the "hash:ip" case they're expanded to individual addresses in the table, whereas "hash:net" keeps separate tables for each prefix length and stores only the network address.
See also:
The return of nftables
The return of nftables
The return of nftables
The return of nftables
The return of nftables
The return of nftables
The return of nftables
The return of nftables
Check this out: https://git.netfilter.org/iptables-nftables/
What about Xtables2 ?
The return of nftables
The return of nftables
another bytecode interpreter ?
Then there was (is?) a java bytecode module somewhere
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
another bytecode interpreter ?
I wonder how this compares to the demultiplexing packet filter.
The return of nftables
The return of nftables
The return of nftables
I tried to make rules with iptables but it could not read 150 000 (but managed 100 000), while Peerguardian read more than that.
The return of nftables
ipset create peerguard hash:ip family inet maxelem 262144
xargs -i ipset add peerguard {} < ip-list
iptables -A INPUT -m set --match-set peerguard src -j DROP
ipset create peerguard6 hash:ip family inet6 maxelem 262144
xargs -i ipset add peerguard6 {} < ip6-list
ip6tables -A INPUT -m set --match-set peerguard6 src -j DROP
The return of nftables
-A INPUT -s 1.2.3.4/24 -j REJECT
lines. So the trick is in "peerguard {} < ip-list"
Is peerguard{} an existing Peerguardian function pointing to a p2p file?
The return of nftables
> Is peerguard{} an existing Peerguardian function pointing to a p2p file?
> ipset add peerguard 21.32.43.54
* http://ipset.netfilter.org/
* http://ipset.netfilter.org/ipset.man.html
* http://ipset.netfilter.org/iptables-extensions.man.html
