Posted Apr 14, 2011 6:57 UTC (Thu) by jengelh (subscriber, #33263)
Parent article: A JIT for packet filters
>An obvious question is: can this same approach be applied to iptables, which is more heavily used than BPF?
Since xtables modules are already handcrafted for a specific task, any interpreter module for arbitrary expressions (such as xt_u32 and nft) has a tendency to naturally run slower. But, if BPF can be JITed, it would seem it not being impossible to extend xt_u32.
Posted Apr 15, 2011 14:01 UTC (Fri) by Nelson (subscriber, #21712)
[Link]
Someone beat me to the punch.. I have been toying with code that does this for a few months. Congrats, I hope the community accepts the patch.
I did some research on this for work about 2 years ago, with something like BPF there is a dramatic gain due to the nature of the interpreter. You can cut a lot of cruft out with JIT. As for generic firewall rules? It's not as dramatic but you can get a pretty general across the board improvement and on some architectures it's definitely in the interesting range (maybe 30%, maybe more, I guess it depends how far you go) If you simply recode firewall rules as binary, think about it this way: the firewall has a linked list of instructions, all the linked list code goes away (it's not much, but some memory reads, a few instructions here and there) and then as you execute the instructions the JIT ones basically turn memory reads into literals so you've can dump the loads and some other machinery. It's not warp speed but a nice bump, maybe in the 30%ish generally, it depends on the architecture and there are a lot of variables. Like I said, it's interesting and noticeable improvement.
Now where it can be interesting is if you coded up a more advanced compiler to optimize the rules. (tcpdump does this, it's ugly but check it out, look at the optimized output some time) a typical stream of rules might have 10 rules that all apply to TCP packets and then check various IP ranges and ports. xtables currently would execute each "instruction" until it reached a result (is packet TCP? does packet src IP match range Y from rule.. okay go to the next one, is the packet TCP?... it would check TCP 10 times) A good compiler can invert that logic and figure out better ways to do it, (if packet TCP? yes then see if it's in the ranges of IPs from these 10 rules and maybe those rules can be compressed in to just checking a couple bits because all the IPs are similar... no it's not TCP, then skip all ten rules and look at the next batch.) I wouldn't hazard a guess as to how much faster this makes the firewall but the potential is HUGE. So we could maybe replace iptables with some sort of LLVM based compiler that generated a bytecode "program" that contained the whole firewall.
Whether it's worth the complexity, the difficulty in debugging and porting is a different question.
A JIT for packet filters
Posted Apr 16, 2011 2:55 UTC (Sat) by wahern (subscriber, #37304)
[Link]
The problem with the benchmark they did was that it's not a fair appraisal of interpretation versus compiling. The BPF switch interpreter isn't threaded. That is, at the end of every instruction it jumps back to the while loop, which does a conditional branch. Then there's the switch, which may or not may not do one or more conditional branches.
For fair comparison with a JIT compiler, the interpreter would instead jump directly from one instruction to the next using jump tables--indexing into a table of labels constructed using GCC's label address-of operator, &&.
On my own VM I can dramatically improve performance on many programs merely by threading the interpreter. If doing this gives the same performance, which it very well could given that BPF might be data bound and the ops are so simple, then it would be far preferable rather than adding hundreds of lines of new code for each architecture (or, conversely, having some architectures needlessly disadvantaged).
A JIT for packet filters
Posted Apr 18, 2011 18:33 UTC (Mon) by Nelson (subscriber, #21712)
[Link]
That's a fair criticism, you can make the BPF VM more efficient, it's still a comparison of whats there to a JIT though. Even with those improvements, you can get a fairly consistent boost with a JIT, just from turning the loads in to literals. It might not be worth the complexity but if there was a more generic JIT framework such that the platform support was there it is an interesting optimization if you rely upon BPF stuff a lot.