The problem with the benchmark they did was that it's not a fair appraisal of interpretation versus compiling. The BPF switch interpreter isn't threaded. That is, at the end of every instruction it jumps back to the while loop, which does a conditional branch. Then there's the switch, which may or not may not do one or more conditional branches.
For fair comparison with a JIT compiler, the interpreter would instead jump directly from one instruction to the next using jump tables--indexing into a table of labels constructed using GCC's label address-of operator, &&.
On my own VM I can dramatically improve performance on many programs merely by threading the interpreter. If doing this gives the same performance, which it very well could given that BPF might be data bound and the ops are so simple, then it would be far preferable rather than adding hundreds of lines of new code for each architecture (or, conversely, having some architectures needlessly disadvantaged).
Posted Apr 18, 2011 18:33 UTC (Mon) by Nelson (subscriber, #21712)
[Link]
That's a fair criticism, you can make the BPF VM more efficient, it's still a comparison of whats there to a JIT though. Even with those improvements, you can get a fairly consistent boost with a JIT, just from turning the loads in to literals. It might not be worth the complexity but if there was a more generic JIT framework such that the platform support was there it is an interesting optimization if you rely upon BPF stuff a lot.