Still no error bars on their charts. Yes, the error bars would probably mean most results found nothing. That's a good thing!
I'd like to see more investigation. I think that would follow from narrowing results down to only those that were significant. If you find 500 tiny differences between two things, most of which are just measurement noise, you have no reason to investigate further. But if you make one big significant finding you can do a whole article about what it means - why is the Frooqux significantly faster ? Is it the same on an AMD machine ? In OpenSolaris ? With a different network card ?