Posted Jul 23, 2009 14:51 UTC (Thu) by nix (subscriber, #2304)
[Link]
Phoronix's benchmarks appear to consist of 'do random stuff, some of which is real-world, some of which isn't, and take an average to see which is best', apparently in the hope that if they pick enough benchmarks the good ones will exceed the crap ones in number.
i.e. they have iozone benchmarks in there... but then they have compression, and I've seen them look at things like 'how long it takes to boot' and even game frame rates (!?) in filesystem benchmarks before.
So I treat Phoronix benchmarks largely as a source of amusement these days. Sometimes (rarely) they might tell us things we don't already know...
A short history of btrfs
Posted Jul 23, 2009 15:26 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
they just announced that they are about to start using a new version of their benchmarks, so this is the perfect time to jump in and try to improve things.
Posted Jul 23, 2009 22:11 UTC (Thu) by tialaramex (subscriber, #21167)
[Link]
Still no error bars on their charts. Yes, the error bars would probably mean most results found nothing. That's a good thing!
I'd like to see more investigation. I think that would follow from narrowing results down to only those that were significant. If you find 500 tiny differences between two things, most of which are just measurement noise, you have no reason to investigate further. But if you make one big significant finding you can do a whole article about what it means - why is the Frooqux significantly faster ? Is it the same on an AMD machine ? In OpenSolaris ? With a different network card ?
error bars
Posted Jul 23, 2009 22:31 UTC (Thu) by dlang (✭ supporter ✭, #313)
[Link]
you need to tell them, not us ;-)
A short history of btrfs
Posted Jul 24, 2009 7:34 UTC (Fri) by nix (subscriber, #2304)
[Link]
Hard: I still don't have email thanks to the implosion of Zetnet, sorry,
Breathe, sorry, they went bust and cut all their *other* customers off
from their IMAP mailservers, the new company is called Breathe now.
Moving to a decent ISP as soon as
BT get around to it... but that's a week away, plus another week for the
new MX record to propagate around. Three weeks without properly-working
email, sigh.
A short history of btrfs
Posted Jul 24, 2009 9:44 UTC (Fri) by dlang (✭ supporter ✭, #313)
[Link]
they do have a comment section on their website
A short history of btrfs
Posted Jul 23, 2009 16:59 UTC (Thu) by kjp (subscriber, #39639)
[Link]
Uh, having the cpu loaded will find problems if the FS code itself is too cpu hoggy....
A short history of btrfs
Posted Jul 23, 2009 17:04 UTC (Thu) by jengelh (subscriber, #33263)
[Link]
Uh, extracting a tarball with lots of files and watching the %sy time is likely to do the same.
A short history of btrfs
Posted Jul 25, 2009 21:20 UTC (Sat) by bronson (subscriber, #4806)
[Link]
So, you're going to arrange some way of recording average %sy time as a single number (there are lots of different ways of doing this), then somehow graph it that makes sense to regular people?
Just timing a file decompression is a lot easier for all involved, no? It's not a great benchmark, true, but it will quickly and reliably tell you if one FS requires more CPU than another. And that's the most important thing.
The problem is not with benchmark itself...
Posted Jul 23, 2009 17:50 UTC (Thu) by khim (subscriber, #9252)
[Link]
Uh, having the cpu loaded will find problems if the FS code
itself is too cpu hoggy....
The problem is not with low-level CPU bound benchmark but with average
taken from many different benchmarks without a case or thought. In the end
you are getting average temperature of hospice patients: some are having
high
fever, some are in morgue already, so in the end average temperature is
useless.
If you plan to mix a lot of different benchmarks you must be ready to
carefully study the results, separate expected results from unexpected
ones,
cluster them in groups (by relevance to this or that real-word task), etc.
Ortherwise it's just pointless race where winner is more-or-less
random.
And it's also pointless to try to fix the situation by adding more
benchmarks to the mix: when you mix a lot of differend kinds of food - you
are getting pile of garbage as a result and if you'll add some more dishes
- you'll just get a bigger pile of garbage.