LWN.net Logo

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Phoronix has published the results of a long series of kernel benchmarks, generally concluding that 2.6.29 is faster than its predecessors. "When it came to the SQLite performance, a serious performance regression began with the Linux 2.6.26 kernel and ended with the Linux 2.6.29 release. Normally it required 27~28 seconds to perform 12,500 database insertions using SQLite, but with the Linux 2.6.26 through 2.6.28 kernel releases it took 109 seconds! Fortunately, this regression is now fixed." There's no sense for why things might have changed, though.
(Log in to post comments)

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 14:18 UTC (Wed) by axboe (subscriber, #904) [Link]

That particular regression was caused by commit 18ce3751ccd488c78d3827e9f6bf54e6322676fb and then later fixed by 78f707bfc723552e8309b7c38a8d0cc51012e813.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 15:15 UTC (Wed) by hmh (subscriber, #3838) [Link]

I assume the regression fix is going to be sent to -stable as well?

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 15:46 UTC (Wed) by bronson (subscriber, #4806) [Link]

Caused by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...

Fixed by: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-...

> I assume the regression fix is going to be sent to -stable as well?

Why? It doesn't destabilize anything (right?). Personally, if I were running a -stable system right now, I'd argue vehemently against including optimization-only patches.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 17:59 UTC (Wed) by iabervon (subscriber, #722) [Link]

The point of -stable is that there should be no reason to use 2.6.24.x instead of 2.6.29.y for sufficiently large y, so that it becomes unnecessary to maintain 2.6.24.x at some point, so that there's only a limited number of series that need to be maintained at any given time. So optimizations shouldn't be made in -stable, but significant performance regressions should be fixed.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 20:22 UTC (Wed) by proski (subscriber, #104) [Link]

The patch is a one-liner, it reverts an old patch that caused a regression, it's well documented and there is a test case. Considering all that, I think it's a good candidate for backporting.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 26, 2009 10:22 UTC (Thu) by axboe (subscriber, #904) [Link]

Yes, I have submitted it to -stable, now that 2.6.29 is released.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 14:59 UTC (Wed) by tialaramex (subscriber, #21167) [Link]

All the benchmarks on Phoronix suffer from not showing significance.

Firstly, statistical significance - we can't tell if there's a 0.01% chance of the apparent difference we're looking at being non-existent, or a 10% chance. It should be easy enough to change the plots so that we can see a 95% confidence interval or something.

Secondly, practical significance - what would be "clinical significance" in a medical setting. That is, even if the difference is real, should I care? A large change in performance of a little-used feature may be unimportant.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 15:23 UTC (Wed) by job (guest, #670) [Link]

I agree wholeheartedly, it's a common problem. Measurements should really be a subject in school from early on. Error bars should always be present. When all you have is a number generally you can't tell if it's random or meaningful.

It's like in the news here: "this party's ratings are up, that one's down -- all changes within the margins of error". That borders on lying.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 25, 2009 21:01 UTC (Wed) by daniel (subscriber, #3181) [Link]

It's like in the news here: "this party's ratings are up, that one's down -- all changes within the margins of error". That borders on lying.

Don't shoot the messenger.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Mar 26, 2009 11:48 UTC (Thu) by tbleher (guest, #48307) [Link]

> I agree wholeheartedly, it's a common problem. Measurements should
> really be a subject in school from early on. Error bars should always be
> present. When all you have is a number generally you can't tell if it's
> random or meaningful.

Anyone have any pointers to good material about how to correctly measure
such things? For all those that didn't learn about statistical
significance and all that stuff in school.

Benchmarking The Linux 2.6.24 Through 2.6.29 Kernels (Phoronix)

Posted Apr 5, 2009 22:21 UTC (Sun) by jengelh (subscriber, #33263) [Link]

Well it certainly can never be wrong to provide the raw result of every single run that was done ;-)

Useless benchmarks?

Posted Mar 25, 2009 15:47 UTC (Wed) by niner (subscriber, #26151) [Link]

Why would one want to benchmark completely CPU-bound computation programs
compared by kernel version? Most of the programs used don't even use any
kernel functions, so why would the results vary? And indeed, they don't.

SQLite is an exception, since it's using the storage subsystem and so can
show a bug in the latter.

Why GraphicsMagick shows such a difference is interesting. I'd guess, that
it does use multithreading on the 2.6.29 but not on the others.

Why they would do the work to do such useless benchmarks but not things,
where the kernel is actually used like multitasking or networking is just
beyond me.

Useless benchmarks?

Posted Mar 25, 2009 16:34 UTC (Wed) by dany (guest, #18902) [Link]

I found these tests quite interesting, there could be even more of them. For example ones which include network subsystem as well.

Useless benchmarks?

Posted Mar 25, 2009 20:25 UTC (Wed) by lambda (subscriber, #40735) [Link]

Yeah, that's what I was thinking. Almost all of these are completely compute bound; I would expect
to see no difference between the kernels. There were no tests of networking, no tests of significant
I/O other than SQLite test, no tests of graphics or anything memory manager intensive, etc.

The 7-Zip case is interesting; I would expect it to be compute bound like the rest, so I would be
interested in knowing why it fluctuated so much. But Phoronix is simply presenting results with no
interesting commentary, which is really fairly non-helpful.

Useless benchmarks?

Posted Mar 25, 2009 23:23 UTC (Wed) by Thalience (subscriber, #4217) [Link]

Indeed, the openssl test shows an even more surprising variation on 2.6.29 (although at least one person on the phoronix forum tried and failed to duplicate that result on another machine).

Phoronix has always come at their reporting from a "typical desktop user" perspective. This is both good and bad.

Good, because "will it make my single-threaded mp3 encoding faster?" is a question many people may well have. The answer is obvious if you know anything about kernel internals, but most people don't. Also, asking "obvious" questions sometimes leads to surprises (like the GraphicMagick result).

Of course, it is also bad. The author himself doesn't know that cpu-bound programs "shouldn't" speed up with just a change in kernels, so he doesn't stop and ask "why?" when they do. He also doesn't appear to know what mix of programs might provide more insight into the changes between kernel versions.

In many ways, it makes me think of the sorry state of mainstream science journalism. The facts may be reported accurately, but without context or analysis the reader is left to guess about the significance of those facts.

Useless benchmarks?

Posted Mar 26, 2009 6:41 UTC (Thu) by jospoortvliet (subscriber, #33164) [Link]

I'm sure they would be incredibly happy with someone sending in an article
with good quality tests and decent coments on it... ;-)

Useless benchmarks?

Posted Mar 26, 2009 6:41 UTC (Thu) by lkundrak (subscriber, #43452) [Link]

Well, during CPU-bound tasks interrupts occur, rescheduling may happen, etc. It might be useful to measure overhead of asynchronous events that just happen to take place.

Useless benchmarks?

Posted Mar 26, 2009 10:12 UTC (Thu) by bangert (subscriber, #28342) [Link]

my guess is that a change in the scheduler could have significant impact
on 'completely CPU-bound computation programs', no?

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds