KS2010: Performance regressions

By Jonathan Corbet
November 2, 2010

2010 Kernel Summit

There have been many grumblings over the years that kernel releases are often slower than their predecessors. One often hears nostalgic longings for the good old days of 2.6.26, 2.6.18, or even older kernels. In a session with no formal leaders, the Kernel Summit attendees were asked to bring out their "war stories" for discussion.

Paul Turner, who works on Google's kernel team, started off. Over time, Google has had problems with a number of aspects of kernel performance. Out-of-memory handling is an ongoing source of pain. The CFS scheduler, he said, was "merged somewhat aggressively" and caused significant performance regressions for them. Google has a couple of difficulties when it comes to dealing with performance regressions, one of which is the company's habit of jumping about six kernel versions at a time. That makes it hard to catch problems in a timely way or to figure out how they were introduced. Google is trying harder to stay up with the mainline, though, so the latest jump was from 2.6.34 to 2.6.36. There are indeed some new performance regressions between the two. Google's other problem is that it has no way to demonstrate its workload to the kernel community. So kernel developers cannot see Google's performance regressions or know when they have been fixed.

Linus said that identifying that a problem was introduced between 2.6.34 and 2.6.36 is still too broad an interval. He requested that Google dedicate a couple of machines to running its workloads on daily snapshots. When a performance regression can be narrowed down to a single day's patches, it is a lot easier to find. Google's Mike Rubin agreed with all of this, saying that he would like to set up a group of machines running normal hardware (instead of Google's special systems) and well-known benchmarks, with the information publicly available. Arjan van de Ven noted that Intel is already doing that kind of work.

Mike also said that a lot of performance regressions tend to come in through the virtual filesystem layer. He also said that Google is seeing some serious scalability problems there; he would like to see Nick Piggin's VFS scalability work merged soon.

How should performance regressions be reported? The best thing, of course, is a bisection which fingers the guilty commit. Doing that requires highly repeatable tests, though; if a performance benchmark has a 5% variation between runs, it cannot be used for bisection. Paul said that Google has had good results using linsched for performance testing.

Mike wondered: what do maintainers use to spot performance regression? Linus responded: "users." Steve Rostedt chimed in with a plug for his recently-posted ktest script. The real answer, though, appears to be that a lot of the real performance testing and fixing is done by distributors when they are working on a new enterprise kernel release.

It was noted that tracking down performance regressions can be a problem. There is rarely a single bug which slows things by 5-15%; instead, there is a series of 0.5% regressions which all add up. They can be hard to find, especially given that little things like the size or layout of the kernel image can affect things on that scale. Paul noted that, in one case, adding three no-op instructions to the end of one function increased performance by 1.5%.

As a result, James Bottomley said, kernel developers tend to let a lot of minor regressions pile up over time. Then the distributors need to get an enterprise kernel out, so they put considerable resources into fixing these regressions, one at a time. There is no real pooling of information; each distributor works independently to make things faster. Ted Ts'o said that each distributor tends to have a collection of customer workloads which was obtained under non-disclosure agreements; these workloads are run late in the process, and any resulting regressions are fixed then. Those workloads - and information about them - cannot be shared.

Other kinds of testing include The Well Known Database Benchmark Which Cannot Be Named. It can yield useful results, but it can also take a week to run. That, it was dryly noted, can make bisection an even more painful process than usual.

James asked: should the kernel community care about small performance regressions? After all, there are people out there with big machines, the resources to run benchmarks on them, and the motivation to submit fixes. Mike Rubin said that, as long as there is no credible competitor to Linux, the kernel community maybe doesn't have to care. Ted said that, if the community did care more, it might help to get this big users to update their kernels more often.

Is there a need for a benchmark summit, a place where kernel maintainers can share performance data? Ted said a good start might be to just post results which can be shared. Such a summit might be scheduled; if so, it will probably be associated with the Linux Foundation's Collaboration Summit in April.

Next: Big out-of-tree projects.

Index entries for this article
Kernel	Performance regressions

KS2010: Performance regressions

Posted Nov 2, 2010 12:43 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

I hesitate to say this, but why not use Phoronix Test Suite?

KS2010: Performance regressions

Posted Nov 2, 2010 13:07 UTC (Tue) by jbh (guest, #494) [Link] (6 responses)

Since you hesitate, I assume you're familiar with the criticisms of this test suite - that it mainly measures compiler differences, etc. Has this improved lately?

KS2010: Performance regressions

Posted Nov 2, 2010 13:12 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (3 responses)

Well, compiler differences are also important :)

Phoromatic tracker allows you to track only one variable (kernel version), while leaving everything else frozen. They even have support for btrfs snapshots to quickly revert system to a known state.

But seriously, Phoronix tracker is quite useful now.

KS2010: Performance regressions

Posted Nov 2, 2010 13:54 UTC (Tue) by jbh (guest, #494) [Link] (2 responses)

I had a brief look at the kernel tracker, and it does look more useful than I remembered. The standard tests there (dbench and so on), and although it looks like a lot of the tests are a bit pointless (basic single-thread cpu-heavy workload), they can probably be disabled.

But the arbitrary mix of operations-per-seconds and seconds-to-complete is very annoying, it means I have to read the fine print on every graph to parse it. Gah!

KS2010: Performance regressions

Posted Nov 2, 2010 15:15 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (1 responses)

"I had a brief look at the kernel tracker, and it does look more useful than I remembered. The standard tests there (dbench and so on), and although it looks like a lot of the tests are a bit pointless (basic single-thread cpu-heavy workload), they can probably be disabled."

Single-threaded benchmarks are not pointless. I had regressions in single-thread workloads caused by 'too clever' locking which had higher overhead than good old lock_kernel.

Anyway, it's certainly possible to disable uninteresting benchmarks in Phoromatic.

KS2010: Performance regressions

Posted Nov 2, 2010 21:29 UTC (Tue) by jbh (guest, #494) [Link]

If you got burned by locking changes, I suspect it wasn't a cpu-bound workload to begin with. But if you find them useful, good for you!

KS2010: Performance regressions

Posted Nov 3, 2010 5:32 UTC (Wed) by mtippett (guest, #70976) [Link] (1 responses)

Don't confuse the use of the Phoronix Test Suite and the Phoronix Test Suite itself. I assume you are talking about the reporting on results from particular runs of the Phoronix Test Suite.

Phoronix Test Suite is just a system to run tests in a repeated manner. If you keep the compiler consistent between kernels you are only testing the kernel. People usually raise issues when there are multiple variables changing between systems under tests (some say the kernel, some say the compiler, some say the filesystem).

KS2010: Performance regressions

Posted Nov 3, 2010 8:01 UTC (Wed) by jbh (guest, #494) [Link]

I'm not sure I'm confusing anything. I wasn't thinking about multiple variables. A number of the tests in the test suite seem to test mainly the compiler optimisations, since they are cpu bound. Hence, pointless (not *wrong*, just not adding any information) for the job to which they are often put. Of course you can argue that this is the users' fault: Let's call it "user education", then, instead of "criticism".

KS2010: Performance regressions

Posted Nov 2, 2010 13:47 UTC (Tue) by ajb (subscriber, #9694) [Link]

"The best thing, of course, is a bisection which fingers the guilty commit. Doing that requires highly repeatable tests, though."

There is a bisection algorithm for intermittent bugs: http://github.com/ealdwulf/bbchop/. It probable wouldn't be hard to adapt it for perfornance regressions.

KS2010: Performance regressions

Posted Nov 3, 2010 11:38 UTC (Wed) by jengelh (guest, #33263) [Link] (1 responses)

>Google's Mike Rubin agreed with all of this, saying that he would like to set up a group of machines running normal hardware (instead of Google's special systems)

Given Google's tendency to use low-cost systems (but a huge array of them), the "special" systems don't seem all that special.

KS2010: Performance regressions

Posted Nov 9, 2010 16:26 UTC (Tue) by sethml (guest, #8471) [Link]

Google's systems are a lot more "special" than they used to be. I suspect they're not real keen on mailing out kernel logs which might reveal exactly how.