KS2010: Performance regressions
Paul Turner, who works on Google's kernel team, started off. Over time, Google has had problems with a number of aspects of kernel performance. Out-of-memory handling is an ongoing source of pain. The CFS scheduler, he said, was "merged somewhat aggressively" and caused significant performance regressions for them. Google has a couple of difficulties when it comes to dealing with performance regressions, one of which is the company's habit of jumping about six kernel versions at a time. That makes it hard to catch problems in a timely way or to figure out how they were introduced. Google is trying harder to stay up with the mainline, though, so the latest jump was from 2.6.34 to 2.6.36. There are indeed some new performance regressions between the two. Google's other problem is that it has no way to demonstrate its workload to the kernel community. So kernel developers cannot see Google's performance regressions or know when they have been fixed.
Linus said that identifying that a problem was introduced between 2.6.34 and 2.6.36 is still too broad an interval. He requested that Google dedicate a couple of machines to running its workloads on daily snapshots. When a performance regression can be narrowed down to a single day's patches, it is a lot easier to find. Google's Mike Rubin agreed with all of this, saying that he would like to set up a group of machines running normal hardware (instead of Google's special systems) and well-known benchmarks, with the information publicly available. Arjan van de Ven noted that Intel is already doing that kind of work.
Mike also said that a lot of performance regressions tend to come in through the virtual filesystem layer. He also said that Google is seeing some serious scalability problems there; he would like to see Nick Piggin's VFS scalability work merged soon.
How should performance regressions be reported? The best thing, of course, is a bisection which fingers the guilty commit. Doing that requires highly repeatable tests, though; if a performance benchmark has a 5% variation between runs, it cannot be used for bisection. Paul said that Google has had good results using linsched for performance testing.
Mike wondered: what do maintainers use to spot performance regression? Linus responded: "users." Steve Rostedt chimed in with a plug for his recently-posted ktest script. The real answer, though, appears to be that a lot of the real performance testing and fixing is done by distributors when they are working on a new enterprise kernel release.
It was noted that tracking down performance regressions can be a problem. There is rarely a single bug which slows things by 5-15%; instead, there is a series of 0.5% regressions which all add up. They can be hard to find, especially given that little things like the size or layout of the kernel image can affect things on that scale. Paul noted that, in one case, adding three no-op instructions to the end of one function increased performance by 1.5%.
As a result, James Bottomley said, kernel developers tend to let a lot of minor regressions pile up over time. Then the distributors need to get an enterprise kernel out, so they put considerable resources into fixing these regressions, one at a time. There is no real pooling of information; each distributor works independently to make things faster. Ted Ts'o said that each distributor tends to have a collection of customer workloads which was obtained under non-disclosure agreements; these workloads are run late in the process, and any resulting regressions are fixed then. Those workloads - and information about them - cannot be shared.
Other kinds of testing include The Well Known Database Benchmark Which Cannot Be Named. It can yield useful results, but it can also take a week to run. That, it was dryly noted, can make bisection an even more painful process than usual.
James asked: should the kernel community care about small performance regressions? After all, there are people out there with big machines, the resources to run benchmarks on them, and the motivation to submit fixes. Mike Rubin said that, as long as there is no credible competitor to Linux, the kernel community maybe doesn't have to care. Ted said that, if the community did care more, it might help to get this big users to update their kernels more often.
Is there a need for a benchmark summit, a place where kernel maintainers can share performance data? Ted said a good start might be to just post results which can be shared. Such a summit might be scheduled; if so, it will probably be associated with the Linux Foundation's Collaboration Summit in April.
Next: Big out-of-tree projects.
Index entries for this article | |
---|---|
Kernel | Performance regressions |
Posted Nov 2, 2010 12:43 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (7 responses)
Posted Nov 2, 2010 13:07 UTC (Tue)
by jbh (guest, #494)
[Link] (6 responses)
Posted Nov 2, 2010 13:12 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (3 responses)
Phoromatic tracker allows you to track only one variable (kernel version), while leaving everything else frozen. They even have support for btrfs snapshots to quickly revert system to a known state.
But seriously, Phoronix tracker is quite useful now.
Posted Nov 2, 2010 13:54 UTC (Tue)
by jbh (guest, #494)
[Link] (2 responses)
But the arbitrary mix of operations-per-seconds and seconds-to-complete is very annoying, it means I have to read the fine print on every graph to parse it. Gah!
Posted Nov 2, 2010 15:15 UTC (Tue)
by Cyberax (✭ supporter ✭, #52523)
[Link] (1 responses)
Single-threaded benchmarks are not pointless. I had regressions in single-thread workloads caused by 'too clever' locking which had higher overhead than good old lock_kernel.
Anyway, it's certainly possible to disable uninteresting benchmarks in Phoromatic.
Posted Nov 2, 2010 21:29 UTC (Tue)
by jbh (guest, #494)
[Link]
Posted Nov 3, 2010 5:32 UTC (Wed)
by mtippett (guest, #70976)
[Link] (1 responses)
Phoronix Test Suite is just a system to run tests in a repeated manner. If you keep the compiler consistent between kernels you are only testing the kernel. People usually raise issues when there are multiple variables changing between systems under tests (some say the kernel, some say the compiler, some say the filesystem).
Posted Nov 3, 2010 8:01 UTC (Wed)
by jbh (guest, #494)
[Link]
Posted Nov 2, 2010 13:47 UTC (Tue)
by ajb (subscriber, #9694)
[Link]
There is a bisection algorithm for intermittent bugs: http://github.com/ealdwulf/bbchop/. It probable wouldn't be hard to adapt it for perfornance regressions.
Posted Nov 3, 2010 11:38 UTC (Wed)
by jengelh (guest, #33263)
[Link] (1 responses)
Given Google's tendency to use low-cost systems (but a huge array of them), the "special" systems don't seem all that special.
Posted Nov 9, 2010 16:26 UTC (Tue)
by sethml (guest, #8471)
[Link]
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions
KS2010: Performance regressions