KS2012: The future of kernel regression tracking
For several years, Rafael Wysocki tracked regressions in the kernel, producing lists and statistical analysis of regressions in each kernel release. This task, which provided an (imperfect) measure of the increase or decrease in the quality of successive kernel releases, was considered so valuable by other kernel developers that Rafael was regularly invited to present his observations at the Kernel Summit (2008, 2009, 2010, and 2011). However, his presentation on this topic on the first day of the 2012 Kernel Summit had a rather different flavor, asking his peers what might be the future of regression tracking.
Over time, Rafael has steadily moved to working on other tasks in the kernel, and has had less time for regression tracking. Fortunately, for a time, a couple of other people stepped in to assist with the task of creating and maintaining the kernel Bugzilla reports that were used to track regressions. However, this work did not run all that smoothly. Rafael had already noted on previous occasions that the kernel Bugzilla was not well suited to the task of generating lists of kernel regressions. In addition, as Rafael stepped still further back from regression tracking, there seemed to be some differences of understanding between his successors and various kernel developers about the how the Bugzilla should be used to track regressions. (Of note is the fact that Rafael was using Bugzilla bugs merely as a tool to track and measure regressions; whether kernel maintainers made use of those bugs as part of their work in fixing those regressions was a matter left to the maintainers.) These differences in understanding appear to be one of the reasons that Rafael's successors also stepped back from the task of regression tracking.
Which brings us to where we are today: for nearly half a year, there has been no tracking of kernel regressions. Furthermore, Rafael noted that his other commitments meant that he would not have time to return to this task in the future. This led him to ask a simple question: do we want track kernel regressions?
At this point, many kernel developers spoke up to emphasize how
valuable they had found Rafael's work. H. Peter Anvin, for example,
noted that he is not a fan of Bugzilla, "But, I found the lists of
regressions useful. It made me do things I didn't want to do.
" Linus
Torvalds also noted that he loved the kind of overview that Rafael's work
provided to him.
The session digressed into a variety of other topics. Rafael wondered whether the Bugzilla is even very useful as a tool for tracking and resolving kernel bugs. Responses from various developers showed that use of Bugzilla varies greatly across subsystems, with some subsystems relying on it heavily, while others avoid it in preference to mechanisms such as email. James Bottomley made the point that Bugzilla allows people unfamiliar with mailing lists to fire a bug onto Bugzilla, which then (automatically) appears on a mailing list. Bugzilla thus provides those users with a means to interact with the kernel developer workflow. Later in the session, the topic of Bugzilla versus mailing lists led Rafael to raise another concern: when some subsystems use Bugzilla while others use mailing lists or other mechanisms, what should the kernel developer community tell bug reporters about how to report bugs? That question often forms a difficult first hurdle for bug reporters. Unfortunately, there was little time to delve into that topic.
There was some general discussion about how Bugzilla should be used to track regressions, and whether there might be better tools than Bugzilla for this task, but no concrete alternatives were proposed. In the end, it was agreed that the question of tooling is secondary, and the tool choice might best be left to whomever takes on the task of regression tracking. The main point was that there was widespread consensus in the room that developers would like to see the regression tracking list return, and that the top priority was to find a person (or, possibly, several, so as to avoid overloading one individual and to ensure continuity when people are absent for vacations and so on) who would be willing to take on this task.
At this point then, there's a vacancy for one or more kernel
regression trackers. Although the work is unpaid, regression
tracking is clearly a task that is highly valued by many kernel developers,
and as Rafael's experience shows, when the work is done in a way that
matches with the development community's needs, the role has a high
profile. (Interested volunteers should contact Rafael.)
Posted Aug 29, 2012 13:50 UTC (Wed)
by sdalley (subscriber, #18550)
[Link] (9 responses)
The problem of regression tracking is inseperable from the problem of regression testing. Testing requires (a) that someone is able, diligent, and motivated enough to test; (b) that they can easily re-run the tests with the latest kernel; (c) that the results can be compared easily between the two; (d) that regressions are routed to the current maintainer; (e) that the maintainer can fix the problem; (f) that the original user can get a timely update to their (probably distro-based stable) kernel release.
Fancy graphs can be plotted showing average lifetime of regressions, how they are changing with time, etc. But the results are of questionable value unless you know there is reasonably comprehensive test coverage. Unless someone falls over them, many regressions will otherwise pass unnoticed for long periods.
If you can't measure the quality of something, at least approximately, you can't control it either. For an active software project, this implies insidious and increasing rust.
Successful projects require (i) good people; (ii) good technology; (iii) good process. The Linux kernel has the first two in spades, but has gaps in the third. Maybe the time is coming for a few-month digression onto testing and tracking infrastructure, like there was in the past with version control, that ended in giving us all the superb git tool.
Wouldn't it be amazing if one fine morning Linus said: "OK, you guys. Great work you're doing. But we do have a problem in actually seeing how great your work is, because it's so hard to test. Well, <joke>Lennart and I have been talking, and</joke> I've come up with this first version of a kernel unit-test plugin interface. All the unit-tests you write for your subsystem will be run at startup or module load, if you specify "test" on the kernel command line. My new test framework will log the results to syslog in this standard self-documenting text-processor-friendly format. If you care enough to add that cute and marvellous feature of yours, you care enough to write some unit tests for it too. It's great, they'll make your job easier. From now on, no changes without tests. Chop-chop."
Or something like that.
A standard opt-in userspace tool could then munge the test results, collect the system configuration, and submit them anonymised to a new automated kernel-bugzilla gateway which would do all the tedious correlation and tracking.
This approach would scale well, as it delegates the test writing to those who know most about the relevant code, and the test running to the users in the field who have all the weird hardware and workloads.
Posted Aug 29, 2012 19:27 UTC (Wed)
by dlang (guest, #313)
[Link] (8 responses)
They are regressions under some specific workload that a user has, or with specific hardware (or combinations of hardware) that a user has.
As a result, it's impossible for any single test project to have comprehensive coverage.
The kernel regression tracker (and assistants) are not people running the tests, they are people working with the users who run into problems, helping those users identify the relevant factors of their environment/workload, identifying where the problem started, helping get the report in front of the appropriate maintainer, and then tracking it to keep it from getting lost (and hopefully the user doesn't disappear in the middle of all this), with an additional task of trying to combine duplicate reports (which gains reliability in terms of the user reporting)
If it's a workload related problem, once a problem workload can be simulated, the maintainer/developer may be able to go off and work on it without needing the user to test it all the time, but the user is still needed to test the resulting fix because the simulated workload may not match the real workload as closely as everyone thinks.
If it's a hardware related problem, it requires someone with the appropriate hardware to test the result (and if it's a combination of hardware it's even worse), the maintainers and developers cannot have every variant of hardware, so it's impossible for them to test, and so when they think they have the fix, they will again need to go back to the user to validate the fix.
Posted Aug 29, 2012 23:22 UTC (Wed)
by raven667 (subscriber, #5198)
[Link] (1 responses)
Posted Aug 29, 2012 23:27 UTC (Wed)
by dlang (guest, #313)
[Link]
Plus, you need to remember that the kernel is multi-threaded, so timing of different things happening matters as well.
There are very few "obvious functionality" type regressions.
Posted Aug 30, 2012 0:17 UTC (Thu)
by sdalley (subscriber, #18550)
[Link] (5 responses)
Of course, 99999 out of 100000 tests are going to pass every time. But processing power is so cheap, why not let it work for you? You never know what might have broken if there's no test.
And, once you have found an unexpected regression, you can then write a test for it if there wasn't one before. Having that test available, as part of a loadable "test" module, say, in a generic kernel release then means it can be triggered in the field on request by anyone at all with the latest update and whatever oddball hardware they have. This would greatly increase the result data and illuminate the circumstances and manner in which the test passes or fails.
And there's nothing like having to write unit tests to thrash out the idiocies and dark corner cases of a new interface. You don't even have to run them initially, the mere mental questioning debugs the design before its stupidities get coagulated into something you're going to have to maintain for years afterward.
I have myself grumbled about having to write tests. But I have never regretted the payoff in quality. And it's very satisfying to see the new release of one's library run its test suite in the blink of an eye and know that you didn't break anything important with your last changes. Or maybe you did, and you get to save yourself a wasted release and maybe a brown paper bag too.
And of course, the test results are gold dust to anyone who wants to document the interfaces. I hadn't heard that the Linux kernel's documentation has been a howling success story. A more formal requirement to write unit tests as part of the kernel development process would go far to improving things. And, dammit, it's just satisfying to know that what you wrote definitely works.
I fully accept that there is no getting around the need for skill and interaction in tracking down the more devious regressions. It's just that we should work towards an environment where that is made as easy as possible.
I hope that new kernel regression trackers are soon appointed and get the support and remuneration their important job deserves. If not, one is saying in the loudest possible language that, words aside, quality is actually only for wimps and doesn't really matter.
Posted Aug 30, 2012 0:49 UTC (Thu)
by dlang (guest, #313)
[Link]
maintaining the tests has overhead as well, it's not free. If they fail is it because the system is broken? or because the test didn't get changed to match the new way the kernel works? (and is the new way the kernel works actually going to work in the real world)
Posted Oct 1, 2012 18:20 UTC (Mon)
by oak (guest, #2786)
[Link] (3 responses)
No, whether that's true depends a lot on the tests and also what you're testing.
I have been in a situation where analyzing results from tests took more time than actually manually finding the bugs *and* fixing them. Eventually we got rid of them. They were quality tests and at wrong level in the stack.
Tests are mostly useful only if:
Preferably they should also be mostly auto-generated so that they get automatically updated with the code, there's less code to maintain and issues with it are more apparent.
Does test code need tests?
Posted Oct 1, 2012 22:49 UTC (Mon)
by sdalley (subscriber, #18550)
[Link]
Under the impression there is a relative dearth of formal testing, more testing will obviously be better. I was assuming testing at the appropriate levels in the stack.
It'd already be great to pick lower-hanging fruit like automatic tests for library/programmer/kernel/userspace/filesystem/device interfaces, which must never change without good reason and when they do break, you jolly-well want to know as soon as possible. Regressions Are Bad. If these tests were put into an installable package, then anyone who wanted to help in the testing effort could run them in their own peculiar environment and have the failures forwarded optionally to a central clearinghouse, like Microsoft does with their system crashes.
Harder tests like response latency under varying loads and configurations, memory management fragmentation problems, etc, necessarily have a symbiotic relationship to the code they are testing, and have to be maintained together with them.
Posted Oct 4, 2012 0:02 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Oct 4, 2012 0:10 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 30, 2012 21:11 UTC (Thu)
by apollock (subscriber, #14629)
[Link]
Your comment got me thinking, what a virtuous loop we'd have if a pissing match broke out between subsystem maintainers about who had the highest test coverage, or the least number of regressions or something like that?
Of course in order to ascertain that, you'd need data, and that's sort of what this article was all about :-(
Posted Sep 1, 2012 10:35 UTC (Sat)
by mstefani (guest, #31644)
[Link]
KS2012: The future of kernel regression tracking
KS2012: The future of kernel regression tracking
KS2012: The future of kernel regression tracking
KS2012: The future of kernel regression tracking
More testing is better than less testing
More testing is better than less testing
More testing is better than less testing
* they're (mostly) automated
* they produce statistically reliable and non-ambivalent results
* writing, maintaining and analyzing their results save time in the long run
More testing is better than less testing
More testing is better than less testing
Does test code need tests?
That depends on its complexity. Testsuite engines are often complex enough to merit it, but it's hard to figure out a way to test most tests except to test the same thing again in a different way and make sure the results of both tests agree. If you know of a less tiresome way that doesn't require doing the same work more than twice (because thinking of a second way to test something is often harder than thinking of the first), I'm all ears.
More testing is better than less testing
KS2012: The future of kernel regression tracking
KS2012: The future of kernel regression tracking