KS2012: The future of kernel regression tracking

Posted Aug 29, 2012 13:50 UTC (Wed) by sdalley (subscriber, #18550)
Parent article: KS2012: The future of kernel regression tracking

So, there is now a vacancy for an unpaid kernel regression tracker, or maybe two or three unpaid kernel regression trackers. The kernel community doesn't know what to tell people who would like to report bugs. It might be just a personal impression, but this is all beginning to sound rather amateurish.

The problem of regression tracking is inseperable from the problem of regression testing. Testing requires (a) that someone is able, diligent, and motivated enough to test; (b) that they can easily re-run the tests with the latest kernel; (c) that the results can be compared easily between the two; (d) that regressions are routed to the current maintainer; (e) that the maintainer can fix the problem; (f) that the original user can get a timely update to their (probably distro-based stable) kernel release.

Fancy graphs can be plotted showing average lifetime of regressions, how they are changing with time, etc. But the results are of questionable value unless you know there is reasonably comprehensive test coverage. Unless someone falls over them, many regressions will otherwise pass unnoticed for long periods.

If you can't measure the quality of something, at least approximately, you can't control it either. For an active software project, this implies insidious and increasing rust.

Successful projects require (i) good people; (ii) good technology; (iii) good process. The Linux kernel has the first two in spades, but has gaps in the third. Maybe the time is coming for a few-month digression onto testing and tracking infrastructure, like there was in the past with version control, that ended in giving us all the superb git tool.

Wouldn't it be amazing if one fine morning Linus said: "OK, you guys. Great work you're doing. But we do have a problem in actually seeing how great your work is, because it's so hard to test. Well, <joke>Lennart and I have been talking, and</joke> I've come up with this first version of a kernel unit-test plugin interface. All the unit-tests you write for your subsystem will be run at startup or module load, if you specify "test" on the kernel command line. My new test framework will log the results to syslog in this standard self-documenting text-processor-friendly format. If you care enough to add that cute and marvellous feature of yours, you care enough to write some unit tests for it too. It's great, they'll make your job easier. From now on, no changes without tests. Chop-chop."

Or something like that.

A standard opt-in userspace tool could then munge the test results, collect the system configuration, and submit them anonymised to a new automated kernel-bugzilla gateway which would do all the tedious correlation and tracking.

This approach would scale well, as it delegates the test writing to those who know most about the relevant code, and the test running to the users in the field who have all the weird hardware and workloads.

KS2012: The future of kernel regression tracking

Posted Aug 29, 2012 19:27 UTC (Wed) by dlang (guest, #313) [Link] (8 responses)

The problem is that the vast majority of the regressions are not ones that will be found by tests like you describe.

They are regressions under some specific workload that a user has, or with specific hardware (or combinations of hardware) that a user has.

As a result, it's impossible for any single test project to have comprehensive coverage.

The kernel regression tracker (and assistants) are not people running the tests, they are people working with the users who run into problems, helping those users identify the relevant factors of their environment/workload, identifying where the problem started, helping get the report in front of the appropriate maintainer, and then tracking it to keep it from getting lost (and hopefully the user doesn't disappear in the middle of all this), with an additional task of trying to combine duplicate reports (which gains reliability in terms of the user reporting)

If it's a workload related problem, once a problem workload can be simulated, the maintainer/developer may be able to go off and work on it without needing the user to test it all the time, but the user is still needed to test the resulting fix because the simulated workload may not match the real workload as closely as everyone thinks.

If it's a hardware related problem, it requires someone with the appropriate hardware to test the result (and if it's a combination of hardware it's even worse), the maintainers and developers cannot have every variant of hardware, so it's impossible for them to test, and so when they think they have the fix, they will again need to go back to the user to validate the fix.

KS2012: The future of kernel regression tracking

Posted Aug 29, 2012 23:22 UTC (Wed) by raven667 (subscriber, #5198) [Link] (1 responses)

Maybe this is just a different usage of the term "regression". In kernel-land a regression as you describe is where a change negatively affects performance in some unique scenario, Linux kernel developers take this seriously. For obvious functionality type regressions it would seem a test suite could, over time, test every code path.

KS2012: The future of kernel regression tracking

Posted Aug 29, 2012 23:27 UTC (Wed) by dlang (guest, #313) [Link]

The problem is that many code paths will only be used with specific hardware, or in other specific conditions.

Plus, you need to remember that the kernel is multi-threaded, so timing of different things happening matters as well.

There are very few "obvious functionality" type regressions.

More testing is better than less testing

Posted Aug 30, 2012 0:17 UTC (Thu) by sdalley (subscriber, #18550) [Link] (5 responses)

More testing is better than less testing.

Of course, 99999 out of 100000 tests are going to pass every time. But processing power is so cheap, why not let it work for you? You never know what might have broken if there's no test.

And, once you have found an unexpected regression, you can then write a test for it if there wasn't one before. Having that test available, as part of a loadable "test" module, say, in a generic kernel release then means it can be triggered in the field on request by anyone at all with the latest update and whatever oddball hardware they have. This would greatly increase the result data and illuminate the circumstances and manner in which the test passes or fails.

And there's nothing like having to write unit tests to thrash out the idiocies and dark corner cases of a new interface. You don't even have to run them initially, the mere mental questioning debugs the design before its stupidities get coagulated into something you're going to have to maintain for years afterward.

I have myself grumbled about having to write tests. But I have never regretted the payoff in quality. And it's very satisfying to see the new release of one's library run its test suite in the blink of an eye and know that you didn't break anything important with your last changes. Or maybe you did, and you get to save yourself a wasted release and maybe a brown paper bag too.

And of course, the test results are gold dust to anyone who wants to document the interfaces. I hadn't heard that the Linux kernel's documentation has been a howling success story. A more formal requirement to write unit tests as part of the kernel development process would go far to improving things. And, dammit, it's just satisfying to know that what you wrote definitely works.

I fully accept that there is no getting around the need for skill and interaction in tracking down the more devious regressions. It's just that we should work towards an environment where that is made as easy as possible.

I hope that new kernel regression trackers are soon appointed and get the support and remuneration their important job deserves. If not, one is saying in the loudest possible language that, words aside, quality is actually only for wimps and doesn't really matter.

More testing is better than less testing

Posted Aug 30, 2012 0:49 UTC (Thu) by dlang (guest, #313) [Link]

the choice is not "do test driven development" vs "you obviously don't care about quality in the slightest". Taking this attitude is just insulting the people you are asking to do more work.

maintaining the tests has overhead as well, it's not free. If they fail is it because the system is broken? or because the test didn't get changed to match the new way the kernel works? (and is the new way the kernel works actually going to work in the real world)

More testing is better than less testing

Posted Oct 1, 2012 18:20 UTC (Mon) by oak (guest, #2786) [Link] (3 responses)

> More testing is better than less testing.

No, whether that's true depends a lot on the tests and also what you're testing.

I have been in a situation where analyzing results from tests took more time than actually manually finding the bugs *and* fixing them. Eventually we got rid of them. They were quality tests and at wrong level in the stack.

Tests are mostly useful only if:
* they're (mostly) automated
* they produce statistically reliable and non-ambivalent results
* writing, maintaining and analyzing their results save time in the long run

Preferably they should also be mostly auto-generated so that they get automatically updated with the code, there's less code to maintain and issues with it are more apparent.

Does test code need tests?

More testing is better than less testing

Posted Oct 1, 2012 22:49 UTC (Mon) by sdalley (subscriber, #18550) [Link]

I agree with what you say.

Under the impression there is a relative dearth of formal testing, more testing will obviously be better. I was assuming testing at the appropriate levels in the stack.

It'd already be great to pick lower-hanging fruit like automatic tests for library/programmer/kernel/userspace/filesystem/device interfaces, which must never change without good reason and when they do break, you jolly-well want to know as soon as possible. Regressions Are Bad. If these tests were put into an installable package, then anyone who wanted to help in the testing effort could run them in their own peculiar environment and have the failures forwarded optionally to a central clearinghouse, like Microsoft does with their system crashes.

Harder tests like response latency under varying loads and configurations, memory management fragmentation problems, etc, necessarily have a symbiotic relationship to the code they are testing, and have to be maintained together with them.

More testing is better than less testing

Posted Oct 4, 2012 0:02 UTC (Thu) by nix (subscriber, #2304) [Link]

Does test code need tests?

That depends on its complexity. Testsuite engines are often complex enough to merit it, but it's hard to figure out a way to test most tests except to test the same thing again in a different way and make sure the results of both tests agree. If you know of a less tiresome way that doesn't require doing the same work more than twice (because thinking of a second way to test something is often harder than thinking of the first), I'm all ears.

More testing is better than less testing

Posted Oct 4, 2012 0:10 UTC (Thu) by Cyberax (✭ supporter ✭, #52523) [Link]

Yes, test code needs tests too. There are projects to do things like finding if tests are efficient. For example, if you comment out (or somehow disable) a part of unit test setup - then the test should fail.