|
|
Log in / Subscribe / Register

Two sessions on kernel testing

By Jonathan Corbet
October 29, 2013

2013 Kernel Summit
Over the last couple of years, the amount of testing applied to pre-release kernels has quietly been increased in a big way; this work has had a significant impact on kernel release quality. Two of the developers behind that work — Dave Jones and Fengguang Wu — ran sessions to talk about what they are doing and their plans for the future.

Trinity

Dave's "Trinity" fuzz-testing tool has been around for some time, but the pace of development has increased in the last year or two. Dave introduced himself as the guy who "has broken lots of people's stuff" and who plans to continue doing so; Trinity, he said, is getting better and growing in [Dave Jones] scope. From the beginning, Trinity has tried to perform system call intelligent fuzz testing by avoiding calls that will obviously get an EINVAL error from the kernel. So, for example, system calls expecting a file descriptor will get a file descriptor rather than a random number. Work is continuing in that direction; the idea is to get Trinity to do things that real programs would do.

One of the targets for the future is to add more subsystem-specific testing. There will also be more use of features like control groups. Among other things, these additional tests will require that Trinity be run as root — something that has been discouraged until now. He wants the ability to fuzz things that only root can do, he said, expressing confidence that there will be "all kinds of horrors" waiting to be found.

Dave was asked about using the fault injection framework for testing; he responded that, every time he tries, he feels like he is the first to use it. "Things blow up everywhere." Dave Airlie asked about fuzz-testing in 32-bit mode on a 64-bit kernel; the answer was that this mode was broken for a while, but should work now. When asked about testing user namespaces, Dave noted that a lot of problems have been found in that area. Trinity does not run within them now; it would be nice if somebody would submit a wrapper to make that work.

Ted Ts'o remarked on the difficulty of finding the real cause of a lot of trinity-caused crashes. Quite a few of them, he suspects, are really the result of memory corruption left behind by a previous test; the place where the crash actually happens may have nothing to do with the real problem. Dave agreed that reproducibility is a problem. There is a lot that changes between runs, even after recent work that is careful to save random seeds so that the random number sequence used will be the same. It is, he said, "the number-one thing that sucks" about Trinity, but fixing it has proved to be far harder than he thought it would be.

The build-and-boot robot

Fengguang Wu has 63 Reported-by credits in the 3.12 kernel — over 12% of the total. These bug reports are the result of the extensive testing setup that he has been building; he ran a session at the Kernel Summit to describe his work.

Essentially, Fengguang's system works by pulling and merging a large number of git trees, building the resulting kernel, then booting it. There are a number of tests that are then run, looking for bugs and performance regressions. When a problem comes up, Fengguang's (large) systems can run up to 1000 KVM [Fengguang Wu] instances to quickly bisect the history and determine which patch caused the problem. The result is an automated email message, of which he sends about ten each day. Fengguang noted that a lot of developers send apologetic emails in response, but, he said, "it's a robot, you don't have to reply." Linus jibed that most of that mail was probably an automated "thank you" script run by Greg Kroah-Hartman.

Of the problems reported by Fengguang's system, about 10% are build errors, 20% are build warnings and documentation issues, 60% are generated by the sparse utility, and 10% come from static checkers like smatch and Coccinelle. The number of error reports going out has been dropping over time, he said; it seems that more developers are running their own tests before making their code public.

There were various questions, starting with: which compiler does he use? Fengguang said that it's gcc from the Debian "sid" distribution. Are any branches excluded from testing? Those which hold only ancient commits or which are based on old upstream releases are not tested; any branch that has "experimental" in its name will also not be tested. Otherwise, once Fengguang's system finds your repository, no branch will go untested. How does he find trees to test? Mostly from mailing lists and git logs; as Ted put it, "you can run, but you can't hide."

One of the more recent changes is the running of performance tests. These tests are time consuming, though; Fengguang would like more tests that can run quickly. The best performance tests, he said, have a --runtime flag to control how long they run; that leads to predictable behavior on both fast and slow systems. He also noted that both the size of the kernel and the time required to boot are increasing over time.

The session ended with general agreement in the room that this work is helpful and welcome.

[Next: Saying "no"].

Index entries for this article
KernelDevelopment tools/Trinity
ConferenceKernel Summit/2013


to post comments

Two sessions on kernel testing

Posted Oct 31, 2013 18:05 UTC (Thu) by gnacux (guest, #91402) [Link] (1 responses)

really glad to see Fengguang's system works so efficient and useful. This is an idea that lots people want to implement. but doing it for kernel involves so many details and requires strong understanding of the whole system.
good job, dude.

Two sessions on kernel testing

Posted Nov 2, 2013 20:56 UTC (Sat) by roblucid (guest, #48964) [Link]

Well said!

Two sessions on kernel testing

Posted Nov 7, 2013 23:15 UTC (Thu) by BenHutchings (subscriber, #37955) [Link]

Some more points I noted from Fengguang Wu's talk:

The robot runs checkpatch.pl but only enables a subset of its warnings/errors.

It is currently building a total of 200 configurations covering 30 different architectures (aside from randconfig testing).

Repository owners can opt-in to an email reporting that tests have completed after a git push.

There is a database of recently detected errors and this is used to de-dupe reports. He plans to provide a way to view all errors introduced on a particular branch.


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds