Two sessions on kernel testing
Trinity
Dave's "Trinity" fuzz-testing tool has been around for some time, but the
pace of development has increased in the last year or two. Dave introduced
himself as the guy who "has broken lots of people's stuff" and who plans to
continue doing so; Trinity, he said, is getting better and growing in
scope. From the beginning, Trinity has tried to perform system call
intelligent fuzz testing by avoiding calls that will obviously get an
EINVAL error from the kernel. So, for example, system calls expecting a
file descriptor will get a file descriptor rather than a random number.
Work is continuing in that direction; the idea is to get Trinity to do
things that real programs would do.
One of the targets for the future is to add more subsystem-specific testing. There will also be more use of features like control groups. Among other things, these additional tests will require that Trinity be run as root — something that has been discouraged until now. He wants the ability to fuzz things that only root can do, he said, expressing confidence that there will be "all kinds of horrors" waiting to be found.
Dave was asked about using the fault injection framework for testing; he responded that, every time he tries, he feels like he is the first to use it. "Things blow up everywhere." Dave Airlie asked about fuzz-testing in 32-bit mode on a 64-bit kernel; the answer was that this mode was broken for a while, but should work now. When asked about testing user namespaces, Dave noted that a lot of problems have been found in that area. Trinity does not run within them now; it would be nice if somebody would submit a wrapper to make that work.
Ted Ts'o remarked on the difficulty of finding the real cause of a lot of trinity-caused crashes. Quite a few of them, he suspects, are really the result of memory corruption left behind by a previous test; the place where the crash actually happens may have nothing to do with the real problem. Dave agreed that reproducibility is a problem. There is a lot that changes between runs, even after recent work that is careful to save random seeds so that the random number sequence used will be the same. It is, he said, "the number-one thing that sucks" about Trinity, but fixing it has proved to be far harder than he thought it would be.
The build-and-boot robot
Fengguang Wu has 63 Reported-by credits in the 3.12 kernel — over 12% of the total. These bug reports are the result of the extensive testing setup that he has been building; he ran a session at the Kernel Summit to describe his work.
Essentially, Fengguang's system works by pulling and merging a large number
of git trees, building the resulting kernel, then booting it. There are a
number of
tests that are then run, looking for bugs and performance regressions.
When a problem comes up, Fengguang's (large) systems can run up to 1000 KVM
instances to quickly bisect the history and determine which patch caused
the problem. The result is an automated email message, of which he sends
about ten each day. Fengguang noted that a lot of developers send
apologetic emails in response, but, he said, "it's a robot, you don't have
to reply." Linus jibed that most of that mail was probably an automated
"thank you" script run by Greg Kroah-Hartman.
Of the problems reported by Fengguang's system, about 10% are build errors, 20% are build warnings and documentation issues, 60% are generated by the sparse utility, and 10% come from static checkers like smatch and Coccinelle. The number of error reports going out has been dropping over time, he said; it seems that more developers are running their own tests before making their code public.
There were various questions, starting with: which compiler does he use? Fengguang said that it's gcc from the Debian "sid" distribution. Are any branches excluded from testing? Those which hold only ancient commits or which are based on old upstream releases are not tested; any branch that has "experimental" in its name will also not be tested. Otherwise, once Fengguang's system finds your repository, no branch will go untested. How does he find trees to test? Mostly from mailing lists and git logs; as Ted put it, "you can run, but you can't hide."
One of the more recent changes is the running of performance tests. These tests are time consuming, though; Fengguang would like more tests that can run quickly. The best performance tests, he said, have a --runtime flag to control how long they run; that leads to predictable behavior on both fast and slow systems. He also noted that both the size of the kernel and the time required to boot are increasing over time.
The session ended with general agreement in the room that this work is helpful and welcome.
[Next: Saying "no"].
| Index entries for this article | |
|---|---|
| Kernel | Development tools/Trinity |
| Conference | Kernel Summit/2013 |
