The presentation given by Fengguang Wu on day 1 of the 2012 Kernel
Summit was about testing for build and boot regressions in the Linux
kernel. In the presentation, Fengguang described the test framework that he
has established to detect and report these regressions in a more timely
fashion.
To summarize the problem that Fengguang is trying to resolve, it's
simplest to look at things from the perspective of a maintainer making
periodic kernel releases. The most obvious example is of course the
mainline tree maintained by Linus, which goes through a series of release
candidates on the way to the release of a stable kernel. The
linux-next tree maintained by Stephen Rothwell is another
example. Many other developers depend on these releases. If for some
reason, those kernel releases don't successfully build and boot, then the
daily work of other kernel developers is impaired while they resolve the
problem.
Of course, Linus and Stephen strive to ensure that these kinds of build
and boot errors don't occur: before making kernel releases, they do local
testing on their development systems, and ensure that the kernel builds,
boots, and runs for them. The problem comes in when one considers the
variety of hardware architectures and configuration options that Linux
provides. No single developer can test all combinations of architectures
and options, which means that, for some combinations, there are inevitably
build and boot errors in the mainline -rc and linux-next
releases. These sorts of regressions appear even in the final releases
performed by Linus; Fengguang noted the results found by Geert
Uytterhoeven, who reported that
(for example) in the Linux 3.4 release, his testing found around 100 build
error messages resulting from regressions. (Those figures are exaggerated
because some errors occur on obscure platforms that see less maintainer
attention. But they include a number of regressions on mainstream platforms
that have the potential to disrupt the work of many kernel developers.)
Furthermore, even when a build problem appears in a series of kernel
commits but is later fixed before a mainline -rc release, this
still creates a problem: developers performing bisects to discover the
causes of other kernel bugs will encounter the build failures during the
bisection process.
As Fengguang noted, the problem is that it takes some time for these
regressions to be detected. By that time, it may be difficult to determine
what kernel change caused the problem and who it should be reported
to. Many such reports on the kernel mailing list get no response, since it
can be hard to diagnose user-reported problems. Furthermore, the developer
responsible for the problem may have moved on to other activities and may
no longer be "hot" on the details of work that they did quite some time
ago. As a result, there is duplicated effort and lost time as the affected
developers resolve the problems themselves.
According to Fengguang, these sorts of regressions are an inevitable
part of the development process. Even the best of kernel developers may
sometimes fail to test for regressions. When such regressions occur,
the best way to ensure they are resolved is to quickly and accurately
determine the cause of the regression and promptly notify the developer who
caused the regression.
Fengguang's solution to this problem is to automate a solution that
detects these regressions and then informs kernel developers by email that
their commit X triggered bug Y. Crucially, the email reports are generated
nearly immediately (1-hour response time) after commits are merged into the
tested repositories. (For this reason, Fengguang calls his system a "0-day
kernel test" system.) Since the relevant developer is informed quickly,
it's more likely they'll be "hot" on the technical details, and able to fix
the problem quickly.
Fengguang's test framework at the Intel Open Source Technology Center
consists of a server farm that includes five build servers (three Sandy
Bridge and two Itanium systems). On these systems, kernels are built inside
chroot jails. The built kernel images are then boot tested inside over 100
KVM instances on another eight test boxes. The system builds and boots
each tested kernel configuration, on a commit-by-commit basis for a range
of kernel configurations. (The system reuses build outputs from previous
commits so as to expedite the build testing. Thus, the build time for the
first commit of an allmodconfig build is typically ten minutes,
but subsequent commits require two minutes to build on average.)
Tests are currently run against Linus's tree, linux-next, and
more than 180 trees owned by individual kernel maintainers and
developers. (Running tests against individual maintainers trees helps
ensure that problems are fixed before they taint Linus's tree and
linux-next.) Together, these trees produce 40 new branch heads and
400 new commits on an average working day. Each day, the system build
tests 200 of the new commits. (The system allows trees to be categorized as
"rebasable" or "non-rebasable". The latter are usually big subsystem trees
for which the maintainers take responsibility to do bisectability tests
before publishing commits. Rebaseable trees are tested on a
commit-by-commit basis. For non-rebaseable trees, only the branch head is
built; only if that fails does the system go though the intervening commits
to locate the source of the error. This is why not all 400 of the daily
commits are tested.)
The current machine power allows the build test system
to test 140 kernel configurations (as well as running sparse and coccinelle) for each commit. Around
half of these configurations are randconfig, which are regenerated
each day in order to increase test coverage over time.
(randconfig builds the kernel with randomized configuration
options, so as to find test unusual kernel configurations.) Most of the
built kernels are boot tested, including the randconfig ones.
Boot tests for the head commits are repeated multiple times to increase the
chance of catching less-reproducible regressions. In the end, 30,000
kernels are boot tested in each day. In the process, the system catches 4
new static errors or warnings per day, and 1 boot error every second day.
The responses from the kernel developers in the room were extremely
positive to this new system. Andrew Morton noted he'd received a number of
useful reports from the tool. "All contained good information, and
all corresponded to issues I felt should be fixed." Others echoed
Andrew's comments.
One developer in the room asked what he should do if he has a scratch
branch that is simply too broken to be tested. Fengguang replied that his
build system maintains a blacklist, and specific branches can be added to
that blacklist on request. In addition, a developer can include a line
containing the string Dont-Auto-Build in a commit message; this
causes the build system to skip testing of the whole branch.
Many problems in the system have already been fixed as a consequence of
developer feedback: the build test system is fairly mature; the boot test
system is already reasonably usable, but has room for further
improvement. Fengguang is seeking further input from kernel developers on
how his system could be improved. In particular, he is asking kernel
developers for runtime stress and functional test scripts for their
subsystems. (Currently the boot test system runs a limited set of
tools—trinity, xfstests,
and a handful of memory management tests—for catching runtime
regressions.)
Fengguang's system has already clearly had a strong positive impact on
the day-to-day life of kernel developers. With further feedback, the system
is likely to provide even more benefit.
(
Log in to post comments)