Brief items
The current development kernel is 3.8-rc6,
released on February 1. "
I have a
CleverPlan(tm) to make *sure* that rc7 will be better and much
smaller. That plan largely depends on me being unreachable for the next
week due to the fact that there is no internet under water." Once he
returns from diving, Linus plans to be very aggressive about accepting only
patches that "
fix major security issues, big user-reported
regressions, or nasty oopses".
The code name for the release has
changed;
it is now "Unicycling Gorilla".
Stable updates:
3.0.62, 3.4.29, and 3.7.6 were released on February 3;
3.2.38 was released on February 6.
Comments (1 posted)
Paraphrasing the Alien films: "Under water, nobody can read your
email".
—
Linus Torvalds
Tonight’s mainline Linux kernel contains about 100,000 instances of
the keyword “goto”. The most deeply nested use of goto that I could
find is
here,
with a depth of 12. Unfortunately this function is kind of
hideous.
Here’s
a much cleaner example with depth 10.
Here are the goto targets that appear more than 200 times:
out (23228 times)
error (4240 times)
err (4184 times)
fail (3250 times)
done (3179 times)
exit (1825 times)
bail (1539 times)
out_unlock (1219 times)
err_out (1165 times)
out_free (1053 times)
[...]
—
John Regehr
diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -93,7 +93,9 @@ includes updates for subsystem X. Please apply."
The maintainer will thank you if you write your patch description in a
form which can be easily pulled into Linux's source code management
-system, git, as a "commit log". See #15, below.
+system, git, as a "commit log". See #15, below. If the maintainer has
+to hand-edit your patch, you owe them the beverage of their choice the
+next time you see them.
—
Greg Kroah-Hartman
"a beverage".
Pilsener, please.
—
Andrew Morton
Comments (26 posted)
At long last, the code implementing RAID 5 and 6 has been merged into an
experimental branch in the Btrfs repository; this is an important step
toward its eventual arrival in the mainline kernel. The initial benchmark
results look good, but there are a few issues yet to be ironed out before
this code can be considered stable. Click below for the announcement,
benchmark information, and some discussion of how higher-level RAID works
in Btrfs. "
This does sound quite a lot like MD raid, and that's because it is. By
doing the raid inside of Btrfs, we're able to use different raid levels
for metadata vs data, and we're able to force parity rebuilds when crcs
don't match. Also management operations such as restriping and
adding/removing drives are able to hook into the filesystem
transactions. Longer term we'll be able to skip reads on blocks that
aren't allocated and do other connections between raid56 and the FS
metadata."
Full Story (comments: 24)
Kernel development news
By Jonathan Corbet
February 6, 2013
The kernel's
locking validator (often known
as "lockdep") is one of the community's most useful pro-active debugging
tools. Since its introduction in 2006, it has eliminated most
deadlock-causing bugs
from the system. Given that deadlocks can be extremely difficult
to reproduce and diagnose, the result is a far more reliable kernel and
happier users. There
is a shortage of equivalent tools for user-space programming, despite the
fact that deadlock issues can happen there as well. As it happens, making
lockdep available in user space may be far easier than almost anybody might
have thought.
Lockdep works by adding wrappers around the locking calls in the kernel.
Every time a
specific type of lock is taken or released, that fact is noted, along with
ancillary details like whether the processor was servicing an interrupt at
the time. Lockdep also notes which other locks were already held when the
new lock is taken; that is the key to much of the checking that lockdep is
able to perform.
To illustrate this point, imagine that two threads each need to acquire two
locks, called A and B:
If one thread acquires A first
while the other grabs B first, the situation might look something
like this:
Now, when each
thread goes for the lock it lacks, the system is in trouble:
Each thread will now wait forever for the other to release the lock it
holds; the system is now deadlocked. Things may not come to this point
often at all; this deadlock requires each thread to acquire its lock at
exactly the wrong time. But, with computers, even highly unlikely events
will come to pass sooner or later, usually at a highly inopportune time.
This situation can be avoided: if both threads adhere to a rule
stating that A must always be acquired before B, this
particular deadlock (called an "AB-BA deadlock" for obvious reasons) cannot
happen. But, in a system with a large number of locks, it is not always
clear what the rules for locking are, much less that they are consistently
followed. Mistakes are easy to make. That is where lockdep comes in:
by tracking the order of lock acquisition, lockdep can raise the
alarm anytime it sees a thread acquire A while already holding
B. No actual deadlock is required to get a "splat" (a report of a
locking problem) out of lockdep,
meaning that even highly unlikely deadlock situations can be found before
they ruin somebody's day. There is no need to wait for that one time when
the timing is exactly wrong to see that there is a problem.
Lockdep is able to detect more complicated deadlock scenarios than the one
described above. It can also detect related problems, such as locks that
are not interrupt-safe being acquired in interrupt context. As one might
expect, running a kernel with lockdep enabled tends to slow things down
considerably; it is not an option that one would enable on a production
system. But enough developers test with lockdep enabled that most problems
are found before they make their way into a stable kernel release. As a
result, reports of deadlocks on deployed systems are now quite rare.
Kernel-based tools often do not move readily to user space; the kernel's
programming environment differs markedly from a normal C environment, so
kernel code can normally only be expected to run in the kernel itself. In
this case, though, Sasha Levin noticed that there is not much in the
lockdep subsystem that is truly kernel-specific. Lockdep collects data and
builds graphs describing observed lock acquisition patterns; it is code
that could be run in a non-kernel context relatively easily.
So Sasha proceeded to put
together a patch set creating a lockdep
library that is available to programs in user space.
Lockdep does, naturally, call a number of kernel functions, so a big part
of Sasha's patch set is a long list of stub implementations shorting out
calls to functions like local_irq_enable() that have no meaning in
user space. An abbreviated version of struct task_struct is
provided to track threads in user space, and functions like
print_stack_trace() are substituted with user-space equivalents
(backtrace_symbols_fd() in this case). The kernel's internal
(used by lockdep) locks are reimplemented using POSIX thread ("pthread")
mutexes. Stub versions of
the include files used by the lockdep code are provided in a special
directory. And so on. Once all that is
done, the lockdep code can be built directly out of the kernel tree and
turned into a library.
User-space code wanting to take advantage of the lockdep library needs to
start by including <liblockdep/mutex.h>, which, among other
things, adds a set of wrappers around the pthread_mutex_t and
pthread_rwlock_t types and
the functions that work with them. A call to liblockdep_init() is
required; each thread should also make a call to
liblockdep_set_thread() to set up information for any problem
reports. That is about all that is required; programs that are
instrumented in this way will have their pthreads mutex and
rwlock usage checked by lockdep.
As a proof of concept, the patch adds instrumentation to the (thread-based)
perf tool contained within the kernel source tree.
One of the key aspects of Sasha's patch is that it requires no changes to
the in-kernel lockdep code at all. The user-space lockdep library can be
built directly out of the kernel tree. Among other things, that means that
any future lockdep fixes and enhancements will automatically become
available to user space with no additional effort required on the
kernel developers' part.
In summary, this patch looks like a significant win for everybody involved;
it is thus not surprising that opposition to its inclusion has been hard to
find. There has been a call for some
better documentation, explicit mention that the resulting user-space
library is GPL-licensed, and a runtime toggle for lock validation (so that
the library could be built into applications but not actually track locking
unless requested). Such
details should not be hard to fill in, though. So, with luck, user space
should have access to lockdep in the near future, resulting in more
reliable lock usage.
Comments (5 posted)
By Jonathan Corbet
February 6, 2013
The kernel's "IDR" layer is a curious beast. Its job is conceptually
simple: it is charged with the allocation of integer ID numbers used with
device names, POSIX timers, and more. The implementation is somewhat less
than simple, though, for a straightforward reason: IDR functions are often
called from performance-critical code paths and must be able to work in
atomic context. These constraints, plus some creative programming, have
led to one of the stranger subsystem APIs in the kernel. If Tejun Heo has
his way, though, things will become rather less strange in the future —
though at least one reviewer disagrees with that conclusion.
Strangeness notwithstanding, the IDR API has changed little since it was documented here in 2004. One includes
<linux/idr.h>, allocates an idr structure, and
initializes it with idr_init(). Thereafter, allocating a new
integer ID and binding it to an internal structure is a matter of calling
these two functions:
int idr_pre_get(struct idr *idp, gfp_t gfp_mask);
int idr_get_new(struct idr *idp, void *ptr, int *id);
The call to idr_pre_get() should happen outside of atomic context;
its purpose is to perform all the memory allocations necessary to ensure
that the following call to idr_get_new() (which returns the newly
allocated ID number and associates it with the given ptr) is able
to succeed. The
latter call can then happen in atomic context, a feature needed by many IDR
users.
There is just one little problem with this interface, as Tejun points out
in the introduction to his patch set: the
call to idr_get_new() can still fail. So code using the IDR layer
cannot just ask for a new ID; it must, instead, execute a loop that retries
the allocation until it either succeeds or returns a failure code other than
-EAGAIN. That leads to the inclusion of a lot of
error-prone boilerplate code in well over 100 call sites in the kernel; the
2004 article and Tejun's patch both contain
examples of what this code looks like.
Failure can happen for a number of reasons, but the mostly likely cause is
tied to the fact that the memory preallocated by idr_pre_get() is
a global resource. A call to idr_pre_get() simply ensures that a
minimal amount of memory is available; calling it twice will not increase
the amount of preallocated memory. So, if two processors simultaneously call
idr_pre_get(), the amount of memory allocated will be the same as
if only one processor had made that call. The first processor to call
idr_get_new() may then consume all of that memory, leaving nothing
for the second caller. That second caller will then be forced to drop out
of atomic context and execute
the retry loop — a code path that is unlikely to have been well tested by
the original developer.
Tejun's response is to change the API, basing it on three new functions:
void idr_preload(gfp_t gfp_mask);
int idr_alloc(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
void idr_preload_end(void);
As with idr_pre_get(), the new idr_preload() function is
charged with allocating the memory necessary to satisfy the next allocation
request. There are some interesting differences, though. The attentive
reader will note that there is no struct idr argument to
idr_preload(),
suggesting that the preallocated memory is not associated with any
particular ID number space. It is, instead, stored in a single per-CPU
array. Since this memory is allocated for the current CPU, it is not
possible for any other processor to slip in and steal it — at least, not if
the current thread is not preempted. For that reason,
idr_preload() also disables preemption. Given that, the existence
of the new idr_preload_end() function is easy to explain: it is
there to re-enable preemption once the allocation has been performed.
A call to idr_alloc() will actually allocate an integer ID. It
accepts upper and lower bounds for that ID to accommodate code that can
only cope with
a given range of numbers — code that uses the ID as an array index, for
example. If need be, it will attempt to allocate memory using the given
gfp_mask. Allocations will be unnecessary if
idr_preload() has been called, but, with the new interface,
preallocation is no longer necessary. So code that can call
idr_alloc() from process context can dispense with the
idr_preload() and idr_preload_end() calls altogether.
Either way, the only way
idr_alloc() will fail is with a hard memory allocation failure;
there is no longer any need to put a loop around allocation attempts. As a
result, Tejun's 62-part patch set, touching 78 files, results in the net
deletion of a few hundred lines of code.
Most of the developers whose code was changed by Tejun's patch set
responded with simple Acked-by lines. Eric Biederman, though, didn't like the API; he said "When
reading code with idr_preload I get this deep down creepy feeling. What is
this magic that is going on?" As can be seen in Tejun's response, one developer's magic is
another's straightforward per-CPU technique. As of this writing, that
particular discussion has not reached any sort of public resolution. Your
editor would predict, though, that the simplification of this heavily-used
API will be sufficiently compelling that most developers will be able to
get past any resulting creepy feelings. So the IDR API may be changing in
a mainline kernel in the not-too-distant future.
Comments (5 posted)
By Michael Kerrisk
February 6, 2013
The Linux kernel developers have long been aware of the need for better
testing of the kernel. That testing can take many forms, including testing for performance regressions and testing
for build and boot regressions.
As the term suggests, regression testing is concerned with detecting cases
where a new kernel version causes problems in code or
features that already existed in previous versions of the kernel.
Of course, each new kernel release also adds new features. The Trinity fuzz tester
is a tool that aims to improve testing of one class of new (and existing)
features: the system call interfaces that the kernel presents to user
space.
Insufficient testing of new user-space interfaces is a long-standing issue in kernel
development. Historically, it has been quite common that significant bugs
are found in new interfaces only a considerable time after those interfaces
appear in a stable kernel—examples include epoll_ctl(),
kill(),
signalfd(),
and utimensat().
The problem is that, typically, a new interface is tested
by only one person (the developer of the feature) or at most a handful
of people who have a close interest in the interface. A common problem that
occurs when developers write their own tests is a bias toward tests which
confirm that expected inputs produce expected results. Often, of
course, bugs are found when software is used in unexpected ways that test
little-used code paths.
Fuzz testing is
a technique that aims to reverse this testing bias. The general idea is to
provide unexpected inputs to the software being tested, in the form of
random (or semi-random) values. Fuzz testing has two obvious
benefits. First, employing unexpected inputs mean that rarely used code
paths are tested. Second, the generation of random inputs and the tests
themselves can be fully automated, so that a large number of tests can be
quickly performed.
History
Fuzz testing has a
history that stretches back to at least the 1980s, when fuzz testers
were used to test command-line utilities. The history of system call fuzz
testing is nearly as long.
During his talk at linux.conf.au 2013 [ogv video, mp4 video], Dave Jones, the developer of
Trinity, noted that the earliest
system call fuzz tester that he had heard of was Tsys, which was
created around 1991 for System V Release 4. Another early example was a fuzz
tester [postscript] developed at the University of Wisconsin in the
mid-1990s that was run against a variety of kernels, including Linux.
Tsys was an example of a "naïve" fuzz tester: it simply generated random
bit patterns, placed them in appropriate registers, and then executed a
system call. About a decade later, the kg_crashme tool was developed to
perform fuzz testing on Linux. Like Tsys, kg_crashme was a naïve fuzz
tester.
Naïve fuzz testers are capable of finding some kernel bugs, but the use of purely
random inputs greatly limits their efficacy. To see why this is, we can
take the example of the madvise() system call, which allows a
process to advise the kernel about how it expects to use a region of
memory. This system call has the following prototype:
int madvise(void *addr, size_t length, int advice);
madvise() places certain constraints on its arguments:
addr must be a page-aligned memory address, length must
be non-negative, and advice
must be one of a limited set of small integer values. When any of these
constraints is violated, madvise() fails with the error
EINVAL. Many other system calls impose analogous checks on their
arguments.
A naïve fuzz tester that simply passes random bit patterns to
the arguments of madvise() will,
almost always, perform uninteresting tests that fail with the (expected)
error EINVAL. As well as wasting time, such naïve testing reduces
the chances of generating a more interesting test input that reveals an
unexpected error.
Thus, a few projects started in the mid-2000s with the aim of bringing
more sophistication to the fuzz-testing process. One of these projects,
Dave's scrashme, was started in 2006. Work on that project languished for a
few years, and only picked up momentum starting in late 2010, when Dave
began to devote significantly more time to its development. In December
2010, scrashme was renamed Trinity. At around the same time, another quite
similar tool, iknowthis,
was also developed at Google.
Intelligent fuzz testing
Trinity performs intelligent fuzz testing by incorporating specific
knowledge about each system call that is tested. The idea is to reduce the
time spent running "useless" tests, thereby reaching deeper into the tested
code and increasing the chances of testing a more interesting case that may
result in an unexpected error. Thus, for example, rather than passing
random values to the advice argument of madvise(),
Trinity will pass one of the values expected for that argument.
Likewise, rather than passing random bit patterns to address arguments,
Trinity will restrict the bit pattern so that, much of the
time, the supplied address is page aligned. However, some system
calls that accept address arguments don't require memory aligned
addresses. Thus, when generating a random address for testing, Trinity will
also favor the creation of "interesting" addresses, for example, an address
that is off a page boundary by the value of sizeof(char) or
sizeof(long). Addresses such as these are likely candidates for
"off by one" errors in the kernel code.
In addition, many system calls that expect a
memory address require that address to point to memory that is actually
mapped. If there is no mapping at the given address, then these system
calls fail (the typical error is ENOMEM or EFAULT). Of
course, in the large address space available on modern 64-bit
architectures, most of the address space is unmapped, so that even if a
fuzz tester always generated page-aligned addresses, most of the resulting
tests would be wasted on producing the same uninteresting error. Thus,
when supplying a memory address to a system call, Trinity will favor
addresses for existing mappings. Again, in the interests of triggering
unexpected errors, Trinity will pass the addresses of "interesting"
mappings, for example, the address of a page containing all zeros or all
ones, or the starting address at which the kernel is mapped.
In order to bring intelligence to its tests, Trinity must have some
understanding of the arguments for each system call. This is accomplished
by defining structures that annotate each system call. For example, the
annotation file for madvise() includes the following lines:
struct syscall syscall_madvise = {
.name = "madvise",
.num_args = 3,
.arg1name = "start",
.arg1type = ARG_NON_NULL_ADDRESS,
.arg2name = "len_in",
.arg2type = ARG_LEN,
.arg3name = "advice",
.arg3type = ARG_OP,
.arg3list = {
.num = 12,
.values = { MADV_NORMAL, MADV_RANDOM, MADV_SEQUENTIAL, MADV_WILLNEED,
MADV_DONTNEED, MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK,
MADV_MERGEABLE, MADV_UNMERGEABLE, MADV_HUGEPAGE, MADV_NOHUGEPAGE },
},
...
};
This annotation describes the names and types of each of the three
arguments that the system call accepts. For example, the first argument is
annotated as ARG_NON_NULL_ADDRESS, meaning that Trinity should
provide an intelligently selected, semi-random, nonzero address for this
argument. The last argument is annotated as ARG_OP, meaning that
Trinity should randomly select one of the values in the corresponding list
(the MADV_* values above).
The second madvise() argument is annotated ARG_LEN,
meaning that it is the length of a memory buffer. Again, rather than
passing purely random values to such arguments, Trinity attempts to
generate "interesting" numbers that are more likely to trigger errors—for
example, a value whose least significant bits are
0xfff might find an off-by-one error in the logic of some system call.
Trinity also understands a range of other annotations, including
ARG_RANDOM_INT, ARG_ADDRESS (an address that can be
zero), ARG_PID (a process ID), ARG_LIST (for bit masks
composed by logically ORing values randomly selected from a specified
list), ARG_PATHNAME, and ARG_IOV (a
struct iovec of the kind passed to system calls such as
readv()). In each case, Trinity uses the annotation to generate a
better-than-random test value that is more likely to trigger an unexpected
error. Another interesting annotation is ARG_FD, which causes
Trinity to pass an open file descriptor to the tested system call. For this
purpose, Trinity opens a variety of file descriptors, including descriptors
for pipes, network sockets, and files in locations such as /dev,
/proc, and /sys. The open file descriptors are randomly
passed to system calls that expect descriptors. By now, it might start to
become clear that you don't want to run Trinity on a system that has the
only copy of your family photo albums.
In addition to annotations, each system call can optionally have a
sanitise routine (Dave's code employs the British
spelling) that performs further fine-tuning of the arguments for the
system call. The sanitise routine can be used to construct arguments that
require special values (e.g., structures) or to correctly initialize the
values in arguments that are interdependent. It can also be
used to ensure that an argument has a value that won't cause an expected
error. For example, the sanitise routine for the madvise() system
call is as follows:
static void sanitise_madvise(int childno)
{
shm->a2[childno] = rand() % page_size;
}
This ensures that the second (length) argument given to
madvise() will be no larger than the page size, preventing the
ENOMEM error that would commonly result when a large length
value causes madvise() to touch an unmapped area of
memory. Obviously, this means that the tests will never exercise the case where
madvise() is applied to regions larger than one page. This
particular sanitize routine could be improved by sometimes
allowing length values that are larger than the page size.
Running trinity
The Trinity home
page has links to the Git repository as well as to the latest stable
release (Trinity 1.1, which was released in
January 2013). Compilation from source is straightforward; then Trinity can
be invoked with a command line as simple as:
$ ./trinity
With no arguments, the program repeatedly tests
randomly chosen system calls. It is also possible to test selected system
calls using one or more instances of the -c command-line
option. This can be especially useful when testing new system calls.
Thus, for example, one could test just the madvise() system call
using the following command:
$ ./trinity -c madvise
In order to perform its work, the trinity program creates a
number of processes, as shown in the following diagram:
The main process performs various initializations (e.g.,
opening the file descriptors and creating the memory mappings used for
testing) and then kicks off a number (default: four) of child processes
that perform the system call tests. A shared memory region (created by the
initial trinity process) is used to record various pieces of
global information, such as open file descriptor numbers, total system
calls performed, and number of system calls that succeeded and failed. The
shared memory region also records various information about each of the
child processes, including the PID, and the system call number and
arguments for the system call that is currently being executed as well as
the system call that was previously executed.
The watchdog process ensures that the test system is still
working correctly. It checks that the children are progressing (they may be
blocked in a system call), and kills them if they are not; when the
main process detects that one of its children has terminated
(because the watchdog killed it, or for some other reason), it
starts a new child process to replace it. The watchdog also
monitors the integrity of the memory region that is shared between the
processes, in case some operation performed by one of the children has
corrupted the region.
Each of the child processes writes to a separate log file, recording
the system calls that it performs and the return values of those system
calls. The file is synced just before each system call is performed, so
that if the system panics, it should be possible to determine the cause of
the panic by looking at the last recorded system call in each of the log
files. The log file contains lines such as the following, which show the
PID of child process, a sequential test number, and the system call
arguments and result:
[17913] [0] mmap(addr=0, len=4096, prot=4, flags=0x40031, fd=-1, off=0) = -1 (Invalid argument)
[17913] [1] mmap(addr=0, len=4096, prot=1, flags=0x25821, fd=-1, off=0x80000000) = -541937664
[17913] [2] madvise(start=0x7f59dff7b000, len_in=3505, advice=10) = 0
...
[17913] [6] mmap(addr=0, len=4096, prot=12, flags=0x20031, fd=-1, off=0) = -1 (Permission denied)
...
[17913] [21] mmap(addr=0, len=4096, prot=8, flags=0x5001, fd=181, off=0) = -1 (No such device)
Trinity can be used in a number of ways. One possibility is simply to
leave it running until it triggers a kernel panic and then look at the
child logs and the system log in order to discover the cause of the
panic. Dave has sometimes left systems running for hours or days in order
to discover such failures. New system calls can be exercised using the
-c command-line option described above. Another possible use is to
discover unexpected (or undocumented) failure modes of existing system
calls: suitable scripting on the log files can be used to obtain summaries
of the various failures of a particular system call.
Yet another way of using the trinity program is with the
-V (victim files) option. This option takes a directory argument:
the program will randomly open files in that directory and pass the
resulting file descriptors to system calls. This can be useful for
discovering failure modes in a particular filesystem type. For example,
specifying an NFS mount point as the directory argument would exercise
NFS. The -V flag can also be used to perform a limited kind of
testing of user-space programs. During his linux.conf.au
presentation, Dave demonstrated the use of the following command:
$ ./trinity -V /bin -c execve
This command has the effect of executing random programs in /bin with random
string arguments. Looking at the system log revealed a large number of
programs that crashed with a segmentation fault when given unexpected arguments.
Results
Trinity has been rather successful at finding bugs. Dave
reports that he has himself found more than 150 bugs in 2012, and many more were
found by other people who were using Trinity. Trinity usually finds bugs in
new code quite quickly. It tends to find the same bugs repeatedly, so that
in order to find other bugs, it is probably necessary to fix the already
discovered bugs first.
Interestingly, Trinity has found bugs not just in system call code. Bugs have
been discovered in many other parts of the kernel, including the networking
stack, virtual memory code, and drivers. Trinity has found many error-path
memory leaks and cases where system call error paths failed to release kernel locks. In
addition, it has discovered a number of pieces of kernel code that had poor
test coverage or indeed no testing at all. The oldest bug that Trinity has
so far found dated back to 1996.
Limitations and future work
Although Trinity is already quite an effective tool for finding bugs,
there is scope for a lot more work to make it even better. An ongoing task
is to add support for new system calls and new system call flags as they
are added to the kernel. Only about ten percent of system calls currently
have sanitise routines. Probably many other system calls could do with
sanitise routines so that tests would get deeper into the code of those
system calls without triggering the same common and expected errors.
Trinity supports many network protocols, but that support could be further
improved and there are other networking protocols for which support could
be added.
Some system calls are annotated with an AVOID_SYSCALL flag,
which tells Trinity to avoid testing that system call. (The --list
option causes Trinity to display a list of the system calls that it knows
about, and indicates those system calls that are annotated with
AVOID_SYSCALL.) In some cases, a system call is avoided because it
is uninteresting to test—for example, system calls such as
fork() have no arguments to fuzz and exit() would simply
terminate the testing process. Some other system calls would interfere with
the operation of Trinity itself—examples include close(),
which would randomly close test file descriptors used by child processes,
and nanosleep(), which might put a child process to sleep for a
long time.
However, there are other system calls such as ptrace() and
munmap() that are currently marked with AVOID_SYSCALL,
but which probably could be candidates for testing by adding more
intelligence to Trinity. For example, munmap() is avoided because
it can easily unmap mappings that are needed for the child to
execute. However, if Trinity added some bookkeeping code that recorded
better information about the test mappings that it creates, then (only)
those mappings could be supplied in tests of munmap(), without
interfering with other mappings needed by the child processes.
Currently, Trinity randomly invokes system calls. Real programs demonstrate
common patterns for making system calls—for example,
opening, reading, and closing a file. Dave would like to add test support for
these sorts of commonly occurring patterns.
An area where Trinity currently provides poor coverage is the
multiplexing ioctl() system call, "the worst interface known
to man". The problem is that ioctl() is really a mass of
system calls masquerading as a single API. The first argument is a file
descriptor referring to a device or another file type, the second argument is
a request type that depends on the type of file or device referred to by
the first argument, and the data type of the third argument depends on the
request type. To achieve good test support for ioctl() would
require annotating each of the request types to ensure that it is
associated with the right type of file descriptor and the right data type
for the third argument. There is an almost limitless supply of work here,
since there are hundreds of request types; thus, in the first instance, this
work would probably be limited to supporting a subset of more interesting
request types.
There are a number of other improvements that Dave would like to see in
Trinity; the source code tarball contains a lengthy TODO
file. Among these improvements are better support for "destructors" in the
system call handling code, so that Trinity does not leak memory, and
support for invoking (some) system calls as root. More generally,
Trinity's ability to find further kernel bugs is virtually limitless: it
simply requires adding ever more intelligence to each of its tests.
Comments (7 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>