LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.8-rc6, released on February 1. "I have a CleverPlan(tm) to make *sure* that rc7 will be better and much smaller. That plan largely depends on me being unreachable for the next week due to the fact that there is no internet under water." Once he returns from diving, Linus plans to be very aggressive about accepting only patches that "fix major security issues, big user-reported regressions, or nasty oopses". The code name for the release has changed; it is now "Unicycling Gorilla".

Stable updates: 3.0.62, 3.4.29, and 3.7.6 were released on February 3; 3.2.38 was released on February 6.

Comments (1 posted)

Quotes of the week

Paraphrasing the Alien films: "Under water, nobody can read your email".
Linus Torvalds

Tonight’s mainline Linux kernel contains about 100,000 instances of the keyword “goto”. The most deeply nested use of goto that I could find is here, with a depth of 12. Unfortunately this function is kind of hideous. Here’s a much cleaner example with depth 10.

Here are the goto targets that appear more than 200 times:

out (23228 times)
error (4240 times)
err (4184 times)
fail (3250 times)
done (3179 times)
exit (1825 times)
bail (1539 times)
out_unlock (1219 times)
err_out (1165 times)
out_free (1053 times)
[...]

John Regehr

diff --git a/Documentation/SubmittingPatches b/Documentation/SubmittingPatches
--- a/Documentation/SubmittingPatches
+++ b/Documentation/SubmittingPatches
@@ -93,7 +93,9 @@ includes updates for subsystem X.  Please apply."
 
 The maintainer will thank you if you write your patch description in a
 form which can be easily pulled into Linux's source code management
-system, git, as a "commit log".  See #15, below.
+system, git, as a "commit log".  See #15, below.  If the maintainer has
+to hand-edit your patch, you owe them the beverage of their choice the
+next time you see them.
Greg Kroah-Hartman

"a beverage".

Pilsener, please.

Andrew Morton

Comments (26 posted)

RAID 5/6 code merged into Btrfs

At long last, the code implementing RAID 5 and 6 has been merged into an experimental branch in the Btrfs repository; this is an important step toward its eventual arrival in the mainline kernel. The initial benchmark results look good, but there are a few issues yet to be ironed out before this code can be considered stable. Click below for the announcement, benchmark information, and some discussion of how higher-level RAID works in Btrfs. "This does sound quite a lot like MD raid, and that's because it is. By doing the raid inside of Btrfs, we're able to use different raid levels for metadata vs data, and we're able to force parity rebuilds when crcs don't match. Also management operations such as restriping and adding/removing drives are able to hook into the filesystem transactions. Longer term we'll be able to skip reads on blocks that aren't allocated and do other connections between raid56 and the FS metadata."

Full Story (comments: 24)

Kernel development news

User-space lockdep

By Jonathan Corbet
February 6, 2013
The kernel's locking validator (often known as "lockdep") is one of the community's most useful pro-active debugging tools. Since its introduction in 2006, it has eliminated most deadlock-causing bugs from the system. Given that deadlocks can be extremely difficult to reproduce and diagnose, the result is a far more reliable kernel and happier users. There is a shortage of equivalent tools for user-space programming, despite the fact that deadlock issues can happen there as well. As it happens, making lockdep available in user space may be far easier than almost anybody might have thought.

Lockdep works by adding wrappers around the locking calls in the kernel. Every time a specific type of lock is taken or released, that fact is noted, along with ancillary details like whether the processor was servicing an interrupt at the time. Lockdep also notes which other locks were already held when the new lock is taken; that is the key to much of the checking that lockdep is able to perform.

To illustrate this point, imagine that two threads each need to acquire two locks, called A and B:

[Cheesy lock diagram]

If one thread acquires A first while the other grabs B first, the situation might look something like this:

[Cheesy lock diagram]

Now, when each thread goes for the lock it lacks, the system is in trouble:

[Cheesy lock diagram]

Each thread will now wait forever for the other to release the lock it holds; the system is now deadlocked. Things may not come to this point often at all; this deadlock requires each thread to acquire its lock at exactly the wrong time. But, with computers, even highly unlikely events will come to pass sooner or later, usually at a highly inopportune time.

This situation can be avoided: if both threads adhere to a rule stating that A must always be acquired before B, this particular deadlock (called an "AB-BA deadlock" for obvious reasons) cannot happen. But, in a system with a large number of locks, it is not always clear what the rules for locking are, much less that they are consistently followed. Mistakes are easy to make. That is where lockdep comes in: by tracking the order of lock acquisition, lockdep can raise the alarm anytime it sees a thread acquire A while already holding B. No actual deadlock is required to get a "splat" (a report of a locking problem) out of lockdep, meaning that even highly unlikely deadlock situations can be found before they ruin somebody's day. There is no need to wait for that one time when the timing is exactly wrong to see that there is a problem.

Lockdep is able to detect more complicated deadlock scenarios than the one described above. It can also detect related problems, such as locks that are not interrupt-safe being acquired in interrupt context. As one might expect, running a kernel with lockdep enabled tends to slow things down considerably; it is not an option that one would enable on a production system. But enough developers test with lockdep enabled that most problems are found before they make their way into a stable kernel release. As a result, reports of deadlocks on deployed systems are now quite rare.

Kernel-based tools often do not move readily to user space; the kernel's programming environment differs markedly from a normal C environment, so kernel code can normally only be expected to run in the kernel itself. In this case, though, Sasha Levin noticed that there is not much in the lockdep subsystem that is truly kernel-specific. Lockdep collects data and builds graphs describing observed lock acquisition patterns; it is code that could be run in a non-kernel context relatively easily. So Sasha proceeded to put together a patch set creating a lockdep library that is available to programs in user space.

Lockdep does, naturally, call a number of kernel functions, so a big part of Sasha's patch set is a long list of stub implementations shorting out calls to functions like local_irq_enable() that have no meaning in user space. An abbreviated version of struct task_struct is provided to track threads in user space, and functions like print_stack_trace() are substituted with user-space equivalents (backtrace_symbols_fd() in this case). The kernel's internal (used by lockdep) locks are reimplemented using POSIX thread ("pthread") mutexes. Stub versions of the include files used by the lockdep code are provided in a special directory. And so on. Once all that is done, the lockdep code can be built directly out of the kernel tree and turned into a library.

User-space code wanting to take advantage of the lockdep library needs to start by including <liblockdep/mutex.h>, which, among other things, adds a set of wrappers around the pthread_mutex_t and pthread_rwlock_t types and the functions that work with them. A call to liblockdep_init() is required; each thread should also make a call to liblockdep_set_thread() to set up information for any problem reports. That is about all that is required; programs that are instrumented in this way will have their pthreads mutex and rwlock usage checked by lockdep.

As a proof of concept, the patch adds instrumentation to the (thread-based) perf tool contained within the kernel source tree.

One of the key aspects of Sasha's patch is that it requires no changes to the in-kernel lockdep code at all. The user-space lockdep library can be built directly out of the kernel tree. Among other things, that means that any future lockdep fixes and enhancements will automatically become available to user space with no additional effort required on the kernel developers' part.

In summary, this patch looks like a significant win for everybody involved; it is thus not surprising that opposition to its inclusion has been hard to find. There has been a call for some better documentation, explicit mention that the resulting user-space library is GPL-licensed, and a runtime toggle for lock validation (so that the library could be built into applications but not actually track locking unless requested). Such details should not be hard to fill in, though. So, with luck, user space should have access to lockdep in the near future, resulting in more reliable lock usage.

Comments (5 posted)

A simplified IDR API

By Jonathan Corbet
February 6, 2013
The kernel's "IDR" layer is a curious beast. Its job is conceptually simple: it is charged with the allocation of integer ID numbers used with device names, POSIX timers, and more. The implementation is somewhat less than simple, though, for a straightforward reason: IDR functions are often called from performance-critical code paths and must be able to work in atomic context. These constraints, plus some creative programming, have led to one of the stranger subsystem APIs in the kernel. If Tejun Heo has his way, though, things will become rather less strange in the future — though at least one reviewer disagrees with that conclusion.

Strangeness notwithstanding, the IDR API has changed little since it was documented here in 2004. One includes <linux/idr.h>, allocates an idr structure, and initializes it with idr_init(). Thereafter, allocating a new integer ID and binding it to an internal structure is a matter of calling these two functions:

    int idr_pre_get(struct idr *idp, gfp_t gfp_mask);
    int idr_get_new(struct idr *idp, void *ptr, int *id);

The call to idr_pre_get() should happen outside of atomic context; its purpose is to perform all the memory allocations necessary to ensure that the following call to idr_get_new() (which returns the newly allocated ID number and associates it with the given ptr) is able to succeed. The latter call can then happen in atomic context, a feature needed by many IDR users.

There is just one little problem with this interface, as Tejun points out in the introduction to his patch set: the call to idr_get_new() can still fail. So code using the IDR layer cannot just ask for a new ID; it must, instead, execute a loop that retries the allocation until it either succeeds or returns a failure code other than -EAGAIN. That leads to the inclusion of a lot of error-prone boilerplate code in well over 100 call sites in the kernel; the 2004 article and Tejun's patch both contain examples of what this code looks like.

Failure can happen for a number of reasons, but the mostly likely cause is tied to the fact that the memory preallocated by idr_pre_get() is a global resource. A call to idr_pre_get() simply ensures that a minimal amount of memory is available; calling it twice will not increase the amount of preallocated memory. So, if two processors simultaneously call idr_pre_get(), the amount of memory allocated will be the same as if only one processor had made that call. The first processor to call idr_get_new() may then consume all of that memory, leaving nothing for the second caller. That second caller will then be forced to drop out of atomic context and execute the retry loop — a code path that is unlikely to have been well tested by the original developer.

Tejun's response is to change the API, basing it on three new functions:

    void idr_preload(gfp_t gfp_mask);
    int idr_alloc(struct idr *idp, void *ptr, int start, int end, gfp_t gfp_mask);
    void idr_preload_end(void);

As with idr_pre_get(), the new idr_preload() function is charged with allocating the memory necessary to satisfy the next allocation request. There are some interesting differences, though. The attentive reader will note that there is no struct idr argument to idr_preload(), suggesting that the preallocated memory is not associated with any particular ID number space. It is, instead, stored in a single per-CPU array. Since this memory is allocated for the current CPU, it is not possible for any other processor to slip in and steal it — at least, not if the current thread is not preempted. For that reason, idr_preload() also disables preemption. Given that, the existence of the new idr_preload_end() function is easy to explain: it is there to re-enable preemption once the allocation has been performed.

A call to idr_alloc() will actually allocate an integer ID. It accepts upper and lower bounds for that ID to accommodate code that can only cope with a given range of numbers — code that uses the ID as an array index, for example. If need be, it will attempt to allocate memory using the given gfp_mask. Allocations will be unnecessary if idr_preload() has been called, but, with the new interface, preallocation is no longer necessary. So code that can call idr_alloc() from process context can dispense with the idr_preload() and idr_preload_end() calls altogether. Either way, the only way idr_alloc() will fail is with a hard memory allocation failure; there is no longer any need to put a loop around allocation attempts. As a result, Tejun's 62-part patch set, touching 78 files, results in the net deletion of a few hundred lines of code.

Most of the developers whose code was changed by Tejun's patch set responded with simple Acked-by lines. Eric Biederman, though, didn't like the API; he said "When reading code with idr_preload I get this deep down creepy feeling. What is this magic that is going on?" As can be seen in Tejun's response, one developer's magic is another's straightforward per-CPU technique. As of this writing, that particular discussion has not reached any sort of public resolution. Your editor would predict, though, that the simplification of this heavily-used API will be sufficiently compelling that most developers will be able to get past any resulting creepy feelings. So the IDR API may be changing in a mainline kernel in the not-too-distant future.

Comments (5 posted)

LCA: The Trinity fuzz tester

By Michael Kerrisk
February 6, 2013

The Linux kernel developers have long been aware of the need for better testing of the kernel. That testing can take many forms, including testing for performance regressions and testing for build and boot regressions. As the term suggests, regression testing is concerned with detecting cases where a new kernel version causes problems in code or features that already existed in previous versions of the kernel. Of course, each new kernel release also adds new features. The Trinity fuzz tester is a tool that aims to improve testing of one class of new (and existing) features: the system call interfaces that the kernel presents to user space.

Insufficient testing of new user-space interfaces is a long-standing issue in kernel development. Historically, it has been quite common that significant bugs are found in new interfaces only a considerable time after those interfaces appear in a stable kernel—examples include epoll_ctl(), kill(), signalfd(), and utimensat(). The problem is that, typically, a new interface is tested by only one person (the developer of the feature) or at most a handful of people who have a close interest in the interface. A common problem that occurs when developers write their own tests is a bias toward tests which confirm that expected inputs produce expected results. Often, of course, bugs are found when software is used in unexpected ways that test little-used code paths.

Fuzz testing is a technique that aims to reverse this testing bias. The general idea is to provide unexpected inputs to the software being tested, in the form of random (or semi-random) values. Fuzz testing has two obvious benefits. First, employing unexpected inputs mean that rarely used code paths are tested. Second, the generation of random inputs and the tests themselves can be fully automated, so that a large number of tests can be quickly performed.

History

Fuzz testing has a history that stretches back to at least the 1980s, when fuzz testers were used to test command-line utilities. The history of system call fuzz testing is nearly as long. During his talk at linux.conf.au 2013 [ogv video, mp4 video], Dave Jones, the developer of Trinity, noted that the earliest system call fuzz tester that he had heard of was Tsys, which was created around 1991 for System V Release 4. Another early example was a fuzz tester [postscript] developed at the University of Wisconsin in the mid-1990s that was run against a variety of kernels, including Linux.

Tsys was an example of a "naïve" fuzz tester: it simply generated random bit patterns, placed them in appropriate registers, and then executed a system call. About a decade later, the kg_crashme tool was developed to perform fuzz testing on Linux. Like Tsys, kg_crashme was a naïve fuzz tester.

Naïve fuzz testers are capable of finding some kernel bugs, but the use of purely random inputs greatly limits their efficacy. To see why this is, we can take the example of the madvise() system call, which allows a process to advise the kernel about how it expects to use a region of memory. This system call has the following prototype:

    int madvise(void *addr, size_t length, int advice);

madvise() places certain constraints on its arguments: addr must be a page-aligned memory address, length must be non-negative, and advice must be one of a limited set of small integer values. When any of these constraints is violated, madvise() fails with the error EINVAL. Many other system calls impose analogous checks on their arguments.

A naïve fuzz tester that simply passes random bit patterns to the arguments of madvise() will, almost always, perform uninteresting tests that fail with the (expected) error EINVAL. As well as wasting time, such naïve testing reduces the chances of generating a more interesting test input that reveals an unexpected error.

Thus, a few projects started in the mid-2000s with the aim of bringing more sophistication to the fuzz-testing process. One of these projects, Dave's scrashme, was started in 2006. Work on that project languished for a few years, and only picked up momentum starting in late 2010, when Dave began to devote significantly more time to its development. In December 2010, scrashme was renamed Trinity. At around the same time, another quite similar tool, iknowthis, was also developed at Google.

Intelligent fuzz testing

Trinity performs intelligent fuzz testing by incorporating specific knowledge about each system call that is tested. The idea is to reduce the time spent running "useless" tests, thereby reaching deeper into the tested code and increasing the chances of testing a more interesting case that may result in an unexpected error. Thus, for example, rather than passing random values to the advice argument of madvise(), Trinity will pass one of the values expected for that argument.

Likewise, rather than passing random bit patterns to address arguments, Trinity will restrict the bit pattern so that, much of the time, the supplied address is page aligned. However, some system calls that accept address arguments don't require memory aligned addresses. Thus, when generating a random address for testing, Trinity will also favor the creation of "interesting" addresses, for example, an address that is off a page boundary by the value of sizeof(char) or sizeof(long). Addresses such as these are likely candidates for "off by one" errors in the kernel code.

In addition, many system calls that expect a memory address require that address to point to memory that is actually mapped. If there is no mapping at the given address, then these system calls fail (the typical error is ENOMEM or EFAULT). Of course, in the large address space available on modern 64-bit architectures, most of the address space is unmapped, so that even if a fuzz tester always generated page-aligned addresses, most of the resulting tests would be wasted on producing the same uninteresting error. Thus, when supplying a memory address to a system call, Trinity will favor addresses for existing mappings. Again, in the interests of triggering unexpected errors, Trinity will pass the addresses of "interesting" mappings, for example, the address of a page containing all zeros or all ones, or the starting address at which the kernel is mapped.

In order to bring intelligence to its tests, Trinity must have some understanding of the arguments for each system call. This is accomplished by defining structures that annotate each system call. For example, the annotation file for madvise() includes the following lines:

    struct syscall syscall_madvise = {
        .name = "madvise",
        .num_args = 3,
        .arg1name = "start",
        .arg1type = ARG_NON_NULL_ADDRESS,
        .arg2name = "len_in",
        .arg2type = ARG_LEN,
        .arg3name = "advice",
        .arg3type = ARG_OP,
        .arg3list = {
            .num = 12,
            .values = { MADV_NORMAL, MADV_RANDOM, MADV_SEQUENTIAL, MADV_WILLNEED,
                    MADV_DONTNEED, MADV_REMOVE, MADV_DONTFORK, MADV_DOFORK,
                    MADV_MERGEABLE, MADV_UNMERGEABLE, MADV_HUGEPAGE, MADV_NOHUGEPAGE },
        },
        ...
    }; 

This annotation describes the names and types of each of the three arguments that the system call accepts. For example, the first argument is annotated as ARG_NON_NULL_ADDRESS, meaning that Trinity should provide an intelligently selected, semi-random, nonzero address for this argument. The last argument is annotated as ARG_OP, meaning that Trinity should randomly select one of the values in the corresponding list (the MADV_* values above).

The second madvise() argument is annotated ARG_LEN, meaning that it is the length of a memory buffer. Again, rather than passing purely random values to such arguments, Trinity attempts to generate "interesting" numbers that are more likely to trigger errors—for example, a value whose least significant bits are 0xfff might find an off-by-one error in the logic of some system call.

Trinity also understands a range of other annotations, including ARG_RANDOM_INT, ARG_ADDRESS (an address that can be zero), ARG_PID (a process ID), ARG_LIST (for bit masks composed by logically ORing values randomly selected from a specified list), ARG_PATHNAME, and ARG_IOV (a struct iovec of the kind passed to system calls such as readv()). In each case, Trinity uses the annotation to generate a better-than-random test value that is more likely to trigger an unexpected error. Another interesting annotation is ARG_FD, which causes Trinity to pass an open file descriptor to the tested system call. For this purpose, Trinity opens a variety of file descriptors, including descriptors for pipes, network sockets, and files in locations such as /dev, /proc, and /sys. The open file descriptors are randomly passed to system calls that expect descriptors. By now, it might start to become clear that you don't want to run Trinity on a system that has the only copy of your family photo albums.

In addition to annotations, each system call can optionally have a sanitise routine (Dave's code employs the British spelling) that performs further fine-tuning of the arguments for the system call. The sanitise routine can be used to construct arguments that require special values (e.g., structures) or to correctly initialize the values in arguments that are interdependent. It can also be used to ensure that an argument has a value that won't cause an expected error. For example, the sanitise routine for the madvise() system call is as follows:

    static void sanitise_madvise(int childno)
    {
        shm->a2[childno] = rand() % page_size;
    } 

This ensures that the second (length) argument given to madvise() will be no larger than the page size, preventing the ENOMEM error that would commonly result when a large length value causes madvise() to touch an unmapped area of memory. Obviously, this means that the tests will never exercise the case where madvise() is applied to regions larger than one page. This particular sanitize routine could be improved by sometimes allowing length values that are larger than the page size.

Running trinity

The Trinity home page has links to the Git repository as well as to the latest stable release (Trinity 1.1, which was released in January 2013). Compilation from source is straightforward; then Trinity can be invoked with a command line as simple as:

     $ ./trinity

With no arguments, the program repeatedly tests randomly chosen system calls. It is also possible to test selected system calls using one or more instances of the -c command-line option. This can be especially useful when testing new system calls. Thus, for example, one could test just the madvise() system call using the following command:

     $ ./trinity -c madvise

In order to perform its work, the trinity program creates a number of processes, as shown in the following diagram:

[Relationship of Trinity processes]

The main process performs various initializations (e.g., opening the file descriptors and creating the memory mappings used for testing) and then kicks off a number (default: four) of child processes that perform the system call tests. A shared memory region (created by the initial trinity process) is used to record various pieces of global information, such as open file descriptor numbers, total system calls performed, and number of system calls that succeeded and failed. The shared memory region also records various information about each of the child processes, including the PID, and the system call number and arguments for the system call that is currently being executed as well as the system call that was previously executed.

The watchdog process ensures that the test system is still working correctly. It checks that the children are progressing (they may be blocked in a system call), and kills them if they are not; when the main process detects that one of its children has terminated (because the watchdog killed it, or for some other reason), it starts a new child process to replace it. The watchdog also monitors the integrity of the memory region that is shared between the processes, in case some operation performed by one of the children has corrupted the region.

Each of the child processes writes to a separate log file, recording the system calls that it performs and the return values of those system calls. The file is synced just before each system call is performed, so that if the system panics, it should be possible to determine the cause of the panic by looking at the last recorded system call in each of the log files. The log file contains lines such as the following, which show the PID of child process, a sequential test number, and the system call arguments and result:

    [17913] [0] mmap(addr=0, len=4096, prot=4, flags=0x40031, fd=-1, off=0) = -1 (Invalid argument)
    [17913] [1] mmap(addr=0, len=4096, prot=1, flags=0x25821, fd=-1, off=0x80000000) = -541937664
    [17913] [2] madvise(start=0x7f59dff7b000, len_in=3505, advice=10) = 0
    ...
    [17913] [6] mmap(addr=0, len=4096, prot=12, flags=0x20031, fd=-1, off=0) = -1 (Permission denied)
    ...
    [17913] [21] mmap(addr=0, len=4096, prot=8, flags=0x5001, fd=181, off=0) = -1 (No such device)

Trinity can be used in a number of ways. One possibility is simply to leave it running until it triggers a kernel panic and then look at the child logs and the system log in order to discover the cause of the panic. Dave has sometimes left systems running for hours or days in order to discover such failures. New system calls can be exercised using the -c command-line option described above. Another possible use is to discover unexpected (or undocumented) failure modes of existing system calls: suitable scripting on the log files can be used to obtain summaries of the various failures of a particular system call.

Yet another way of using the trinity program is with the -V (victim files) option. This option takes a directory argument: the program will randomly open files in that directory and pass the resulting file descriptors to system calls. This can be useful for discovering failure modes in a particular filesystem type. For example, specifying an NFS mount point as the directory argument would exercise NFS. The -V flag can also be used to perform a limited kind of testing of user-space programs. During his linux.conf.au presentation, Dave demonstrated the use of the following command:

    $ ./trinity -V /bin -c execve

This command has the effect of executing random programs in /bin with random string arguments. Looking at the system log revealed a large number of programs that crashed with a segmentation fault when given unexpected arguments.

Results

Trinity has been rather successful at finding bugs. Dave reports that he has himself found more than 150 bugs in 2012, and many more were found by other people who were using Trinity. Trinity usually finds bugs in new code quite quickly. It tends to find the same bugs repeatedly, so that in order to find other bugs, it is probably necessary to fix the already discovered bugs first.

Interestingly, Trinity has found bugs not just in system call code. Bugs have been discovered in many other parts of the kernel, including the networking stack, virtual memory code, and drivers. Trinity has found many error-path memory leaks and cases where system call error paths failed to release kernel locks. In addition, it has discovered a number of pieces of kernel code that had poor test coverage or indeed no testing at all. The oldest bug that Trinity has so far found dated back to 1996.

Limitations and future work

Although Trinity is already quite an effective tool for finding bugs, there is scope for a lot more work to make it even better. An ongoing task is to add support for new system calls and new system call flags as they are added to the kernel. Only about ten percent of system calls currently have sanitise routines. Probably many other system calls could do with sanitise routines so that tests would get deeper into the code of those system calls without triggering the same common and expected errors. Trinity supports many network protocols, but that support could be further improved and there are other networking protocols for which support could be added.

Some system calls are annotated with an AVOID_SYSCALL flag, which tells Trinity to avoid testing that system call. (The --list option causes Trinity to display a list of the system calls that it knows about, and indicates those system calls that are annotated with AVOID_SYSCALL.) In some cases, a system call is avoided because it is uninteresting to test—for example, system calls such as fork() have no arguments to fuzz and exit() would simply terminate the testing process. Some other system calls would interfere with the operation of Trinity itself—examples include close(), which would randomly close test file descriptors used by child processes, and nanosleep(), which might put a child process to sleep for a long time.

However, there are other system calls such as ptrace() and munmap() that are currently marked with AVOID_SYSCALL, but which probably could be candidates for testing by adding more intelligence to Trinity. For example, munmap() is avoided because it can easily unmap mappings that are needed for the child to execute. However, if Trinity added some bookkeeping code that recorded better information about the test mappings that it creates, then (only) those mappings could be supplied in tests of munmap(), without interfering with other mappings needed by the child processes.

Currently, Trinity randomly invokes system calls. Real programs demonstrate common patterns for making system calls—for example, opening, reading, and closing a file. Dave would like to add test support for these sorts of commonly occurring patterns.

An area where Trinity currently provides poor coverage is the multiplexing ioctl() system call, "the worst interface known to man". The problem is that ioctl() is really a mass of system calls masquerading as a single API. The first argument is a file descriptor referring to a device or another file type, the second argument is a request type that depends on the type of file or device referred to by the first argument, and the data type of the third argument depends on the request type. To achieve good test support for ioctl() would require annotating each of the request types to ensure that it is associated with the right type of file descriptor and the right data type for the third argument. There is an almost limitless supply of work here, since there are hundreds of request types; thus, in the first instance, this work would probably be limited to supporting a subset of more interesting request types.

There are a number of other improvements that Dave would like to see in Trinity; the source code tarball contains a lengthy TODO file. Among these improvements are better support for "destructors" in the system call handling code, so that Trinity does not leak memory, and support for invoking (some) system calls as root. More generally, Trinity's ability to find further kernel bugs is virtually limitless: it simply requires adding ever more intelligence to each of its tests.

Comments (7 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Networking

Architecture-specific

Security-related

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds