Kernel development [LWN.net]

Kernel release status

The current development kernel is 4.5-rc6, released on February 28. Linus said: "I'd like to say that things are on track for the usual release timing, but let's see how things look next week. If rc7 hasn't started to shrink, I may end up deciding that this is one of the releases when we do an rc8 too. Too early to tell. There's nothing particularly scary going on, but I'd have liked it even calmer this week."

Stable updates: 4.4.3, 3.14.62, and 3.10.98 were released on February 26.

The 4.4.4 (342 changes), 3.14.63 (130 changes), and 3.10.99 (80 changes) updates are in the review process as of this writing; they can be expected on or after March 3. With these kernels, Greg Kroah-Hartman says, the queue of patches waiting to go into stable updates has finally been emptied.

Comments (none posted)

Welte: Report from the VMware GPL court hearing

On his blog, Harald Welte has a report on a hearing in Germany regarding VMware's alleged GPL violations. Welte is a former kernel developer as well as the founder of gpl-violations.org, so he has quite an interest in the case, which was brought by Christoph Hellwig and is being funded by the Software Freedom Conservancy. To Welte's eye, it seems that there are two questions at issue: whether vmklinux and vmkernel are considered to be one or separate works (in a copyright sense) and whether Hellwig has the standing to sue: "This situation is used by the VMware defense in claiming that overall, they could only find very few functions that could be attributed to Christoph, and that this may altogether be only 1% of the Linux code they use in VMware ESXi. The court recognized this as difficult, as in German copyright law there is the concept of fading. If the original work by one author has been edited to an extent that it is barely recognizable, his original work has faded and so have his rights. The court did not state whether it believed that this has happened. To the contrary, the indicated that it may very well be that only very few lines of code can actually make a significant impact on the work as a whole. However, it is problematic for them to decide, as they don't understand source code and software development. So if (after further briefs from both sides and deliberation of the court) this is still an open question, it might very well be the case that the court would request a [technical] expert report to clarify this to the court."

Comments (34 posted)

The persistent memory "I know what I'm doing" flag

By Jonathan Corbet
March 2, 2016

As was described in Neil Brown's article last week, developers working on persistent memory appear to be converging on a solution for the fsync() system call. A working fsync() will enable applications to ensure that the data they have written is safely stored to persistent memory; importantly, applications that have been written correctly for POSIX filesystems in general will work correctly on persistent memory without the need to be aware of the difference. But some developers want to write code that is specific to persistent memory as a way of maximizing performance. A patch catering to the needs of those developers inspired a lengthy conversation on how to best ensure that data written to persistent memory is not lost, and how development in this area should proceed in general.

The problem with the emerging fsync() solution, according to Boaz Harrosh, is that it requires the kernel to maintain a radix tree of all pages that might have dirty lines of data in the CPU caches. If an application has been written with persistent memory in mind, though, it can avoid leaving data in the caches. That data can be explicitly flushed by the application or, as an alternative, non-temporal writes can be used to bypass the CPU caches entirely. If the application is using these techniques, Boaz said, there is no need for the kernel to flush cache lines for the relevant persistent memory, so it can avoid the wasted overhead of maintaining the radix tree.

The kernel currently has no way of knowing that an application is taking care of its own cache-management needs, though. Fixing that is the goal of this patch set posted by Boaz in February. It adds a new flag for the mmap() system call named MAP_PMEM_AWARE. If an application maps a file stored in persistent memory with this flag, and the filesystem supports the DAX direct-access mechanism, the kernel can assume that the application will deal with cache management and, as a result, the kernel need not track pages with potentially dirty cache lines. Boaz claims considerably improved performance when running with this patch.

Some concerns

It is fair to say that this patch was not universally acclaimed. There were a number of objections to providing this kind of functionality, the first of which being that an application that does its own cache management will still have to make calls to fsync() (or msync()) to ensure that its data is truly persistent. That is because this data does not stand alone; it is stored within a filesystem, and the application has no knowledge of whether there is any filesystem metadata that must also be flushed out to be sure that the data can be accessed. The only way to be sure that the metadata is consistent on disk is to call fsync(), just like applications dealing with data on more traditional storage media.

In theory, an application can allocate and write an entire file, then call fsync() to get it all to persistent storage with the goal that, afterward, it can rewrite the data within the file without causing any further metadata changes (other than timestamps, which are not important for retrieving that data). But filesystems can be performing actions like data deduplication, delayed allocation, or, as Christoph Hellwig pointed out, copy-on-write operations. So it is true that the only way to be sure that data is truly, safely persistent is to call fsync(); the MAP_PMEM_AWARE flag would not eliminate that requirement.

Boaz protested that eliminating the need to call fsync() was never the purpose of the patch set. Instead, it aims to make those calls much faster; other overhead, especially associated with page faults in areas backed by persistent memory, would also be significantly reduced. Unfortunately, the worries about MAP_PMEM_AWARE didn't end there.

For example, consider the interaction between applications using this flag and others that are not aware of persistent memory. Such applications (which might be something as simple as mv or a backup utility) may also create metadata changes needing flushing, and they may create dirty cache lines in the persistent-memory area that the "aware" application knows nothing about. Experience with direct I/O has shown that such interactions can be subtle, difficult to notice, and impossible to fix.

Perhaps the biggest worry, though, is that application developers will rush out and proclaim that their code is "aware" without actually understanding everything they need to do to guarantee the integrity of their data. As Dave Chinner put it: "Almost any app developer that says they understand how filesystems provide data integrity is almost always completely wrong." If the kernel provides these developers with an "I know what I'm doing" flag, the reasoning goes, they will soon write code that demonstrates the lack of that knowledge — to their users' detriment.

One might just say that any such applications are buggy; they will either be fixed or replaced with something better. But, as Dave continued, he made it clear that he didn't see things happening that way.

History tells us otherwise. Users always blame the filesystem first, and then app developers will refuse to fix their applications because it would either make their app slow or they think it's a filesystem problem to solve because they tested on some other filesystem and it didn't display that behaviour. The result is we end up working around such problems in the filesystem so that users don't end up losing data due to shit applications.

The same will happen here - filesystems will end up ignoring this special "I know what I'm doing" flag because the vast majority of app developers don't know enough to even realise that they don't know what they are doing.

That last point is key: filesystem developers, in their own defense, will end up ignoring this new flag because the alternative is to face the wrath of users who blame them for their lost data. The ext4 data-loss wars in 2009 have left some lasting scars; filesystem developers do not wish to find themselves in that position again.

Data integrity first

Developers had one more reason to oppose this patch — one that had little to do with the specifics of the patch itself. DAX and its associated persistent-memory functionality are still new, and problems are still being found with them. Dave made the claim that the core problem of safely storing data via DAX has not yet been solved, so it is not appropriate to be looking at optimizations. For now, the focus has to be on making things reliable; after that, there will be time to look at where the performance issues lie and do some optimization work.

Failure to solve the correctness issues first, he said, will just lead to more problems as more features are added. He drew a parallel with Btrfs which, he said, didn't solve the "known hard problems" early and, as a result, is stuck with "entrenched deficiencies" that are nearly impossible to fix. If those known hard problems are not solved first with DAX, it may well end up in the same situation.

He would also like to see optimization work focused on the general case, instead of on providing opt-out mechanisms for a few programs. Fixing performance issues rather than bypassing them will provide benefits for everybody, a better outcome than just enabling a few applications to implement their own optimized solutions. If, instead, those applications opt out, they will not benefit from core-code improvements and, consequently, those improvements will be less likely to happen.

Pushing back on and delaying work that kernel developers would like to see merged is never a pleasant experience. That work was done for a reason; rejecting it often means that at least some of that work was done in vain, and hard feelings can often result. But experience has shown that resisting work that seems premature or not consistent with long-term goals leads to a better, more maintainable kernel in the long run. The DAX infrastructure is going to have to serve as an important kernel-supported approach to persistent memory for a long time; the community cannot afford to get this one wrong. So there may well be a solid case to be made for conservatism in this area for now.

Comments (6 posted)

Airplane mode and rfkill

March 2, 2016

This article was contributed by Neil Brown

The closest that many get to the kernel's rfkill subsystem is when they press a button on their laptop (e.g. WiFi off, airplane mode) to save power, board a plane, or reduce interruptions. The plumbing to link that key press to a light that goes off, or maybe on, should be fairly straightforward but, as some recent patches show, there is still room for improvement. To understand the nature of these improvements, a little introduction to rfkill will be helpful.

The rfkill subsystem supports the creation of rfkill devices. When a driver registers a device capable of transmitting RF (e.g. WiFi or Bluetooth adapters), it will also register an rfkill device associated with the transmitter. Each such device will have an index number, a name such as "eeepc-wlan" or "phy1", and two state flags: "hard blocked" (RFKILL_STATE_HARD_BLOCKED) is read-only and is expected to reflect some physical disablement while "soft blocked" (RFKILL_STATE_SOFT_BLOCKED) is read/write and can be used to enable or disable transmission.

[WiFi key]

Each device also has a type (RFKILL_TYPE_*) from the list WLAN, BLUETOOTH, UWB (ultra-wideband), WIMAX, WWAN (wireless WAN), GPS, FM, and NFC. GPS is an interesting addition to the list as GPS transmitters are rare. GPS receivers do have powered antenna and powering these down is sometimes appropriate; that could be seen as fitting the role of rfkill.

For linking with an input key, the rfkill subsystem registers an input handler that is automatically attached to any input device that can report one of the keys KEY_WLAN, KEY_BLUETOOTH, KEY_UWB, KEY_WIMAX, or KEY_RFKILL — or one that can report a change to the SW_RFKILL_ALL switch. The distinction between a key and a switch is that a key reports an "off" event when released, while a switch has two equally stable states. The key events toggle an internal rfkill state and cause all rfkill devices of the relevant type to be either blocked or unblocked, where KEY_RFKILL applies to all types. The switch is a little more heavy-handed as will be described later.

On the output side, each rfkill device registers an LED trigger that can be assigned to any LED that Linux controls. This assignment can be effected with a kernel driver, by using device tree (the "linux,default-trigger" attribute), or by writing to /sys/class/leds/$LED/trigger. There are usually a large number of triggers available, from simple states like "none" and "default-on" to the more complex "BAT0-charging-blink-full-solid" that might be registered by a battery controller. The "rfkillNN" trigger (where "NN" is the index number of the relevant rfkill) turns the LED on if transmission is not blocked, and off if it is either hard- or soft-blocked. This makes it suitable for an LED marked with a transmitter symbol, but not so suitable for one marked with an airplane: in that case one would expect the light to be on when transmission is blocked, rather than off.

This is where the patches from João Paulo Rechi Vita come in. Rechi Vita is putting together support for some new ASUS laptops and wanted to enable the LED next to the airplane symbol. To support this he created a new LED trigger called "rfkill-airplane-mode" that causes any associated LED to light up when in airplane mode. This might seem simple enough, but first you need to be sure you have a clear understanding of what airplane mode means, and agreement on whether the kernel should even know about such a thing.

To see what it could mean in the context of the rfkill subsystem, it is important to understand that there are some more soft-block flags beyond the one per device. The flags exist at three different levels. At the top level is a global flag. When it is toggled using the KEY_RFKILL key or set using, for example, "rfkill block all", all of the flags at all levels are set to match the new value of the global flag. The middle level has one flag for each type of rfkill device; one for WLAN, one for BLUETOOTH, etc. When these are toggled using a relevant key or set with a command like "rfkill unblock bluetooth", the soft flag for all devices of that type is set to match the new value of the per-type flag. The third level is the per-device soft-blocked flags that we have already met. When these are set, the corresponding per-type flag and the global flag are left unchanged, so the settings can become inconsistent. It is quite possible for a specific WLAN device to be unblocked while both the global setting and the mid-level WLAN setting are blocked. This could be achieved with commands like:

    # rfkill block all
    # rfkill block wlan
    # rfkill unblock phy0

That understanding is enough to fill in the blanks concerning the SW_RFKILL_ALL switch. When that switch is activated all of the soft-block flags, per-device, per-type, and global, are set to "blocked" after first saving a copy of the per-type and global flags. While the switch is active none of the toggle keys will work, though settings can still be changed using the rfkill command-line tool. When the switch is deactivated, the toggle keys are re-enabled and the various flags can either be left unchanged, restored to their previous setting, or forced on, depending on the "master_switch_mode" module parameter.

Since both the SW_RFKILL_ALL switch and the KEY_RFKILL key affect the global blocked flag, it makes some sense for that value to drive the airplane-mode LED. Had Rechi Vita created a trigger called "rfkill-all-inverted" that might have been the end of the story. The proposed "rfkill-airplane-mode" raised questions though. Marcel Holtmann wondered if the concept of airplane mode had any place in the kernel at all, since it is really a regulatory concept rather than a technical concept, and it is subject to change with place and time. Another concern, which seemed to be implied but never quite stated, was that, since transmitters can be turned back on individually without changing the global blocked status, it wasn't clear that the global soft-blocked status meant anything more than "the next toggle will turn everything back on", which is rather indirect.

It was generally felt that these concerns were more theoretical than practical and didn't need much attention. Providing that a user-space daemon could implement a more nuanced behavior if it chose, the simple answer is probably the best. A user-space daemon can always take complete control of any LED by simply setting the trigger to "none" and controlling the brightness directly, but Rechi Vita provided something a little better. With his latest patches, a daemon can take control of the airplane-mode setting and set it explicitly the way it wants. It is fairly easy to receive notifications of changes to rfkill devices, so any policy for the LED that can be imagined can be implemented. Having this option removes the need for the daemon to discover which LED it needs to control, and means that if the daemon dies the behavior will revert to the default, which may not be perfect behavior, but isn't that bad.

It is not immediately clear that these benefits justify giving the kernel a concept of airplane mode — even a user-modifiable one. Restarting daemons that die is a solved problem, so that aspect provides no real benefit. Not needing to discover the appropriated LED is more interesting. If a daemon wanted to discover input keys and switches that relate to RF transmission, there are a well-defined set of event names that can be searched for — you could probably even script something using evtest.

Discovering LEDs is not so easy. All that can be used is the names of the LEDs and, while these are supposed to be of the form "devicename:color:function", there is no standard list of functions. The mac80211 module uses "radio" for a function name, while the ASUS platform driver defines "asus::wlan". The Intel iwl driver defines LEDs with names like "phy0-led" which doesn't match the pattern at all. Rechi Vita has a separate patch that creates an LED called "asus-wireless::airplane_mode", which is pleasingly unambiguous, but only really helpful if other developers follow this lead. Teaching the kernel about airplane mode might not be the most elegant response to this lack of standardization, but it should work; Linux is nothing if not pragmatic.

Once these patches land, we will be a little closer to being able to have a light that comes on in airplane mode. Only two steps remain on my notebook: getting events when the airplane-mode key is pressed, and being able to control the airplane-mode LED. This functionality is often controlled through ACPI; the details seem to vary unpredictably from model to model. Whether Rechi Vita's other patches will help on a model which is a full six months old is an open question.

Comments (3 posted)

Coverage-guided kernel fuzzing with syzkaller

March 2, 2016

This article was contributed by David Drysdale

If your software deals with untrusted user input, it's a good idea to run a fuzzer against the program. For the Linux kernel, the most effective fuzzer of recent years has been Dave Jones's Trinity system call tester. But there's a new system call fuzzer in town, Dmitry Vyukov's syzkaller, and early results from it look promising — over 150 bugs uncovered in the mainline kernel (plus several dozen in Google's internal kernels) in a few months of operation.

Fuzzing in user space

The basic idea of fuzzing — feeding huge numbers of random inputs into a program and watching for crashes — has been around for a long time, but a naive implementation that just blindly emits random data is too inefficient to find all but the most shallow bugs. One technique for finding deeper bugs is to use a "template-based" fuzzer, which generates input variations from built-in knowledge about the possible/valid patterns (i.e. templates) for the program under test — information that needs to be manually created for each particular target (or class of targets).

However, more recently, "coverage-guided" fuzzers have appeared, notably Michał Zalewski's american fuzzy lop (which LWN covered back in September) and Clang's LibFuzzer, which operate without target-specific templates. Instead, these fuzzers work with an instrumented build of the binary under test, so that code coverage information is exposed. The fuzzer tries to maximize the amount of code covered (building an ever-expanding corpus of test inputs along the way), by mutating existing inputs and saving anything that hits new code.

As well as detecting out-and-out crashes, fuzzers also work well in combination with tools that expose latent bugs, such as Clang's sanitizers — compiler options that add instrumentation to the generated code so that incorrect behavior generates an error at run-time:

AddressSanitizer (ASAN), which detects memory access errors.
ThreadSanitizer (TSAN), which detects data races between different threads.
MemorySanitizer (MSAN), which detects uninitialized reads: code whose behavior relies on memory contents that have not been initialized to a specific value.
UndefinedBehaviorSanitizer (UBSAN), which detects the use of various features of C/C++ that are explicitly listed as resulting in undefined behavior.

(Most, but not all, of the sanitizers have been ported from Clang to GCC; however, it remains the case that the most useful tools appear first, or even exclusively, for Clang/LLVM — another reason to hope for the complete success of the LLVMLinux project.)

Fuzzing the kernel

The Linux kernel is certainly a piece of software that is exposed to untrusted user input, so it is an important target for fuzzing. The kernel is also sufficiently high-profile that it has been worth writing specific, template-based fuzzers for different areas of it, such as the filesystem or the perf_event subsystem. For the system call interface in general, the Trinity fuzz tester is the main tool that is currently used. It fuzzes the kernel in an intelligent way that is driven by per-system call templates.

In recent months, Vyukov and a team from Google have brought coverage-guided fuzz testing to the kernel with syzkaller, which uses a hybrid approach. As with Trinity, syzkaller relies on templates that indicate the argument domains for each system call, but it also uses feedback from code coverage information to guide the fuzzing.

The need for instrumentation does make syzkaller more complicated to set up than Trinity. To start with, the compiler option to generate the needed coverage data has only recently been added to GCC (as -fsanitize-coverage=trace-pc), so the kernel needs to be built with a fresh-from-tip version of GCC.

It is worth noting that Jones has considered feedback-guided fuzzing for Trinity in the past, but found the coverage tools that were available at the time to be too slow. The Google team behind syzkaller is primarily made up of compiler developers rather than kernel developers, so they may have an easier job of upgrading the tools to match the task in hand.

Another complication is that the coverage data needs to be tracked on a per-task basis and exported from the kernel to the outside world (via a debugfs entry at /sys/kernel/debug/kcov). The kernel patch to do this, and to invoke the relevant compiler options (all under CONFIG_KCOV), is currently under discussion but looks likely to be merged soon.

As mentioned above, the most effective bug-hunting occurs when the system call fuzzing is combined with tools that make latent bugs more visible. The kernel version of AddressSanitizer, KASAN, is the most straightforward of the sanitizers to enable (it is already included in the kernel as the CONFIG_KASAN build option), and it's also helpful to turn on various kernel debug features that expose incorrect use of internal kernel APIs, such as:

CONFIG_PROVE_LOCKING to catch potential deadlocks.
CONFIG_PROVE_RCU to catch potential bugs in RCU-using code.
CONFIG_DEBUG_ATOMIC_SLEEP to find code that calls potentially-sleeping functions in an atomic section.

Using these options means that errors get emitted for bugs that might otherwise pass unnoticed ninety-nine times out of a hundred (but which are correspondingly harder to find and fix on the hundredth roll of the dice).

With these preliminaries in place, syzkaller can then be run over a set of QEMU virtual machines running the instrumented kernel under test. The structure of the various syzkaller processes is described by the diagram below, which was taken from the project's documentation (and where red text indicates configuration entries).

The results

To see the results of syzkaller in action, we attempt to reproduce a null-dereference bug in System V shared-memory processing that was first reported in October 2015. We speed up the process by narrowing the range of system calls tested to just those mentioned in that email thread, via the enable_syscalls parameter in syzkaller's configuration file. We also make sure our test kernel is built with full namespace support; this allows the fuzzer to run its tests in individual sandboxes that do not interfere with each other (using the dropprivs configuration flag). This is particularly useful when dealing with an interprocess resource like shared memory.

While the fuzzer is running, it provides a minimal web server to allow the user to see progress. The main status page displays fuzzing statistics and a list of the tested system calls; each of the latter provides links to further pages:

A corpus page showing the sequences of system calls that have been run that include the given system call. For example, the page for remap_file_pages() might include "shmget-shmat-remap_file_pages" as a summary of particular sequence of system calls that has been run by the fuzzer.
A coverage page that shows which parts of the kernel source code were hit (provided that the kernel was configured with CONFIG_DEBUG_INFO and addr2line is in the PATH), either during the processing of a specific corpus input or during all corpus inputs that include the given system call.
A priority page that shows the biases used when randomly generating other system calls to run in combination with the given system call. These priorities are partly based on compatible argument types (for example, syzkaller is more likely to combine two system calls that both take socket file descriptor arguments), and partly based on the frequency with which particular pairs of system calls appear in the current corpus (indicating that the pair has been effective in hitting new code in the past).

After running for a while, syzkaller generates a report file that includes a kernel oops; this file includes a log of the sequences of system calls that were being run, together with the log output for a null pointer dereference. Feeding the main fault address from the oops output into the addr2line tool reveals that the problem is in shm_lock(), which is being called from shm_open() as part of processing a remap_file_pages() system call.

However, we still have to narrow down the precise sequence that causes the problem, as the report file includes 204 distinct sequences of system calls. The syz-repro tool helps with this process; starting from the configuration file and the crash report file, it first narrows down to the particular sequence that triggers the crash — usually one of the few immediately preceding the log output. Next, it repeatedly attempts to minimize that particular sequence of system calls, by generating simpler versions of the sequence and checking that they still induce a crash.

In our example, after a few iterations of syz-repro, a fairly short sequence of system calls pops out:

    mmap(&(0x7f0000000000)=nil, (0x2000), 0x3, 0x32, \
         0xffffffffffffffff, 0x0)
    r0 = shmget(0x5, (0x2000), 0x200, &(0x7f0000b03000)=nil)
    shmat(r0, &(0x7f0000b03000)=nil, 0x6000)
    shmctl(r0, 0x3, &(0x7f0000000000+0xe4b)={ \
           0x3, <r1=>0xffffffffffffffff, 0x0, 0xffffffffffffffff, \
	   0xffffffffffffffff, 0x1, 0xfa, 0x3, 0xee, 0x10000, 0x6520, \
	   0x5, 0xffffffffffffffff, 0x0, 0x0})
    shmctl(r0, 0xe, &(0x7f0000000000+0x28f)={ \
           0x1000, <r2=>0xffffffffffffffff, \
	   <r3=>0xffffffffffffffff, 0x0, <r4=>0x0, 0x7, \
	   0x100000000, 0x5, 0x6, 0x0, 0x2, 0x4, <r5=>0x0, \
	   0xffffffffffffffff, 0xef0})
    shmctl(r0, 0xc, &(0x7f0000002000-0x50)={ \
           0x80, r1, r4, r2, r3, 0x7, 0x10000, 0x5, 0xff, 0x80000000, \
	   0x9, 0x3, r5, 0xffffffffffffffff, 0x2})
    shmctl(r0, 0x0, &(0x7f0000001000-0x50)={ \
           0x1, 0x0, 0x0, 0xffffffffffffffff, 0x0, 0x1, 0x5, 0x5059, \
	   0x3, 0x6301, 0x8001, 0xfffffffffffffffd, 0xffffffffffffffff, \
           0x0, 0x6})
    remap_file_pages(&(0x7f0000b03000)=nil, (0x2000), 0x0, 0x7, \
                     0x21dd964cfba54855)

To confirm that this is a reproducible bug scenario, we feed this system call script into syzkaller's syz-prog2c utility, which generates a 100-line program that reproduces the problem on the test kernel.

At this point, a bit of human intervention helps to reduce the size of the program further. Looking at the shmctl() invocations, we notice that the first two calls are for IPC_INFO and SHM_INFO, both of which read values from the kernel rather than modifying anything. Next, we might also suspect that SHM_UNLOCK is a no-op, as nothing has been locked. After removing those calls and their data setup, we are left with an extremely short program that does indeed reproduce our null dereference (at least for now — a fix is on its way):

    #include <unistd.h>
    #include <sys/syscall.h>
    #include <string.h>
    
    long r[5];
    
    int main()
    {
    	memset(r, -1, sizeof(r));
    	r[0] = syscall(SYS_mmap, 0x20000000ul, 0x2000ul, 0x3ul, 0x32ul,
	               0xfffffffffffffffful, 0x0ul);
    	r[1] = syscall(SYS_shmget, 0x5ul, 0x2000ul, 0x200ul, 0x20b03000ul, 0, 0);
    	r[2] = syscall(SYS_shmat, r[1], 0x20b03000ul, 0x6000ul, 0, 0, 0);
    	r[3] = syscall(SYS_shmctl, r[1], 0x0ul, 0x20000fb0ul, 0, 0, 0);
    	r[4] = syscall(SYS_remap_file_pages, 0x20b03000ul, 0x2000ul,
	               0x0ul, 0x7ul, 0x21dd964cfba54855ul, 0);
    	return 0;
    }

Unfortunately, not all problems are as straightforward to reproduce and isolate as this one. Bugs may only be triggered by interactions between multiple test programs (when the procs configuration option is greater than one) if persistent or global resources are involved. More commonly, bugs may only be triggered by interactions between different threads in the same program; the fuzzing process deliberately executes system calls in parallel across multiple threads — which increases the chances of finding bugs at the cost of making it harder to narrow down the reproduction scenario. (Building the kernel with KTSAN enabled is particularly helpful for finding multithreaded problems, as it makes latent data races explicitly visible.)

To help with reproduction, syzkaller includes a tool (syz-execprog) for re-running a crash script under various options. The -threaded option governs whether the system call script is run across multiple threads, and (if it is) the -collide option forces the threads to explicitly execute system calls in parallel. To catch heisenbugs, the -repeat option also allows the script to be re-run arbitrarily many times.

Although these tools don't guarantee a simple reproduction scenario, they seem to be effective in practice — the majority of the syzkaller-generated bug reports have included a short reproducer program, greatly simplifying the process of finding and fixing the underlying bug. The corpus of test inputs can be a helpful resource for quick regression testing of new kernel versions.

What's next

The syzkaller project is under active development, so things are moving fast. As mentioned above, the necessary patches for GCC have gone upstream and should appear in the next version; the concomitant kernel patch is being discussed. Once both are available by default, running syzkaller will only be slightly more inconvenient than running Trinity.

Because syzkaller is a hybrid of a template-based and a coverage-guided fuzzer, it does work best when provided with information about the usage patterns of system calls. To that end, the syzkaller developers are keen to work with kernel developers so that support for particular kernel subsystems can be reviewed and extended (which may well involve making the system call template mechanisms more sophisticated). They would also like to extend architecture support beyond the current somewhat x86_64-specific situation, and would like to further automate the process of extracting a reproducer program (and minimizing the size of that program).

But overall, syzkaller appears to be a worthy addition to the battery of kernel test tools, and its successes reinforce the idea that fuzzing should be considered a best practice for any software project that takes user input.

Comments (8 posted)

Linus Torvalds Linux 4.5-rc6 ?

Greg KH Linux 4.4.3 ?

Thomas Gleixner v4.4.3-rt9 ?

Steven Rostedt 3.18.27-rt26 ?

Luis Henriques Linux 3.16.7-ckt25 ?

Greg KH Linux 3.14.62 ?

Steven Rostedt 3.14.61-rt63 ?

Jiri Slaby Linux 3.12.55 ?

Steven Rostedt 3.12.54-rt73 ?

Greg KH Linux 3.10.98 ?

Steven Rostedt 3.10.97-rt106 ?

Steven Rostedt 3.4.110-rt139 ?

Ben Hutchings Linux 3.2.78 ?

Steven Rostedt 3.2.77-rt111 ?

David Long arm64: Add kernel probes (kprobes) support ?

Khalid Aziz sparc64: Add support for Application Data Integrity (ADI) ?

Josh Poimboeuf Compile-time stack metadata validation ?

Emese Revfy Introduce GCC plugin infrastructure ?

Thomas Gleixner cpu/hotplug: Core infrastructure for cpu hotplug rework ?

Parav Pandit rdmacg: IB/core: rdma controller support ?

NeilBrown RFC improvements to radix-tree related to DAX ?

Michal Hocko introduce down_write_killable for rw_semaphore ?

Chris Metcalf support "task_isolation" mode ?

Pankaj Dubey Add support for Exynos SROM Controller driver. ?

Joshua Henderson PIC32MZDA Clock Driver ?

Xinliang Liu Add DRM Driver for HiSilicon Kirin hi6220 SoC ?

lijianhua Add DRM driver for Hisilicon hi1710 ?

Archit Taneja drm/msm/hdmi: HDMI support on MSM8996 ?

Philipp Zabel MT8173 DRM support ?

John Crispin net-next: mediatek: add ethernet driver ?

Joachim Eastwood PINT irqchip driver for NXP LPC18xx family ?

Eric Anholt bcm2835 SDHOST controller ?

fu.wei@linaro.org Watchdog: introduce ARM SBSA watchdog driver ?

Ramesh Shanmugasundaram can: rcar_canfd: Add Renesas R-Car CAN FD driver ?

Andreas Werner introduce MEN 16Z127 GPIO controller driver ?

Jung Zhao Add Rockchip VP8 Video Decoder Driver ?

Chunyan Zhang Introduce CoreSight STM support ?

Harry Wentland [PATCH v3 00/26] Enabling new DAL display driver for amdgpu on Carrizo and Tonga ?

Alan Tull Device Tree support for FPGA Programming ?

Guenter Roeck watchdog: Add support for keepalives triggered by infrastructure ?

Lionel Landwerlin Pipe level color management V9 ?

Lee Jones pwm: Add support for PWM Capture ?

Jaegeuk Kim File-level Encryption Support by VFS ?

Gang He Add online file check feature ?

Andreas Gruenbacher Richacls (Core and Ext4) ?

Anand Jain Experimental btrfs encryption ?

Deepa Dinamani Add infrastructure to support vfs 64 bit times ?

js1304@gmail.com mm/slab: introduce new freed objects management way, OBJFREELIST_SLAB ?

Pablo Neira Ayuso intermediate representation for jit and cls_u32 conversion ?

Michal Kazior mac80211: implement fq_codel for software queuing ?

Jiri Pirko Introduce devlink interface and first drivers to use it ?

Jamal Hadi Salim net_sched: Add support for IFE action ?

Emmanuel Grumbach Add support for Neighbor Awareness Networking ?

Andi Kleen Make /dev/urandom scalable ?

Kernel development

Brief items

Kernel release status

Welte: Report from the VMware GPL court hearing

Kernel development news

The persistent memory "I know what I'm doing" flag

Some concerns

Data integrity first

Airplane mode and rfkill

Coverage-guided kernel fuzzing with syzkaller

Fuzzing in user space

Fuzzing the kernel

The results

What's next

Patches and updates

Kernel trees

Architecture-specific

Build system

Core kernel code

Device drivers

Device driver infrastructure

Filesystems and block I/O

Memory management

Networking

Security-related