LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 3.8-rc3, released on January 9. "Anyway, another week, another -rc. A fairly normal-sized one." Changesets continue to flow into the mainline repository; along with the usual fixes, they include a new driver for Wilocity wil6210-based WiFi cards.

Stable updates: 3.0.58, 3.4.25, and 3.7.2 were released on January 11; 3.2.37 came out on January 16. Massive updates set to become 3.0.59, 3.4.26, 3.5.7.3, and 3.7.3 are all in the review process as of this writing; they can be expected on or after January 17.

Comments (1 posted)

Quotes of the week

IMHO we ought to have a policy that anything of any relevance has *two* maintainers or more. That way there is always someone to help pick a new maintainer if one drops out and we are mostly not at the mercy of real world happenings. We also need to be much more active in giving maintainers the boot if they vanish (and having two maintainers will balance the effect nicely).
Alan Cox

We do not say "user mode shouldn't". Seriously. EVER. User mode *does*, and we deal with it.
Linus Torvalds

Comments (4 posted)

Kernel development news

GPIO in the kernel: an introduction

By Jonathan Corbet
January 16, 2013
A GPIO (general-purpose I/O) device looks like the most boring sort of peripheral that a computer might offer. It is a single electrical signal that the CPU can either set to one of two values — zero or one, naturally — or read one of those values from (or both). Either way, a GPIO does not seem like a particularly expressive device. But, at their simplest, GPIOs can be used to control LEDs, reset lines, or pod-bay door locks. With additional "bit-banging" logic, GPIOs can be combined to implement higher-level protocols like i2c or DDC — a frequent occurrence on contemporary systems. GPIOs are thus useful in a lot of contexts.

GPIO lines seem to be especially prevalent in embedded systems; even so, there never seems to be enough of them. As one might expect, a system with dozens (or even hundreds) of GPIOs needs some sort of rational abstraction for managing them. The kernel has had such a mechanism since 2.6.21 (it was initially added by David Brownell). The API has changed surprisingly little since then, but that period of relative stasis may be about to come about to an end. The intended changes are best understood in the context of the existing API, though, so that is what this article will cover. Subsequent installments will look at how the GPIO API may evolve in the near future.

Naturally, there is an include file for working with GPIOs:

    #include <linux/gpio.h>

In current kernels, every GPIO in the system is represented by a simple unsigned integer. There is no provision for somehow mapping a desired function ("the sensor power line for the first camera device," say) onto a GPIO number; the code must come by that knowledge by other means. Often that is done through a long series of macro definitions; it is also possible to pass GPIO numbers through platform data or a device tree.

GPIOs must be allocated before use, though the current implementation does not enforce this requirement. The basic allocation function is:

    int gpio_request(unsigned int gpio, const char *label);

The gpio parameter indicates which GPIO is required, while label associates a string with it that can later appear in sysfs. The usual convention applies: a zero return code indicates success; otherwise the return value will be a negative error number. A GPIO can be returned to the system with:

    void gpio_free(unsigned int gpio);

There are some variants of these functions; gpio_request_one() can be used to set the initial configuration of the GPIO, and gpio_request_array() can request and configure a whole set of GPIOs with a single call. There are also "managed" versions (devm_gpio_request(), for example) that automatically handle cleanup if the developer forgets.

Some GPIOs are used for output, others for input. A suitably-wired GPIO can be used in either mode, though only one direction is active at any given time. Kernel code must inform the GPIO core of how a line is to be used; that is done with these functions:

    int gpio_direction_input(unsigned int gpio);
    int gpio_direction_output(unsigned int gpio, int value);

In either case, gpio is the GPIO number. In the output case, the value of the GPIO (zero or one) must also be specified; the GPIO will be set accordingly as part of the call. For both functions, the return value is again zero or a negative error number. The direction of (suitably capable) GPIOs can be changed at any time.

For input GPIOs, the current value can be read with:

    int gpio_get_value(unsigned int gpio);

This function returns the value of the provided gpio; it has no provision for returning an error code. It is assumed (correctly in almost all cases) that any errors will be found when gpio_direction_input() is called, so checking the return value from that function is important.

Setting the value of output GPIOs can always be done using gpio_direction_output(), but, if the GPIO is known to be in output mode already, gpio_set_value() may be a bit more efficient:

    void gpio_set_value(unsigned int gpio, int value);

Some GPIO controllers can generate interrupts when an input GPIO changes value. In such cases, code wishing to handle such interrupts should start by determining which IRQ number is associated with a given GPIO line:

    int gpio_to_irq(unsigned int gpio);

The given gpio must have been obtained with gpio_request() and put into the input mode first. If there is an associated interrupt number, it will be passed back as the return value from gpio_to_irq(); otherwise a negative error number will be returned. Once obtained in this manner, the interrupt number can be passed to request_irq() to set up the handling of the interrupt.

Finally, the GPIO subsystem is able to represent GPIO lines via a sysfs hierarchy, allowing user space to query (and possibly modify) them. Kernel code can cause a specific GPIO to appear in sysfs with:

    int gpio_export(unsigned int gpio, bool direction_may_change);

The direction_may_change parameter controls whether user space is allowed to change the direction of the GPIO; in many cases, allowing that control would be asking for bad things to happen to the system as a whole. A GPIO can be removed from sysfs with gpio_unexport() or given another name with gpio_export_link().

And that is an overview of the kernel's low-level GPIO interface. A number of details have naturally been left out; see Documentation/gpio.txt for a more thorough description. Also omitted is the low-level driver's side of the API, by which GPIO lines can be made available to the GPIO subsystem; covering that API may be the subject of a future article. The next installment, though, will look at a couple of perceived deficiencies in the above-described API and how they might be remedied.

Comments (13 posted)

Signing ELF binaries

By Jake Edge
January 16, 2013

As part of the effort to support UEFI secure boot on Linux, Matthew Garrett proposed a number of restrictions on kernel features so that signed kernels could not be used to circumvent secure boot. Many of those restrictions were fairly uncontroversial, but disabling kexec() was not one of them, so it was dropped in a later patch set. At the time, there was discussion of how to support kexec() in a secure boot world; Vivek Goyal recently posted an RFC patch set to start down that path.

The kexec() system call is used to replace the running kernel with a different program. It can be used to boot a new kernel without going through the BIOS or other firmware, which is exactly what gets it into trouble for secure boot. A running kernel that has been verified by the secure boot mechanism (and thus is trusted) could boot any unsigned, unverified kernel by way of kexec(). The concern is that it would be used to boot Windows in an insecure environment while making it believe it was running under secure boot—exactly what secure boot is meant to prevent. That, in turn, could lead to Linux bootloaders getting blacklisted, which would make it more difficult to boot Linux on hardware certified for Windows 8.

Goyal's patches add the ability to cryptographically sign ELF executables, then have the kernel verify those signatures. If the binary is signed and the signature verifies, it will be executed. While the patch does not yet implement this, the idea is that a signed binary could be given additional capabilities if it verifies—capabilities that would enable kexec(), for example. If the binary is unsigned, it will always be executed. Only if a signed binary fails to verify does it get blocked from execution.

The patches contain a signelf utility that puts a signature based on the private key argument into a .signature ELF section. The signature is calculated by hashing the contents of the PT_LOAD ELF segments, then cryptographically signing the result. It is based on the module signing code that was recently added to the kernel, but instead of just tacking the signature on at the end of the binary, it puts it into the .signature section.

Since any shared libraries used by an executable cannot be trusted (so far, at least, there is no mechanism to verify those libraries), only statically linked executables can be signed and verified. The patches do not stop binaries from using dlopen() directly, however, so Goyal said binaries that do so should not be signed. He is targeting the /sbin/kexec binary that is used to launch kdump, so that users can still get crash dumps, even in a secure-boot-enabled system, but there are other possible uses as well.

When the binfmt_elf loader in the kernel detects a binary with the .signature section, it locks the pages of the executable into memory and verifies the signature. Goyal is trying to avoid situations where the binary is modified after the verification has been done, which is why the executable is locked into memory. If the signature does not verify, the process is killed; unsigned binaries are simply executed as usual.

Beyond just adding the capability for kexec(), there are some other pieces of the puzzle that aren't addressed in the patches. The biggest is the need to disable ptrace() on signed binaries. Otherwise, the signed binary could be subverted in various ways—changing the binary passed to kexec(), for example. In addition, the "to do" list has some key and keyring related issues that need to be sorted out.

There is already a mechanism in the kernel to verify the signature of various kinds of files, though. The Integrity Measurement Architecture (IMA) appraisal extension that was added in Linux 3.7 does much of what Goyal needs, as was pointed out by IMA maintainer Mimi Zohar. While the integrity subsystem targets measuring and verifying the whole system, it already does most of the kinds of signature operations Goyal is looking to add. On the other hand, features like disabling ptrace(), locking the binary into memory, and setting capabilities based on signature verification are well beyond the scope of the integrity subsystem. Goyal is currently looking into using the integrity features and adding secure-boot-specific features on top.

Losing the ability to use kexec() on secure boot systems would be rather painful. While Garrett's patches do not actually make that change (because of the outcry from other kernel developers), any distribution that is trying to enable secure boot is likely to do so. Finding a way to support that use case, without unduly risking the blacklist wrath of Microsoft, would be good.

Comments (22 posted)

Deadlocking the system with asynchronous functions

By Jonathan Corbet
January 16, 2013
Deadlocks in the kernel are a relatively rare occurrence in recent years. The credit largely belongs to the "lockdep" subsystem, which watches locking activity and points out patterns that could lead to deadlocks when the timing goes wrong. But locking is not the source of all deadlock problems, as was recently shown by an old deadlock bug which was only recently found and fixed.

In early January, Alex Riesen reported some difficulties with USB devices on recent kernels; among other things, it was easy to simply lock up the system altogether. A fair amount of discussion followed before Ming Lei identified the problem. It comes down to the block layer's use of the asynchronous function call infrastructure used to increase parallelism in the kernel.

The asynchronous code is relatively simple in concept: a function that is to be run asynchronously can be called via async_schedule(); it will then run in its own thread at some future time. There are various ways of waiting until asynchronously called functions have completed; the most thorough is async_synchronize_full(), which waits until all outstanding asynchronous function calls anywhere in the kernel have completed. There are ways of waiting for specific functions to complete, but, if the caller does not know how many asynchronous function calls may be outstanding, async_synchronize_full() is the only way to be sure that they are all done.

The block layer in the kernel makes use of I/O schedulers to organize and optimize I/O operations. There are several I/O schedulers available; they can be switched at run time and can be loaded as modules. When the block layer finds that it needs an I/O scheduler that is not currently present in the system, it will call request_module() to ask user space to load it. The module loader, in turn, will call async_synchronize_full() at the end of the loading process; it needs to ensure that any asynchronous functions called by the newly loaded module have completed so that the module will be fully ready by the time control returns to user space.

So far so good, but there is a catch. When a new block device is discovered, the block layer will do its initial work (partition probing and such) in an asynchronous function of its own. That work requires performing I/O to the device; that in turn, requires an I/O scheduler. So the block layer may well call request_module() from code that is already running as an asynchronous function. And that is where things turn bad.

The problem is that the (asynchronous) block code must wait for request_module() to complete before it can continue with its work. As described above, the module loading process involves a call to async_synchronize_full(). That call will wait for all asynchronous functions, including the one that called request_module() in the first place, and which is still waiting for request_module() to complete. Expressed more concisely, the sequence looks like this:

  1. sd_probe() calls async_schedule() to scan a device asynchronously.

  2. The scanning process tries to read data from the device.

  3. The block layer realizes it needs an I/O scheduler, so, in elevator_get(), it calls request_module() to load the relevant kernel module.

  4. The module is loaded and initializes itself.

  5. do_module_init() calls async_synchronize_full() to wait for any asynchronous functions called by the just-loaded module.

  6. async_synchronize_full() waits for all asynchronous functions, including the one called back in step 1, which is waiting for the async_synchronize_full() call to complete.

That, of course, is a classic deadlock.

Fixing that deadlock turns out not to be as easy as one would like. Ming suggested that the call to async_synchronize_full() in the module loader should just be removed, and that user space should be taught that devices might not be ready immediately when the modprobe binary completes. Linus was not impressed with this approach, however, and it was quickly discarded.

The optimal solution would be for the module loader to wait only for asynchronous functions that were called by the loaded module itself. But the kernel does not currently have the infrastructure to allow that to happen; adding it as an urgent bug fix is not really an option. So something else needed to be worked out. To that end, Tejun Heo was brought into the discussion and asked to help come up with a solution. Tejun originally thought that the problem could be solved by detecting deadlock situations and proceeding without waiting in that case, but the problem of figuring out when it would be safe to proceed turned out not to be tractable.

The solution that emerged instead is regarded as a bit of a hack by just about everybody involved. Tejun added a new process flag (PF_USED_ASYNC) to mark when a process has called asynchronous functions. The module loader then tests this flag; if no asynchronous functions are called as the module is loaded, the call to async_synchronize_full() is skipped. Since the I/O scheduler modules make no such calls, that check avoids the deadlock in this particular case. Obviously, the problem remains in any case where an asynchronously-loaded module calls asynchronous functions of its own, but no other such cases have come to light at the moment. So it seems like a workable solution.

Even so, Tejun remarked "It makes me feel dirty but makes the problem go away and I can't think of anything better." The patch has found its way into the mainline and will be present in the 3.8 final release. By then, though, it would not be entirely surprising if somebody else were to take up the task of finding a more elegant solution for a future development cycle.

Comments (2 posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Security-related

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds