The current development kernel is 3.8-rc3
on January 9. "Anyway,
another week, another -rc. A fairly normal-sized one.
continue to flow into the mainline repository; along with the usual fixes,
they include a new driver for Wilocity wil6210-based WiFi cards.
3.0.58, 3.4.25, and 3.7.2 were released on January 11; 3.2.37 came out on January 16. Massive
updates set to become
3.7.3 are all in the review process as of
this writing; they can be expected on or after January 17.
Comments (1 posted)
IMHO we ought to have a policy that anything of any relevance has
*two* maintainers or more. That way there is always someone to help
pick a new maintainer if one drops out and we are mostly not at the
mercy of real world happenings. We also need to be much more active
in giving maintainers the boot if they vanish (and having two
maintainers will balance the effect nicely).
— Alan Cox
We do not say "user mode shouldn't". Seriously. EVER. User mode
*does*, and we deal with it.
— Linus Torvalds
Comments (4 posted)
Kernel development news
A GPIO (general-purpose I/O) device looks like the most boring sort of
peripheral that a computer might offer. It is a single electrical signal
that the CPU can either set to one of two values — zero or one,
naturally — or read one of those values from (or both).
Either way, a GPIO does not seem like a particularly expressive device.
But, at their simplest, GPIOs can be used to control LEDs, reset lines, or
pod-bay door locks. With additional "bit-banging" logic, GPIOs can be
combined to implement higher-level protocols like i2c
— a frequent
occurrence on contemporary systems. GPIOs are thus useful in a lot of
GPIO lines seem to be especially prevalent in embedded systems; even so,
there never seems to be enough of them. As one might expect, a system with
dozens (or even hundreds) of GPIOs needs some sort of rational abstraction
for managing them. The kernel has had such a mechanism
since 2.6.21 (it was initially added by David Brownell). The API has
changed surprisingly little since then, but that period of relative stasis
may be about to come about to an end. The intended changes are best
understood in the context of the existing API, though, so that is what this
article will cover. Subsequent installments will look at how the GPIO API
may evolve in the near future.
Naturally, there is an include file for working with GPIOs:
In current kernels, every GPIO in the system is represented by a simple
unsigned integer. There is no provision for somehow mapping a desired
function ("the sensor power line for the first camera device," say) onto a
GPIO number; the code must come by that knowledge by other means. Often
that is done through a long series of macro definitions; it is also
possible to pass GPIO numbers through platform data or a device tree.
GPIOs must be allocated before use, though the current implementation does
not enforce this requirement. The basic allocation function is:
int gpio_request(unsigned int gpio, const char *label);
The gpio parameter indicates which GPIO is required, while label
associates a string with it that can later appear in sysfs. The usual
convention applies: a zero return code indicates success; otherwise the
return value will be a
negative error number. A GPIO can be returned to the system with:
void gpio_free(unsigned int gpio);
There are some variants of these functions; gpio_request_one() can
be used to set the initial configuration of the GPIO, and
gpio_request_array() can request and configure a whole set of
GPIOs with a single call. There are also "managed" versions
(devm_gpio_request(), for example) that automatically handle
cleanup if the developer forgets.
Some GPIOs are used for output, others for input. A suitably-wired GPIO
can be used in either mode, though only one direction is active at any
given time. Kernel code must inform the GPIO core of how a line is
to be used; that is done with these functions:
int gpio_direction_input(unsigned int gpio);
int gpio_direction_output(unsigned int gpio, int value);
In either case, gpio is the GPIO number. In the output case, the
value of the GPIO (zero or one) must also be specified; the GPIO will be set accordingly
as part of the call. For both functions, the return value is again zero or
a negative error number. The direction of (suitably capable) GPIOs can be
changed at any time.
For input GPIOs, the current value can be read with:
int gpio_get_value(unsigned int gpio);
This function returns the value of the provided gpio; it has no
provision for returning an error code. It is assumed (correctly in almost
all cases) that any errors will be found when
gpio_direction_input() is called, so checking the return value
from that function is important.
Setting the value of output GPIOs can always be done using
gpio_direction_output(), but, if the GPIO is known to be in
output mode already, gpio_set_value() may be a bit more efficient:
void gpio_set_value(unsigned int gpio, int value);
Some GPIO controllers can generate interrupts when an input GPIO changes
value. In such cases, code wishing to handle such interrupts should start
by determining which IRQ number is associated with a given GPIO line:
int gpio_to_irq(unsigned int gpio);
The given gpio must have been obtained with
gpio_request() and put into the input mode first. If there is an
associated interrupt number, it will be passed back as the return value
from gpio_to_irq(); otherwise a negative error number will be
returned. Once obtained in this manner, the interrupt number can be passed
to request_irq() to set up the handling of the interrupt.
Finally, the GPIO subsystem is able to represent GPIO lines via a sysfs
hierarchy, allowing user space to query (and possibly modify) them. Kernel
code can cause a specific GPIO to appear in sysfs with:
int gpio_export(unsigned int gpio, bool direction_may_change);
The direction_may_change parameter controls whether user space is
allowed to change the direction of the GPIO; in many cases, allowing that
control would be asking for bad things to happen to the system as a whole.
A GPIO can be removed from sysfs with gpio_unexport() or given
another name with gpio_export_link().
And that is an overview of the kernel's low-level GPIO interface. A number
of details have naturally been left out; see Documentation/gpio.txt for a more thorough
description. Also omitted is the low-level driver's side of the API, by
which GPIO lines can be made available to the GPIO subsystem; covering that
API may be the subject of a future article. The next installment, though,
will look at a couple of perceived deficiencies in the above-described API
and how they might be remedied.
Comments (13 posted)
As part of the effort to support UEFI secure boot on Linux, Matthew Garrett
proposed a number of restrictions on kernel
that signed kernels could not be used to circumvent secure boot. Many of
those restrictions were fairly uncontroversial, but disabling
kexec() was not one of them, so it was dropped in a later patch set. At the time,
there was discussion of how to support kexec() in a secure boot
world; Vivek Goyal recently posted an RFC patch
set to start down that path.
The kexec() system call is used to replace the running kernel with
a different program. It can be used to boot a new kernel without going
through the BIOS or other firmware, which is exactly what gets it into
trouble for secure boot. A running kernel that has been verified by the
secure boot mechanism (and thus is trusted) could boot any unsigned, unverified
kernel by way of kexec(). The concern is that it would be used to
boot Windows in an insecure environment while making it believe it was
running under secure boot—exactly what secure boot is
meant to prevent. That, in turn, could lead to Linux bootloaders getting
blacklisted, which would make it more difficult to boot Linux on hardware
certified for Windows 8.
Goyal's patches add the ability to cryptographically sign ELF executables,
then have the
kernel verify those signatures. If the binary is signed and the signature
verifies, it will be executed. While the patch does not yet implement
this, the idea is that a signed
binary could be given additional capabilities if it
verifies—capabilities that would enable kexec(), for
example. If the binary is unsigned, it will always be
executed. Only if a signed binary fails to verify does it get blocked from
The patches contain a signelf utility that puts a signature
based on the private key argument into a .signature ELF section. The
signature is calculated by hashing the contents of the PT_LOAD ELF segments,
then cryptographically signing the result. It
is based on the module signing code
that was recently added to the kernel, but instead of just tacking the
signature on at the end of the binary, it puts it into the
Since any shared libraries used by an executable cannot be trusted (so
far, at least, there is no mechanism to verify those libraries), only
statically linked executables can be signed and verified. The patches do
not stop binaries from using
dlopen() directly, however, so Goyal said binaries that do so
should not be
signed. He is
targeting the /sbin/kexec binary that is used to launch
kdump, so that users can still get crash dumps, even in a
secure-boot-enabled system, but there are other possible uses as well.
When the binfmt_elf loader in the kernel detects a binary with
the .signature section, it locks the pages of the executable into
memory and verifies the signature. Goyal is trying to avoid situations
where the binary is modified after the verification has been done, which is
why the executable is locked into memory.
If the signature does not verify, the
process is killed; unsigned binaries are simply executed as usual.
Beyond just adding the capability for kexec(), there are some
other pieces of the puzzle that aren't addressed in the patches. The
biggest is the need to disable ptrace() on signed binaries.
Otherwise, the signed binary could be subverted in various ways—changing the binary passed to kexec(), for example. In addition,
the "to do" list has
some key and keyring related issues that need to be sorted out.
There is already a mechanism in the kernel to verify the signature of
various kinds of files, though. The Integrity Measurement Architecture
(IMA) appraisal extension that was added in Linux 3.7 does much of what
needs, as was pointed out by IMA maintainer
Mimi Zohar. While the integrity subsystem targets measuring and verifying
the whole system, it already does most of the kinds of signature operations
Goyal is looking to add. On the other hand, features like disabling
ptrace(), locking the binary into memory, and setting capabilities
based on signature verification are well beyond the scope
of the integrity subsystem. Goyal is currently looking into using the
integrity features and adding secure-boot-specific features on top.
Losing the ability to use kexec() on secure boot systems would be
rather painful. While Garrett's patches do not actually make that change
(because of the outcry from other kernel developers), any distribution that
is trying to enable secure boot is likely to do so. Finding a way to
support that use case, without unduly risking the blacklist wrath of
Microsoft, would be good.
Comments (22 posted)
Deadlocks in the kernel are a relatively rare occurrence in recent years.
The credit largely belongs to the "lockdep
subsystem, which watches
locking activity and points out patterns that could lead to deadlocks when
the timing goes wrong. But locking is not the source of all deadlock
problems, as was recently shown by an old deadlock bug which was only
recently found and fixed.
In early January, Alex Riesen reported some
difficulties with USB devices on recent kernels; among other things, it was
easy to simply lock up the system altogether. A fair amount of discussion
followed before Ming Lei identified the
problem. It comes down to the block layer's use of the asynchronous function call infrastructure used
to increase parallelism in the kernel.
The asynchronous code is relatively simple in concept: a function that is to be run
asynchronously can be called via async_schedule(); it will then
run in its own thread at some future time. There are various ways of
waiting until asynchronously called functions have completed; the most
thorough is async_synchronize_full(), which waits until all
outstanding asynchronous function calls anywhere in the kernel have
completed. There are ways of
waiting for specific functions to complete, but, if the caller does not
know how many asynchronous function calls may be outstanding,
async_synchronize_full() is the only way to be sure that they are
The block layer in the kernel makes use of I/O schedulers to organize and
optimize I/O operations. There are several I/O schedulers available; they
can be switched at run time and can be loaded as modules. When the block
layer finds that it needs an I/O scheduler that is not currently present in
the system, it will call request_module() to ask user space to
load it. The module loader, in turn, will call
async_synchronize_full() at the end of the loading process; it
needs to ensure that any asynchronous functions called by the newly loaded
module have completed so that the module will be fully ready by the time
control returns to user space.
So far so good, but there is a catch. When a new block device is
discovered, the block layer will do its initial work (partition probing and
such) in an asynchronous function of its own. That work requires
performing I/O to the device; that in turn, requires an I/O scheduler. So
the block layer may well call request_module() from code that is
already running as an asynchronous function. And that is where things turn
The problem is that the (asynchronous) block code must wait for
request_module() to complete before it can continue with its
work. As described above, the module loading process involves a call to
async_synchronize_full(). That call will wait for all
asynchronous functions, including the one that called
request_module() in the first place, and which is still waiting
for request_module() to complete. Expressed more concisely, the
sequence looks like this:
- sd_probe() calls async_schedule() to scan a device
- The scanning process tries to read data from the device.
- The block layer realizes it needs an I/O scheduler, so, in
elevator_get(), it calls request_module() to load
the relevant kernel module.
- The module is loaded and initializes itself.
- do_module_init() calls async_synchronize_full() to
wait for any asynchronous functions called by the just-loaded module.
- async_synchronize_full() waits for all asynchronous
functions, including the one
called back in step 1, which is waiting for the
async_synchronize_full() call to complete.
That, of course, is a classic
Fixing that deadlock turns out not to be as easy as one would like. Ming suggested
that the call to async_synchronize_full() in the module loader
should just be removed, and that user space should be taught that devices
might not be ready immediately when the modprobe binary
completes. Linus was not impressed with
this approach, however, and it was quickly discarded.
The optimal solution would be for the module loader to wait only for
asynchronous functions that were called by the loaded module itself. But
the kernel does not currently have the infrastructure to allow that to
happen; adding it as an urgent bug fix is not really an option. So
something else needed to be worked out. To that end, Tejun Heo was brought
into the discussion and asked to help come up with a solution. Tejun
originally thought that the problem could
be solved by detecting deadlock situations and proceeding without waiting
in that case, but the problem of figuring out when it would be safe to
proceed turned out not to be tractable.
The solution that emerged instead is
regarded as a bit of a hack by just about everybody involved. Tejun added
a new process flag (PF_USED_ASYNC) to mark when a process has
called asynchronous functions. The module loader then tests this flag; if
no asynchronous functions are called as the module is loaded, the call to
async_synchronize_full() is skipped. Since the I/O scheduler
modules make no such calls, that check avoids the deadlock in this
particular case. Obviously, the problem remains in any case where an
asynchronously-loaded module calls asynchronous functions of its own, but
no other such cases have come to light at the moment. So it seems like a
Even so, Tejun remarked "It makes me feel dirty but makes the problem
go away and I can't think of anything better." The patch has found
its way into the mainline and will be present in the 3.8 final
release. By then, though, it would not be entirely surprising if somebody
else were to take up the task of finding a more elegant solution for a
future development cycle.
Comments (2 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>