The current stable 2.6 kernel is 18.104.22.168
on October 16. It
contains a rather long list of fixes for problems which have been
encountered in 2.6.18.
The stable team has also released 22.214.171.124 with a smaller set of
fixes. This will probably be the final 2.6.17.x release.
Adrian Bunk has released 126.96.36.199-rc1 with several new
The current 2.6 prepatch is 2.6.19-rc2, released by Linus on
October 13. There's a bunch of fixes here, but also the big interrupt handler prototype
change and the initial merge of the developmental ext4 filesystem with
a few enhancements. See the
long-format changelog for the details.
Around 250 post-rc2 patches - almost all fixes - have gone into the
mainline git repository as of this writing.
The current -mm tree is 2.6.19-rc2-mm1. Recent changes
to -mm include generic backlight device support, some changes to how
per-CPU data works on i386, and a FUSE update. There is also a new
round_jiffies() function which rounds a time value up to the next
whole second. The idea is to cause recurring timers to go off at the same
time, reducing the number of timer interrupts needed.
Comments (none posted)
Kernel development news
Wow, who'd have thought that loading 6 megabytes of unauditable
code into your kernel and X server might be a bad idea? It's almost
like code running as root was some sort of potential security
issue, or something.
-- Matthew Garrett
Comments (14 posted)
The function pci_set_mwi()
enables the "memory write and
invalidate" (MWI) mode on the PCI bus. If the device on the other end can
work with MWI, a small optimization results. The MWI mode might not be
enabled, however, even if a device driver requests it; the bus hardware
itself might not support it. A failure to set MWI is not generally a
problem; things just go a bit slower than they would have otherwise. The
calling driver might still want to know if the call succeeded, however, so
Matthew Wilcox recently fixed the function
to return -EINVAL
if the attempt fails.
It turns out that this is one of the many patches which have recently
sabotaged Andrew Morton's heavily abused Vaio laptop. Some code was
checking the result of pci_set_mwi(); once that function actually
returned the result of the operation, the calling code failed on an error
path. But, as noted above, a failure to set MWI is almost never a fatal
problem. So, in response to this series of events, Alan Cox asserted:
The underlying bug is that someone marked pci_set_mwi must-check,
that's wrong for most of the drivers that use it. If you remove the
must check annotation from it then the problem and a thousand other
spurious warnings go away.
One suspects Alan is also behind code like the following, from
compiler_warning_pointless_fix = pci_set_mwi(cs5530_0);
The __must_check annotation makes use of the gcc
warn_unused_result attribute; it first found its way into the
mainline in 2.6.8. If a function is marked __must_check, the
compiler will issue a strong warning whenever the function is called and
its return code is unused.
The use of __must_check is another step in the long path toward
automatic detection of potential bugs. It is intended for functions whose
return value really does require checking - copy_from_user() is a
good example. If that function fails, and the calling code does not
notice, it will proceed using essentially random data. Similar issues come
up in user space; witness the recent vulnerabilities resulting from
privileged applications which fail to check the result of a
setuid() call. In some cases, there clearly is no excuse for not
looking at the return value, and __must_check is a good way to
find incorrect function usage before it creates real problems.
In current kernels, however, the list of __must_check functions
has grown rather long: it includes most of the sysfs, PCI, kobject, and
driver core APIs. In some cases, as with pci_set_mwi(), it now
includes functions whose return values are often of no interest to the
calling code. The result, in this case, is snide workarounds in the code,
added warning noise, and an actual bug where code which need not fail does
so in response to an error return code.
Still, according to Andrew Morton, it is a
mistake to ignore an error return from a function like
You, the driver author _do not know_ what pci_set_mwi() does at
present, on all platforms, nor do you know what it does in the
future. For you the driver author to make assumptions about what's
happening inside pci_set_mwi() is a layering violation. Maybe the
bridge got hot-unplugged. Maybe the attempt to set MWI caused some
synchronous PCI error. For example, take a look at the various
implementations of pci_ops.read() around the place - various of
them can fail for various reasons.
This discussion led, eventually, to what might be the real issue: how
should in-kernel APIs be designed to properly return status information? A
suggestion which has been made is that pci_set_mwi() should return
zero or one, depending on whether MWI is a possible operating mode. Only
if something goes drastically wrong on the PCI bus should a negative error
code be returned. No such patch has yet been merged, but that seems like
the way this particular issue is likely to be resolved.
The larger discussion of how errors should be handled may just be beginning,
however. There are a number of de-facto conventions for kernel APIs which
have evolved over time, but no overall policy on error handling. So Andrew
would like to talk about guidelines on how
different kinds of errors should be handled. In particular, he suggests a
rule that a negative error code should never be ignored in any situation.
Cases where this kind of result is not relevant (pci_set_mwi()
being an example) are an indication of an API in need of a redesign.
So over time, it would not be surprising to see a number of kernel
interfaces shift such that a number of error conditions are handled further
down the call chain and with the goal of not returning error codes for
non-error situations. There is also likely to be a continued effort to cut
down on the warning noise, which, at times, threatens to drown out the real
errors. With luck, all of this work will lead to safer interfaces and a more
robust kernel in the future.
Comments (2 posted)
system call has had a rough life. It began as an
idea imported from BSD; it allows a user-space process to tweak various
kernel parameters using a set of integer indexes. People quickly
discovered, however, that a text and filesystem-based interface (as seen
) is much easier to deal with. The
hierarchy can be adjusted from the shell and manipulated
by scripts - and nobody has to worry about sysctl numbers. So there are
very few users of sysctl()
, which has been considered deprecated
for a long time. Recent kernels have issued warnings when
The 2.6.19-rc kernels take things one step further: for most
configurations, sysctl() disappears altogether. In a strange sort
of turnaround, only configurations with the "embedded" option set can
enable sysctl() at all. This is all in accordance with the
feature removal schedule, which calls for sysctl() to go away in
But sysctl() is part of the user-space API, which is never
supposed to be broken for any reason. The removal of this function would
appear to be a violation of the oft-repeated promise to keep this interface
stable. So some developers have started to
complain about the API change. There have been calls to back it out again,
and to restore sysctl() to normal configurations. As Alan Cox put it: "We added it, we supported it, we
get to keep it. We just stick notes in the docs saying 'please use /proc
Patches which restore sysctl() are circulating, though none
have been merged. There appears to be some disagreement over whether
removing sysctl() would truly break user-space applications or
not. There are some uses of it in older C libraries, but, apparently,
those libraries do the right thing when the attempt to use
sysctl() fails, and applications operate normally. Linus has asked for an example of an application which
truly breaks in the absence of sysctl(); none have been posted as
of this writing. Interfaces
which are not actually used on real systems are fair game for removal, so,
unless somebody comes up with a a real-world problem soon,
sysctl() will likely continue on its path out of the kernel.
Comments (none posted)
This is the second article in the LWN series on writing drivers for the
Video4Linux2 kernel interface; those who have not yet seen the introductory article
wish to start there. This installment will look at the overall structure
of a Video4Linux driver and the device registration process.
Before starting, it is worth noting that there are two resources which will
prove invaluable for anybody working with video drivers:
- The V4L2 API
Specification. This document covers the API from the user-space
point of view, but, to a great extent, V4L2 drivers implement that API
directly. So most of the structures are the same, and the semantics
of the V4L2 calls are clearly laid out. Print a copy (consider
cutting out the Free Documentation License text to save trees) and
keep it somewhere within easy reach.
- The "vivi" driver found in the kernel source as
drivers/media/video/vivi.c. It is a virtual driver, in that
it generates test patterns and does not actually interface to any
hardware. As such, it serves as a relatively clear illustration of
how V4L2 drivers should be written.
To start, every V4L2 driver must include the requisite header file:
Much of the needed information is there. When digging through the headers
as a driver author, however, you'll also want to have a look at
include/media/v4l2-dev.h, which defines many of the structures you'll
be working with.
A video driver will probably have sections which deal with the PCI or USB
bus (for example); we'll not spend much time on that part of the driver
here. There is often an internal i2c interface, which will be
examined later on in this article series. Then, there is the interface to
the V4L2 subsystem. That interface is built around struct
video_device, which represents a V4L2 device. Covering everything
that goes into this structure will be the topic of several articles; here
we'll just have an overview.
The name field of struct video_device is a name for the
type of device; it will appear in kernel log messages and in sysfs. The
name usually matches the name of the driver.
There are two fields to describe what type of device is being represented.
The first (type) looks like a holdover from the Video4Linux1 API;
it can have one of four values:
- VFL_TYPE_GRABBER indicates a frame grabber device - including
cameras, tuners, and such.
- VFL_TYPE_VBI is for devices which pull information
transmitted during the video blanking interval.
- VFL_TYPE_RADIO for radio devices.
- VFL_TYPE_VTX for videotext devices.
If your device can perform more than one of the above functions, a separate
V4L2 device should be registered for each of the supported functions. In
V4L2, however, any of the registered devices can be called upon to function
in any of the supported modes. What it comes down to is that, for V4L2,
there is really only need for a single device, but compatibility with the
older Video4Linux API requires that individual devices be registered for
The second field, called type2, is a bitmask describing the
device's capabilities in more detail. It can contain any of the following
- VID_TYPE_CAPTURE: the device can capture video data.
- VID_TYPE_TUNER: it can tune to different frequencies.
- VID_TYPE_TELETEXT: it can grab teletext data.
- VID_TYPE_OVERLAY: it can overlay video data directly
into the frame buffer.
- VID_TYPE_CHROMAKEY: a special form of overlay capability
where the video data is only displayed where the underlying
frame buffer contains pixels of a specific color.
- VID_TYPE_CLIPPING: it can clip overlay data.
- VID_TYPE_FRAMERAM: it uses memory located in the frame buffer
- VID_TYPE_SCALES: it can scale video data.
- VID_TYPE_MONOCHROME: it is a monochrome-only device.
- VID_TYPE_SUBCAPTURE: it can capture sub-areas of the image.
- VID_TYPE_MPEG_DECODER: it can decode MPEG streams.
- VID_TYPE_MPEG_ENCODER: it can encode MPEG streams.
- VID_TYPE_MJPEG_DECODER: it can decode MJPEG streams.
- VID_TYPE_MJPEG_ENCODER: it can encode MJPEG streams.
Another field initialized by all V4L2 drivers is minor, which is
the desired minor number for the device. Usually this field will be set to
-1, which causes the Video4Linux subsystem to allocate a minor number at
There are also three distinct sets of function pointers found within
struct video_device. The first, consisting of a single function,
is the release() method. If a device lacks a release()
function, the kernel will complain (your editor was amused to note that it
refers offending programmers to an LWN article). The release()
function is important: for various reasons, references to a
video_device structure can remain long after that last video
application has closed its file descriptor. Those references can remain
after the device has been unregistered. For this reason, it is not safe to
free the structure until the release() method has been called.
So, often, this function consists of a simple kfree() call.
The video_device structure contains within it a
file_operations structure with the usual function pointers. Video
drivers will always need open() and release() operations;
note that this release() is called whenever the device is
closed, not when it can be freed as with the other function with the same
name described above. There will often be a read() or
write() method, depending on whether the device performs input or
output; note, however, that for streaming video devices, there are other
ways of transferring data. Most devices which handle streaming video data
will need to implement poll() and mmap(). And
every V4l2 device needs an ioctl() method - but they can
use video_ioctl2(), which is provided by the V4L2 subsystem.
The third set of methods, stored in the video_device structure
itself, makes up the core of the V4L2 API. There are several dozen of
them, handling various device configuration operations, streaming I/O, and
Finally, a useful field to know from the beginning is debug.
Setting it to either (or both - it's a bitmask) of V4L2_DEBUG_IOCTL and
V4L2_DEBUG_IOCTL_ARG will yield a fair amount of debugging output
which can help a befuddled programmer figure out why a driver and an
application are failing to understand each other.
Video device registration
Once the video_device structure has been set up, it should be
int video_register_device(struct video_device *vfd, int type, int nr);
Here, vfd is the device structure, type is the same value
found in its type field, and nr is, again, the desired
minor number (or -1 for dynamic allocation). The return value should be
zero; a negative error code indicates that something went badly wrong. As
always, one should be aware that the device's methods can be called
immediately once the device is registered; do not call
video_register_device() until everything is ready to go.
A device can be unregistered with:
void video_unregister_device(struct video_device *vfd);
Stay tuned for the next article in this series, which will begin to look at
the implementation of some of these methods.
open() and release()
Every V4L2 device will need an open() method, which will have the
int (*open)(struct inode *inode, struct file *filp);
The first thing an open() method will normally do is to locate an
internal device corresponding to the given inode; this is done by
keying on the minor number stored in inode. A certain amount of
initialization can be performed; this can also be a good time to power up
the hardware if it has a power-down option.
The V4L2 specification defines some conventions which are relevant here.
One is that, by design, all V4L2 devices can have multiple open file
descriptors at any given time. The purpose here is to allow one
application to display (or generate) video data while another one, perhaps,
tweaks control values. So, while certain V4L2 operations (actually reading
and writing video data, in particular) can be made
exclusive to a single file descriptor, the device as a whole should support
multiple open descriptors.
Another convention worth mentioning is that the open() method
should not, in general, make changes to the operating parameters currently
set in the hardware. It should be possible to run a command-line program
which configures a camera according to a certain set of desires
(resolution, video format, etc.), then run an entirely separate application
to, for example, capture a frame from the camera. This mode would not work
if the camera's settings were reset in the middle, so a V4L2 driver should
endeavor to keep existing settings until an application explicitly resets
The release() method performs any needed cleanup. Since video
devices can have multiple open file descriptors, release() will
need to decrement a counter and check before doing anything radical. If
the just-closed file descriptor was being used to transfer data, it may
necessary to shut down the DMA engine and perform other cleanups.
The next installment in this series will start into the long process of
querying device capabilities and configuring operating modes. Stay tuned.
Comments (1 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>