Brief items
The current ultra-stable 2.6 kernel is 2.6.11.4, which was
released on March 15; it contains two
security fixes. Previously,
2.6.11.3 was
released on March 12 with a larger set of fixes. The form of the
2.6.11.x patches has changed slightly: they now apply directly to the
2.6.11 root, rather than to the previous .x release.
There still have been no 2.6.12 prepatches, though it looks like one
should appear soon.
When that prepatch shows up, it will include over 2000 patches currently
sitting in Linus's BitKeeper repository. These include a driver for the
"trusted computing" TPM chip (see the Trusted Computing
Group site for more information on TPM), SuperHyway bus
support, a new multi-level security implementation for SELinux, a user-mode
Linux update, support for hot-pluggable parallel ports, the "cpuset" patch
(see cpusets.txt for information on cpusets),
a new nVidia framebuffer driver, the device
mapper multipath patches, a big set of input driver patches, an ALSA
update, an IPv6 update (including a patch removing the "experimental"
designation for IPv6), a rearrangement of the net_device structure
(which will break binary-only drivers), a 21,000-line DVB whitespace
cleanup patch, a rework of the page table access functions (which is still
causing some trouble on ia-64), a patch enabling an administrator to enable
a subset of the "magic SysRq" functions,
numerous driver updates, the address space randomization patches, a new
packet classifier mechanism for the networking layer, a new workqueue API
function, a
Tiger digest algorithm implementation, the restoration of the Philips
webcam driver, some software suspend improvements, some readahead
improvements, a big block I/O barrier rewrite (which enables full barrier
support on serial ATA drives), a set of patches to shrink the kernel for
embedded use, a generic sort() function, high-resolution POSIX
CPU clock support (not the full high-resolution timers patch), a USB API
change (usb_control_msg() and usb_bulk_msg() now take a
timeout in milliseconds rather than in jiffies), and lots of fixes.
Also to be found in BitKeeper is an (almost) direct merge of the first three
2.6.11.x releases.
The current -mm patch is 2.6.11-mm4.
Recent changes to -mm include a big CFQ I/O scheduler update, a new and
smaller relayfs patch, a set of sparse memory support patches, a
performance counter API update, a reiser4 update, and various fixes.
The current 2.4 prepatch remains 2.4.30-pre3; there have been no 2.4
prepatches since March 9.
Comments (2 posted)
Kernel development news
This patch causes a CONFIG_PREEMPT=y, CONFIG_PREEMPT_BKL=y,
CONFIG_DEBUG_PREEMPT=y kernel on a ppc64 G5 to hang immediately after
displaying the penguins, but apparently not before having set the hardware
clock backwards 101 years.
After having carefully reviewed the above description and having decided
that these effects were not a part of the patch's design intent I have
temporarily set it aside, thanks.
-- Andrew Morton
Comments (2 posted)
LWN is happy to host
an online version of Linux
Device Drivers, Third Edition by Jonathan Corbet, Alessandro
Rubini, and Greg Kroah-Hartman. As of this writing, only the PDF version
of the book is available; it will eventually be released in HTML and
DocBook form as well. The book has been released under the
Creative Commons
Attribution-ShareAlike license, but you're going to want to run out and
buy a copy or three anyway.
Comments (27 posted)
It is a nice thing when hardware vendors provide Linux drivers for their
products. Since these drivers are written by the vendor, there is usually
no trouble getting information on how the hardware is controlled. With luck, that
hardware will "just work" for Linux users, and all will be as it should
be. In the real world, however, things are not always that simple.
Hardware companies often take interesting approaches to coding drivers,
and the people involved are not always well tied into the Linux kernel
development community. The result can be conflicts between the vendors,
who simply want to get things done, and the kernel developers, who are
increasingly unwilling to accept code which does not meet their standards.
For a current example, consider the proposed
new Neterion/S2io 10GbE network driver. This driver has been rewritten
from the beginning; it supports many of the hardware's advanced features
and provides high performance. It looks like just the thing for high-end
Linux-based networking uses.
The problem is that the driver does not deal directly with the Linux kernel
API. It is, instead, based on a "hardware abstraction layer" (HAL) which
glues the driver to the kernel. So, for example, the driver builds lists
with a structure like:
typedef struct xge_list_t {
struct xge_list_t* prev;
struct xge_list_t* next;
} xge_list_t;
Such lists are accessed with functions like xge_list_insert() and
even xge_list_for_each(). Similarly, the driver uses
xge_os_spin_lock() to acquire a lock, xge_os_malloc() to
allocate memory, and xge_os_pio_mem_read8() to read a byte from
I/O memory. This approach helps Neterion support a variety of systems with
the same core driver code, but it does not sit well with the kernel
hackers. Networking maintainer David Miller responded this way:
I totally reject this driver, HAL is unacceptable for in-tree
drivers. We've been over this a thousand times.
One problem with the HAL approach is that there can be a performance cost.
A 10G network adaptor can handle thousands of packets per second; at that
sort of load, even the minimal overhead of a simple wrapper function can
make a significant difference. The extra memory taken by the glue code,
parallel linked list implementation, etc. also hurts. A developer
community which is dedicated to obtaining the best possible performance
from the hardware will be unwilling to swallow even a small cost in the
name of portability.
The bigger issue, however, is in the maintainability of the driver. A
driver written for a HAL layer has its own idioms and conventions; it works
with a completely different API. It simply does not look like a Linux
driver; Linux developers will have a harder time understanding and
modifying it.
One might think that this is not a big issue, since Neterion has said that
it plans to maintain the driver, but there are a couple of problems that
come up:
- When a kernel developer changes an internal function, he or she will
usually go through and fix all of the in-tree users of that function.
So developers who are not employed by the hardware vendor will almost
certainly have to work with the driver code at some point.
- Hardware vendors have a short attention span. Product cycles
tend to be short, and the vendor will, before too long, move on to new
products requiring new and different drivers. Once a given driver no
longer applies to the products which are currently in the vendor's
catalog, the vendor will, most likely, see little reason to continue
maintaining that driver. The Linux community, however, will have an
interest in keeping that driver working for several more years.
Additionally, the vendor may resist patches which affect the HAL layer
itself, making it harder for the community to work on the driver. Overall,
the Linux kernel developers plan to maintain the kernel for many years into
the future; they tend to be concerned about taking on code which will make
that maintenance task harder in the future.
So the kernel hackers have some solid reasons for resisting HAL-based
drivers. The vendors also have good reasons for wanting to write such
drivers. To them, the resistance to HAL looks like a "Linux is the only
important system" attitude, and it forces them in incur extra costs when
writing their code. In this case, Neterion has reluctantly
said that it will produce a non-HAL driver if that is the only way to
get into the tree; other vendors may not bother.
Comments (15 posted)
Peter Chubb has long been working on a project to move device drivers into
user space. Getting drivers out of the kernel, he points out, would have a
number of benefits. Faults in drivers (the source of a large percentage of
kernel bugs) would be less likely to destabilize the entire system.
Drivers could be easily restarted and upgraded. And a user-space
implementation would make it possible to provide a relatively stable driver
API, which would appeal to many vendors.
Much of the support needed for user-space drivers is already in place. A
process can communicate with hardware by mapping the relevant I/O memory
directly into its address space, for example; that is how the X server
works with video adaptors. One piece, however, is missing:
user-space drivers cannot handle device interrupts. In many cases, a
proper driver cannot be written without using interrupts, so a user-space
implementation is not possible.
Peter has now posted his user-space interrupts
patch for review and possible inclusion. The mechanism that he ended
up with is simple and easy to work with, but it suffers from an important
limitation.
The mechanism is this: a process wishing to respond to interrupts opens a
new /proc file; for IRQ 10, the file would be
/proc/irq/10/irq. A read on that file will yield the number of
interrupts which have occurred since the last read. If no interrupts have
occurred, the read() call will block until the next interrupt
happens. The select() and poll() system calls are
properly supported, so it is possible to include interrupt handling as just
another thing to do in an event loop.
On the kernel side, the real interrupt handler looks like this:
static irqreturn_t irq_proc_irq_handler(int irq, void *vidp,
struct pt_regs *regs)
{
struct irq_proc *idp = (struct irq_proc *)vidp;
BUG_ON(idp->irq != irq);
disable_irq_nosync(irq);
atomic_inc(&idp->count);
wake_up(&idp->q);
return IRQ_HANDLED;
}
In other words, all it does is count the interrupt and wake up any process
that might be waiting to handle it.
The handler also disables the interrupt before returning. There is an
important reason for this action: since the
handler knows nothing of the device which is actually interrupting, it is
unable to acknowledge or turn off the interrupt. So, when the handler
returns, the device will still be signalling an interrupt. If the
interrupt were not disabled in the processor (or the APIC), the processor
would be interrupted (and the handler called) all over again, repeatedly -
at least, when level-triggered interrupts are in use. Disabling the
interrupt allows life to go on until the user-space process gets scheduled
and is able to tend to the interrupting device.
There is a problem here, however: interrupt lines are often shared between
devices. Disabling a shared interrupt shuts it off for all devices using
that line, not just the one being handled by a user-space driver. It is
entirely possible that masking that interrupt will block a device which is
needed by the user-space handler - a disk controller, perhaps. In that
case, the system may well deadlock. For this reason, the patch does not
allow user-space drivers to work with shared interrupts. This restriction
avoids problems, but it also reduces the utility of the whole thing.
One possible solution was posted by Alan
Cox. He would require user-space processes to pass a small structure into
the kernel describing the hardware's IRQ interface. It would be just
enough for the kernel to tell if a particular device is interrupting,
acknowledge that interrupt, and tell the device to shut
up. With that in place, the kernel could let user space deal with what the
device really needs while leaving the interrupt enabled. It has been pointed out that this simple scheme would not
work with some of the more complicated hardware, but it would be a step in
the right direction regardless.
Meanwhile, Michael Raymond described a
different user-space interrupt implementation (called "User Level
Interrupt" or ULI) done at SGI. This patch is significantly more
complicated. In this scheme, a user-space driver would register an
interrupt handler function directly with the kernel. When an interrupt
happens, the ULI code performs some assembly-code black magic so that its
"return from interrupt" instruction jumps directly into the user-space
handler, in user mode. Once that handler returns, the ULI library writes a
code to a magic device which causes the kernel stack and related data
structures to be restored to their pre-interrupt state. The implementation
is more complex, and it currently only works on the ia-64 architecture, but
it could conceivably offer better performance than the /proc
method.
Comments (7 posted)
A few more changes to the 2.6 internal kernel API have been merged since
last week's summary.
The driver model API has seen a couple of small changes.
kref_put() no longer returns void:
int kref_put(struct kref *kref, void (*release)(struct kref *kref));
The (new) return value is normally zero, but will be nonzero if the kref
was actually removed. Note that a zero return does not imply that the kref
is still valid; somebody else may have done the last kref_put()
call in the mean time.
The kset type now has its own internal spinlock. That means that
a kset is no longer required to be part of a subsystem.
Greg Kroah-Hartman has proposed a rather wider
set of changes to the driver model class code. Essentially, he is
pushing all users over to a form of the "class_simple" interface, and
getting away from the original class implementation, which was hard to use
correctly. These changes have not yet been merged, however.
The kernel has long held a variety of special-purpose sorting functions.
These have now been replaced by a generic heap sort utility written by Matt
Mackall. It's interface is:
void sort(void *base, size_t num, size_t size,
int (*compare)(const void *a, const void *b),
void (*swap)(void *a, void *b, int size));
Here, base is the array of items to sort; it contains num
items of size bytes. The compare() function returns the
integer equivalent of a-b; sort() will sort the array in
ascending order as dictated by compare(). The swap()
function is optional; it can be provided if the caller knows a faster way
to exchange two elements in the array.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Device drivers
Filesystems and block I/O
- Phillip Lougher: SquashFS.
(March 14, 2005)
Janitorial
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>