The current 2.6 prepatch is 2.6.21-rc1
on February 20.
"There's a lot of changes, as is usual for an -rc1
thing, but at least so far it would seem that 2.6.20 has been a good base,
and I don't think we have anything *really* scary here.
include the long-awaited dynamic tick patch, better high-resolution timer
support, the VMI virtualization
(now built on top of paravirt_ops), the ALSA "system on chip"
layer, lots of new drivers, and more. See the
for details, or the
for lots of details.
As of this writing, a few hundred patches have found their way into the
mainline git repository since -rc1 was released. Most of them are in the
Video4Linux subsystem, adding ASUS P7131 remote control support, BTTV
cropping support, a big update to the pvrusb2 WinTV driver, a new MSI Mega
Sky 580 driver, and quite a bit more.
The current -mm tree is 2.6.20-mm2. Recent changes to
-mm include Xen DomU
support, lguest, Blackfin architecture support,
more workqueue changes, POSIX listio completion support for asynchronous
I/O, utrace (a new
tracing mechanism meant to replace ptrace()), and the kernel
Stable kernel updates: 188.8.131.52, 184.108.40.206, and 220.127.116.11 were all released on
February 20 with a single patch: a fix for the NFS ACL denial of
service vulnerability. Larger updates for 2.6.18 and 2.6.19 (probably the
last stable updates for both of those kernels) are currently in the works,
with a likely release around the 23rd or 24th.
18.104.22.168 was released on
February 18 with about a dozen fixes.
Comments (2 posted)
Kernel development news
With the release of 2.6.21-rc1, the merge window for this kernel
development cycle is now closed. Most of the major 2.6.21 changes were
covered here last week
, but a
number of significant changes did get into the mainline between then and
the closing of the window. They are:
- The VMI virtualization
interface has been merged. VMI is a generic hypervisor interface;
it is (now) built on top of paravirt_ops and provides a higher level
- The clocksource and dynamic tick patches have been merged.
- Various improvements to the kernel's support for Sony laptops.
- The deprecated ACPI "hotkey" driver has been removed.
- Version 1 of the JFFS filesystem has been removed.
- The audit subsystem has a "lockdown" mode where further configuration
changes cannot be made.
- A simple driver allowing Blackberry devices to be charged from a Linux
system's USB port has been merged.
- A big ARM update has been merged with oprofile support for ARMv6
processors, kexec() support, support for a number of new
board and processor variants, and more.
- The v9fs (Plan 9) filesystem has seen a number of improvements, mostly
in the form of better caching.
- The SYSV shared memory code has been reworked for more sane internal
file usage and easier integration into the ongoing containers /
- A driver for the Silicon Motion SM501 "multimedia companion" chip has
Now the stabilization period begins, with the final 2.6.21 due somewhere
approximately around the beginning of May.
Comments (9 posted)
The device resource management
was discussed here in January. That patch has now been merged
for the 2.6.21 kernel. Since the API is now set - at least, as firmly as
any in-kernel API is - it seems like a good time for a closer look at this
The core idea behind the resource management interface is that remembering to free
allocated resources is hard. It appears to be especially hard for driver
writers who, justly or not, have a reputation for adding more than their
fair share of bugs to the kernel. And even the best driver writers can run
into trouble in situations where device probing fails halfway through; the
recovery paths may be there in the code, but they tend not to be well
tested. The result of all this is a fair number of resource leaks in
To address this problem, Tejun Heo created a new set of resource allocation
functions which track allocations made by the driver. These allocations
are associated with the device structure; when the driver detaches
from the device, any left-over allocations are cleaned up. The resource
management interface is thus similar to the talloc()
API used by the Samba hackers, but it is adapted to the kernel
environment and covers more than just memory allocations.
Starting with memory allocations, though, the new API is:
void *devm_kzalloc(struct device *dev, size_t size, gfp_t gfp);
void devm_kfree(struct device *dev, void *p);
In a pattern we'll see repeated below, the new functions are similar to
kzalloc() and kfree() except for the new names and the
addition of the dev argument. That argument is necessary for the
resource management code to know when the memory can be freed. If any
memory allocations are still outstanding when the associated device is
removed, they will all be freed at that time.
Note that there is no managed equivalent to kalloc(); if driver
writers cannot be trusted to free memory, it seems, they cannot be trusted
to initialize it either. There are also no managed versions of the
page-level or slab allocation functions.
Managed versions of a subset of the DMA allocation functions have been
void *dmam_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp);
void dmam_free_coherent(struct device *dev, size_t size, void *vaddr,
void *dmam_alloc_noncoherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t gfp);
void dmam_free_noncoherent(struct device *dev, size_t size, void *vaddr,
int dmam_declare_coherent_memory(struct device *dev, dma_addr_t bus_addr,
dma_addr_t device_addr, size_t size,
void dmam_release_declared_memory(struct device *dev);
struct dma_pool *dmam_pool_create(const char *name, struct device *dev,
size_t size, size_t align,
void dmam_pool_destroy(struct dma_pool *pool);
All of these functions have the same arguments and functionality as their
dma_* equivalents, but they will clean up the DMA areas on device
shutdown. One still has to hope that the driver has ensured
that no DMA remains active on those areas, or unpleasant things could
There is a managed version of pci_enable_device():
int pcim_enable_device(struct pci_dev *pdev);
There is no pcim_disable_device(), however; code should just use
pci_disable_device() as usual. A new function:
void pcim_pin_device(struct pci_dev *pdev);
will cause the given pdev to be left enabled even after the driver
detaches from it.
The patch makes the allocation of I/O memory regions with
pci_request_region() managed by default - there is no
pcim_ version of that interface. The higher-level allocation and
mapping interfaces do have managed versions:
void __iomem *pcim_iomap(struct pci_dev *pdev, int bar,
unsigned long maxlen);
void pcim_iounmap(struct pci_dev *pdev, void __iomem *addr);
For the allocation of interrupts, the managed API is:
int devm_request_irq(struct device *dev, unsigned int irq,
irq_handler_t handler, unsigned long irqflags,
const char *devname, void *dev_id);
void devm_free_irq(struct device *dev, unsigned int irq, void *dev_id);
For these functions, the addition of a struct device argument was
There is a new set of functions for the mapping of of I/O ports and memory:
void __iomem *devm_ioport_map(struct device *dev, unsigned long port,
unsigned int nr);
void devm_ioport_unmap(struct device *dev, void __iomem *addr);
void __iomem *devm_ioremap(struct device *dev, unsigned long offset,
unsigned long size);
void __iomem *devm_ioremap_nocache(struct device *dev,
unsigned long offset,
unsigned long size);
void devm_iounmap(struct device *dev, void __iomem *addr);
Once again, these functions required the addition of a struct
device argument for the managed form.
Finally, for those using the low-level resource allocation functions, the
managed versions are:
struct resource *devm_request_region(struct device *dev,
const char *name);
void devm_release_region(resource_size_t start, resource_size_t n);
struct resource *devm_request_mem_region(struct device *dev,
const char *name);
void devm_release_mem_region(resource_size_t start, resource_size_t n);
The resource management layer includes a "group" mechanism, accessed via
void *devres_open_group(struct device *dev, void *id, gfp_t gfp);
void devres_close_group(struct device *dev, void *id);
void devres_remove_group(struct device *dev, void *id);
int devres_release_group(struct device *dev, void *id);
A group can be thought of as a marker in the list of allocations associated
with a given device. Groups are created with devres_open_group(),
which can be passed an id value to identify the group or
NULL to have the ID generated on the fly; either way, the
resulting group ID is returned. A call to devres_close_group()
marks the end of a given group. Calling devres_remove_group()
causes the system to forget about the given group, but does nothing with
the resources allocated within the group. To remove the group and
immediately free all resources allocated within that group,
devres_release_group() should be used.
The group functions seem to be primarily aimed at mid-level code - the bus
layers, for example. When bus code tries to attach a driver to a device,
for example, it can open a group; should the driver attach fail, the group
can be used to free up any resources allocated by the driver.
There are not many users of this new API in the kernel now. That may
change over time as driver writers become aware of these functions, and,
perhaps, as the list of managed allocation types grows. The reward for
switching over to managed allocations should be more robust and simpler
code as current failure and cleanup paths are removed.
Comments (15 posted)
Almost exactly one year ago, Intel announced the ipw3945 project
- a free driver
for its 3945ABG wireless adapters. This move was welcomed as a refreshing
change from the usual mode of operation in the wireless area, which usually
involves binary-only drivers. Even so, this driver was greeted with some
complaints; in particular, the binary-only "regulatory daemon" was not a
popular idea, despite the fact that it ran entirely in user space. The ipw3945
driver was never merged into the mainline kernel.
In many cases, just getting free drivers from companies seems like a lot to
ask. Getting them to go back and start over is often out of the question.
That is just what Intel has done, however, and, on February 9, the new version of the driver was
announced, complete with a
shiny new web site. The new driver should prove more popular than the
old one was.
The user-space regulatory daemon is no more. Intel's engineers, it seems,
have found a way to move the regulatory function into the device's
firmware, getting the host processor out of the regulatory compliance
business altogether. That is probably a more robust solution in general,
even though, strictly speaking, the flexibility of the hardware has been
reduced. Most users will likely look at the tradeoff - better regulatory
compliance and no binary-only daemon - and like what they see. Of course,
those who see binary-only device firmware as an infringement of their
freedom will not feel that the situation has improved much.
Another significant change is that the new driver works with the
Devicescape 802.11 stack. Devicescape remains the intended direction for
wireless networking in the Linux kernel, so the new driver should be more
easily integrated. At least, that will be the case once Devicescape gets
into the mainline. For now, Linux users wanting to try out the new driver
will also have to get a version of the d80211 module (available from the
Intel site) and build that for their kernels as well.
That leads to the obvious question: when will Devicescape make it into the
mainline kernel? The process of getting that code ready for merging has
taken rather longer than desired, but it is still moving forward. The current plan, it seems, is to rebase the
Devicescape code to 2.6.21-rc1, once that's released, and get the result
included in the -mm kernel. If all goes well, the Devicescape stack might
just find its way into 2.6.22. That would be a major step forward for
wireless networking in Linux.
Back to the Intel driver: one thing that is still lacking is any sort of
hardware documentation. Anybody not working for Intel will be limited in
what they can do with this driver by what they can learn from the code
itself. Your editor asked Intel about hardware documentation; we were
The reality is the driver sources are the programming information
for the hardware. As time goes forward we spend some time trying
to improve the comments in the headers for the source files to make
it more clear what they do and to provide some overviews of
theory-of-operation, but there isn't any self-contained accurate
document that covers everything you need to know to program and
operate the device.
Given the choice between developing code and writing documentation, the
Intel hackers went for the code.
Comments (none posted)
One of the last patch sets to be merged before the 2.6.21 window closed
was the clockevents and dyntick work from the real-time tree. These
patches have been in the works for some time, and were originally targeted
for merging in 2.6.19. In the process, the developers (primarily Ingo
Molnar and Thomas Gleixner) discovered one of the fundamental laws of
kernel development: if your patches break Andrew Morton's laptop, they are
unlikely to make it into the mainline. That little difficulty has now been
overcome, with the result that 2.6.21 will include some interesting core
Dealing with clock devices has traditionally been handled in the
kernel's architecture-specific code. The result has been a lot of
duplicated code between
architectures (there are more architectures than common timer devices) and
no uniform interface for the core kernel to make use of these devices.
John Stultz's generic time of day infrastructure resolved a number of those
problems, at least for the timekeeping task, but anybody who wanted to
program timer devices in a more general way still ended up dealing with
The "clockevents" patch set finishes this job. At its core, clockevents
creates a driver API for devices which can deliver interrupts at a specific
time in the future. The API tracks the capabilities of each timer
(resolution and whether it can do one-shot or periodic interrupts, for
example) and provides a simple
interface for arming the timer. This API is defined in the core kernel,
with only a low-level driver remaining in the architecture-specific code.
The end result is that the kernel now has the means to query and use timer
capabilities in an architecture-independent manner.
With the clockevents mechanism in place, it becomes possible to support
truly high-resolution timers. When such a timer is requested, all that
is required is to pick a suitable clockevent device and arm it for the
desired time. These devices can deliver interrupts with a high degree of
precision, with the result that kernel timers, too, can offer high
precision - a feature which is of clear utility to real-time users (among
The periodic timer tick is now implemented with a clockevent as well. It
does all of the things the old timer-based interrupt did - updating
jiffies, accounting CPU time, etc. - but it is run out of the new
All of this is an improvement, but there is still one thing which could be
better: there is no real need for a periodic tick in the system. That is
especially true when the processor is idle. An idle CPU can save quite a
bit of power, but waking that CPU up 100 times (or more) per second will
hurt those power savings considerably. With a flexible timer
infrastructure, there is no point in turning the CPU back on until it has
something to do. So, when the (i386) kernel goes into its idle loop, it
checks the next pending timer event. If that event is further away than
the next tick, the periodic tick is turned off altogether; instead, the
timer is is programmed to fire when the next event comes due. The CPU can
then rest unharrassed until that time - unless an interrupt comes in
first. Once the processor goes out of the idle state, the periodic tick is
What's in 2.6.21 is, thus, not a full dynamic tick implementation.
Eliminating the tick during idle times is a good step forward, but there is
value in getting rid of the tick while the system is running as well -
especially on virtualized systems which may be sharing a host with quite a
few other clients. The dynamic tick documentation file suggests that the
developers have this goal in mind:
The implementation leaves room for further development like full
tickless systems, where the time slice is controlled by the
scheduler, variable frequency profiling, and a complete removal of
jiffies in the future.
So expect some interesting work in the future - the removal of
jiffies alone has a number of interesting implications. The
developers also have support for the x86_64 and ARM architectures, though
that support has not been merged for 2.6.21; MIPS and PowerPC support is in
the works as well.
Comments (4 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>