Release status
Kernel release status
The current 2.6 prepatch is 2.6.15-rc5,
released by Linus on
December 3. It consists mostly of fixes, but also includes some
changes for drivers which map memory into user space (see below). The
long-format changelog has the
details.
2.6.15-rc4 was released on
November 30; details in the
long-format changelog.
The current -mm tree is 2.6.15-rc5-mm1. Recent changes
to -mm include some memory management tweaks, a special test which taints
the kernel when ndiswrapper or driverloader is loaded, a new set of ktimer
patches, and various architecture updates.
Comments (none posted)
Kernel development news
Linux in a binary world... a doomsday scenario
Arjan van de Ven has contributed to the debate on proprietary kernel
modules by putting together a scenario based on one crucial event: "
On December 6th, 2005 the kernel developers en mass decide that binary
modules are legally fine and also essential for the progress of linux,
and are as such a desirable thing." Click below to see how the
story plays out.
Full Story (comments: 63)
Xen 3.0 released
Version 3.0 of the Xen hypervisor - a virtualization system - has been
released.
Xen 3.0 includes support for Intel's hardware virtualization
mechanism, SMP guest systems (with hot-pluggable virtual CPUs), large
memory support, trusted platform module support, ports to the ia-64 and
(soon) PowerPC architectures, and more.
Comments (10 posted)
The first stable OpenVZ release
The
OpenVZ project has
announced its existence and its first stable
release. OpenVZ is yet another virtualization approach for Linux, based on
SWsoft's "Virtuozzo" product. The OpenVZ approach differs from others,
however, in that it creates its virtualized environments within a single
kernel; the result, it is claimed, is better performance. Unfortunately,
the released patch is for the ancient 2.6.8 kernel.
Comments (34 posted)
The evolution of driver page remapping
Two weeks ago, this page
looked
at the new VM_UNPAGED flag, introduced in 2.6.15-rc2 to mark
virtual memory areas (VMAs) which are not made up of "normal" pages. These
areas are usually created by device drivers which map special memory areas
(which may or may not be device I/O memory) into user space. Your editor
now humbly suggests that readers ignore that article; things have changed
significantly since then.
As it turns out, Linus didn't like the VM_UNPAGED idea, so he
rewrote the code for 2.6.15-rc4. The VM_UNPAGED VMA flag is gone,
replaced by VM_PFNMAP. The new flag has a very similar meaning:
it marks the VMA as containing special page table entries which should not
be touched by the VM subsystem. In particular, it states that there is no
page structure associated with any page in that VMA, so the VM
subsystem should not go looking for one. Even in cases where that
structure does exist (such as remappings of real memory), the VM code will
pretend that it does not.
The advantage of the reworked code is that it takes out a number of special
cases; the VM_PFNMAP VMAs can be treated just like normal VMAs in
more places. Things quickly got a bit more complicated, however. The
initial VM_PFNMAP code assumed that a linear range of addresses
was being mapped into user space. In fact, some drivers piece together
memory in more complicated ways.
So a subsequent patch added explicit support for "incomplete" VMAs, marked
with yet another flag: VM_INCOMPLETE. When the kernel detects
that a driver is creating something other than a straightforward, linear
mapping, it sets that flag and emits a warning. It also requires, in this
case, that the pages being remapped carry the PG_reserved flag -
even though this flag is being phased out. Remapping RAM in this way
always required that flag in the past, so this requirement is not a change
as far as drivers are concerned.
The patch adding VM_INCOMPLETE notes that "In the long
run we almost certainly want to export a totally different interface for
that, though." In this case, "in the long run" meant about one day,
when yet another patch was merged adding a new function:
int vm_insert_page(struct vm_area_struct *vma,
unsigned long address,
struct page *page);
This function inserts the given page into vma, mapped at
the given address. It does not put out warnings, and does not
require that PG_reserved be set. What it does require is
that the page be an order-zero allocation obtained for this purpose; it is
not possible to remap arbitrary RAM pages with vm_insert_page().
Since a page structure is required, the new function is also
unsuitable for remapping I/O memory. But it is useful for drivers which
wish to map a set of pages into a user-space address range.
Just which driver might want to do something like that became clear when
another patch was merged for 2.6.15-rc5. It removed the GPL-only export
for vm_insert_page() and included this commit message:
Make vm_insert_page() available to NVidia module. It used to use
remap_pfn_range(), which wasn't GPL-only either, and the new
interface is actually simpler and does more checking, so we
shouldn't unnecessarily discourage people from switching over.
Some developers objected to this change, seeing it as an explicit
endorsement of the proprietary NVidia drivers. Others, however, saw it as
a simple attempt to avoid breaking drivers without a good reason. The
kernel developers may well be working toward taking a stronger stand
against proprietary modules, but this particular interface will not be the
place where that battle is fought.
Comments (2 posted)
bcm43xx and the 802.11 stack
The Broadcom 43xx family is yet another wireless network chipset without
free driver support. There is, however, a proprietary Linux driver
available; for example, the LinkSys WRT54G router has a Broadcom module. A reverse
engineering team has been busily looking at that driver with the idea of
writing a document describing how this chipset works; the resulting
free bcm43xx specification is
in a reasonably complete state.
Independently, the bcm43xx driver
team has been writing a driver from this specification. The authors
have never worked with the original, proprietary driver, so they should be
unable to infringe any copyrights which cover that driver. This project
has been moving along quietly for a while, but the quiet period is over: the free bcm43xx driver is now working. It
is not for the faint of heart at this point, but it is able to transmit and
receive packets. Adventurous souls with suitable hardware are encouraged
to start testing the new driver.
While almost everybody is happy to see a free driver for this hardware,
there have been some complaints about it. In particular, some developers
are unhappy about the "softmac"
layer used by the bcm43xx driver. This layer handles many media access
tasks - scanning, management frames, etc. - for the driver. This
functionality is not currently a part of the Linux 802.11 stack because the
chipset for which that stack was initially developed - Intel's ipw chips -
performs those tasks in hardware. Most other chipsets rely on the host for
this functionality, so some sort of "software MAC" must be provided.
The problem is not that there is no softmac implementation for Linux;
instead, there are too many of them. The softmac layer used by the bcm43xx
driver, which is meant to integrate with the current kernel 802.11 stack,
is one. The MadWifi project
includes its own 802.11 stack, including a software MAC implementation.
There is also a complete
802.11 stack from Devicescape available. Both the MadWifi and
Devicescape stacks are said - by their supporters - to be more capable than
the in-kernel stack, with or without the softmac layer. So why, they ask,
should yet another software MAC be written using the in-tree 802.11 stack
when better alternatives exist?
Your editor will not attempt to draw any conclusions about which
implementation is the best. The simple fact, however, is that the in-tree
802.11 code is what developers have to work with now. Efforts to work with
and improve that code will be better received by the networking maintainers
than pointing at out-of-tree parallel implementations. So the softmac code
used by the bcm53xx driver would appear to have an advantage going forward:
it builds on the existing, in-tree code, and makes new capabilities
available for all drivers.
Meanwhile, those who are interested in playing with the bcm43xx driver may
want to avail themselves of the daily snapshots posted by the
project.
Comments (1 posted)
Memory copies in hardware
Upcoming versions of Intel processors will include a feature called an
"asynchronous DMA engine." Essentially, it is a hardware peripheral which
can be used to quickly copy data from one memory location to another. The
"I/OAT" ("I/O acceleration technology") is expected to improve performance
by offloading copy operations, enabling quick in-memory scatter/gather
operations, and keeping copy operations from pushing useful data out of the
processor's cache.
Hardware with an I/OAT is not yet available, but a patch for I/OAT support has
recently been posted. It lacks the hardware-level interface, but does
demonstrate the API that the folks at Intel have come up with for this sort
of device.
Code which wishes to make use of the I/OAT must first register itself as a
"DMA client." The registration interface looks like:
#include <linux/dmaengine.h>
typedef void (*dma_event_callback)(struct dma_client *client,
struct dma_chan *chan,
enum dma_event_t event);
struct dma_client *dma_async_client_register(dma_event_callback event_callback);
void dma_async_client_unregister(struct dma_client *client);
The client must provide a callback function which will be invoked when DMA
channels come and go. If all goes well, registration results in a
dma_client structure which can be used with subsequent operations.
Before anything can be done, the client must request one or more
"channels." Every channel on the I/OAT can be used for one copy operation
at a time; all channels can be operating simultaneously. The function to
request channels is:
dma_async_client_chan_request(struct dma_client *client,
unsigned int number);
The client's callback function will be called once for each allocated
channel. The number of channels actually allocated may be less than what
has been requested. There is no real guidance on the optimal number of
channels to ask for; the example patch for the networking subsystem
requests one channel for each processor on the system. The number of
channels can be changed later on if need be.
There are three functions for actually starting a copy operation:
dma_cookie_t dma_async_memcpy_buf_to_buf(struct dma_chan *chan,
void *dest, void *src,
size_t len);
dma_cookie_t dma_async_memcpy_buf_to_pg(struct dma_chan *chan,
struct page *page,
unsigned int offset,
void *kdata, size_t len);
dma_cookie_t dma_async_memcpy_pg_to_pg(struct dma_chan *chan,
struct page *dest_pg,
unsigned int dest_off,
struct page *src_pg,
unsigned int src_off,
size_t len);
All three functions do the same thing: they request an asynchronous copy
operation from one memory location to another. The only difference is
whether kernel addresses or page structures are used to specify
the locations. For some reason, it appears to be necessary to issue a call
to:
void dma_async_memcpy_issue_pending(struct dma_chan *chan);
before the operation will actually happen.
Since copy operations are asynchronous, they may not have completed when
the request functions return, so the caller should not mess with the
affected buffers in the mean time. There are two functions for querying
and waiting for completion:
dma_async_memcpy_complete(struct dma_chan *chan, dma_cookie_t cookie,
dma_cookie_t *last, dma_cookie_t *used);
dma_async_wait_for_completion(struct dma_chan *chan,
dma_cookie_t cookie);
dma_async_memory_complete() will return one of
DMA_SUCCESS, DMA_IN_PROGRESS, or DMA_ERROR,
depending on the status of the copy operation indicated by cookie
(the last and used arguments can be passed as
NULL; their purpose is not entirely clear to your slow editor). A
call to dma_async_wait_for_completion() will wait until the given
operation finishes. In the current implementation, that wait is
accomplished via a busy loop calling schedule(). There is no
function for canceling an outstanding operation.
The initial reaction to the patch was cautiously positive. There is some
concern that invoking an external device to perform copies may be
sufficiently expensive that it will only be worthwhile for very large
operations. There were also some requests to extend the interface to
include a transformation to be performed on the data as it is copied. The
current hardware does not look like it will support anything beyond a
direct copy (though, since the hardware is not yet available, it is hard to
be sure), but it would be nice to be able to make use of any such
capabilities as they arrive. Transformations could be simple (simply
zeroing a buffer, say), or complex (cryptographic operations). But they
will only be available if the interface supports them.
The hardware is due in "early 2006," so more information will become
available then. Until that time, there probably will not be any serious
discussion of merging the I/OAT interface.
Comments (5 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>