The current development kernel is 2.5.29
, which was released
on July 26. It includes another set
of IDE patches, a new LDM (Windows dynamic disks) driver, a number of
driverfs changes, lots of fixups for the new serial driver, and, of course,
lots of fixes for things that broke in the big 2.5.28 IRQ handling changes
(see the July 25 LWN Kernel
). The long format changelog
is also available.
Linus's BitKeeper tree (for 2.5.30) contains quite a few patches at this
point. There is a change to the fork() code which allows things
to be done to the child process (i.e. migration to another CPU) before it
starts running. Also included is a big pile of IDE updates, more IRQ
fixes, some direct I/O changes from Andrew Morton ("This code is
wickedly quick"), the "strict overcommit" patch which prevents
surprise "out of memory" conditions, some serial driver fixes, and an ARM
update. This patch also removes the "khttpd" in-kernel web server.
There is no current prepatch from Dave Jones; this posting explains why. In short: he has
been busy, the current development kernels are too unstable to make patches
against, and he has been getting going with BitKeeper.
The current 2.5 status summary from Guillaume
Boissiere came out on July 31.
The current stable kernel is 2.4.18; Marcelo tried to catch us by
releasing the fourth 2.4.19 release candidate
just before this page went to "press," but we've learned to watch out for
that kind of manouver. -rc4 contains a relatively small set of fixes for
the few remaining problems that have come up; with luck, this one will turn
into the real 2.4.19.
The latest prepatch from Alan Cox is 2.4.19-rc3-ac5.
Comments (1 posted)
Kernel development news
When Andrea Arcangeli released his 2.4.19-rc3-aa4
tree, he included
an old version of Ben LaHaise's asynchronous I/O code. This led to a
discussion of some features of the AIO interface, and a note
from Linus wondering what had happened to
the AIO project:
Note that something needs to get moving on this rsn, I'm not
interested in getting aio patches on Oct 30th. The feature freeze
may be on Halloween, but if I get some big feature just days before
I'm likely to just say "screw it".
Ben responded with a patch implementing the
core part of the AIO subsystem. It is far from a full implementation -
there are no device driver or filesystem changes in the patch. But it is
enough to get a sense for where the AIO development is going.
This patch does not, at this time, make all I/O asynchonous within the
kernel (as had been discussed at Kernel
Summit). Instead, devices and filesystems must implement the new
aio_read, aio_write, and aio_fsync operations in
the file_operations structure to be able to support asynchronous
operations. This patch can thus, at this point, go into the system without
actually breaking anything.
That may change when the rest of the AIO code is posted. This patch
provides the mechanism for submitting, tracking, and cancelling
asynchronous I/O operations - actually executing those operations
will come later. A new io_submit system call provides for the
initiation of asynchronous I/O requests; it takes an array of structures
describing what is to be done. Whenever an application wants to fire off
an asynchronous read or write, it fills in a iocb structure with
an "opcode," information on the buffer, etc. and passes it to
io_submit. (Of course, the application will likely call a library
function like aio_read which handles these details).
io_submit does some validation and bookkeeping, then passes the
requests on to the new file_operations methods. For now, they
disappear into a cloud of missing code for execution. When the operation
has completed, successfully or not, the internal function
aio_complete is called with the final status. That status (and
associated information) is stored in a circular buffer; applications can
extract this information from the buffer with the new io_getevents
Interestingly, some of the structure is there to allow this circular buffer
to be mapped into user space. Then applications could obtain their I/O
completion information without the need for a system call. The
implementation of this feature is not yet complete, however.
Much of the rest of the code posted at this point concerns itself with
cancellation of asynchronous I/O requests - either by application request,
or when the application exits.
What is missing is the implementation of the AIO operations
themselves. Previous versions of this patch provided generic versions of
the aio_read and aio_write operations that handled much
of the low-level work. They would start by calling the standard
read or write operations, but with a twist: those
operations were changed to take an extra flags argument. If
flags contains F_ATOMIC, the I/O operation must be
completed without sleeping, or not at all. In the first case, the
operation is done and the application can be notified.
Life is often not that easy, though - usually it is necessary to wait for
I/O operations. The application does not want to wait, of course, or it
would not be using asynchronous I/O. The older AIO patch would create a
kvec structure describing the operation - it contains a pointer to
the physical page holding the user buffer, a length, and an offset. Then
one of the new kvec_read or kvec_write operations would be called
to start the work. These operations also need to be atomic (no sleeping),
and must arrange to call aio_complete when the job is done.
This is the part of the patch which breaks everything, of course - even
devices and filesystems which have no intention of supporting asynchronous
I/O must take the new flags argument on read and
write. It will be interesting to see how this part of the AIO
patch has changed over the last few months. If the kernel is really going
to shift to asynchronous operations as the default way of doing things
internally, there could be some fun surprises there.
Comments (none posted)
The interface between the kernel and user space is a complicated thing.
There are over 200 system calls, many of which take task-specific
structures or other types as arguments. And then there is ioctl
which can be different for every driver or filesystem, and which, according
to many, should be seen as hundreds of independent system calls in its own
In the good old days, before glibc, applications included kernel header
files directly to get the definitions of the structures needed for system
calls. The good old days were not all that good, though; keeping the
kernel header files suitable for user space use was not easy, the kernel
headers brought in a lot of stuff that applications did not need, and it
was not uncommon to encounter mismatches between the headers used to
compile an application and the actual kernel it was running under. As a
result, the rule with glibc has been to never, ever include kernel header
files into application programs.
The problem with this approach is that there is no longer a single
definition of the interface between kernel and user space. People working
on library interfaces must go hunting for structure definitions through the
tangled mess of kernel header files; that is not an easy job.
H. Peter Anvin, as it turns out, is working on a
library interface - a small C library for the initramfs mechanism. He
has come up with a relatively simple suggestion: create a new include
directory (linux/abi/) for include files which encapsulate the
interface between the two worlds. These files would be written so that
they could be included in either kernel or user space, and they would
contain only the minimal declarations needed to define the kernel
The idea makes a lot of sense. It would make life easier for library
writers, but it would help on the kernel side as well. It is not always
obvious, when editing kernel headers, that a particular structure forms
part of the interface with user space. Putting the user space interface
into special header files will make it harder to change that interface by
mistake. Creating the abi/ directory seems like a logical part of
the larger task of cleaning up the kernel's include files.
Comments (2 posted)
Patches and updates
- Andrea Arcangeli: 2.4.19rc3aa4. "<span>Merged async-io from Benjamin LaHaise after purifying it from the
/proc/libredhat.so mess that made it not binary compatible with 2.5.</span>"
(July 30, 2002)
Core kernel code
- Benjamin LaHaise: aio-core for 2.5.29 . "<span>This drop is untested, but I'd
like it if people could provide comments on it.</span>"
(July 30, 2002)
- Marcin Dalecki: IDE 104.
(July 26, 2002)
- Marcin Dalecki: IDE 105.
(July 30, 2002)
- Marcin Dalecki: IDE 106.
(July 26, 2002)
- Marcin Dalecki: IDE 107.
(July 26, 2002)
- Russell King: Various updates. (...to the new serial driver...)
(July 26, 2002)
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>