LWN.net Logo

Kernel development

Brief items

Current release status

The current development kernel is 2.5.29, which was released on July 26. It includes another set of IDE patches, a new LDM (Windows dynamic disks) driver, a number of driverfs changes, lots of fixups for the new serial driver, and, of course, lots of fixes for things that broke in the big 2.5.28 IRQ handling changes (see the July 25 LWN Kernel Page). The long format changelog is also available.

Linus's BitKeeper tree (for 2.5.30) contains quite a few patches at this point. There is a change to the fork() code which allows things to be done to the child process (i.e. migration to another CPU) before it starts running. Also included is a big pile of IDE updates, more IRQ fixes, some direct I/O changes from Andrew Morton ("This code is wickedly quick"), the "strict overcommit" patch which prevents surprise "out of memory" conditions, some serial driver fixes, and an ARM update. This patch also removes the "khttpd" in-kernel web server.

There is no current prepatch from Dave Jones; this posting explains why. In short: he has been busy, the current development kernels are too unstable to make patches against, and he has been getting going with BitKeeper.

The current 2.5 status summary from Guillaume Boissiere came out on July 31.

The current stable kernel is 2.4.18; Marcelo tried to catch us by releasing the fourth 2.4.19 release candidate just before this page went to "press," but we've learned to watch out for that kind of manouver. -rc4 contains a relatively small set of fixes for the few remaining problems that have come up; with luck, this one will turn into the real 2.4.19.

The latest prepatch from Alan Cox is 2.4.19-rc3-ac5.

Comments (1 posted)

Kernel development news

The asynchronous I/O core

When Andrea Arcangeli released his 2.4.19-rc3-aa4 tree, he included an old version of Ben LaHaise's asynchronous I/O code. This led to a discussion of some features of the AIO interface, and a note from Linus wondering what had happened to the AIO project:

Note that something needs to get moving on this rsn, I'm not interested in getting aio patches on Oct 30th. The feature freeze may be on Halloween, but if I get some big feature just days before I'm likely to just say "screw it".

Ben responded with a patch implementing the core part of the AIO subsystem. It is far from a full implementation - there are no device driver or filesystem changes in the patch. But it is enough to get a sense for where the AIO development is going.

This patch does not, at this time, make all I/O asynchonous within the kernel (as had been discussed at Kernel Summit). Instead, devices and filesystems must implement the new aio_read, aio_write, and aio_fsync operations in the file_operations structure to be able to support asynchronous operations. This patch can thus, at this point, go into the system without actually breaking anything.

That may change when the rest of the AIO code is posted. This patch provides the mechanism for submitting, tracking, and cancelling asynchronous I/O operations - actually executing those operations will come later. A new io_submit system call provides for the initiation of asynchronous I/O requests; it takes an array of structures describing what is to be done. Whenever an application wants to fire off an asynchronous read or write, it fills in a iocb structure with an "opcode," information on the buffer, etc. and passes it to io_submit. (Of course, the application will likely call a library function like aio_read which handles these details).

io_submit does some validation and bookkeeping, then passes the requests on to the new file_operations methods. For now, they disappear into a cloud of missing code for execution. When the operation has completed, successfully or not, the internal function aio_complete is called with the final status. That status (and associated information) is stored in a circular buffer; applications can extract this information from the buffer with the new io_getevents system call.

Interestingly, some of the structure is there to allow this circular buffer to be mapped into user space. Then applications could obtain their I/O completion information without the need for a system call. The implementation of this feature is not yet complete, however.

Much of the rest of the code posted at this point concerns itself with cancellation of asynchronous I/O requests - either by application request, or when the application exits.

What is missing is the implementation of the AIO operations themselves. Previous versions of this patch provided generic versions of the aio_read and aio_write operations that handled much of the low-level work. They would start by calling the standard read or write operations, but with a twist: those operations were changed to take an extra flags argument. If flags contains F_ATOMIC, the I/O operation must be completed without sleeping, or not at all. In the first case, the operation is done and the application can be notified.

Life is often not that easy, though - usually it is necessary to wait for I/O operations. The application does not want to wait, of course, or it would not be using asynchronous I/O. The older AIO patch would create a kvec structure describing the operation - it contains a pointer to the physical page holding the user buffer, a length, and an offset. Then one of the new kvec_read or kvec_write operations would be called to start the work. These operations also need to be atomic (no sleeping), and must arrange to call aio_complete when the job is done.

This is the part of the patch which breaks everything, of course - even devices and filesystems which have no intention of supporting asynchronous I/O must take the new flags argument on read and write. It will be interesting to see how this part of the AIO patch has changed over the last few months. If the kernel is really going to shift to asynchronous operations as the default way of doing things internally, there could be some fun surprises there.

Comments (none posted)

Organizing the kernel binary interface

The interface between the kernel and user space is a complicated thing. There are over 200 system calls, many of which take task-specific structures or other types as arguments. And then there is ioctl, which can be different for every driver or filesystem, and which, according to many, should be seen as hundreds of independent system calls in its own right.

In the good old days, before glibc, applications included kernel header files directly to get the definitions of the structures needed for system calls. The good old days were not all that good, though; keeping the kernel header files suitable for user space use was not easy, the kernel headers brought in a lot of stuff that applications did not need, and it was not uncommon to encounter mismatches between the headers used to compile an application and the actual kernel it was running under. As a result, the rule with glibc has been to never, ever include kernel header files into application programs.

The problem with this approach is that there is no longer a single definition of the interface between kernel and user space. People working on library interfaces must go hunting for structure definitions through the tangled mess of kernel header files; that is not an easy job.

H. Peter Anvin, as it turns out, is working on a library interface - a small C library for the initramfs mechanism. He has come up with a relatively simple suggestion: create a new include directory (linux/abi/) for include files which encapsulate the interface between the two worlds. These files would be written so that they could be included in either kernel or user space, and they would contain only the minimal declarations needed to define the kernel interface.

The idea makes a lot of sense. It would make life easier for library writers, but it would help on the kernel side as well. It is not always obvious, when editing kernel headers, that a particular structure forms part of the interface with user space. Putting the user space interface into special header files will make it harder to change that interface by mistake. Creating the abi/ directory seems like a logical part of the larger task of cleaning up the kernel's include files.

Comments (2 posted)

Patches and updates

Kernel trees

  • Andrea Arcangeli: 2.4.19rc3aa4. "<span>Merged async-io from Benjamin LaHaise after purifying it from the /proc/libredhat.so mess that made it not binary compatible with 2.5.</span>" (July 30, 2002)

Core kernel code

  • Benjamin LaHaise: aio-core for 2.5.29 . "<span>This drop is untested, but I'd like it if people could provide comments on it.</span>" (July 30, 2002)

Development tools

Device drivers

  • Marcin Dalecki: IDE 104. (July 26, 2002)
  • Marcin Dalecki: IDE 105. (July 30, 2002)
  • Marcin Dalecki: IDE 106. (July 26, 2002)
  • Marcin Dalecki: IDE 107. (July 26, 2002)
  • Russell King: Various updates. (...to the new serial driver...) (July 26, 2002)

Filesystems and block I/O

Memory management

Networking

Architecture-specific

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds