Kernel development
Brief items
Kernel release status
The current 2.6 kernel remains 2.6.18; patches continue to flow into the mainline repository for the upcoming 2.6.19-rc1 release.The current -mm tree is 2.6.18-mm3. Recent changes to -mm include a patch to silence a lot of useless compiler warnings, a new attempt to get the swap token code working properly, swapfile support for software suspend, and the kevent subsystem.
The current 2.4 prepatch is 2.4.34-pre4, released on October 2. A small number of fixes went in this time around; 2.4.34 looks like it is about to go into the final stabilization phase.
Kernel development news
Quotes of the week
-- Alan Cox
More stuff for 2.6.19
The flow of patches into the mainline repository continues at a high rate, with a few thousand of them having been merged since last week's summary. The most significant of these are (starting with the user-visible changes):
- The GFS2 cluster
filesystem has been merged at last; it includes its own
distributed lock manager implementation.
- New drivers: MCS7840 USB port devices. ELAN U132 USB controllers,
ELAN Uxxx USB-to-PCMCIA adapters, Playstation 2 "Trance" vibrator
devices, the VIA VT1211 Super-I/O chip, AMD K8 CPU temperature
monitors, Philips TDA10086 and TDA826x tuner devices, DiBcom
DiB0700-based USB bridges, Hauppauge Nova-T 500 tuners, TI Flash Media
PCI74xx and PCI76xx host adapters, QUICC Engine communications
coprocessors, and HP Quicksilver AGP GARTs.
- The NFS server code has a number of improvements, including the
ability to do I/O in much larger chunks over TCP connections.
- eCryptfs, an encrypting
filesystem, has gone in.
- Bound
End-to-End Tunnel (BEET) mode support has been added to the IPSec
code.
- A USB gadget driver which connects the gadget interface to the ALSA
MIDI subsystem. The purpose is to allow a system to appear as a
USB-connected MIDI streaming device.
- POSIX access control lists are now available in the tmpfs filesystem.
- If a string with the form |program is written to
/proc/sys/kernel/core_pattern, all core dumps will be piped
to the given program instead of being written to disk.
- Some of the early containers patches have gone in, including separate
namespaces for utsname information and SYSV IPC objects.
- The BSD secure level security module has been removed.
- The "floppy tape" subsystem has been marked for removal in 2.6.20; it is unmaintained, probably has no active users, and its 1.6GB storage capacity looks rather quaint in current times. Anybody who actually has worthwhile data on this medium probably should have copied it to something newer some time ago.
Changes visible to kernel developers include:
- SRCU - a version of read-copy-update which allows read-side blocking -
has been merged.
- Much improved suspend and resume support for the USB layer.
- A new set of functions has been added to allow USB drivers to quickly
check the direction and transfer mode of an endpoint.
- A somewhat reduced version of Wireless Extensions version 21. Most of
the original functionality has been removed with the idea that the
wireless extensions will soon be superseded by something else.
- Vast numbers of annotations enabling the sparse utility to
detect big/little endian errors.
- A number of Video4Linux drivers have been converted to the V4L2 API.
- The flags field of struct request has been split
into two new fields: cmd_type and cmd_flags. The
former contains a value describing the type of request (filesystem
request, sense, power management, etc.) while the latter has the flags
which modify the way the command works (read/write, barriers, etc.).
- The block layer can be disabled entirely at kernel configuration
time; this option can be useful in some embedded situations.
- The kernel now has a generic boolean type, called bool; it
replaces a number of homebrewed boolean types found in various parts
of the kernel.
- There is a new function for allocating a copy of a block of memory:
void *kmemdup(const void *src, size_t len, gfp_t gfp);
A number of allocate-then-copy code sequences have been updated to use kmemdup() instead. - The latency tracking
infrastructure patch has been merged.
- The readv() and writev() methods in the file_operations structure have been removed in favor of aio_readv() and aio_writev() (whose prototypes have been changed). See this article for more information.
As of this writing the merge window has not yet closed, so chances are that more significant changes could yet find their way into 2.6.19.
API changes: interrupt handlers and vectored I/O
Normally, the release of 2.6.19-rc1 would be the signal that the release cycle process would begin to slow down and focus on bug fixes. Things might be just a little different this time around, however, as a large and disruptive (almost 1100 files changed) API change is likely to go in between -rc1 and -rc2. The reasoning is this: a patch which hits so many files will inevitably conflict with a number of the other patches currently flooding into the mainline. Holding this patch until the flood should make life easier all around.So what is this patch? Consider that interrupt handlers currently have the following prototype:
irqreturn_t handler(int irq, void *data, struct pt_regs *regs);
The regs structure holds the state of the processor's registers at the time of the interrupt. It is passed to every interrupt handler, but it is almost never used; for the purposes of most handlers, the pre-interrupt register state is just a bunch of random bits. There is a cost to passing this pointer around, however. According to David Howells:
So David has put together a patch which removes the regs argument to interrupt handlers. Any code which actually needs the registers - seemingly only the timer interrupt handler - can get the pointer with a call to the new get_irq_regs() function. Since this change obviously requires fixing every interrupt handler in the system - and there are a lot of them in the mainline kernel - the patch is large and touches a lot of files.
This patch has just now come along, meaning that, by normal standards, it is a bit late for the 2.6.19 party. So it would normally sit in -mm for this cycle, and be merged into 2.6.20. But, Andrew Morton says:
Nobody else seems to object to the change, though Linus did spare a moment to feel the pain of people maintaining drivers out of the mainline tree. The writing on the wall all points to a near-term inclusion, perhaps with a special defined symbol to help out-of-tree maintainers write code which works with both handler prototypes.
Meanwhile, the file_operations structure can be found at the core of just about any subsystem which does I/O. Char device drivers create file_operations structures directly, while most other parts of the system (filesystems, network protocols and drivers, block drivers) bury them in higher-level logic. Two of the members of this structure are:
ssize_t (*aio_read) (struct kiocb *iocb, char __user *buf, size_t len, loff_t pos); ssize_t (*aio_write) (struct kiocb *iocb, const char __user *buf, size_t len, loff_t pos);
These methods implement asynchronous reads and writes - operations which may be completed sometime after the original call returns to user space. One longstanding shortcoming of the Linux asynchronous I/O implementation is its lack of vectored operations; each AIO call can only operate on a single buffer. The 2.6.19 kernel will fill in that gap, at the cost of changing the above two prototypes to:
ssize_t (*aio_read) (struct kiocb *iocb, const struct iovec *iov, unsigned long niov, loff_t pos); ssize_t (*aio_write) (struct kiocb *iocb, const struct iovec *iov, unsigned long niov, loff_t pos);
The single buffer has been replaced by an array of iovec structures:
struct iovec { void __user *iov_base; __kernel_size_t iov_len; };
Single-buffer calls are now wrapped in a single iovec structure and passed to the new, vectorized versions of the AIO operations. All code which provides aio_read() and aio_write() will need to be updated to the new API - and the possibility of being requested to perform vectored operations.
The changes actually go beyond that, however, in that the readv() and writev() file_operations methods have been removed. The associated system calls are now, instead, implemented with calls to aio_read() and aio_write(). Converting older readv() and writev() methods is not particularly difficult, since there is no requirement that aio_read() and aio_write() must be asynchronous (in fact, in this case, they will be passed a "synchronous KIOCB" which indicates that the operation must be performed synchronously). In most cases, it is simply a matter of adopting the new prototype, then looking in iocb->ki_filp for the struct file pointer, should it be needed.
(See this article from last February for more background on this change).
The final wireless extension?
"Wireless extensions" is an ioctl()-based API which allows user space to control parameters specific to wireless network interfaces - ESSID, encryption passwords, channels, etc. This API has long been maintained by Jean Tourrilhes; the last few kernel releases have had version 20 of this API. As of this writing, version 21 has been merged into the pre-2.6.19 mainline, but at least some of it may be on its way back out again.The problem is that version 21 is a real API change, in that sufficiently old tools will no longer operate properly. In particular, the formatting of the ESSID passed into the kernel has changed, so configurations which associated with a given network under version 20 will not do so under version 21. There is a workaround (add a space to the ESSID string), but many users will not know that, and, in any case, will only discover the need after upgrading their kernel and finding that the network is no longer there.
Since this problem came to light, many kernel developers (including Linus) have made it clear that they see this sort of API breakage as unacceptable. So they want the ESSID change backed out. There are, of course, real reasons for that change - the way those strings are handled in the protocols has evolved over time. But the right solution is to add a new ioctl() which can handle the new string format; the older version would continue to be supported indefinitely. Done in this way, the format change would be acceptable.
That seems like a good solution, except for one little hitch. It seems that Jean has foreseen this problem for some time. To help minimize the pain, he has been shipping versions of the wireless tools which understand the version 21 API for about six months. A number of distributors have picked up - and shipped - these new tools; affected distributions include Slackware 11 and Mandriva 2007. If those tools see a wireless extensions version greater than 20, they expect to use the new ESSID string format; if that change is backed out, those tools will break.
So wireless extensions 21 is now guaranteed to break some systems whether the ESSID change is included or not. At this point, the only way to avoid breaking deployed systems is to keep the wireless extensions version at 20 indefinitely. The wireless extensions, it seems, may be extended no more.
If that is how things work out, there will be some short-term pain, since needed enhancements will not find their way into the API. The long-term plan, however, is to replace the wireless extensions anyway; to that end, a new, netlink-based API called nl80211 is under development. That API, however, is tightly tied to the Devicescape 802.11 stack, which has been taking rather longer than expected to reach a state where it can be considered for merging. So the Linux wireless API may be stuck for a little while.
Slides and photos from Kernel Netconf 2006
David Miller has posted slides and photos from the 2006 Linux kernel network developers' conference. If you are interested in hardcore details on where the Linux networking layer is going, there are plenty to be found on that page.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Networking
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>