|
|
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current 2.6 kernel remains 2.6.18; patches continue to flow into the mainline repository for the upcoming 2.6.19-rc1 release.

The current -mm tree is 2.6.18-mm3. Recent changes to -mm include a patch to silence a lot of useless compiler warnings, a new attempt to get the swap token code working properly, swapfile support for software suspend, and the kevent subsystem.

The current 2.4 prepatch is 2.4.34-pre4, released on October 2. A small number of fixes went in this time around; 2.4.34 looks like it is about to go into the final stabilization phase.

Comments (4 posted)

Kernel development news

Quotes of the week

I reserve the right some day to attempt to sue the ass of people who tivo-ise my code. Hey I might lose but I reserve the right to.

-- Alan Cox

The Mexicans have the Chupacabra. We have Al Viro. If you hear him roar, just _pray_ he's about to dissect somebody elses code than yours.. There is no point in running.

-- Linus Torvalds

Seems that the entire kernel effort is an ongoing plot to make my poor little Vaio stop working.

-- Andrew Morton

Comments (18 posted)

More stuff for 2.6.19

The flow of patches into the mainline repository continues at a high rate, with a few thousand of them having been merged since last week's summary. The most significant of these are (starting with the user-visible changes):

  • The GFS2 cluster filesystem has been merged at last; it includes its own distributed lock manager implementation.

  • New drivers: MCS7840 USB port devices. ELAN U132 USB controllers, ELAN Uxxx USB-to-PCMCIA adapters, Playstation 2 "Trance" vibrator devices, the VIA VT1211 Super-I/O chip, AMD K8 CPU temperature monitors, Philips TDA10086 and TDA826x tuner devices, DiBcom DiB0700-based USB bridges, Hauppauge Nova-T 500 tuners, TI Flash Media PCI74xx and PCI76xx host adapters, QUICC Engine communications coprocessors, and HP Quicksilver AGP GARTs.

  • The NFS server code has a number of improvements, including the ability to do I/O in much larger chunks over TCP connections.

  • eCryptfs, an encrypting filesystem, has gone in.

  • Bound End-to-End Tunnel (BEET) mode support has been added to the IPSec code.

  • A USB gadget driver which connects the gadget interface to the ALSA MIDI subsystem. The purpose is to allow a system to appear as a USB-connected MIDI streaming device.

  • POSIX access control lists are now available in the tmpfs filesystem.

  • If a string with the form |program is written to /proc/sys/kernel/core_pattern, all core dumps will be piped to the given program instead of being written to disk.

  • Some of the early containers patches have gone in, including separate namespaces for utsname information and SYSV IPC objects.

  • The BSD secure level security module has been removed.

  • The "floppy tape" subsystem has been marked for removal in 2.6.20; it is unmaintained, probably has no active users, and its 1.6GB storage capacity looks rather quaint in current times. Anybody who actually has worthwhile data on this medium probably should have copied it to something newer some time ago.

Changes visible to kernel developers include:

  • SRCU - a version of read-copy-update which allows read-side blocking - has been merged.

  • Much improved suspend and resume support for the USB layer.

  • A new set of functions has been added to allow USB drivers to quickly check the direction and transfer mode of an endpoint.

  • A somewhat reduced version of Wireless Extensions version 21. Most of the original functionality has been removed with the idea that the wireless extensions will soon be superseded by something else.

  • Vast numbers of annotations enabling the sparse utility to detect big/little endian errors.

  • A number of Video4Linux drivers have been converted to the V4L2 API.

  • The flags field of struct request has been split into two new fields: cmd_type and cmd_flags. The former contains a value describing the type of request (filesystem request, sense, power management, etc.) while the latter has the flags which modify the way the command works (read/write, barriers, etc.).

  • The block layer can be disabled entirely at kernel configuration time; this option can be useful in some embedded situations.

  • The kernel now has a generic boolean type, called bool; it replaces a number of homebrewed boolean types found in various parts of the kernel.

  • There is a new function for allocating a copy of a block of memory:

        void *kmemdup(const void *src, size_t len, gfp_t gfp);
    
    A number of allocate-then-copy code sequences have been updated to use kmemdup() instead.

  • The latency tracking infrastructure patch has been merged.

  • The readv() and writev() methods in the file_operations structure have been removed in favor of aio_readv() and aio_writev() (whose prototypes have been changed). See this article for more information.

As of this writing the merge window has not yet closed, so chances are that more significant changes could yet find their way into 2.6.19.

Comments (8 posted)

API changes: interrupt handlers and vectored I/O

Normally, the release of 2.6.19-rc1 would be the signal that the release cycle process would begin to slow down and focus on bug fixes. Things might be just a little different this time around, however, as a large and disruptive (almost 1100 files changed) API change is likely to go in between -rc1 and -rc2. The reasoning is this: a patch which hits so many files will inevitably conflict with a number of the other patches currently flooding into the mainline. Holding this patch until the flood should make life easier all around.

So what is this patch? Consider that interrupt handlers currently have the following prototype:

   irqreturn_t handler(int irq, void *data, struct pt_regs *regs);

The regs structure holds the state of the processor's registers at the time of the interrupt. It is passed to every interrupt handler, but it is almost never used; for the purposes of most handlers, the pre-interrupt register state is just a bunch of random bits. There is a cost to passing this pointer around, however. According to David Howells:

The regs pointer is used in few places, but it potentially costs both stack space and code to pass it around. On the FRV arch, removing the regs parameter from all the genirq function results in a 20% speed up of the IRQ exit path (ie: from leaving timer_interrupt() to leaving do_IRQ()).

So David has put together a patch which removes the regs argument to interrupt handlers. Any code which actually needs the registers - seemingly only the timer interrupt handler - can get the pointer with a call to the new get_irq_regs() function. Since this change obviously requires fixing every interrupt handler in the system - and there are a lot of them in the mainline kernel - the patch is large and touches a lot of files.

This patch has just now come along, meaning that, by normal standards, it is a bit late for the 2.6.19 party. So it would normally sit in -mm for this cycle, and be merged into 2.6.20. But, Andrew Morton says:

I think the change is good. But I don't want to maintain this whopper out-of-tree for two months! If we want to do this, we should just smash it in and grit our teeth

Nobody else seems to object to the change, though Linus did spare a moment to feel the pain of people maintaining drivers out of the mainline tree. The writing on the wall all points to a near-term inclusion, perhaps with a special defined symbol to help out-of-tree maintainers write code which works with both handler prototypes.

Meanwhile, the file_operations structure can be found at the core of just about any subsystem which does I/O. Char device drivers create file_operations structures directly, while most other parts of the system (filesystems, network protocols and drivers, block drivers) bury them in higher-level logic. Two of the members of this structure are:

    ssize_t (*aio_read) (struct kiocb *iocb, char __user *buf, 
                         size_t len, loff_t pos);
    ssize_t (*aio_write) (struct kiocb *iocb, const char __user *buf, 
                          size_t len, loff_t pos);

These methods implement asynchronous reads and writes - operations which may be completed sometime after the original call returns to user space. One longstanding shortcoming of the Linux asynchronous I/O implementation is its lack of vectored operations; each AIO call can only operate on a single buffer. The 2.6.19 kernel will fill in that gap, at the cost of changing the above two prototypes to:

    ssize_t (*aio_read) (struct kiocb *iocb, const struct iovec *iov, 
             unsigned long niov, loff_t pos);
    ssize_t (*aio_write) (struct kiocb *iocb, const struct iovec *iov, 
             unsigned long niov, loff_t pos);

The single buffer has been replaced by an array of iovec structures:

    struct iovec
    {
	void __user *iov_base;
	__kernel_size_t iov_len;
    };

Single-buffer calls are now wrapped in a single iovec structure and passed to the new, vectorized versions of the AIO operations. All code which provides aio_read() and aio_write() will need to be updated to the new API - and the possibility of being requested to perform vectored operations.

The changes actually go beyond that, however, in that the readv() and writev() file_operations methods have been removed. The associated system calls are now, instead, implemented with calls to aio_read() and aio_write(). Converting older readv() and writev() methods is not particularly difficult, since there is no requirement that aio_read() and aio_write() must be asynchronous (in fact, in this case, they will be passed a "synchronous KIOCB" which indicates that the operation must be performed synchronously). In most cases, it is simply a matter of adopting the new prototype, then looking in iocb->ki_filp for the struct file pointer, should it be needed.

(See this article from last February for more background on this change).

Comments (3 posted)

The final wireless extension?

"Wireless extensions" is an ioctl()-based API which allows user space to control parameters specific to wireless network interfaces - ESSID, encryption passwords, channels, etc. This API has long been maintained by Jean Tourrilhes; the last few kernel releases have had version 20 of this API. As of this writing, version 21 has been merged into the pre-2.6.19 mainline, but at least some of it may be on its way back out again.

The problem is that version 21 is a real API change, in that sufficiently old tools will no longer operate properly. In particular, the formatting of the ESSID passed into the kernel has changed, so configurations which associated with a given network under version 20 will not do so under version 21. There is a workaround (add a space to the ESSID string), but many users will not know that, and, in any case, will only discover the need after upgrading their kernel and finding that the network is no longer there.

Since this problem came to light, many kernel developers (including Linus) have made it clear that they see this sort of API breakage as unacceptable. So they want the ESSID change backed out. There are, of course, real reasons for that change - the way those strings are handled in the protocols has evolved over time. But the right solution is to add a new ioctl() which can handle the new string format; the older version would continue to be supported indefinitely. Done in this way, the format change would be acceptable.

That seems like a good solution, except for one little hitch. It seems that Jean has foreseen this problem for some time. To help minimize the pain, he has been shipping versions of the wireless tools which understand the version 21 API for about six months. A number of distributors have picked up - and shipped - these new tools; affected distributions include Slackware 11 and Mandriva 2007. If those tools see a wireless extensions version greater than 20, they expect to use the new ESSID string format; if that change is backed out, those tools will break.

So wireless extensions 21 is now guaranteed to break some systems whether the ESSID change is included or not. At this point, the only way to avoid breaking deployed systems is to keep the wireless extensions version at 20 indefinitely. The wireless extensions, it seems, may be extended no more.

If that is how things work out, there will be some short-term pain, since needed enhancements will not find their way into the API. The long-term plan, however, is to replace the wireless extensions anyway; to that end, a new, netlink-based API called nl80211 is under development. That API, however, is tightly tied to the Devicescape 802.11 stack, which has been taking rather longer than expected to reach a state where it can be considered for merging. So the Linux wireless API may be stuck for a little while.

Comments (8 posted)

Slides and photos from Kernel Netconf 2006

David Miller has posted slides and photos from the 2006 Linux kernel network developers' conference. If you are interested in hardcore details on where the Linux networking layer is going, there are plenty to be found on that page.

Comments (none posted)

Patches and updates

Kernel trees

Andrew Morton 2.6.18-mm2 ?
Andrew Morton 2.6.18-mm3 ?
Willy Tarreau Linux 2.4.34-pre4 ?

Architecture-specific

Core kernel code

Development tools

Petr Baudis Cogito-0.18 ?
Junio C Hamano GIT 1.4.2.2 ?
Junio C Hamano GIT 1.4.2.3 ?

Device drivers

Filesystems and block I/O

Janitorial

Jeff Garzik schedule ftape removal ?

Memory management

Networking

Stephen Hemminger let mortals use ethtool ?
Samir Bellabes Network Events Connector ?
Johannes Berg cfg80211 and nl80211 ?

Security-related

Miscellaneous

Kay Sievers udev 101 release ?
Netfilter Core Team Release of iptables-1.3.6 ?
Stephen Hemminger iproute2-2.6.18-061002 ?

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds