|
|
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current development kernel is 2.5.70, which was released, at long last, on May 26. This massive patch includes the beginning of Alexander Viro's character device rework for a larger dev_t type (see below), some NFS fixes, sysfs support for network devices, an XFS update, some scheduler fixes, a change to the request_module() prototype, some framebuffer fixes, more annotations of user-space pointers and makefile support for Linus's kernel source analyzer, 48-bit IDE addressing support, a (hopefully) working IDE tagged command queueing implementation, the BIO "walking" and splitting APIs, more devfs cleanups (devfs_register() is gone), the USB "gadget" subsystem, a wireless networking update (and quite a bit of networking work in general), dynamic block I/O request allocation, a fair amount of SCSI cleanup work, a generic x86 subarchitecture, a number of TTY layer cleanups, a USB update, several architecture updates, and a vast number of other fixes. See the announcement from Linus for the details, or long-format changlog for lots of really gory details.

As of this writing, Linus's BitKeeper repository contains a FAT filesystem rework (if you have been waiting to be able to create FAT partitions greater than 128GB, this patch is for you), a v850 subarchitecture merge, a RAID update, the removal of the long-deprecated callout TTY device (/dev/cua) support, and several other fixes and updates.

Andrew Morton's -mm tree is currently even more interesting than usual in that it contains a major rework of the ext3 filesystem and generic journaling code. ext3 now uses fine-grained locking - the big kernel lock is no longer used there. "These are major changes to a major filesystem. I would ask that interested parties now subject these patches to stresstesting and to performance testing. The performance gains on SMP will be significant."

For those who are curious about the source checking program that Linus has been working on, a preliminary version is now available via BitKeeper. "It's unfinished enough that I'm a bit embarrassed about some of it, but I've gotten the permission from Transmeta to make it open source".

The current stable kernel is 2.4.20, though 2.4.21 may be out by the time you read this. As of this writing, the fifth 2.4.21 release candidate is available with a small set of fixes. This release has an issue with pauses related to the block subsystem; a small patch exists (and is merged into 2.4.21-rc5-ac1) which fixes this problem.

Comments (1 posted)

Kernel development news

Release management issues

Is the 2.5 kernel ready to move to the next stage? Linus, in the 2.5.70 announcement, talked about his plans to start the pre-2.6 series of releases. That remark drew a complaint: with all that remains broken in 2.5, how could any plan to create a pre-2.6 release be taken seriously? Linus is unsympathetic, however:

Would I prefer to have everything fixed by 2.6.0 (or even the pre-2.6 kernels)? Sure, everybody would. But it's just a fact of life that we won't see people who care about the issues before that happens. In fact, judging by past performance, a lot of things won't get fixed before the actual vendors have made _releases_ that use 2.6.x ...

This issue comes up over and over again in free software development, of course. Truly getting the bugs fixed requires a very broad base of testers. But most of those testers will not show up until you present them with something billed as "stable" or close to it. Of course, there are dangers in presenting an "almost stable" release too soon; a kernel with too many problems could simply drive those testers away for a long time.

The decision on when to jump into the pre-2.6 series will be a hard one. Quite a few kernel developers seem to think that the time has not yet come. Linus may be ready to make his move sooner rather than later, however. (It is worth noting, incidentally, that the various bureaucratic obstacles to having Andrew Morton work with Linus on the 2.6 release, and eventually take it over, appear to have been overcome. That bodes well for the whole process.)

On the 2.4 front, the official 2.4.21 kernel may be out by the time you read this. No doubt many will be happy to see this long-delayed kernel; 2.4.20 was released on November 28 - a full six months ago. Even so, there are a few complaints, particularly about the omission of a new set of driver fixes. David Miller was one of a few who spoke out:

I really think 2.4.x development is becoming almost non-existent lately... If Conectiva needs to task Marcelo to so much work that he can only really put 1 or 2 days a week into 2.4.x, this needs be rethought at either one end (Conectiva finding a way to give him more 2.4.x time) or another (Marcelo splits up the work with someone else or we simply find another 2.4.x maintainer).

A few developers seconded this complaint, with one or two, perhaps somewhat prematurely, throwing their hats into the ring to be Marcelo's replacement. Marcelo has responded by saying that things will change - 2.4.22 will come out much more quickly. He has also offered to pass on the 2.4.x responsibility should the community think he is not up to the job. There have not been a whole lot of complaints about the kernels that Marcelo has released, however; the only problem is the frequency with which they are produced. Nobody really wants to see him hand the job off to somebody else. But there will be a lot of eyes on the 2.4.22 release process.

Comments (5 posted)

How should interrupts be balanced?

The programmable interrupt controller on modern (SMP) hardware can be set up to route different interrupts to different processors. When properly programmed, the APIC can help system performance by having each interrupt be handled by the processor which is best suited to the job. At the moment, however, there is not much agreement on how the kernel should be programming the APIC.

The 2.5 kernel contains (for the x86 architecture, at least) an in-kernel interrupt balancing routine. It runs as a separate kernel thread ("kirqd") which wakes up every so often and tries to arrange things so that each processor handles approximately the same interrupt rate. If that can't be done (if, for example, most interrupts come from a single source), interrupts are slowly rotated between the processors. This approach works reasonably well much of the time, but it can fail badly for certain loads.

In particular, the interrupt balancer has trouble with networking loads. The networking code goes out of its way to avoid hardware interrupts - when thousands of packets per second are passing through the system, you don't want the network interface bugging you for every one of them. So a great deal of kernel work may result from a single network interface interrupt. To a simple interrupt balancer, which tries to equalize interrupt counts across a system, a processor handling a heavy networking load may look relatively idle. That processor may find that it gets to deal with a SCSI interface as well, even though it is already overloaded. Even worse, a router could end up with multiple interfaces being handled by a single processor, which still looks lightly loaded.

One can certainly imagine ways to tweak the in-kernel interrupt balancer to make it deal properly with the networking case. But many developers believe that IRQ balancing belongs in user space. A user-space solution can contain whatever complexity is needed to make the right sort of decisions; it also, of course, allows site administrators to set their own policies.

A user-space interrupt balancing daemon exists now; it can be downloaded from Arjan van de Ven's web site. The current implementation is relatively simple, depending mostly on interrupt counts like the in-kernel balancer. It does, however, take pains to distribute interrupts from each type of device across processors. That technique will help network routers, since it will at least keep different interfaces on different processors. But the real point is that this policy can be enhanced and customized as needed.

There is some disagreement about moving interrupt balancing to user space. According to some, only the kernel has the knowledge and the ability to react quickly enough to create optimal interrupt routings. But, chances are that user space will be the eventual home for this task. The real question may be whether the in-kernel interrupt balancer is removed before 2.6.0 comes out.

Comments (none posted)

Another new character device infrastructure

Alexander Viro is definitely back, and he has made good on his promises to rework the character device infrastructure to pave the way for the dev_t transition. A set of patches merged into 2.5.70 shows where things are headed.

Character devices are now represented by their own structure:

	struct cdev {
		struct kobject kobj;
		struct module *owner;
		struct file_operations *ops;
		struct list_head list;
	};

It is expected that a cdev structure will be embedded within larger, subsystem-specific structures. An infrastructure has been set up which lets drivers register character devices with a CIDR-like scheme - any range of device numbers, starting with an arbitrary major and minor number, can be allocated, with more specific allocations overriding wider ranges. It is, in other words, the same scheme that was implemented some time ago for block devices (and which is described in this Driver Porting Series article).

In this scheme, the classic register_chrdev() function is unchanged; it allocates a cdev structure and registers it with minor numbers 0-255. So unmodified char drivers will continue to work - and will not be presented with larger device numbers than before. It expected that, over time, drivers will move away from the register_chrdev() interface and toward working with cdev structures directly.

We'll put out a detailed description of the new interface (as part of the Driver Porting series) once it has had a chance to stabilize a bit.

Comments (none posted)

strlcpy()

Years of buffer overflow problems have made it clear that the classic C string functions - strcpy() and friends - are unsafe. Functions like strncpy(), which take a length argument, have been presented as the safe alternatives. But strncpy() has always been poorly suited to the task; it wastes time by zero-filling the destination string, and, if the string to be copied must be truncated, the result is no longer NULL-terminated. A non-terminated string can lead to overflows and bugs in its own right. So Linus finally got fed up and put together a new copy_string() function which does what most strncpy() users really wanted in the first place.

As is often the case with this sort of security-related improvement, OpenBSD got there first. In fact, back in 1996, the OpenBSD team came up with a new string API which avoids the problems of both strcpy() and strncpy(). The resulting functions, with names like strlcpy(), have been spreading beyond OpenBSD. The basic function is simple:

    size_t strlcpy(char *dest, const char *src, size_t size);

The source string is copied to the destination and properly terminated; the return value is the length of the source. If that length is greater than the destination string, the caller knows that the string has been truncated.

Linus agreed that following OpenBSD's lead was the right way forward, and strlcpy() is in his BitKeeper repository, waiting for 2.5.71. There has also been a flurry of activity to convert kernel code over to the new function. By the time 2.6.0 comes out, strncpy() may no longer have a place in the Linux kernel.

Comments (21 posted)

Patches and updates

Kernel trees

Linus Torvalds Linux 2.5.70 ?
Andrew Morton 2.5.70-mm1 ?
Stephen Hemminger 2.5.70-osdl1 ?
Andrew Morton 2.5.69-mm8 ?
Andrew Morton 2.5.69-mm9 "<q>2.5.69-mm9 is not for the timid.</q>" ?
Marcelo Tosatti Linux 2.4.21-rc5 ?
Marcelo Tosatti Linux 2.4.21-rc4 ?
Andrea Arcangeli 2.4.21rc4aa1 ?
Marcelo Tosatti Linux 2.4.21-rc3 ?
Alan Cox Linux 2.4.21rc2-ac3 ?
Con Kolivas 2.4.20-ck7 ?

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Memory management

William Lee Irwin III pgcl-2.5.70-1 ?

Networking

Jeff Garzik irda merges ?

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>


Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds