Brief items
The current development kernel is 2.5.70, which was released, at
long last, on May 26. This massive patch includes the beginning of
Alexander Viro's character device rework for a larger
dev_t type
(see below), some NFS fixes, sysfs support for network devices, an XFS
update, some scheduler fixes, a change to the
request_module()
prototype, some framebuffer fixes, more annotations of user-space pointers
and makefile support for Linus's kernel source analyzer,
48-bit IDE addressing support, a (hopefully) working IDE tagged command
queueing implementation, the BIO "walking" and splitting APIs, more devfs
cleanups (
devfs_register() is gone), the USB "gadget" subsystem, a
wireless networking update (and quite a bit of networking work in general),
dynamic block I/O request allocation, a fair amount of SCSI cleanup work, a
generic x86 subarchitecture, a number of TTY layer cleanups, a USB update,
several architecture updates, and a vast number of other fixes. See
the announcement from Linus for the details, or
long-format changlog for lots of really gory
details.
As of this writing, Linus's BitKeeper repository contains a FAT filesystem
rework (if you have been waiting to be able to create FAT partitions
greater than 128GB, this patch is for you), a v850 subarchitecture merge, a
RAID update, the removal of the long-deprecated callout TTY device
(/dev/cua) support, and several other fixes and updates.
Andrew Morton's -mm tree is currently even
more interesting than usual in that it contains a major rework of the ext3
filesystem and generic journaling code. ext3 now uses fine-grained locking
- the big kernel lock is no longer used there. "These are major
changes to a major filesystem. I would ask that interested parties now
subject these patches to stresstesting and to performance testing. The
performance gains on SMP will be significant."
For those who are curious about the source checking program that Linus has
been working on, a preliminary version is
now available via BitKeeper. "It's unfinished enough that
I'm a bit embarrassed about some of it, but I've gotten the permission
from Transmeta to make it open source."
The current stable kernel is 2.4.20, though 2.4.21 may be out by the
time you read this. As of this writing, the
fifth 2.4.21 release candidate is available with a small set of fixes.
This release has an issue with pauses related to the block subsystem; a
small patch exists (and is merged into 2.4.21-rc5-ac1) which fixes this problem.
Comments (1 posted)
Kernel development news
Is the 2.5 kernel ready to move to the next stage? Linus, in the 2.5.70
announcement, talked about his plans to start the pre-2.6 series of
releases. That remark drew
a complaint:
with all that remains broken in 2.5, how could any plan to create a pre-2.6
release be taken seriously? Linus
is
unsympathetic, however:
Would I prefer to have everything fixed by 2.6.0 (or even the
pre-2.6 kernels)? Sure, everybody would. But it's just a fact of
life that we won't see people who care about the issues before that
happens. In fact, judging by past performance, a lot of things
won't get fixed before the actual vendors have made _releases_ that
use 2.6.x ...
This issue comes up over and over again in free software development, of
course. Truly getting the bugs fixed requires a very broad base of
testers. But most of those testers will not show up until you present them
with something billed as "stable" or close to it. Of course, there are
dangers in presenting an "almost stable" release too soon; a kernel with
too many problems could simply drive those testers away for a long time.
The decision on when to jump into the pre-2.6 series will be a hard one.
Quite a few kernel developers seem to think that the time has not yet
come. Linus may be ready to make his move sooner rather than later,
however. (It is worth noting, incidentally, that the various bureaucratic
obstacles to having Andrew Morton work with Linus on the 2.6 release, and
eventually take it over, appear to have been overcome. That bodes well for
the whole process.)
On the 2.4 front, the official 2.4.21 kernel may be out by the time you
read this. No doubt many will be happy to see this long-delayed kernel;
2.4.20 was released on November 28 - a
full six months ago. Even so, there are a few complaints, particularly
about the omission of a new set of driver fixes. David Miller was one of a
few who spoke out:
I really think 2.4.x development is becoming almost non-existent
lately... If Conectiva needs to task Marcelo to so much work that
he can only really put 1 or 2 days a week into 2.4.x, this needs be
rethought at either one end (Conectiva finding a way to give him
more 2.4.x time) or another (Marcelo splits up the work with
someone else or we simply find another 2.4.x maintainer).
A few developers seconded this complaint, with one or two, perhaps
somewhat prematurely, throwing their hats into the ring to be Marcelo's
replacement. Marcelo has responded by saying that things will change -
2.4.22 will come out much more quickly. He has also offered to pass on the
2.4.x responsibility should the community think he is not up to the job.
There have not been a whole lot of complaints about the kernels that
Marcelo has released, however; the only problem is the frequency with which
they are produced. Nobody really wants to see him hand the job off to
somebody else. But there will be a lot of eyes on the 2.4.22 release
process.
Comments (5 posted)
The programmable interrupt controller on modern (SMP) hardware can be set
up to route different interrupts to different processors. When properly
programmed, the APIC can help system performance by having each interrupt
be handled by the processor which is best suited to the job. At the
moment, however, there is not much agreement on how the kernel should be
programming the APIC.
The 2.5 kernel contains (for the x86 architecture, at least) an in-kernel
interrupt balancing routine. It runs as a separate kernel thread ("kirqd")
which wakes up every so often and tries to arrange things so that each
processor handles approximately the same interrupt rate. If that
can't be done (if, for example, most interrupts come from a single source),
interrupts are slowly rotated between the processors. This approach works
reasonably well much of the time, but it can fail badly for certain loads.
In particular, the interrupt balancer has trouble with networking loads.
The networking code goes out of its way to avoid hardware interrupts - when
thousands of packets per second are passing through the system, you don't
want the network interface bugging you for every one of them. So a great
deal of kernel work may result from a single network interface interrupt.
To a simple interrupt balancer, which tries to equalize interrupt counts
across a system, a processor handling a heavy networking load may look
relatively idle. That processor may find that it gets to deal with a SCSI
interface as well, even though it is already overloaded. Even worse, a
router could end up with multiple interfaces being handled by a single
processor, which still looks lightly loaded.
One can certainly imagine ways to tweak the in-kernel interrupt balancer to
make it deal properly with the networking case. But many developers
believe that IRQ balancing belongs in user space. A user-space solution
can contain whatever complexity is needed to make the right sort of
decisions; it also, of course, allows site administrators to set their own
policies.
A user-space interrupt balancing daemon exists now; it can be downloaded
from Arjan van de
Ven's web site. The current implementation is relatively simple,
depending mostly on interrupt counts like the in-kernel balancer. It does,
however, take pains to distribute interrupts from each type of device
across processors. That technique will help network routers, since it will
at least keep different interfaces on different processors. But the real
point is that this policy can be enhanced and customized as needed.
There is some disagreement about moving interrupt balancing to user space.
According to some, only the kernel has the knowledge and the ability to
react quickly enough to create optimal interrupt routings. But, chances
are that user space will be the eventual home for this task. The real
question may be whether the in-kernel interrupt balancer is removed before
2.6.0 comes out.
Comments (none posted)
Alexander Viro is definitely back, and he has made good on his promises to
rework the character device infrastructure to pave the way for the
dev_t transition. A set of patches merged into 2.5.70 shows where
things are headed.
Character devices are now represented by their own structure:
struct cdev {
struct kobject kobj;
struct module *owner;
struct file_operations *ops;
struct list_head list;
};
It is expected that a cdev structure will be embedded within
larger, subsystem-specific structures. An infrastructure has been set up
which lets drivers register character devices with a CIDR-like scheme - any
range of device numbers, starting with an arbitrary major and minor number,
can be allocated, with more specific allocations overriding wider ranges.
It is, in other words, the same scheme that was implemented some time ago for
block devices (and which is described in this Driver Porting Series
article).
In this scheme, the classic register_chrdev() function is
unchanged; it allocates a cdev structure and registers it with
minor numbers 0-255. So unmodified char drivers will continue to work -
and will not be presented with larger device numbers than before. It
expected that, over time, drivers will move away from the
register_chrdev() interface and toward working with cdev
structures directly.
We'll put out a detailed description of the new interface (as part of the
Driver Porting series) once it has
had a chance to stabilize a bit.
Comments (none posted)
Years of buffer overflow problems have made it clear that the classic C
string functions -
strcpy() and friends - are unsafe. Functions
like
strncpy(), which take a length argument, have been presented
as the safe alternatives. But
strncpy() has always been poorly
suited to the task; it wastes time by zero-filling the destination string,
and, if the string to be copied must be truncated, the result is no longer
NULL-terminated. A non-terminated string can lead to overflows
and bugs in its own right. So Linus
finally got
fed up and put together a new
copy_string() function which
does what most
strncpy() users really wanted in the first place.
As is often the case with this sort of security-related improvement, OpenBSD got there
first. In fact, back in 1996, the OpenBSD team came up with a new
string API which avoids the problems of both strcpy() and
strncpy(). The resulting functions, with names like
strlcpy(), have been spreading beyond OpenBSD. The basic function
is simple:
size_t strlcpy(char *dest, const char *src, size_t size);
The source string is copied to the destination and properly terminated; the
return value is the length of the source. If that length is greater than
the destination string, the caller knows that the string has been
truncated.
Linus agreed that following OpenBSD's lead was the right way forward, and
strlcpy() is in his BitKeeper repository, waiting for 2.5.71.
There has also been a flurry of activity to convert kernel code over to the
new function. By the time 2.6.0 comes out, strncpy() may no
longer have a place in the Linux kernel.
Comments (21 posted)
Patches and updates
Kernel trees
- Andrew Morton: 2.5.69-mm9. "<span>2.5.69-mm9 is not for the timid.</span>"
(May 26, 2003)
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>