Release status
Kernel release status
The current development kernel is 2.6.0-test4, which was
released by Linus on August 22. This
large patch includes several hundred changesets, including numerous
networking fixes, a new
free_netdev() method for networking
drivers (see below), a new
cpumask_t type for systems with more
processors than bits in a long integer, a
CONFIG_BROKEN option to
control access to drivers known to be broken, a
magic, fast new strncpy()
implementation, the addition of wireless statistics to sysfs, Twofish
and Serpent support for IPSec, a bunch of power management code, new sysfs
attributes to control scanning of SCSI devices, a number of IDE patches, a
new sysfs "attribute group" mechanism which enables the addition of
attributes in a safer way and with less boilerplate code, an ALSA update,
and a mind-numbing array of other fixes and updates. See
the long-format changelog for the details.
As of this writing, Linus's BitKeeper tree contains only a handful of
fixes. Linus is currently on vacation, so patches are not currently being
merged.
The current stable kernel is 2.4.22, released by Marcelo on August 25. Marcelo
is not resting, however; he has already put out 2.4.23-pre1, which includes a merge of the IP
virtual server code, an LVM update, various driver updates, a possible
first step toward the eventual inclusion of XFS, and a number of fixes.
Comments (5 posted)
Kernel development news
dev_t expansion status
The expansion of the
dev_t type to 64 bits has been stalled for a
few months now. Most of the work, it seems, has been done, but the patches
have yet to find their way into the mainline kernel. Among other things,
the
dev_t expansion has been held up waiting for another set of
patches from the elusive Alexander Viro. Mr. Viro still only surfaces
rarely on the mailing lists, but it seems he has been busy; a set of large
dev_t patches has turned up in 2.6.0-test4-mm2.
Many of the patches are essentially cleanups, such as removals of final
uses of the kdev_t type which can be replaced with something
else. After all, if a piece of code does not use device numbers at all, it
should not run into trouble if the size of those numbers changes. Others
begin to address more problematic code; for example, the JFFS filesystem
incorporates device numbers directly into its on-media data structures; a
change in the device number size would make older filesystems
unreadable. In this case, for now, the (16-bit) size of this field has
been made explicit.
Some of the patches take care of some (seemingly) unrelated block device
layer cleanups. A few things, it seems, didn't work quite as well as
expected once Al went back and took another serious look at the code.
Then, there is a simple addition to <linux/fs.h>:
static inline unsigned iminor(struct inode *inode)
{
return minor(inode->i_rdev);
}
This little function is the subject of the largest patch in the series: it
replaces references to inode->i_rdev in a vast number of
drivers and a few filesystems as well. The purpose, of course, is to allow
access to the minor number of the device behind an inode without requiring
any knowledge of how that number is actually stored within the inode. Not
surprisingly, there is also an imajor() helper function.
Al mentions another series of patches which have not yet made an appearance.
They will include a change to the inode structure, turning the
i_rdev field into a dev_t type (from kdev_t).
At that point, the addition of all those iminor() and
imajor() calls will make sense; code using those calls will be
unaffected by the inode structure change. There will also be
patches to ensure that the conversion of device numbers between the
internal representation and that used on-disk by filesystems is done
properly.
So the expanded dev_t project is moving forward once again. This
is an important feature to have in 2.6, so this is a good thing. There is,
however, a large set of fairly invasive patches coming which may bring a
surprise or two when it hits the 2.6.0-test mainline.
(The actual patches can be seen in the 2.6.0-test4-mm2 patch, or separately
on
kernel.org; a good place to start is Al's
overview of the patch series).
Comments (none posted)
The ongoing interactive scheduling effort
The interactive scheduling response of the 2.6.0-test kernels is a
controversial topic. Some (including your editor) find the recent kernels
to be noticeably more responsive than the 2.4 series; others complain
loudly. It does seem that, despite the fact that some users are happy, the
job is not yet entirely finished.
Con Kolivas has continued to produce his scheduler patches, which
concentrate mostly on tweaking the interactivity estimation code. The
basic idea remains that, if the system can get a good handle on which tasks
are truly interactive, it can then be made to do the right thing. In many
cases, that appears to be the case. Andrew Morton has, however, recently called for Con to take a step back and rethink
things after being
made aware of some significant performance regressions that appear to have
been caused by the scheduler patches:
I suggest that what we need to do is to await some more complete
testing of the CPU scheduler patch alone from Steve and co. If it
is fully confirmed that the CPU scheduler changes are the culprit
we need to either fix it or go back to square one and start again
with more careful testing and a less ambitious set of changes.
Con did some quick testing and narrowed the problem down to Ingo Molnar's
latest interactivity patch. There does not, as yet, appear to be a real
understanding of what is going on, however.
Con has also recently posted a lengthy
document on how the scheduler works and what changes his patches have
made.
Nick Piggin is, perhaps, best known for scheduling disks - he is the author
of the anticipatory I/O scheduler in 2.6.0-test. Nick recently decided to
get into the CPU scheduler tuning game, and has started posting patches;
his most recent is Nick's scheduler policy
v7. These patches take a different approach, starting by hacking out
almost all of the code that tries to calculate interactivity. They remove
almost as much code as they add.
The key part of Nick's policy seems to be the manipulation of time slices.
Processes at different priority levels get very different time slices -
much more so than with the current scheduler. Time slices also depend on
what else is running; if there aren't any high priority processes waiting
to run, lower-priority processes will get larger slices.
Process priorities also vary
more quickly, allowing processes which sleep a lot to get back into the CPU
quickly. Finally, this patch restores the "priority transfer" idea: when
one process wakes another, a portion of the waking process's priority (and
time slice) is given over to the process being awakened. This feature
helps to keep the X server responsive. With Nick's patch, the X server
benefits from being given a higher priority; this is not the case with
Con's scheduler patches.
Getting scheduling right is hard, as can be seen by the amount of effort
being put to the problem. By many accounts, 2.6 will be better than
earlier kernels in this regard. But it would not be surprising if
developers were still trying to improve it long after 2.6.0 is released.
Comments (8 posted)
Freeing network devices safely
Recent development kernels include a great deal of networking information
under
/sys/class. For the moment, it is mostly physical layer
stuff, but one should expect more information to show up there over time,
as it migrates out of
/proc/sys. The current networking sysfs
files draw their information from the interface's associated
net_device structure. That scheme works nicely, in that network
drivers need not concern themselves with providing the sysfs
infrastructure; it just sort of happens. But consider what happens if a
suitably privileged user executes something like:
rmmod e100 < /sys/class/net/eth0/statistics/tx_bytes
This command will keep the indicated sysfs file open past the time when the
module containing the net_device structure behind that file is
removed from the system. Unless special care is taken, the open file will
be left pointing to structures which no longer exist, leading to all kinds
of potential trouble. Most
drivers do not take that care.
Until 2.6.0-test4, that is. After a series of patches by Stephen
Hemminger, drivers are expected to use kmalloc() to create
net_device structures dynamicly. Most drivers already worked that
way; the difference now is that drivers can no longer just return those
structures with kfree() when they are no longer needed. Instead,
there is a new function which is used to get rid of a net_device
structure:
void free_netdev (struct net_device *dev);
This function, of course, helps the networking system maintain reference
counts for net_device structures, and avoid freeing them until
they are truly unused. This whole structure is relatively simple, but it
demonstrates, again, the higher level of care required to avoid creating
race conditions in the 2.6 kernel.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
- Con Kolivas: O18int.
(August 22, 2003)
- Con Kolivas: O18.1int.
(August 24, 2003)
Device drivers
Documentation
Filesystems and block I/O
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>