The current development kernel is still 2.6.0-test5
, which was
released back on September 8.
The pile of patches in Linus's BitKeeper repository continues to grow.
The most notable change is probably the dev_t expansion
(see below); other patches which have been merged include a device mapper
update, some NFS updates, a big I2C update, Con Kolivas's and Ingo Molnar's
scheduler interactivity patches,
a Coda filesystem update, some initramfs tweaks, improvements in random
driver locking, the removal of some ext3 debugging hooks, direct I/O
support for reiserfs, some CPU frequency work, an Intel SpeedStep-SMI
driver, a substantial amount of janitorial work, and various fixes.
The current stable kernel is 2.4.22. Marcelo continues to work on
2.4.23; he released 2.4.23-pre5 on
September 21. This prepatch adds some ACPI fixes, an omitted piece of
the VM patch set that went into -pre4, and various other fixes.
It remains a relatively slow period in
kernel development, so this is not the longest LWN Kernel Page we have ever
produced. It was hard, but we have resisted the urge to fill it out with
coverage of the latest BitKeeper flame war.
Comments (6 posted)
Kernel development news
The expansion of the dev_t
device number type has been on the list
of goals for 2.6 since the beginning. The only problem is that it has
stayed on that list through the entire 2.5 development process; for various
reasons, work on that project stalled for a long time. As of
September 24, however, the dev_t
expansion can be checked off
the list; Linus has merged the required changes into his BitKeeper tree.
They will appear in the 2.6.0-test6 release.
For some time, it had appeared that dev_t would expand to 64 bits,
with 32 bits each for the major and minor numbers. The actual change,
however, is to 32 bits, with a 12-bit major number and 20 bits for the
minor. That should be adequate for some time, especially given that the
new registration mechanisms and sysfs make it much easier for the system to
use device numbers more effectively.
Internally, the new kernel dev_t type uses the encoding one would
expect: the major number sits in the top twelve bits of a 32-bit value,
with the minor number in the bottom 20 bits. The encoding seen by user
space is different, however, as shown in the diagram to the right. Here,
the major number sits in bits 8-19, while the minor number is split across
bits 20-31 and 0-7. This representation may seem strange, but it has one
very nice property: old 16-bit device numbers are still valid in the new
scheme. Encoding device numbers this way helps keep no end of applications
from breaking with the new device number type. One might wonder why this
workaround is necessary, given that the C library can convert device
numbers as needed for the few system calls (mknod(),
stat(), etc.) that actually need them. The problem is that device
number pop up in a number of other contexts, such as in filesystems and
ioctl() calls, where the C library is unable to help.
There are places, however, where an explicitly 16-bit value is passed.
There is no way to change that without breaking applications. In such
cases, the kernel checks whether 16 bits is sufficient; if not, the system
call has no choice but to fail with an EOVERFLOW error.
Beyond that, most of the groundwork for the new dev_t had already
been laid over the last few months. There are, however, certain to be a
few surprises left after such a fundamental change. The next couple
kernels could be interesting to use while the remaining issues get ironed
Comments (5 posted)
The 2.5 development series saw the creation of a few different I/O
schedulers ("elevators") for the block I/O subsystem. I/O schedulers
attempt to perform requested block I/O operations in an order that
maximizes performance. Given that different people (and applications)
measure performance differently, it is not surprising that more than one
I/O scheduler exists. So, for example, the "deadline" scheduler attempts
seeks while ensuring that no request waits for more than a certain period
of time. The anticipatory scheduler pauses after completing read
operations on the assumption that another nearby read will show up
quickly. The CFQ ("completely fair queueing") scheduler tries to divide up
the available I/O bandwidth equally among processes. And there is a "noop"
scheduler for devices (such as memory-based devices) which do not benefit
from I/O scheduling logic at all.
What has been lacking is any sort of way for a system administrator to
choose between these schedulers. A system I/O scheduler
can be designated with the elevator= boot parameter, but that
choice applies to all drives on the system, and it cannot be changed. This
restriction makes experimenting with the various schedulers difficult; in
the real world, it may also be appropriate to use different schedulers for
So Nick Piggin has released a patch which
makes I/O schedulers selectable at run time. With the patch, a new
io_scheduler sysfs attribute appears under
/sys/block/<device>/queue; changing a scheduler is simply a
matter of writing the name of the new scheduler into that attribute. So,
for example, to go to CFQ on the first SCSI drive:
echo cfq >/sys/block/sda/queue/io_scheduler
Changing schedulers requires pausing and emptying the I/O queue, so it
might not be advisable in the middle of writing a CD or controlling a
nuclear power plant shutdown. But it certainly can be a useful thing to do
at system initialization time, or while experimenting with scheduler
performance under a certain kind of load.
Comments (3 posted)
One of the patches that will appear in 2.6.0-test6 is one marking the devfs
subsystem as being obsolete. The patch from Christoph Hellwig reads:
Richard [Gooch] hasn't touched it for about a year and since then
only bugfixes and my changes to the kernel interface went in. No
one has stepped up to maintain it and with udev we have a proper
Devfs was the subject of countless heated linux-kernel battles in the years
leading up to its inclusion in 2.3. It made rather less of a spash
afterwards; none of the major distributors have enabled devfs in their
kernels, with the (arguable) exception of Gentoo. When a subsystem does
not get used, and especially when its maintainer stops working on it, that
subsystem's future tends to be dim. Such is the case with devfs.
Christoph has said he will continue to fix a few problems, but will do no
more with it. 2.6 may be the last major kernel series that includes the
Comments (12 posted)
Patches and updates
- Andrew Morton: 2.6.0-test5-mm4. "<span>A series of patches from Al Viro which introduce 32-bit dev_t support</span>."
(September 22, 2003)
Core kernel code
Filesystems and block I/O
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>