Brief items
The current development kernel is 2.6.0-test9,
released by Linus on October 25. It
consists almost entirely of important fixes, of course, but Linus also
threw in Jeff Garzik's "libata" driver. As always,
the long-format changlog has the details.
It seems a real 2.6.0 release
could be getting close:
If this works out, then I'll submit -test10 to Andrew Morton, and
if he takes it we'll probably have a real 2.6.0 after a final
shakedown.
Linus's approach of restricting patches to the most important fixes should
help to stabilize the kernel. It also is likely to mean, however, that
there will be a substantial pile of patches waiting to go in after the
2.6.0 release.
2.6.0-test9 is, perhaps, unique in having its
own press release, something that is not normally done for development
kernels. OSDL, it seems, wants to be sure that the world knows where Linus
and Andrew work these days.
Linus's BitKeeper tree, as of this writing, contains a relatively small
number of fixes.
The current stable kernel is 2.4.22; Marcelo released 2.4.23-pre8 on October 22. Along
with the usual fixes, this patch also includes an ACPI update, some driver
updates, and a set of tmpfs fixes.
Comments (none posted)
Joe Pranevich has updated the
Wonderful World of Linux 2.6
to cover the -test9 release. This is likely to be the last update until
the official 2.6 release. A
rough list of changes to the
document is
also available.
Comments (1 posted)
Kernel development news
Upgrading to a new version of an operating system is always a bit of a
mixed experience. The promise of new features, new applications, and
better performance (one hopes) contends with the fear that the upgrade will
break something that used to work. Even the most worried among us,
however, do not normally worry about an upgrade causing hardware to self
destruct. Those who have recent attempted to install Mandrake
Linux 9.2 on a system containing an LG CD drive (shipped by Dell and
numerous others) have gotten just that sort of surprise, however. An
unpatched 9.2 system, it seems, can cause those drives to wipe out their
firmware and cease to function.
This problem has been the centerpiece of a small flood of complaints about
the stability of the 9.2 release - over 250MB of updates have already been
issued by MandrakeSoft. The simple fact of the matter, however, is that it
is hard to blame MandrakeSoft for this problem.
The code which toasts LG drives was added to the Mandrake Linux kernel back
in August, as part of a general packet writing support patch. It issues a
standard ATAPI FLUSH_CACHE command to the drive at times, in order
to ensure that all outbound data reaches its intended destination. A
CD-ROM is a read-only device, so the FLUSH_CACHE command does not
make any particular sense in this context. But, for the purpose of the
packet-writing code, it was easier to simply issue that command
unconditionally.
The ATAPI specification is clear on what should happen in this situation;
the drive should either simply ignore the command, or it should fail it
with an error code. The designer of the LG drive firmware, however, had a
different idea. Since FLUSH_CACHE is not a command that is
applicable in this situation,
why not reuse it to overwrite the firmware in some (undocumented)
way? It must have, in some twisted way, seemed like a good idea at the
time. But standard commands should never be re-purposed in this way; and
they especially should not be turned into a self-destruct operation. The
LG drives are non-compliant and mis-designed, and nobody can blame
MandrakeSoft for having been the first distributor to get burned by this
poor product.
Some people have tried to lay the blame there anyway, of course. According
to the critics, if MandrakeSoft would only test its releases more
thoroughly and avoid including non-standard kernel patches, this sort of
episode would not occur. These charges do not hold water, however.
Mandrake Linux has, arguably, the most open development process of any
commercial distributor; anybody who is interested can follow the evolution
of each release from one day to the next and, yes, test those releases.
The code in question was included in two 9.2 release candidates,
but nobody pointed out the problem. It is hard to see how much better
MandrakeSoft could do on the testing front.
With regard to patches: for better or worse, shipping patched kernels is
standard practice for distributors. Some distributors ship kernels which
are hard to recognize as being derived from any mainline release; Red Hat's
kernels are called 2.4.x, but, at the moment, are packed with 2.6 code and
features. Even Debian has just been through a lengthy (and somewhat
inconclusive) debate on just how heavily its kernels should be patched.
For many patches, use in distributor kernels is a prerequisite to inclusion
in the mainline. The use of patched kernels in distributions is not only
standard practice, but it's a part of the wider development process.
New code will bring surprises, though, hopefully, not often of this
magnitude. The only real way to be sure of the stability of code is to see
it in wide use, in many different situations. Unfortunately, in the
software world, the only way to achieve that degree of testing is to have
the end users do it. This is true for both free and proprietary software.
Such is life in this industry. MandrakeSoft got unlucky this time; the
next such incident could just as easily happen to anybody else.
(Mandrake users may want to see the errata page
for the LG drive problem).
Comments (8 posted)
Mark Bellon recently
announced the first
release of a tool called "User-Space Device Enumeration," or "uSDE". uSDE
maintains a directory full of device nodes based on hotplug events and
information found in sysfs. It is thus intended to be a user-space
replacement for the devfs filesystem.
Few doubt that the objectives for uSDE make sense. But quite a few
developers have asked why the uSDE developers went off and created their
own system, rather than working on udev (which recently released version 005). Given that the two projects
appear to be trying to do exactly the same thing, it seems strange that the
work is being done twice.
According to Mr. Bellon, uSDE was developed because udev wasn't up to the
needs of Carrier Grade Linux. What needs they were trying to meet are not
entirely clear; his posting is full
of language like "Aggressive device enumeration. Multiple concurrent
policy execution and management." In fact, the actual requirements
imposed by the CGL specification are minimal; as posted by Greg Kroah-Hartman:
OSDL CGL specifies that carrier grade Linux shall provide
functionality such that a device's identity shall be maintained
when it is removed and reinstalled even if it is plugged into a
different bus, slot, or adapter. "Device identity" is the name
of the device presented to user space, and this identity is
assigned based on policies set by the administrator, e.g., based
on location or hardware identification information.
Meeting this requirement with existing tools is not that hard to do.
uSDE appears to be the result of a different design approach. It uses a
complicated plugin architecture to implement different device naming
policies. As a whole, it is rather larger and more complex than udev. It
does provide some functionality that udev is still lacking, including a
devfs emulation module. In general, it shows the signs of having had more
developer time put into it than udev.
But, while uSDE may be a little further developed than udev, it looks set to
lose the fight for developer support and mindshare. The development of
udev has followed the informal rules of kernel hacking: it has been done in
the open, with feedback received along the way. It also doesn't hurt that
udev is the project of a core kernel developer. uSDE, instead, has been
developed in isolation, in competition to an established project,
and was late to enter the public arena. Whether or
not uSDE is, in fact, a better solution, the way in which it has been
developed has put it at a disadvantage relative to its competition.
Comments (3 posted)
Driver porting
The Driver Porting Series now includes several articles on how kobjects
work as a way of tieing together data structures and managing reference
counts. Experience shows, however, that truly envisioning how
kobject-linked data structures tie together is a difficult task. In the
hope of shedding a bit more light in this direction, and as a way for your
editor to exercise his minimal skills with the "dia" diagram editor, this
article will show how some of the crucial data structures in the block
layer are connected.
The core data structure in this investigation is the kobject. In the
diagrams that follow, kobjects will be represented by the small symbol you
see to the right. The upper rectangle represents the kobject's parent
field, while the other two are its entries in the doubly-linked list that
implements a kset. Not all kobjects belong to a kset, so those links will
often be empty.
At the root of the block subsystem hierarchy is a subsystem called
block_subsys; it is defined in drivers/block/genhd.c. As
you'll recall from The Zen of Kobjects, a
subsystem is a very simple structure, consisting of a semaphore and a
kset. The kset will define, in its ktype field, what type of
kobjects it will contain; for block_subsys, this field is set to
ktype_block. Pictorially, we can show this structure as seen on
the right.
Each kset contains its own kobject, and block_subsys is no
exception. In this case, the kobject's parent field is explicitly set to
NULL (indicated by the ground symbol in the picture). As a
result, this kobject will be represented in the top level of the sysfs
hierarchy; it is the kobject which lurks behind /sys/block.
A block subsystem is not very interesting without disks. In the block
hierarchy, disks are defined by a struct gendisk, which can be
found in <include/linux/genhd.h>. The gendisk interface is
described in this article. For our
purposes, we will represent a gendisk as seen on the left; note that it has
the inevitable embedded kobject inside it. A gendisk's kobject does not
have an explicit type pointer; its membership in the block_subsys
kset takes care of that. But its parent and kset
pointers both point to the kobject within block_subsys, and the
kset pointers are there too. The result, for a system with two disks,
would be a structure that looks like this:
Things do not end there, however; a gendisk structure is a complicated
thing. It contains, among other things, an array of partition entries (of
type struct hd_struct),
each of which has embedded within it, yes, a kobject. The parent of each
partition is the disk which contains it. It would have been possible to
implement the list of partitions as a kset, but things weren't done that
way. Partitions are a relatively static item, and their ordering matters,
so they were done as a simple array. We depict that array as seen on the
right.
As you can see, the kobject type of a partition is ktype_part.
This type implements the attributes you will see in the sysfs
entries for each partition, including the starting block number and size.
Another item associated with each gendisk is its I/O request queue. The
queue, too, contains a kobject (of type queue_ktype) whose parent
is the associated gendisk. The I/O scheduler ("elevator") in use with an
I/O request queue is also represented in the hierarchy. The scheduler's
kobject's type depends on which scheduler is being used; the (default)
anticipatory scheduler uses as_ktype. The resulting piece of the
puzzle looks as portrayed on the left.
The request queue and I/O scheduler information in sysfs is currently
read-only. There is no reason, however, why sysfs attributes could not be
used to change I/O scheduling parameters on the fly. The selectable I/O scheduler patch uses sysfs
attributes to change I/O schedulers completely, for example.
Putting it all together
![[The full diagram]](/images/ns/dp/block-kobj-sm.png)
So far, we have seen a number of disconnected pieces. The full diagram can
be found on
this page; it
is a bit wide to be placed inline with the text (a small, illegible
version appears to the right). Also on that page, you'll
find a corresponding diagram showing the sysfs names the correspond to each
kobject.
The data structure as described is the full implementation of the
/sys/block subtree of sysfs. The full sysfs tree contains rather
more than this, of course. For each gendisk which shows up under
/sys/block, there will be a separate entry under
/sys/devices which describes the underlying hardware. Internally,
the link between the two is contained in the driverfs_dev field of
the gendisk structure. In sysfs, that link is represented as a symbolic
link between the two sub-trees.
Hopefully this series of pictures helps in the visualization of a portion
of the sysfs tree and the device model data structure that implements it.
The device model brings a great deal of apparent complexity, but, once the
underlying concepts are grasped, the whole thing is approachable.
Comments (2 posted)
Patches and updates
Kernel trees
Core kernel code
Device drivers
Memory management
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>