Kernel development
Brief items
Kernel release status
The current development kernel is 2.6.0-test9, released by Linus on October 25. It consists almost entirely of important fixes, of course, but Linus also threw in Jeff Garzik's "libata" driver. As always, the long-format changlog has the details.It seems a real 2.6.0 release could be getting close:
Linus's approach of restricting patches to the most important fixes should help to stabilize the kernel. It also is likely to mean, however, that there will be a substantial pile of patches waiting to go in after the 2.6.0 release.
2.6.0-test9 is, perhaps, unique in having its own press release, something that is not normally done for development kernels. OSDL, it seems, wants to be sure that the world knows where Linus and Andrew work these days.
Linus's BitKeeper tree, as of this writing, contains a relatively small number of fixes.
The current stable kernel is 2.4.22; Marcelo released 2.4.23-pre8 on October 22. Along with the usual fixes, this patch also includes an ACPI update, some driver updates, and a set of tmpfs fixes.
The Wonderful World of Linux 2.6
Joe Pranevich has updated the Wonderful World of Linux 2.6 to cover the -test9 release. This is likely to be the last update until the official 2.6 release. A rough list of changes to the document is also available.
Kernel development news
Mandrake Linux 9.2 and self-destructing CD-ROM drives
Upgrading to a new version of an operating system is always a bit of a mixed experience. The promise of new features, new applications, and better performance (one hopes) contends with the fear that the upgrade will break something that used to work. Even the most worried among us, however, do not normally worry about an upgrade causing hardware to self destruct. Those who have recent attempted to install Mandrake Linux 9.2 on a system containing an LG CD drive (shipped by Dell and numerous others) have gotten just that sort of surprise, however. An unpatched 9.2 system, it seems, can cause those drives to wipe out their firmware and cease to function.This problem has been the centerpiece of a small flood of complaints about the stability of the 9.2 release - over 250MB of updates have already been issued by MandrakeSoft. The simple fact of the matter, however, is that it is hard to blame MandrakeSoft for this problem.
The code which toasts LG drives was added to the Mandrake Linux kernel back in August, as part of a general packet writing support patch. It issues a standard ATAPI FLUSH_CACHE command to the drive at times, in order to ensure that all outbound data reaches its intended destination. A CD-ROM is a read-only device, so the FLUSH_CACHE command does not make any particular sense in this context. But, for the purpose of the packet-writing code, it was easier to simply issue that command unconditionally.
The ATAPI specification is clear on what should happen in this situation; the drive should either simply ignore the command, or it should fail it with an error code. The designer of the LG drive firmware, however, had a different idea. Since FLUSH_CACHE is not a command that is applicable in this situation, why not reuse it to overwrite the firmware in some (undocumented) way? It must have, in some twisted way, seemed like a good idea at the time. But standard commands should never be re-purposed in this way; and they especially should not be turned into a self-destruct operation. The LG drives are non-compliant and mis-designed, and nobody can blame MandrakeSoft for having been the first distributor to get burned by this poor product.
Some people have tried to lay the blame there anyway, of course. According to the critics, if MandrakeSoft would only test its releases more thoroughly and avoid including non-standard kernel patches, this sort of episode would not occur. These charges do not hold water, however. Mandrake Linux has, arguably, the most open development process of any commercial distributor; anybody who is interested can follow the evolution of each release from one day to the next and, yes, test those releases. The code in question was included in two 9.2 release candidates, but nobody pointed out the problem. It is hard to see how much better MandrakeSoft could do on the testing front.
With regard to patches: for better or worse, shipping patched kernels is standard practice for distributors. Some distributors ship kernels which are hard to recognize as being derived from any mainline release; Red Hat's kernels are called 2.4.x, but, at the moment, are packed with 2.6 code and features. Even Debian has just been through a lengthy (and somewhat inconclusive) debate on just how heavily its kernels should be patched. For many patches, use in distributor kernels is a prerequisite to inclusion in the mainline. The use of patched kernels in distributions is not only standard practice, but it's a part of the wider development process.
New code will bring surprises, though, hopefully, not often of this magnitude. The only real way to be sure of the stability of code is to see it in wide use, in many different situations. Unfortunately, in the software world, the only way to achieve that degree of testing is to have the end users do it. This is true for both free and proprietary software. Such is life in this industry. MandrakeSoft got unlucky this time; the next such incident could just as easily happen to anybody else.
(Mandrake users may want to see the errata page for the LG drive problem).
User-space device enumeration
Mark Bellon recently announced the first release of a tool called "User-Space Device Enumeration," or "uSDE". uSDE maintains a directory full of device nodes based on hotplug events and information found in sysfs. It is thus intended to be a user-space replacement for the devfs filesystem.Few doubt that the objectives for uSDE make sense. But quite a few developers have asked why the uSDE developers went off and created their own system, rather than working on udev (which recently released version 005). Given that the two projects appear to be trying to do exactly the same thing, it seems strange that the work is being done twice.
According to Mr. Bellon, uSDE was developed because udev wasn't up to the
needs of Carrier Grade Linux. What needs they were trying to meet are not
entirely clear; his posting is full
of language like "Aggressive device enumeration. Multiple concurrent
policy execution and management.
" In fact, the actual requirements
imposed by the CGL specification are minimal; as posted by Greg Kroah-Hartman:
Meeting this requirement with existing tools is not that hard to do.
uSDE appears to be the result of a different design approach. It uses a complicated plugin architecture to implement different device naming policies. As a whole, it is rather larger and more complex than udev. It does provide some functionality that udev is still lacking, including a devfs emulation module. In general, it shows the signs of having had more developer time put into it than udev.
But, while uSDE may be a little further developed than udev, it looks set to lose the fight for developer support and mindshare. The development of udev has followed the informal rules of kernel hacking: it has been done in the open, with feedback received along the way. It also doesn't hurt that udev is the project of a core kernel developer. uSDE, instead, has been developed in isolation, in competition to an established project, and was late to enter the public arena. Whether or not uSDE is, in fact, a better solution, the way in which it has been developed has put it at a disadvantage relative to its competition.
Driver porting
Examining a kobject hierarchy
| This article is part of the LWN Porting Drivers to 2.5 series. |
The core data structure in this investigation is the kobject. In the
diagrams that follow, kobjects will be represented by the small symbol you
see to the right. The upper rectangle represents the kobject's parent
field, while the other two are its entries in the doubly-linked list that
implements a kset. Not all kobjects belong to a kset, so those links will
often be empty.
At the root of the block subsystem hierarchy is a subsystem called
block_subsys; it is defined in drivers/block/genhd.c. As
you'll recall from The Zen of Kobjects, a
subsystem is a very simple structure, consisting of a semaphore and a
kset. The kset will define, in its ktype field, what type of
kobjects it will contain; for block_subsys, this field is set to
ktype_block. Pictorially, we can show this structure as seen on
the right.
Each kset contains its own kobject, and block_subsys is no exception. In this case, the kobject's parent field is explicitly set to NULL (indicated by the ground symbol in the picture). As a result, this kobject will be represented in the top level of the sysfs hierarchy; it is the kobject which lurks behind /sys/block.
A block subsystem is not very interesting without disks. In the block
hierarchy, disks are defined by a struct gendisk, which can be
found in <include/linux/genhd.h>. The gendisk interface is
described in this article. For our
purposes, we will represent a gendisk as seen on the left; note that it has
the inevitable embedded kobject inside it. A gendisk's kobject does not
have an explicit type pointer; its membership in the block_subsys
kset takes care of that. But its parent and kset
pointers both point to the kobject within block_subsys, and the
kset pointers are there too. The result, for a system with two disks,
would be a structure that looks like this:
Things do not end there, however; a gendisk structure is a complicated
thing. It contains, among other things, an array of partition entries (of
type struct hd_struct),
each of which has embedded within it, yes, a kobject. The parent of each
partition is the disk which contains it. It would have been possible to
implement the list of partitions as a kset, but things weren't done that
way. Partitions are a relatively static item, and their ordering matters,
so they were done as a simple array. We depict that array as seen on the
right.
As you can see, the kobject type of a partition is ktype_part. This type implements the attributes you will see in the sysfs entries for each partition, including the starting block number and size.
Another item associated with each gendisk is its I/O request queue. The
queue, too, contains a kobject (of type queue_ktype) whose parent
is the associated gendisk. The I/O scheduler ("elevator") in use with an
I/O request queue is also represented in the hierarchy. The scheduler's
kobject's type depends on which scheduler is being used; the (default)
anticipatory scheduler uses as_ktype. The resulting piece of the
puzzle looks as portrayed on the left.
The request queue and I/O scheduler information in sysfs is currently
read-only. There is no reason, however, why sysfs attributes could not be
used to change I/O scheduling parameters on the fly. The selectable I/O scheduler patch uses sysfs
attributes to change I/O schedulers completely, for example.
Putting it all together
So far, we have seen a number of disconnected pieces. The full diagram can
be found on this page; it
is a bit wide to be placed inline with the text (a small, illegible
version appears to the right). Also on that page, you'll
find a corresponding diagram showing the sysfs names the correspond to each
kobject.
The data structure as described is the full implementation of the /sys/block subtree of sysfs. The full sysfs tree contains rather more than this, of course. For each gendisk which shows up under /sys/block, there will be a separate entry under /sys/devices which describes the underlying hardware. Internally, the link between the two is contained in the driverfs_dev field of the gendisk structure. In sysfs, that link is represented as a symbolic link between the two sub-trees.
Hopefully this series of pictures helps in the visualization of a portion of the sysfs tree and the device model data structure that implements it. The device model brings a great deal of apparent complexity, but, once the underlying concepts are grasped, the whole thing is approachable.
Patches and updates
Kernel trees
Architecture-specific
Core kernel code
Device drivers
Memory management
Networking
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page:
Distributions>>
