Kernel development [LWN.net]

Kernel release status

The current development kernel is 2.6.0-test3, released by Linus on August 8. Changes this time around include a merge of the SELinux security module, a new print_dev_t() function which is portable across architectures (and dev_t size changes), some power management and software suspend fixups, an ALSA update, a bunch of CPU frequency work, some disk readahead changes (avoiding work if the drive is too busy to do readahead anyway), and, of course, a vast number of fixes. There has also been an API change for block drivers; the Driver Porting series has been updated accordingly. The long-format changelog has the details, as usual.

Linus's BitKeeper tree contains only a small number of fixes as of this writing.

The current stable kernel is 2.4.21; Marcelo released the second 2.4.22 release candidate on August 8 with another set of fixes.

Comments (none posted)

Coming soon: MSI support

Modern hardware manufacturers have a problem: too many pins. Often, one of the most expensive part of a chip (or bus card) is simply connecting all of the wires. A chip that should be small and take up little board space can expand to several square centimeters to make room for the large number of leads required. So the hardware folks are very interested in anything that reduces pin counts; this is part of the motivation behind serial technologies like USB and serial ATA.

One target for pin-chopping engineers is interrupt lines. As a way of eliminating interrupt lines and moving further toward a "legacy free" environment, a (relatively) new PCI bus feature called "message signaled interrupts" (MSI) has been introduced. Essentially, MSI works by moving interrupts onto the data bus with the rest of the data traffic. An MSI-capable device signals an interrupt by writing a specific data value to a special address. The operating system can then trap that write and dispatch the interrupt accordingly.

Someday, in the future, all devices will do MSI and separate interrupt lines will no longer be necessary. Until then, there is one other advantage to the MSI scheme: devices can be assigned multiple message types, which can function as entirely separate interrupts. Thus a complicated device can indicate different situations with different messages, and each will be quickly routed to the appropriate service routine in the driver.

MSI is relatively new, and hardware support for MSI is just beginning to appear. The Linux kernel does not have support for MSI - yet. Tom Nguyen (of Intel) has posted a patch designed to change that state of affairs. His MSI patch is broken into two big chunks. The first adds a layer of indirection ("vector indexing") to the interrupt management code. The second then uses vector indexing to implement full MSI (and MSI-X, an extended version of MSI) support. Included in the patch is a documentation file (MSI-HOWTO.txt) describing MSI and the Linux implementation.

The MSI patch is far from inclusion into the mainline kernel; review on the linux-kernel list has brought out a lot of things that people would like to see changed first. But once things are ironed out, MSI could go in fairly quickly. It's late in the game to be reworking the 2.6 interrupt handling code, but MSI should be ready for an early 2.7 inclusion.

Comments (none posted)

A different ATA driver

Much work has been done through 2.5 to improve the ATA/IDE layer. The work of Bartlomiej Zolnierkiewicz, Alan Cox, and others has brought a great deal of order and correctness to this code. Not everything that might have been hoped for at the beginning of 2.5 has been done, but things have clearly moved in the right direction.

Meanwhile, Jeff Garzik has been quietly developing a completely different driver for ATA drives; he posted libata 0.70 this week. Jeff's driver concentrates on newer hardware, with an emphasis on serial ATA drives. The interesting aspect of libata, however, is the approach it takes: it essentially functions as a translation layer which makes ATA drives appear to be SCSI devices. They are managed by the SCSI layer, and do not appear as IDE drives to the user at all.

This is not as strange a thing to do as one might think. The ATA protocol is heavily influenced by SCSI, so many SCSI commands can be passed through with little processing. But the real advantage of this approach seems to be that it can take advantage of the existing SCSI mid-layer. The SCSI code takes care of a lot of the work, and already supports a number of needed features (such as hotplugging). For a developer who wants to make a new, "legacy free" driver for modern ATA hardware, plugging into the SCSI layer offers a lot of advantages. This is especially true for serial ATA, which presents a lot of SCSI-like handling issues.

libata is not intended to replace the current IDE layer; it makes no attempt to handle the wide range of hardware that the IDE code copes with. It could be, however, the driver that many of us end up using in a couple years or so. Sometimes you have to leave the old stuff behind and look to the future.

Comments (6 posted)

Too many threads?

In a discussion of problems in the current request_firmware() interface (discussed here last May), it was noted that firmware loads sometimes happen too slowly as a result of latency in the workqueue mechanism. The firmware interface uses the default workqueue, meaning that its tasks can wait an unknown amount of time behind other users of that queue. In some situations, at least, it seems that this delay can be too long. So a patch was posted which sets up a dedicated workqueue for firmware loading.

Creating one's own work queue can help with the latency problems, but it also loads the system with another kernel thread for each processor. And some people are starting to get a little unhappy with the number of such threads in 2.6. They are proliferating a bit; a quick check on your editor's mighty dual Pentium 450 system (running -test3) shows some 21 of them:

    2 ?        SW     0:00 [migration/0]
    3 ?        SWN    0:00 [ksoftirqd/0]
    4 ?        SW     0:00 [migration/1]
    5 ?        SWN    0:02 [ksoftirqd/1]
    6 ?        SW<    0:00 [events/0]
    7 ?        SW<    0:00 [events/1]
    8 ?        SW<    0:00 [kblockd/0]
    9 ?        SW<    0:01 [kblockd/1]
   10 ?        SW     0:00 [khubd]
   11 ?        SW     0:00 [kirqd]
   12 ?        SW     0:00 [pdflush]
   13 ?        SW     0:07 [pdflush]
   14 ?        SW     0:17 [kswapd0]
   15 ?        SW<    0:00 [aio/0]
   16 ?        SW<    0:00 [aio/1]
   17 ?        SW     0:00 [scsi_eh_0]
   18 ?        SW     0:00 [ahc_dv_0]
   19 ?        SW     0:00 [kseriod]
  142 ?        SW     0:01 [kjournald]
  143 ?        SW     0:00 [kjournald]
  144 ?        SW     0:05 [kjournald]

Kernel threads are not that expensive, but they do take up some kernel memory and clutter up ps listings. Imagine what the listing would look like on a system with a large number of processors. More to the point, many of these threads are likely to be unnecessary, and that bugs kernel hackers.

As a result, there will probably be a rework of the workqueue mechanism at some point, when somebody feels motivated to do it. One possible change would be to turn the default workqueue into a thread pool of sorts; if no thread is available when schedule_work() is called, a new one is created to handle the task. Some sort of timeout mechanism would trim the threads down when the load drops. It has also been noted that many users of workqueues don't really need a thread for every processor; a single thread would be adequate for the job. An interface change allowing the creator to specify whether per-CPU threads are needed could cut down on the number of threads considerably.

Implementing changes of this nature would not be particularly difficult. Whether a rework of something as fundamental is the workqueue interface is appropriate at this stage of development is another question, however.

Comments (2 posted)

Any flavour you like

Just when you thought that we were safely done with the "spelling fixes" phase for this development series, out comes this patch changing all occurrences of "flavour" in the kernel to "flavor." The patch, of course, drew the usual complaints: spelling fixes are seen by many as useless code churn which breaks things and make it hard for developers to keep their patches in sync with the mainline. There also seems to be a special animosity aimed at anybody who suggests that there should be a preference in the kernel between British or American spelling.

Linus actually jumped into this conversation. He agreed that, perhaps, a variable of type rpc_authflavor_t named authflavour could be confusing, but that was the extent of it.

I think you guys who care should have a huge free-for-all, an electronic mud-wrestling thing if you will. But not on linux-kernel... Tell me when it's over.

For the most part, it would appear that kernel developers can continue to use whichever flavour of spelling they prefer.

Comments (8 posted)

Linus Torvalds Linux 2.6.0-test3 ?

Andrew Morton 2.6.0-test3-mm1 ?

Andrew Morton 2.6.0-test3-mm2 ?

Randy.Dunlap 2.6.0-test3-kj1 patchset ?

Andrew Morton 2.6.0-test2-mm5 ?

Randy.Dunlap 2.6.0-test2-bk6-KJ patchset ?

Marcelo Tosatti Linux 2.4.22-rc2 ?

Alan Cox Linux 2.4.22-rc2-ac1 ?

Alan Cox Linux 2.4.21-rc1-ac1 ?

Jeff Dike UML filesystems ?

Con Kolivas O14int ?

Con Kolivas O14.1int ?

Con Kolivas O15int for interactivity ?

Nigel Cunningham Announce: swsusp 1.1-rc2 ?

Nigel Cunningham Announce: swsusp 1.1-pre3 ?

Roland McGrath read_trylock for i386 ?

Robert Williamson Linux Test Project August Release Announcement ?

Matt Mackall Netconsole debugging tool for 2.6 ?

Gerd Knorr v4l: sysfs'ify videodev ?

long Updated MSI Patches ?

Jeff Garzik 2.6.x net driver updates ?

Jeff Garzik RFR: new SiS gige driver ?

Jeff Garzik libata update posted ?

Domen Puncer Plustek scanner driver (pt_drv) port to 2.6 ?

Bartlomiej Zolnierkiewicz kill HDIO_GETGEO_BIG_RAW ioctl ?

Greg KH More PCI fixes for 2.6.0-test2 ?

Greg KH More USB fixes for 2.6.0-test2 ?

Suparna Bhattacharya Readahead issues and AIO read speedup ?

Alex Tomas file extents for EXT3 ?

Matt Mackall Make cryptoapi non-optional? ?

kartikey bhatt CAST5 Cipher Algorithm for Kernel Cryptographic API. ?

Cliff White linux-2.6.0-test2 reaim results (flat text) ?

Paul Larson Ltp regression test results for 2.6.0-test3, bk1, mm1, mm2 ?

Cyril Bortolato gmodconfig 0.4 is released ?

Kurt Garloff scsidev-2.30 released ?

Kernel development

Brief items

Kernel release status

Kernel development news

Coming soon: MSI support

A different ATA driver

Too many threads?

Any flavour you like

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Filesystems and block I/O

Security-related

Benchmarks and bugs

Miscellaneous