User: Password:
Subscribe / Log in / New account

Kernel development

Brief items

Kernel release status

The current stable 2.6 kernel is, released on June 5. This one contains several fixes for serious problems; none of them look immediately security-related, however.

The current 2.6 prepatch is 2.6.17-rc6, released on June 5. There are enough fixes here that Linus decided to do one more -rc release. Details can be found in the long-format changelog.

No patches have been merged into the mainline repository since -rc6, as of this writing.

The current -mm tree is 2.6.17-rc6-mm1. Recent changes to -mm include improved force feedback support in the input driver and a large number of patches related to the locking validator.

Comments (none posted)

Kernel development news

Quote of the week

The older policy was to get stuff roughly right, merge it into a tree then beat on it. Now everyone is blocking anything that is the slightest imperfect which makes it impossible to add anything large to the tree because it will *never* be perfect before a merge and hack session and it will never be perfect in everyones eyes...

Perfection is the enemy of progress and of success. We risk moving back to the case we got into in 2.4 when merging got so hard that most vendors shipped kernels bearing no relationship to the "upstream" tree. Probably worse this time as there is no common "unofficial" tree like -ac so they will all ship different variants and combinations.

-- Alan Cox

Comments (14 posted)

What's not going into 2.6.18

The 2.6.17 development cycle is coming to an end, with the final release likely to happen before the middle of June. So, naturally, the attention of the kernel developers is turning toward the 2.6.18 cycle. As a way of encouraging thought on what should happen then, Andrew Morton has posted a 2.6.18 merge plan summary describing how he expects to dispose of the patches currently sitting in the -mm tree. There has been occasional talk of doing a bugfix-only kernel cycle, but it's clear that 2.6.18 won't be that cycle - there are a lot of patches tagged for merging.

The features which are expected to be merged are interesting, but they are best discussed once they hit the mainline repository; until then, their fate remains uncertain. So, for now, suffice to say that 2.6.18 will likely include an S/390 hypervisor filesystem, a number of memory management patches, some software suspend improvements, a new i386 hardware clock subsystem, some SMP scheduler improvements, the swap prefetch patches (maybe), priority-inheriting futexes, a rework of the /proc/pid code, a number of MD (RAID) improvements, a new kernel-space inotify API, and a bunch of code from subsystem trees which does not appear in -mm directly. As is usual, a great deal of code will be flowing into the mainline for the next release.

It can also be interesting to look at what will not be merged. From Andrew's posting, the following big patch sets are likely to be held back:

  • There is a great deal of code which requires action by various subsystem maintainers. But, says Andrew, "I continue to have some difficulty getting this material processed." He will step up his efforts to get responses from maintainers, but some patches will likely continue to languish.

    In particular, some dismay has been expressed regarding how long it can take to get drivers into the mainline. It seems that, perhaps, the quality bar is being set too high. It is always possible to find things to criticize in a body of code, but sometimes the best thing to do is to proceed with the code one has and improve it as part of an ongoing process. There is concern that reviewers are insisting on perfection and keeping out code which is good enough, and which could be of value to Linux users.

  • The acx100 driver supports a useful range of wireless chipsets. Unfortunately, there are some concerns about how this driver was developed and whether its inclusion could cause legal problems for Linux. Until that issue is resolved, this driver is likely to remain out in the cold.

  • The per-task delay accounting patches are sitting on the edge. The main concern here appears to be that these patches create a new interface for getting per-task information from the kernel. Any other new code which exports that sort of information (and a number of patches exist) will be expected to use this new API. So more review and discussion may be called for here. There is also a separate patch set for non-task-oriented statistics which will probably not be merged this time around for the same reason.

  • eCryptfs is uncertain as well. This filesystem implements its own mechanism for stacking on top of a base filesystem, but the primary reviewer would rather see the creation of a generic stacking layer for all to use. This is an issue which is often encountered by people trying to do new things; they are asked to make their infrastructure more generic. The intent is good, but it can cause delays and extra work for developers trying to add new features.

  • The UTS namespaces patch. This patch, which implements a small part of the container concept, is not particularly useful on its own. So it will probably wait until more of the container infrastructure is in place.

  • The adaptive readahead patches are deemed to be too young for now. Some benchmark results show significant performance improvements from these patches, but others are less clear.

  • Reiser4. Says Andrew: "We need to do something about this. It does need an intensive review and there aren't many people who have the experience to do that right, and there are fewer who have the time. Uptake by a vendor or two would be good." This filesystem has been waiting on the sidelines for a very long time, and no prospective merge is yet in sight.

  • The generic IRQ code is said to be "still stabilizing" and more likely to be merged in 2.6.19. That is also the case for the lock validator.

All of this is subject to change when the merge window actually opens. Developers are making cases for specific patches; Ingo Molnar is asking for reconsideration of the generic IRQ and lock validator patches, for example. Watch this space in the coming weeks to see what really happens.

Comments (8 posted)

Putting a lid on USB power

Kernel bugs are bad news. Among the worst bugs are regressions - situations where a once-working system breaks after a kernel upgrade. The kernel developers have been taking an increasingly hard line against regressions; patches which break working systems will usually be reverted, even if those patches fix other problems. The idea, as pushed by Linus, is that once a system works, it should continue to work into the future.

As it happens, a number of USB users have found that, on upgrading to 2.6.16, their systems do not work anymore. But, in this case, this "regression" is not seen as such by the developers and is not likely to change. This issue is a good demonstration of the sort of tradeoffs which operating systems developers must make.

USB ports can supply power to the devices plugged into them; this power is sufficient to drive many devices, as well as totally unrelated items (such as USB-powered LED lamps). There are limits to the amount of power which can be supplied, however. USB devices will communicate their maximum current draw to the host, which can then decide whether it has the capacity available or not. If sufficient power is not available, the device will not be allowed to configure itself and operate.

There are many rules in the USB specification on how power configuration should work. One of those applies to unpowered USB hubs - the ones which lack a power supply of their own. The total current drawn by an unpowered hub cannot be allowed to exceed what the host can supply; in particular, the USB specification limits devices on unpowered USB hubs to 100 mA of current. Even if only one hub port is in use, that single port is limited to that value, despite the fact that a larger draw should work in that situation.

Prior to 2.6.16, the Linux kernel did not actually check power requirements before configuring devices. With 2.6.16, however, any device whose stated maximum power requirement exceeds 100 mA will not be allowed to configure itself on an unpowered hub. Thus, devices which worked in that mode in earlier kernels now fail to operate; not all users are entirely pleased.

The argument has been made that, since these configurations almost always work in the real world, the kernel should not be shutting them down now. The fact is, however, that running hardware outside of its specifications is always a dangerous thing to do. Often one will get away with it, but sometimes things can fail badly. A fairly large class of USB devices are mass storage devices; the consequences of power-related problems with these devices could include corrupted data and damaged hardware. These are not consequences which the USB developers wish to inflict on their users, so, instead, they refuse to operate devices out of their specifications.

To the developers, the fact that some previously-working hardware now fails to operate is not a regression. It is a bug fix, with the kernel finally performing some due diligence which should have been happening all along. They do not intend to change this behavior.

As it happens, it is possible to convince the kernel to override its good sense and configure the device anyway. It is not easy, however. Essentially, the steps are this:

  • Run lsusb -v and find the entry for the device of interest. Your editor's USB mouse, for example, is described by an entry starting "Bus 001 Device 003: ID 046d:c01b Logitech, Inc. MX310 Optical Mouse". This mouse is plugged into a hub listed previously as being "Bus 001, Device 002". Together, these numbers turn into a path number "1-2.3". This number is important.

  • Under that same device entry will be found one or more possible device configurations, along with their associated power requirements. Each of these configurations includes a bConfigurationValue number describing it. The number associated with the desired configuration must be found; in many cases it is one.

  • Force the device configuration with a line like:

        echo -n 1 > /sys/bus/usb/devices/1-2.3/bConfigurationValue

    The configuration values and path number must be replaced with the actual values determined from the lsusb output.

Needless to say, this sequence of steps is not entirely easy - and it must be repeated each time the device is plugged in. For those who are comfortable writing udev rules, this configuration change can be automated without too much trouble. Perhaps the desktop environments will eventually be made smart enough to detect this situation and offer (with suitable scary warnings) to override the kernel for specific devices. But it might just be better to buy a powered hub or plug the device directly into the host.

Comments (14 posted)


A great deal of work has gone into making the Linux scheduler work well on multiprocessor systems. Whenever it appears to make sense, the scheduler will shift processes from one CPU to another in order to keep all CPUs equally busy (in an approximate sense), but, since moving a process is expensive, the scheduler tries to avoid unnecessary moves. SMP performance was problematic on early 2.6 releases, but it has been reasonably solid for the last couple of years.

There is one situation, however, where the current scheduler does not work as well as one would like. Imagine a simple system with two processors. If two CPU-bound processes, each running at normal priority, are started on this system, the scheduler will eventually run one process on each CPU. If two niced (low-priority) processes (also CPU-bound) are then started, one would normally expect the scheduler to ensure that those processes get less CPU time than the normal-priority processes.

If the processes are distributed such that one normal-priority and one low-priority process end up on each CPU, that expectation will be met; the low-priority processes will get a relatively small amount of CPU time. It is just as likely, however, that both normal-priority processes will end up on the same CPU, with the two low-priority processes on the other. In this case, the two normal-priority processes will be contending for the same CPU, while the low-priority processes fight for the other. As a result, the low-priority processes will get as much CPU time as the others, their reduced priority notwithstanding. That is almost certainly not what the user had in mind when the process priorities were set.

The problem is that the scheduler looks only at the length of the run queue on each CPU, without taking priorities into account. So, in either case above, the CPUs appear to be equally busy, and no redistribution of processes will occur. To fix this problem, the load balancing code must be made to understand that not all running processes are created equal.

A solution can be found in the "smpnice" patch set, implemented by Peter Williams with input from a number of other developers. The smpnice code changes the load balancer so that it does not just look at run queue lengths. Instead, each process is assigned a "load weight," which is derived from its priority. When load balancing decisions are made, the scheduler compares total load weights rather than the length of the run queues. If a load weight imbalance is detected, the scheduler will move a process to bring things back into line. If the imbalance is large, high-priority processes will be moved; when the imbalance is small, however, a low-priority process will be moved instead.

The basic idea makes sense, but this set of patches has been a long time in development. The scheduling code is full of subtle heuristics which are easily upset. So early versions of the smpnice patches caused benchmark regressions and ran into a number of difficulties. For example, a processor running a very high-priority process will tend to appear to be the most heavily loaded, with the result that load balancing no longer occurs between other processors on the system. This problem was fixed by ignoring processors which have no processes which can be moved. Some load balancing heuristics which would move high-priority processes were broken, resulting in suboptimal scheduling decisions; now, if a process would have the highest priority on the new CPU, it is considered first for moving. Various stability problems, where processes would oscillate between processors, have also been ironed out.

With all of these fixes applied, the smpnice code appears to be stabilizing, with the result that it might just make it into the 2.6.18 kernel. That should improve life for people running multiple-priority workloads on SMP systems.

Comments (none posted)

Patches and updates

Kernel trees


Core kernel code

Development tools

  • Marco Costalba: qgit 1.3. (June 5, 2006)
  • Jonas Fonseca: tig 0.4. (June 6, 2006)

Device drivers


Filesystems and block I/O

Memory management




Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds