Brief items
The current stable 2.6 kernel is 2.6.16.20,
released on June 5. This
one contains several fixes for serious problems; none of them look
immediately security-related, however.
The current 2.6 prepatch is 2.6.17-rc6, released on June 5. There
are enough fixes here that Linus decided to do one more -rc release.
Details can be found in the
long-format changelog.
No patches have been merged into the mainline repository since
-rc6, as of this writing.
The current -mm tree is 2.6.17-rc6-mm1. Recent changes
to -mm include improved force feedback support in the input driver and a
large number of patches related to the locking validator.
Comments (none posted)
Kernel development news
The older policy was to get stuff roughly right, merge it into a
tree then beat on it. Now everyone is blocking anything that is the
slightest imperfect which makes it impossible to add anything large
to the tree because it will *never* be perfect before a merge and
hack session and it will never be perfect in everyones eyes...
Perfection is the enemy of progress and of success. We risk moving
back to the case we got into in 2.4 when merging got so hard that
most vendors shipped kernels bearing no relationship to the
"upstream" tree. Probably worse this time as there is no common
"unofficial" tree like -ac so they will all ship different variants
and combinations.
-- Alan Cox
Comments (14 posted)
The 2.6.17 development cycle is coming to an end, with the final release
likely to happen before the middle of June. So, naturally, the attention
of the kernel developers is turning toward the 2.6.18 cycle. As a way of
encouraging thought on what should happen then, Andrew Morton has posted
a 2.6.18 merge plan summary
describing how he expects to dispose of the patches currently sitting in the
-mm tree. There has been occasional talk of doing a bugfix-only kernel
cycle, but it's clear that 2.6.18 won't be that cycle - there are
a lot of patches tagged for merging.
The features which are expected to be merged are interesting, but they are
best discussed once they hit the mainline repository; until then, their
fate remains uncertain. So, for now, suffice to say that 2.6.18 will
likely include an S/390 hypervisor filesystem, a number of memory
management patches, some software suspend improvements, a new i386 hardware
clock subsystem, some SMP scheduler improvements, the swap prefetch patches (maybe),
priority-inheriting futexes,
a rework of the /proc/pid code, a number of MD (RAID)
improvements, a new kernel-space inotify API, and a bunch of code from
subsystem trees which does not appear in -mm directly. As is usual, a
great deal of code will be flowing into the mainline for the next release.
It can also be interesting to look at what will not be merged. From
Andrew's posting, the following big patch sets are likely to be held back:
- There is a great deal of code which requires action by various
subsystem maintainers. But, says Andrew, "I continue to have
some difficulty getting this material processed." He will step
up his efforts to get responses from maintainers, but some patches
will likely continue to languish.
In particular, some dismay has been expressed regarding how long it
can take to get drivers into the mainline. It seems that, perhaps,
the quality bar is being set too high. It is always possible to find
things to criticize in a body of code, but sometimes the best thing to
do is to proceed with the code one has and improve it as part of an
ongoing process. There is concern that reviewers are insisting on
perfection and keeping out code which is good enough, and which could
be of value to Linux users.
- The acx100
driver supports a useful range of wireless chipsets.
Unfortunately, there are some concerns about how this driver was
developed and whether its inclusion could cause legal problems for
Linux. Until that issue is resolved, this driver is likely to remain
out in the cold.
- The per-task delay accounting patches are sitting on the edge. The
main concern here appears to be that these patches create a new
interface for getting per-task information from the kernel. Any other
new code which exports that sort of information (and a number of
patches exist) will be expected to use this new API. So more review
and discussion may be called for here. There is also a separate patch
set for non-task-oriented statistics which will probably not be merged
this time around for the same reason.
- eCryptfs is uncertain as
well. This filesystem implements its own mechanism for stacking on
top of a base filesystem, but the primary reviewer would rather see
the creation of a generic stacking layer for all to use. This is an
issue which is often encountered by people trying to do new things;
they are asked to make their infrastructure more generic. The intent
is good, but it can cause delays and extra work for developers trying to
add new features.
- The UTS namespaces patch. This patch, which implements a small part
of the container concept, is not particularly useful on its own. So
it will probably wait until more of the container infrastructure is in
place.
- The adaptive readahead
patches are deemed to be too young for now. Some benchmark
results show significant performance improvements from these patches,
but others are less clear.
- Reiser4. Says Andrew: "We need to do something about this. It
does need an intensive review and there aren't many people who have
the experience to do that right, and there are fewer who have the
time. Uptake by a vendor or two would be good." This
filesystem has been waiting on the sidelines for a very long time, and
no prospective merge is yet in sight.
- The generic IRQ code is
said to be "still stabilizing" and more likely to be merged in
2.6.19. That is also the case for the lock validator.
All of this is subject to change when the merge window actually opens.
Developers are making cases for specific patches; Ingo Molnar is asking for
reconsideration of the generic IRQ and lock validator patches, for
example. Watch this space in the coming weeks to see what really happens.
Comments (8 posted)
Kernel bugs are bad news. Among the worst bugs are regressions -
situations where a once-working system breaks after a kernel upgrade. The
kernel developers have been taking an increasingly hard line against
regressions; patches which break working systems will usually be reverted,
even if those patches fix other problems. The idea, as pushed by Linus, is
that once a system works, it should
continue to work into the future.
As it happens, a number of USB users have found that, on upgrading to
2.6.16, their systems do not work anymore. But, in this case, this
"regression" is not seen as such by the developers and is not likely to
change. This issue is a good demonstration of the sort of tradeoffs which
operating systems developers must make.
USB ports can supply power to the devices plugged into them; this power is
sufficient to drive many devices, as well as totally unrelated items (such
as USB-powered LED lamps). There are limits to the amount of power which
can be supplied, however. USB devices will communicate their maximum
current draw to the host, which can then decide whether it has the capacity
available or not. If sufficient power is not available, the device will
not be allowed to configure itself and operate.
There are many rules in the USB specification on how power configuration
should work. One of those applies to unpowered USB hubs - the ones which
lack a power supply of their own. The total current drawn by an unpowered
hub cannot be allowed to exceed what the host can supply; in particular,
the USB specification limits devices on unpowered USB hubs to 100 mA of
current. Even if only one hub port is in use, that single port is limited
to that value, despite the fact that a larger draw should work in that situation.
Prior to 2.6.16, the Linux kernel did not actually check power requirements
before configuring devices. With 2.6.16, however, any device whose stated
maximum power requirement exceeds 100 mA will not be allowed to
configure itself on an unpowered hub. Thus, devices which worked in that
mode in earlier kernels now fail to operate; not all users are entirely
pleased.
The argument has been made that, since these configurations almost always
work in the real world, the kernel should not be shutting them down now.
The fact is, however, that running hardware outside of its specifications
is always a dangerous thing to do. Often one will get away with it, but
sometimes things can fail badly. A fairly large class of USB devices are
mass storage devices; the consequences of power-related problems with these
devices could
include corrupted data and damaged hardware. These are not consequences
which the USB developers wish to inflict on their users, so, instead, they
refuse to operate devices out of their specifications.
To the developers, the fact that some previously-working hardware now fails
to operate is not a regression. It is a bug fix, with the kernel finally
performing some due diligence which should have been happening all along.
They do not intend to change this behavior.
As it happens, it is possible to convince the kernel to override its
good sense and configure the device anyway. It is not easy, however.
Essentially, the steps are this:
Needless to say, this sequence of steps is not entirely easy - and it must
be repeated each time the device is plugged in. For those who are
comfortable writing udev rules, this configuration change can be
automated without too much trouble. Perhaps the desktop environments will
eventually be made smart enough to detect this situation and offer (with
suitable scary warnings) to override the kernel for specific devices. But
it might just be better to buy a powered hub or plug the device directly
into the host.
Comments (14 posted)
A great deal of work has gone into making the Linux scheduler work well on
multiprocessor systems. Whenever it appears to make sense, the scheduler
will shift processes from one CPU to another in order to keep all CPUs
equally busy (in an approximate sense), but, since moving a process is
expensive, the scheduler tries to avoid unnecessary moves. SMP performance
was problematic on early 2.6 releases, but it has been reasonably solid for
the last couple of years.
There is one situation, however, where the current scheduler does not work
as well as one would like. Imagine a simple system with two processors.
If two CPU-bound processes, each running at normal priority, are started on
this system, the scheduler will eventually run one process on each CPU. If
two niced (low-priority) processes (also CPU-bound) are then started, one
would normally expect the scheduler to ensure that those processes get less
CPU time than the normal-priority processes.
If the processes are distributed such that one normal-priority and one
low-priority process end up on each CPU, that expectation will be met; the
low-priority processes will get a relatively small amount of CPU time. It
is just as likely, however, that both normal-priority processes will end up
on the same CPU, with the two low-priority processes on the other. In this
case, the two normal-priority processes will be contending for the same
CPU, while the low-priority processes fight for the other. As a result,
the low-priority processes will get as much CPU time as the others, their
reduced priority notwithstanding. That is almost certainly not what the
user had in mind when the process priorities were set.
The problem is that the scheduler looks only at the length of the run queue
on each CPU, without taking priorities into account. So, in either case
above, the CPUs appear to be equally busy, and no redistribution of
processes will occur. To fix this problem, the load balancing code must be
made to understand that not all running processes are created equal.
A solution can be found in the "smpnice" patch set, implemented by Peter
Williams with input from a number of other developers. The smpnice code
changes the load balancer so that it does not just look at run queue
lengths. Instead, each process is assigned a "load weight," which is
derived from its priority. When load balancing decisions are made, the
scheduler compares total load weights rather than the length of the run
queues. If a load weight imbalance is detected, the scheduler will move a
process to bring things back into line. If the imbalance is large,
high-priority processes will be moved; when the imbalance is small,
however, a low-priority process will be moved instead.
The basic idea makes sense, but this set of patches has been a long time in
development. The scheduling code is full of subtle heuristics which are
easily upset. So early versions of the smpnice patches caused benchmark
regressions and ran into a number of difficulties. For example, a
processor running a very high-priority process will tend to appear to be
the most heavily loaded, with the result that load balancing no longer
occurs between other processors on the system. This problem was fixed by
ignoring processors which have no processes which can be moved. Some load
balancing heuristics which would move high-priority processes were broken,
resulting in suboptimal scheduling decisions; now, if a process would have
the highest priority on the new CPU, it is considered first for moving.
Various stability problems, where processes would oscillate between
processors, have also been ironed out.
With all of these fixes applied, the smpnice code appears to be
stabilizing, with the result that it might just make it into the 2.6.18
kernel. That should improve life for people running multiple-priority
workloads on SMP systems.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
- Marco Costalba: qgit 1.3.
(June 5, 2006)
- Jonas Fonseca: tig 0.4.
(June 6, 2006)
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>