The current 2.6 prepatch remains 2.6.19-rc5
; no prepatches have been
released in the last week. Enough patches have found their way into the
mainline git repository that a 2.6.19-rc6 release will probably happen
before this kernel cycle runs its course.
The current -mm tree is 2.6.19-rc5-mm2. Recent changes
to -mm include the fault injection capability (see below), file-based
capabilities, and a backport of the ext3 reservation code to ext2.
For 2.6.16 users, Adrian Bunk has released 126.96.36.199 with a number of fixes.
Comments (none posted)
Kernel development news
70% hit a bug
1/7th think it's deteriorating
1/4th think lkml response is inadequate
3/5ths think bugzilla response is inadequate
2/5ths think we have features-vs-stability wrong
2/3rds hit a bug. Of those, 1/3rd remain unfixed
1/5th of users are presently impacted by a kernel bug
Happy with that?
-- Andrew Morton
Comments (11 posted)
The time stamp counter (TSC) is a hardware feature found on a number of
contemporary processors. The TSC is a special register which is simply
incremented every clock cycle. Since the clock is the fundamental unit of
time as seen by the processor, the TSC provides the highest-resolution
timing information available for that processor. It can thus be used for a
number of applications, such as measuring the exact time cost of specific
instructions or operations.
The TSC can also be read quickly (it is just a CPU register, after all),
making it of interest for system timekeeping. There are a lot of
applications which check the current time frequently, to the point that
gettimeofday() is one of the most performance-critical system
calls in Linux. By using the TSC to interpolate within the resolution of a coarser
clock, the system can give accurate, high-resolution time without taking a
lot of time in the process.
That is the idea, anyway. In practice, the TSC turns out to be hard to use
in this way. If the CPU frequency changes (as it will on CPUs which can
vary their power consumption), the TSC rate will change as well. If the
processor is halted (as can happen when it goes idle), the TSC may stop
altogether. On multiprocessor systems, the TSCs on different processors
may drift away from each other over time - leading to a situation where a
process could read a time on one CPU, move to a second processor, and
encounter a time earlier than the one it read on the first processor.
These challenges notwithstanding, the Linux kernel tries to make the best
use of the TSC possible. The code which deals with the TSC contains a
number of checks to try to detect situations where TSC-based time might not
be reliable. One of those checks, in particular, compares TSC time against
the jiffies count, which is incremented by way of the timer tick. If,
after ten seconds' worth of ticks, the number of TSC cycles seen differs
from what would have been expected, the kernel concludes that the TSC is
not stable and stops using it for time information.
Interesting things happen when the dynamic tick patch is thrown into the
mix. With dynamic ticks, the periodic timer interrupt is turned off
whenever there's nothing to be done in the near future, allowing the
processor to remain idle for longer and consume less power. Once something
happens, however, the jiffies count must be updated to reflect the
timer ticks which were missed - something which is generally done by
obtaining the time from another source. At best, this series of events
defeats the test which ensures that the TSC is operating in a stable
manner; at worst, it can lead to corrupted system time. Not a good state
For this reason, the recently-updated high-resolution timers and dynamic
tick patch set includes a change which disables use of the TSC. It
seems that the high-resolution timers and dynamic tick features are
incompatible with the TSC - and that people configuring kernels must choose
between the two. Since the TSC does have real performance benefits,
disabling it has predictably made some people unhappy, to the point that
some would prefer to see the timer patches remain out of the kernel for
In response to the objections, Ingo Molnar has explained things this way:
We just observed that in the past 10 years no generally working
TSC-based gettimeofday was written (and i wrote the first version
of it for the Pentium, so the blame is on me too), and that we
might be better off without it. If someone can pull off a working
TSC-based gettimeofday() implementation then there's no objection
Ingo has also posted a test program which
demonstrates that time inconsistencies on TSC-based systems are common - at
least, when multiple processors are in use.
Arjan van de Ven has suggested a "duct
tape" solution which might work well enough "to keep the illusion alive."
It involves setting up offsets and multipliers for each processor's TSC.
Between the offsets (which could compensate for TSC drift between
processors) and the multipliers (which adjust for frequency changes), some
semblance of synchronized and accurate TSC-based time could be maintained -
as long as the kernel is able to detect TSC-related events and adjust those
values accordingly. No code which implements this idea has yet been
The conversation faded out with no real conclusion, though, near the end,
Thomas Gleixner did note that the complete
disabling of the TSC was "overkill." The preferred solution, which he is
working on, is to keep the system from going into the dynamic tick mode if
there is no other reliable timer available. Once that code has been
posted, it may be possible to have the full set: high-resolution timers,
dynamic ticks, and fast clocks using the TSC.
Comments (10 posted)
Some kernel developers, doubtless, feel that their systems fail too often
as it is; they certainly would not go out looking for ways to make more
trouble. Others, however, are most interested in how their code behaves
when things go wrong. As your editor recently discovered
to his chagrin, error paths tend to be debugged rather less well than the
"normal" code. One can try to anticipate possible failures and try to code
the right response, but it can be hard to actually test that code. So
error-handling paths can be incorrect (or missing) but the code will appear
to work - until something blows up.
In an attempt to help test kernel error handling, Akinobu Mita has been
working for some time on a framework for injecting faults into a running
kernel. By causing things to go wrong occasionally, the fault injection
code should help to ensure that error situations are handled - and handled
correctly. This mechanism has found its way into 2.6.19-rc5-mm2 where, hopefully,
it will be employed by developers to make sure that their code is
The framework can cause memory allocation failures at two levels: in the
slab allocator (where it affects kmalloc() and most other
small-object allocations) and at the page allocator level (where it affects
everything, eventually). There are also hooks to cause occasional disk I/O
operations to fail, which should be useful for filesystem developers. In
both cases, there is a flexible runtime configuration infrastructure, based
on debugfs, which will let developers focus fault injections into a
specific part of the kernel.
Your editor built a version of 2.6.19-rc5-mm2 with the fault injection
capability turned on. For whatever reason, the configuration system
insisted that the locking validator be enabled too; perhaps somebody
injected a fault into the config scripts. In any case, the resulting
kernel exports a directory (in debugfs) for each of the available fault
So, for example, the slab allocation capability has a directory
failslab. At system boot, failure injection is turned off; slab
failures can be enabled by writing an integer value to the
failslab/probability file. The value
written there will be interpreted as the percent probability that any given allocation
will fail; so writing "5" will cause a 5% failure rate. For
situations where a failure rate of less than 1% (but greater than zero) is
needed, there is a separate interval value which further filters
the result. So a 0.1% failure rate could be had by setting
interval to 1000 and probability to 100 - preferably in
that order. There is also a times variable which puts an upper
limit on the number of failures which will be simulated.
As it happens, randomly injecting failures into the kernel as a whole does
not necessarily lead to a lot of useful information for a developer, who is
probably interested in the behavior of a specific subsystem. There is only
so long that one can put up with basic shell commands failing while trying
to make something happen in one particular driver. So there are a number
of options which can be used to focus the faults on a particular part of
the kernel. These include:
- task-filter: if this variable is set to a positive value, faults will
only be injected when a specially-marked processes are running. To
enable this marking, each process has a new flag
(make-it-fail) in its /proc directory; setting that
value to one will cause faults to be injected into that process.
- address-start and address-stop: if these values are
set, fault injection will be concentrated on the code found within the
address range specified. As long as any entry within the call chain
is inside that address range, the fault injection code will consider
causing a failure.
- ignore-gfp-wait: if this value is set to one, only
non-waiting (GFP_ATOMIC) allocations will potentially fail.
There is also a ignore-gfp-highmem option which will cause
failures not to be injected into high-memory allocations.
Various other options exist; there is also a set of boot options
for turning on injection which might be useful for debugging early system
initialization. The documentation file has
the details. Also found in the documentation directory are a couple of
scripts for concentrating faults on a specific command or module.
The end result of all this is a useful tool. One need not just hope that
the error recovery paths in a piece of kernel code will just work properly;
it is now possible to actually run them and see what happens. This should
lead to a better tested, more robust kernel in the near future, and that
can only be a good thing.
Comments (6 posted)
The Atheros family of wireless chipsets finds its way into a number of
network adapters and laptop systems. It is a flexible and capable device,
with one little limitation: there is no free Linux driver available. Linux
support can be had via the freely-downloadable MadWifi driver
, but, at the core of that
driver, there is a binary-only "hardware access layer" (HAL) module which
does much of the real work. This module has all of the problems associated
with proprietary drivers: it cannot be audited or fixed, it cannot be
improved, it is only available for the kernel versions and architectures
supported by the manufacturer, etc. But, for Linux users, the choices are
MadWifi or nothing.
A free Atheros HAL module called "ar5k," written by Reyk Floeter, has been
in circulation for a couple of years; OpenBSD uses it. But this code has
long been followed by allegations that it was improperly developed and
potentially subject to copyright claims by Atheros. In the current
climate, nobody wants to risk bringing possibly tainted code into the
kernel; the potential consequences are just too severe. So, while the
desire to support Atheros devices in Linux remains strong, the existing HAL
has not been considered and little work has been done to bring that about.
Except that, as it turns out, work has been quietly happening in an
unexpected place. The Software Freedom Law Center was asked by the ar5k
developers to look at the development history of the code and come up with
a pronouncement on whether it was legitimate (from a copyright law
perspective) or not. On November 14, the SFLC produced its answer:
SFLC has made independent inquiries with the OpenBSD team regarding
the development history of ar5k source. The responses received
provide a reasonable basis for SFLC to believe that the OpenBSD
developers who worked on ar5k did not misappropriate code, and that
the ar5k implementation is OpenBSD's original copyrighted work.
This finding should clear the way for the entry of the free Atheros HAL
into the Linux kernel - eventually. But there are a couple of problems
which need to be overcome first.
One of those is the general level of upheaval in the Linux wireless subsystem.
The developers still intend to move over to the Devicescape stack and to
get that code into the mainline, but there is still work to be done in that
area. But a new wireless driver which does not work with Devicescape will
have a harder path into the kernel. There is an effort to move MadWifi
over to Devicescape (it's called "DadWifi"), so that might be the quickest
path for Atheros support to get into the kernel.
The other problem, however, is that code based on the HAL concept tends to
be unpopular at best. A HAL is typically seen as an unnecessary
abstraction layer between the driver and the hardware which serves to
obscure what's really going on while adding no real value of its own. So
developers who propose HAL-based drivers are usually told to go away and
come back once the HAL is gone. There is no real reason to expect things
to happen differently this time around.
But, even if it can't be used directly, the ar5k code is now fair game for
reference and eventual adaptation into a Linux driver. There are enough
developers out there with an interest in making Atheros adapters work
that the chances of this work getting done in the (relatively) near future
are relatively good. The list of devices which are not supported by Linux
is about to get shorter.
Comments (8 posted)
Patches and updates
Core kernel code
- Junio C Hamano: GIT 1.4.4.
(November 15, 2006)
Filesystems and block I/O
Virtualization and containers
Page editor: Jonathan Corbet
Next page: Distributions>>