Brief items
The current stable 2.6 release is 2.6.13.4,
released on October 10. It
contains a small number of security-related fixes, a fix for the elusive
Sparc FPU bug, and a few other patches.
The current 2.6 prepatch is 2.6.14-rc4, announced by Linus on
October 10. This will be, he says, the last -rc release before 2.6.14
comes out. It contains mostly fixes, but there's also some driver updates, a new
Megaraid SAS driver, and a new gfp_t type which has caused a
prototype change for many internal functions which perform memory
allocations (see below). The details may be found in the
long-format changelog.
There have been no -mm releases since 2.6.14-rc2-mm2 came out on
September 29.
Comments (none posted)
Kernel development news
In general, if you act like I've got all the attention span of a slightly
retarded golden retriever, you'll be pretty close to the mark.
--
Linus Torvalds
Comments (none posted)
Those of you who were watching in the early days of Linux kernel
development will remember a series of web sites which consisted of a list
of kernel releases and the changes to be found in each. Maintaining such a
site is a considerable amount of work, however, and no such site has been
operating for some time now. That has just changed, however, with Diego
Calleja's
announcement of his
LinuxChanges page,
hosted on the KernelNewbies site. The entries go all the way back to
2.5.1 (released almost four years ago) and provide a list of relevant
changes for each release. It is a useful site which, one hopes, will
be kept current for a long time to come.
For those who are interested in the many projects underway in the
networking subsystem, a visit to the new linux-net wiki may
be in order. Visitors cannot help being struck by the amount of work which
is going on in this area.
Comments (none posted)
Most kernel functions which deal with memory allocation take a set of "GFP
flags" as an argument. These flags describe the allocation and how it
should be satisfied; among other things, they control whether it is
possible to sleep while waiting for memory, whether high memory can be
used, and whether it is possible to call into the filesystem code. The
flags are a simple integer value, and that leads to a potential problem:
coding errors could result in functions being called with incorrect
arguments. An occasional error has turned up where function arguments
have gotten confused (usually through ordering mistakes). The resulting
bugs can be strange and hard to track down.
A while back, the __nocast attribute was added to catch these
mistakes. This attribute simply says that automatic type coercion should
not be applied; it is used by the sparse utility. A more complete
solution is on the way, now, in the form of a new gfp_t type. The
patch defining this type, and changing
several kernel interfaces, was posted by Al Viro and merged just before
2.6.14-rc4 came out. There are several more patches in the series, but
they have evidently been put on hold for now.
The patches are surprisingly large and intrusive; it turns out that quite a
few kernel functions accept GFP flags as arguments. For all that, the
actual code generated does not change, and the code, as seen by
gcc, changes very little. Once the patch set is complete,
however, it will allow comprehensive type checking of GFP flag arguments,
catching a whole class of potential bugs before they bite anybody.
Comments (5 posted)
One of the many features which will be shipped with the 2.6.14 kernel will
be a driver for the "hard drive active protection system" found in some
ThinkPad laptops. This system provides a set of sensors, and, in
particular, an accelerometer which can report on the position of the laptop
- and how quickly that position is changing. There are a number of
applications of such device - such as
a version of neverball
played by tipping the laptop. The real purpose, however, is to enable the
system to react to a fall and attempt to protect the hard drive.
The next step in the implementation of that purpose is the hard drive protection patch
recently posted by Jon Escombe. This patch adds two new callbacks to the
block request queue which drivers can provide:
typedef int (issue_protect_fn) (request_queue_t *);
typedef int (issue_unprotect_fn) (request_queue_t *);
If the driver provides these functions, the request queue, as seen in
sysfs, will contain a new protect attribute. If a value is
written to that attribute, the block system will interpret it as an integer
number of seconds. The issue_protect_fn() will be called, and the
request queue will be plugged for the indicated number of seconds. When
that time expires, issue_unprotect_fn() will be called and the
queue will be restarted.
The theory of operation here is that a user-space daemon will be monitoring
the status of the system, as reported by the accelerometer. Should this
daemon note that the laptop has begun to accelerate, it will quickly write
a value to the protect attribute for each drive in the system.
The drives will respond by parking the disk heads, and, in any other
possible way, telling the drive to crawl into its shell and prepare for
impact. Once the event has transpired, the shattered remains of the laptop
can attempt to resume normal operation.
The idea seems reasonable, but block maintainer Jens Axboe has turned down the patch for now. Says Jens:
We have far too many queue hooks already, adding two more for a
relatively obscure use such as this one is not a good idea.
The number of request queue callbacks is indeed large. Some of them have
little to do with drivers (there's one which is called whenever disk
activity happens, for example; it can be used to flash a keyboard LED in
the absence of a hardware disk activity light), but others, such as the
ones discussed here, are direct requests to the underlying block driver.
The use of callbacks seems a little redundant in this situation, given that
the request queue is, fundamentally, a mechanism for conveying commands to
block drivers. The right solution might thus be to use the request queue
to carry commands beyond those requesting the movement of blocks to and
from the drive.
To an extent, the request queue is already used this way. Packet commands,
ATA task file commands, and power management commands can be fed to drivers
through the queue. In each case, the flags field of struct
request is used to indicate that something special is being
requested. The use of flags in this way is getting a little
unwieldy, however, leading to the consideration of a new approach.
That approach, as seen in a patch held by Jens, is to add a new field
(cmd_type) to struct request which indicates the type of
command embodied by each request. Currently-anticipated types include
packet commands, sense requests, power management commands, flush requests,
driver-specific special requests, and Linux-specific, generic requests.
Oh, and the occasional request to move a disk block in one direction or the
other. The addition of cmd_type turns struct request
into a generic carrier of commands to a disk drive.
With this mechanism in place, the "brace yourself, we're falling!" message
becomes just another Linux-specific block request type. When such an event
happens, the kernel need only place one of those messages on the queue -
preferably at the head of the queue - and call the driver's
request() function. The driver can then prepare the drive for the
coming catastrophe and plug the queue itself. No additional callbacks
required.
This approach does involve some significant changes to the block layer,
however, and would include a driver API change. So it is not likely to
take a quick path into the kernel. The hard drive protection mechanism,
which will require the new API, thus looks likely to wait in line for a
while yet.
Comments (15 posted)
Readahead is a technique employed by the kernel in an attempt to improve
file reading performance. If the kernel has reason to believe that a
particular file is being read sequentially, it will attempt to read blocks
from the file into memory before the application requests them. When
readahead works, it speeds up the system's throughput, since the reading
application does not have to wait for its requests. When readahead fails,
instead, it generates useless I/O and occupies memory pages which are
needed for some other purpose.
The current kernel readahead implementation uses a window 128KB in length.
When readahead seems appropriate, the kernel will speculatively bring in
the next 128KB of file data. If the application continues to read
sequentially through that data, the next 128KB chunk will be brought in
when the application is part-way through the first one. This
implementation works, but Wu Fengguang thinks that it can be made better.
In particular, Wu thinks that the fixed readahead window size should,
instead, adapt to both the application's behavior and the global state of
the system. His adaptive readahead patch
is an implementation of this thought. It is a work of daunting complexity,
but the core ideas are reasonably straightforward.
The adaptive readahead patch tries to balance two constraints: readahead
should be performed aggressively, but not to the point that the system
starts thrashing or readahead pages get recycled before the application
uses them. Every time a readahead decision is to be made for a specific
file, the adaptive code looks at how much memory is available for
readahead and how quickly the application has been working through the
file. If memory is tight, or if the disk holding the file is congested,
readahead will not be performed at all.
The code also looks at the pressure on the inactive page lists and tries to
figure out whether any readahead pages are in danger of falling off that
list and being reclaimed. In that situation, the readahead pages will be
moved back up the list, keeping them in memory for a bit longer. This
"rescue" operation helps to keep previous readahead work from being wasted;
since it is only performed when the application consumes data from the
file, it will not happen if the reading process has stalled entirely. But,
when the application is working through the data, it will get
another chance to benefit from readahead which has already been performed.
No more readahead will be started in that situation, however.
If, instead, the application is making use of its readahead pages and the
memory is available, the readahead window can grow up to 1MB. For
streaming media or data processing applications which work their way
sequentially through large files, this enlarged window can lead to
significant performance gains.
In fact, Wu claims results which are "pretty optimistic." They include a
20-100% improvement for applications doing parallel reads, and the ability
to run 800 1KB/sec simultaneous streams on a 64MB system without
thrashing. The page cache hit rate is claimed to be 91%, which is quite
good.
The adaptive readahead patch might, thus, be a worthwhile addition to the
Linux memory management subsystem. There has been little discussion (none,
actually) of the patch on the list, however. Complicated patches working
in an obscure corner of memory management do not receive the same level of
review as, say, new filesystems, it would seem. In any case, a patch of
this nature will require a good deal of testing before it can be considered
for any sort of merge. So, while adaptive readahead may indeed make its
way into the mainline, it's not something to expect to see in the very near
future.
Comments (4 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>