Kernel release status
The current development kernel is 2.5.20, which was
released by Linus on June 2. Big changes this time
include a large ACPI merge, a bunch more buffer/VM work, a PowerPC64 merge,
the usual set of IDE patches, various merges from the -dj series, some
device model work, and numerous other fixes and updates. The
long format changelog is also available.
Other releases from Linus since the last LWN Kernel Page include:
- 2.5.19 (short, long). Changes include more block, buffer,
and IDE layer work, some enhancements to the driver model code, more
kbuild tweaks, and many other fixes and updates.
- 2.5.18 (short, long). This one included the software
suspend patch (as covered in the May 23 LWN
Kernel Page), a bunch of kbuild fixes (which are not Keith Owens's
new kbuild system - see below), more IDE reworking, more VFS changes,
and a bunch of other fixes and improvements.
The current prepatch from Dave Jones is 2.5.20-dj3. The most significant feature of
this patch, perhaps, is the merging of some small pieces of the
kbuild 2.5 code.
The latest 2.5 status summary from Guillaume
Boissiere came out on June 5.
The current stable kernel release is 2.4.18. Marcelo's plan had
been to create a 2.4.19 release candidate, but some problems turned up. So
he released 2.4.19-pre10 instead. A very
long list of fixes got into this release. With luck, the next prepatch
from Marcelo will be the first 2.4.19 release candidate.
Alan Cox has released 2.4.19-pre10-ac2; arguably the most interesting change in this prepatch is the inclusion of the "speakup" console module for blind users.
Comments (none posted)
A new way of block queue plugging
Jens Axboe has posted
a patch which, once
again, changes some of the main assumptions underlying the block I/O
subsystem. It is worth a look at what is going on.
A longstanding feature of the block layer has been "queue plugging." If
the request queue for a particular block device has been plugged, that
device's driver will not be invoked to execute the operations in the
queue. The main reason for plugging has been to allow the block layer to
build up a backlog of requests, so that adjacent operations can be merged.
By sometimes waiting a little longer to start an operation, the block layer
can often achieve better performance overall.
With the 2.5 block layer, however, there is less need for this sort of
plugging. The code works harder at not splitting large requests in the
first place, so it is not necessary to merge them again. The new plugging
code actually serves a different purpose: it is a mechanism by which a
block driver can indicate that it is busy and can not handle any more
requests at the moment.
As Jens points out in his patch, the block code is starting to look a
little (a little!) bit more like the networking subsystem. Like network
interfaces, block devices can have multiple requests outstanding. When the
device has been given all the simultaneous requests that it can handle,
there is no point in further troubling the driver until some of those
requests complete. Thus the new plugging code: block devices, too, can ask
to be allowed to work in peace for a while.
There's a couple of other, incidental changes in this patch. One is that
the venerable tq_disk task queue has been removed. Slowly, the
set of standard task queues is shrinking. A block driver's
request ("strategy") function is also now called out of a
tasklet. The block layer that shows up in 2.6 will be vastly different
from what has been seen in previous stable kernels.
Comments (1 posted)
Splitting the kernel stack
The Linux kernel has, for years, run with an 8KB (two page) stack in each
process's address space (at least, on i386 systems). That stack holds the
"task structure" (the kernel's information about the process) and provides
space for automatic variables and call frames when the system is running in
kernel mode. The 8KB stack works, of course, but it is not optimal. The
biggest problem, perhaps, is the need to find two adjacent pages for a new
stack every time a new process is created. On a busy system memory can get
badly fragmented, and allocating two pages together can be a challenge.
So Ben LaHaise has posted a patch which
splits the kernel stack into two 4KB stacks. One of them holds the task
structure and is used by normal kernel code (i.e. handling system calls).
The other stack is set aside and is used only when the kernel is handling
interrupts.
A separate interrupt stack is not a particularly new idea - many operating
systems have had interrupt stacks for decades. There are numerous
advantages to doing things this way. Only one interrupt stack (per CPU) is
needed, so one page of memory per process is freed up. The interrupt stack
is also more likely to stay in the processor cache, improving performance.
Interrupt handlers need not worry about other kernel code having consumed
most of the stack when they get invoked. And, of course, it is no longer
necessary to perform a two-page allocation to set up the regular kernel
stack.
The biggest downside, perhaps, is that non-interrupt kernel code must now
fit into much less stack space. Some
kernel code is not particularly careful about the size of its automatic
variables, and risks overflowing the new, smaller stack. As a way of
tracking down such code, Ben has also posted a
stack checker (followed by a brown paper bag
fix) which monitors stack usage and raises the alarm when
available space on the stack gets too low. The two patches are probably
best used together.
Comments (none posted)
The continuing saga of kbuild 2.5.
The discussion over whether to merge kbuild 2.5 has been covered in this
space before. It is one of those conversations that persists, however. A
few things have happened over the last few weeks.
Keith Owens, the author of kbuild 2.5, has posted a new set of timing comparisons meant to show
the advantages of the new code. The full build process Keith performed
took a bit less than 14 minutes
with kbuild 2.5, and a little over 20 minutes with the existing
kbuild. He also points out that the result is sometimes incorrect with the
existing code.
Daniel Phillips also tried it out and
obtained similar results. For good measure, Daniel took a look at the code
itself: "There is no Python anywhere to be seen in kbuild 2.5, for
those who worry about that. It is coded in C, about 10,000 lines it seems.
It has a simple built in database which I suppose accounts for some of
that. For what it does, it seems quite reasonable."
In general, most (but not all) developers who express an opinion on the
matter seem to feel that kbuild 2.5 is worthwhile and should be
merged. So it has surprised a number of people to see numerous patches to
the existing kbuild system, written by Kai Germaschewski, being merged by
Linus. These patches do worthwhile things, but they are not
kbuild 2.5. Why bother, one might ask, if the whole thing is going to
be replaced?
The answer seems to be that Linus, for now, wants Kai to be the kbuild
maintainer. Kai is willing to do things in small pieces, which has always
been Linus's preferred method; Keith has, so far, refused to break his
kbuild work up in this way. Also, says Linus:
Kai isn't an enthusiastic kbuild-2.5 supporter. In fact, he tends
to be a bit down on some of it. Which is a plus in my book: it
means that whatever Kai tries to push my way I'll feel just that
much more comfortable with as having had critical review.
Meanwhile, a couple of different developers (Sam Ravnborg and "Lightweight
patch manager") have started submitting broken up versions of
kbuild 2.5. Kai has stated that he will look them over and integrate
those which make sense. Some of these patches also found their way into 2.5.20-dj3. It seems like at least a partial victory for the
new kbuild.
So one has to wonder why, after all this, Keith felt the need to post his
call for an email campaign entitled "If you
want kbuild 2.5, tell Linus." It's a full-scale polemic that takes one
back to the old devfs wars. It is also, seemingly, counterproductive. One
would think that would be better to work with the people who are trying to make
kbuild acceptible to Linus than to call for a pressure campaign.
Comments (2 posted)
The value of negative dentries
A "directory entry" (dentry) is an internal data structure used to hold the
results of looking up a file in the filesystem. The Linux "dentry cache"
keeps a number of recently used dentries around; they tend to be useful,
since files are often accessed more than once over a short period of time.
Finding a file in the dentry cache can save a lot of time by avoiding a
full filesystem lookup.
The kernel also hangs on to "negative dentries," which indicate that the
given file does not exist. Andrea Arcangeli recently noted that these negative dentries can take up
quite a bit of memory, and wondered what possible use they could be. His
message included a patch to force negative dentries out of memory quickly.
It turns out, though, that "this file does not exist" can be useful
information. A quick strace run on a GNOME application, for
example, turns up dozens of lookups on nonexistent files as the application
gropes around looking for the unbelievable number of libraries it needs.
Similarly, apache is continually looking for .htaccess files,
shells look for executables, etc. It is more than worthwhile to be able to
determine that a file doesn't exist without an expensive filesystem call -
especially for file names that are often looked up. So negative dentries
will stay.
There is one optimization that can be made, though. In Andrea's case, the
negative dentries were created by deleting a large directory full of
files. When a file is deleted, it is relatively unlikely that it will be
looked up again soon, and keeping a negative dentry around is less useful.
In this case, perhaps, it is better to just forget about the file name
altogether.
Comments (1 posted)
The return of /dev/port
A few weeks ago, LWN reported on the removal of support for
/dev/port from the 2.5 kernel. Since then, a few users have
reported real uses for
/dev/port and a desire that it stay in the
kernel. Martin Dalecki, who create the patch removing
/dev/port,
suggested that users who
really need it can patch it back in
themselves. Linus
disagreed, saying:
So when simplifying, it's not just important to say "we could do
without this". You have to also say "and nobody can reasonably
expect to need it".
Which doesn't seem to be the case with /dev/ports. So it stays.
That is, of course, the definitive end to the discussion.
Comments (1 posted)
Resources
A few other worthwhile notes:
- Kernel Traffic issues 168
and 169
are available.
- The Linux Security Module web
site has been overhauled in a big way. "It's no longer an
endless dribble of old patches. It contains some information about
the project, more navigable patch listing, links to the BK
repositories, and links to all the documentation that I am aware
of."
- Late last April, we mentioned that Pacific Northwest National
Laboratory was seeking an experienced kernel programmer to work on its
new, 1400-node
Linux cluster. The position is still open, so go check
it out if you think you might be interested.
Comments (none posted)
Patches and updates
Kernel trees
- Lightweight patch manager: linux-2.5.20-ct1. Adds a number of "trivial patches" to 2.5.20.
(June 4, 2002)
- Andrea Arcangeli: 2.4.19pre9aa1. Included the integration of the O(1) scheduler - "highly experimental."
(June 4, 2002)
- Paul P Komkoff Jr: 2.4.19-pre9-ac1-s1. kbuild 2.5, EVMS, and a number of fixes.
(June 4, 2002)
- Marc-Christian Petersen: 2.2.21-3-secure. Many goodies for 2.2: OpenWall, ext3, ReiserFS, CryptoAPI, ACLs, USAGI, FreeS/Wan, 2.4 IDE, etc. "<span>The intended purpose is for production/servers.</span>"
(June 4, 2002)
Core kernel code
- Robert Love: scheduler hints. Allow applications to give hints to the scheduler on how they will behave.
(June 4, 2002)
- Russell King: cpufreq core for 2.5. A common (across architectures) interface to CPU clock speed.
(June 4, 2002)
- William Lee Irwin III: lazy_buddy-2.5.19-3. A "bugfix and cleanup release" of the new, deferred coalescing memory allocator.
(June 4, 2002)
- Andrew Morton: direct-to-BIO writeback. Perform filesystem writeouts direct to the block layer via BIO requests - no more buffer heads. At least in simple cases.
(June 5, 2002)
- Andrew Morton: direct-to-BIO readahead. Make the readahead code work without buffer heads. "<span>CPU load for `cat large_file > /dev/null' is reduced by approximately
15%.</span>"
(June 5, 2002)
Development tools
- Randy.Dunlap: kerneltop. A "top"-like display generated from kernel profiling data.
(June 4, 2002)
Device drivers
- Martin Dalecki: 2.5.18 IDE 71. "<span>Scary big patch this time</span>."
(June 4, 2002)
Documentation
- Patrick Mochel: device model documentation 1/3. Documentation of the device model code - part 1 covers the <tt>bus_type</tt> structure.
(June 4, 2002)
Filesystems and block I/O
- Andreas Gruenbacher: Status of 2.5.x port. An initial port of the extended attribute/access control list code to 2.5.
(June 5, 2002)
Janitorial
- Robert Love: remove suser(). The venerable suser() call is gone at last.
(June 5, 2002)
Kernel building
Networking
Architecture-specific
- James Bottomley: i386 arch subdivision into machine types for 2.5.18. "<span>This code rearranges the arch/i386 directory structure to allow for sliding
additional non-pc hardware in here in an easily separable (and thus easily
maintainable) fashion.</span>"
(June 4, 2002)
- Thomas Capricelli: linux zeta-0.2 released. Zeta is a virtual platform to which the group is porting Linux.
(June 4, 2002)
Security-related
- Chris Wright: 2.5.20-lsm1. New version of the Linux Security Module patch.
(June 5, 2002)
- Chris Wright: 2.4.18-lsm3. Linux Security Module patch for 2.4.18.
(June 5, 2002)
- Amon Ott: RSBAC v1.2.0. Rule Set Based Access Control.
(June 4, 2002)
Miscellaneous
- Andrew Morton: "laptop mode". Optimizations for laptop use - mostly minimizing disk spinups.
(June 5, 2002)
- Bartlomiej Zolnierkiewicz: atapci 0.50. Reads information from ATA PCI chipsets.
(June 4, 2002)
- Bartlomiej Zolnierkiewicz: atapci 0.51. Fixes a problem with 0.50.
(June 5, 2002)
Page editor: Jonathan Corbet
Next page: Distributions>>