Brief items
The current 2.6 release remains 2.6.6; no 2.6.7 prepatches have been
released as of this writing.
Linus's BitKeeper repository contains over 650 changesets, however,
indicating that work is proceeding even in the absence of formal releases.
These patches include a generic msleep() function for
millisecond-scale waits, a CPU frequency control update, a set of autofs4
patches, del_singleshot_timer() (covered here last week), a set of patches to shrink the
heavily-used dentry structure, the "filtered wakeup" mechanism
(see the May 5 Kernel Page), a
libata update, some architecture updates, the scheduling domains patch set
(covered here last month), the removal of
the Intermezzo filesystem due to lack of use and support (see below), a
sysctl variable
giving "huge page" access to a administrator-specified group),
the ability to re-enable interrupts while waiting in
spin_lock_irqsave() (for all architectures now), support in
reiserfs for quotas and external attributes (added over Hans Reiser's objections), and lots of
fixes.
The current kernel prepatch from Andrew Morton is 2.6.6-mm4. Recent additions to -mm include the
anon-vma reverse mapping code (see below), a fix for the
"phenomenally broken" ramdisk driver, the reservation of a system call
number for the "kexec" functionality, and lots of fixes.
The current 2.4 prepatch is 2.4.27-pre3, which was released by Marcelo on May 18. Changes
this time around include a JFS update, some driver updates, a big serial
ATA update, and a number of fixes.
Comments (none posted)
Kernel development news
The discussion has been quiet in recent times, but work on replacing the
low-level reverse-mapping virtual memory code in the 2.6 kernel continues.
When we last
looked at the new, object-based reverse mapping ("objrmap") approach, there
were two competing implementations:
- Andrea Arcangeli's anon-vma, which adds
a data structure creating a connection between each physical page and
the virtual memory area (VMA) structures which reference it.
- Hugh Dickins's anonmm, which associates
pages with the top-level memory management ("mm") structure instead.
The two approaches are conceptually similar, but each has its strong and
weak points. Their performance is essentially equivalent. Thus far, there
has not been any sort of spirited debate over which should be included;
most kernel developers, if they have a preference, have kept it to
themselves.
Hugh has been busy over the last few weeks, however, creating a series of
40 patches aimed at slowly moving the reverse mapping code over to the
object-based approach. The first five of those patches, which are
restricted to cleanup and preparatory work, have been merged into the 2.6
mainline. "rmap-10" added anonmm; it was promptly merged into the -mm
tree. This action did not imply that anonmm had been chosen over anon-vma,
however; it was simply the first step in the testing process which would
lead to a final decision.
Hugh's final series of patches (rmap-34 to rmap-40) completes the process
by replacing anonmm with anon-vma; these patches are present in 2.6.6-mm4.
Hugh introduces the patch set by saying:
Judge for yourselves which you prefer. I do think I was wrong to
call anon_vma more complex than anonmm (its lists are easier to
understand than my refcounting), and I'm happy with its vma merging
after the last patch. It just comes down to whether we can spare
the extra 24 bytes (maximum, on 32-bit) per vma for its advantages
in swapout and mremap.
As Hugh notes, anon-vma should have better swapping performance, since its
structures make it easier to find the VMA for a given page. Additionally,
the anonmm code works best when shared anonymous pages have the same
virtual address in each address space that uses them; if a process moves
pages with mremap(), some relatively complicated work must be
performed to make things work. The anon-vma solution does not have that
particular problem.
On the other hand, expanding the VMA
structure is not something which should be done lightly; some loads can use
huge numbers of VMAs, and they must all be located in low memory. That
said, either reverse mapping scheme should free far more low memory than it
consumes; that is, after all, one of the main points behind this entire
exercise.
There still has been no public word on which scheme will be chosen, or when
it might be merged. The current state of affairs suggests, however, that
anon-vma will be the one that goes in unless some sort of major problem
turns up. As for timing: enough major work has already gone into 2.6.7
that it's hard to imagine throwing major VM surgery into the mix. So 2.6.8
is the earliest such a merge could possibly happen. A couple of 2.6
releases after that, the forking of the 2.7 tree might just become a
possibility.
Comments (4 posted)
Last week's Kernel Page talked about the
push toward 4K stacks on the i386 architecture. While most of the problems
with the smaller stack size have been worked out, a few remain. Witness,
for example,
this problem report; it would
appear that the 2.6.6 Radeon framebuffer driver is overflowing the 4K
stack.
The problem was quickly narrowed down to a
couple of new fields added to the radeon_regs structure:
struct radeon_regs {
....
u32 palette[256];
u32 palette2[256];
};
If one of these structures is placed on the kernel stack (as happens in the
radeonfb driver), those two arrays, by themselves, take half of the
available space. If that weren't sufficiently annoying, there is the
little fact that those arrays are part of an ongoing development and are
not actually used for anything in 2.6.6.
Fixing this particular problem is relatively easy, but this episode has
reawakened interest in finding large stack users automatically. One never
knows when a developer will expand a data structure without realizing that
it is used on the stack in some other place; rather than letting users find
this sort of mistake the hard way, it would be better to look for them
explicitly earlier in the development process. To that end, several
scripts have been posted which seek out large stack users in a compiled
Linux kernel. A quick look at these scripts makes it clear that kernel
code is, by no means, the scariest code out there:
objdump --disassemble "$@" | \
sed -ne '/>:/{s/[<>:]*//g; h; }
/subl\?.*\$0x[^,][^,][^,].*,%esp/{
s/.*\$0x\([^,]*\).*/\1/; /^[89a-f].......$/d; G; s/\(.*\)\n.* \(.*\)/\1 \2/; p; };
/subl\?.*%.*,%esp/{ G; s/\(.*\)\n\(.*\)/Dynamic \2 \1/; p; }; ' | \
sort | \
perl -e 'while (<>) { if (/^([0-9a-f]+)(.*)/) { $decn = hex("0x" . $1);\
if ($decn > 400) { print "$decn $2\n";} } }'
(from a script by Keith Owens and Arjan van
de Ven). Several variants have been posted, most of which are trying to
support multiple architectures. None yet have solved the full problem,
however: finding full call chains whose cumulative stack usage exceeds the
space available. With or without that feature, some sort of stack usage
checker is likely to be merged into the kernel build system before too
long. That should help the developers to trap the most obvious problems
before they find their way into a released kernel.
Comments (4 posted)
In the 2.6 kernel, parameters to loadable modules are set up with the
module_param() macro:
module_param(name, type, perm);
The perm parameter was set aside for the sysfs representation of
this parameter but has, until now, been unused; almost every declared
parameter simply sets it to zero in the 2.6.6 kernel. A new patch has been posted, however, which
makes module parameters in sysfs a reality.
This patch creates a new /sys/module directory; a subdirectory
will be created for each module loaded into the system. For unloadable
modules, a read-only parameter (called refcnt) will be set up
which contains the module's current reference count. There will also be
attributes for every module parameter whose perm value is not
zero; that value will, as expected, set the permissions mask for that
parameter.
If the permissions mask allows, module parameters will be writable. In
theory, this will give module authors an easy way to export
administrator-tweakable knobs to user space. It is worth noting, however,
that there is no mechanism for notifying a module that one of its
parameters has been changed. Module authors, thus, will have to be careful
to ensure that their modules will properly detect and respond to changes to
parameters at any time before exporting those parameters in a writable
mode. Even so, this patch represents the tying-up of yet another 2.6 loose
end.
Comments (none posted)
One of the most important tasks in kernel maintenance is not the addition
of new code, but removal of old code that is no longer useful. Unused code
bloats the kernel and, potentially, becomes a breeding ground for bugs and
security problems. Getting that code out of the way helps keep the kernel
cruft level down.
In recent times, the ax has fallen on two subsystems. The first is the InterMezzo filesystem, which has
been removed for 2.6.7. InterMezzo is a distributed filesystem from Peter
Braam and company with a number of interesting ideas, but, apparently, few
users. Maintenance has been lacking, and Mr. Braam finally agreed that it should be removed, noting
"In the past 4 years nobody has supported InterMezzo sufficiently for
it to become successful." The Lustre
filesystem, which is Mr. Braam's current project, appears to be headed for
greater success.
A patch has been posted which removes
support for the PC9800 architecture. There have been a few small
objections to this removal, drawing this
response from Alexander Viro:
So are you volunteering to maintain the port? Maintainers are MIA;
the damn thing doesn't compile; all patches it gets are basically
blind ones ("we have that API change, this ought to take care of
those drivers and let's hope that possible mistakes will be caught
by testers"). Considering the lack of testers (kinda hard to test
something that refuses to build), the above actually spells in one
word: "bitrot".
There has been a rather conspicuous shortage of people stepping up to
maintain the PC9800 port, so chances are that it will be going away
soon.
Comments (4 posted)
Patches and updates
Kernel trees
Development tools
Device drivers
Filesystems and block I/O
Janitorial
Memory management
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>