The current 2.6 prepatch is 2.6.5-rc3
, which was announced
by Linus on March 29.
Additions this time around include lots of architecture
updates, an AGPGART update, a few networking tweaks, an ACPI update, and
various fixes. "Nothing earth-shattering
," says Linus; things
seem to be slowly settling down toward a real 2.6.5 release. See the long-format changelog
for the details.
Linus's BitKeeper repository, as of this writing, contains an ALSA update,
some PowerPC updates, and various other fixes.
The current tree from Andrew Morton is 2.6.5-rc3-mm2. Recent additions to -mm include
some architecture updates, more scheduler work, a reworked laptop mode
patch, support for huge serial ATA requests (see below), and lots of fixes.
The current 2.4 prepatch is 2.4.26-rc1, announced by Marcelo on March 28.
Previously, 2.4.26-pre6 had come out on
March 25. Recent changes include lots of fixes and support for
Intel's AMD64-like IA32e architecture.
Comments (none posted)
Kernel development news
The LWN Kernel Page has included several articles over the last month on the work to
improve the scalability of the virtual memory subsystem by eliminating the
reverse mapping chains currently used by the 2.6 kernel. That work
reached a milestone on March 26, when Andrea Arcangeli released 2.6.5-rc2-aa3
with more virtual memory changes
and a comment:
Ok, this seems feature complete. Both nonlinear swapping and
prio_tree are available now. I believe
objrmap-core+anon-vma+prio_tree can be merged into mainline after a
bit more of testing, certainly they looks good enough for -mm.
Andrea raised the issue again when he released 2.6.5-rc3-aa1. Andrew Morton finally replied at that point:
It's a bit early for that, I feel. I'd like to see thing settle
down a little more at your end first, then see that Rajesh, Hugh
and if possible Ingo have had a good go through everything.
And then there are the mechanics of swallowing a
largely-undocumented 4,600-line patch which touches 60 files and
tosses 30-odd rejects across 16 files.
It is not surprising that Andrew would hesitate to rush into merging
major virtual memory changes in the middle of a stable kernel series. Most
2.6 users will, one imagines, be relieved to see that some caution is being
applied here - regardless of the eventual value of this work. Andrea,
however, is in more of a hurry: "Keep
in mind this whole thing is going in production in a matter of a week, so
please test and review now." Those words suggest that SUSE
Linux 9.1 will include the new VM code. One can only hope that
Andrea's high level of confidence in that code is justified.
Comments (none posted)
Free software hackers often find themselves cloning a large tree full of
source files; with a duplicate tree, it is easy to see which files have
been changed and to generate patch files. Creating such a tree can be easy
cp -rl old-tree new-tree
This technique works well if you use a tool (emacs, say) which moves files
aside before rewriting them. By moving the file, emacs breaks the link and
leaves the original copy (in the old tree) unchanged. If, however, the
tool rewrites the file in place (as vi tends to do), the file, as seen in
both trees, will be changed.
As a solution to this problem, Jörn Engel has been working on a patch which implements "cowlinks." The idea
behind a COW (copy-on-write) link is that, if the file linked to is written
to, a copy will be made (thus breaking the link) and the write will be
performed on the copy. With this capability, somebody wishing to duplicate
and modify a tree of files could use COW links; the duplicate files would
share the same blocks on disk until one was modified. And it would all
work regardless of the tool being used to perform the modifications.
In fact, COW links could be used for any copy operations within the same
filesystem. The result would be faster copies and, perhaps, substantial
savings of disk space.
The current cowlink patch does not actually implement this behavior,
however. It implements a COW bit in the inode structure, but, rather than
actually perform the copy, it simply fails any attempt to write a file with
more than one link. User space is then expected to notice the error and do
the right thing. This is not the long-term planned behavior; from a
comment in the code:
Yes, this breaks the kernel interface and is simply wrong. This is
intended behaviour, so Linus will not merge the code before it is
complete. Or will he?
The full behavior has not yet been implemented because it requires some
tricky filesystem-level programming. There is also the issue that the
right behavior for COW links has not, yet, been worked out. One obvious
implementation would have COW links behave just like regular, "hard" links,
with the file being truly copied when the first write is done. With that
approach, however, the file will change its inode number after the writing
application has opened it. That is just the sort of anomalous,
nonstandard behavior that can break applications in strange and unexpected
An alternative would be for two COW-linked files to have separate inode
numbers from the beginning, even though they share the same on-disk data.
If COW links are implemented this way, no application will notice when the
link is broken. What will break, however, is any application which
depends on inode numbers to detect identical files. Recursive diffs will
be much slower, "du" will give wrong numbers, and tar could do the wrong
thing. Fixing all of these applications would require the addition of a
nonstandard system call and fixing the programs to use it.
Linus has made his opinion known:
I think the correct thing to do is to just admit that cowlinks
aren't POSIX, and instead see the inode number as a way to see
whether the link has been broken or not. Ie just accept the inode
number potentially changing.
That opinion makes it likely that development will go in that direction,
but, until the code shows up, nobody knows for sure.
Comments (11 posted)
Users of serial ATA drives on Linux will be familiar with Jeff Garzik's
"libata" driver, which provides solid support for those drives with several
controllers. Jeff recently posted a patch
which has the potential to make SATA users happier; with this patch, libata
will use the "LBA48" mode, which can perform transfers of up to 32MB in
length. Says Jeff:
With this simple patch, the max request size goes from 128K to
32MB... so you can imagine this will definitely help performance.
Throughput goes up. Interrupts go down. Fun for the whole family.
Interestingly, the whole family was not entirely thrilled by the idea. The
problem is latency: most SATA drives will take the better part of a second
to perform a 32MB transfer, during which no other requests are being
processed. Several people complained, saying that a 32MB limit is far too
high, and that, in any case, the performance benefits of transfers above
around 1MB are minimal at best. Jeff's explanation that, in reality, transfers would
be limited to 8MB with the current libata driver did little to slow the
The issue being debated is not whether 32MB transfers could create latency
problems; everybody agrees on that point. The difference of opinion is
over where the decision on transfer sizes should be made. A device
driver's job, according to Jeff, is to make the full capabilities of the
device available to the kernel without imposing arbitrary limits. He would
rather see the block layer deal with maximum transfer size issues. Jens
Axboe, the maintainer of the block layer, responds that the block layer has no idea of
the performance characteristics of any individual device, while the driver
does. The driver, thus, is in the best position to make decisions about
maximum transfer sizes.
In truth, the driver doesn't know the right number, either; it can depend
on individual drives, the controller being used, etc. As a result,
the final outcome looks like it will involve
some sort of adaptive, dynamic
tuning. The block layer will track the execution time of requests and
note when that time gets to be too long; at that point, it will have the
information needed to put a lid on request size. The same timing
information could also be used to tweak the maximum tagged command queueing
depth (the number of requests which can be fed simultaneously to the
drive), since a number of similar issues come up there.
Comments (2 posted)
Patches and updates
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>