Brief items
The current 2.6 development kernel is 2.6.26-rc8,
released by Linus on
June 24. "
It hasn't been a week, I know, and this is a pretty
small set of changes since -rc7, but I'm going to be mostly incommunicado
for the next week or so, so I just released what will hopefully be the last
-rc." See
the
long-format changelog for all the details.
2.6.26-rc7 was released on
June 20; it contains another set of fixes and support for some new
graphics cards.
As of this writing, no patches have been merged into the mainline git
repository since 2.6.26-rc8.
The current stable 2.6 kernel is 2.6.25.9, released on June 24. It
contains a small set of fixes, a couple of which have security
implications. 2.6.25.8 was
released on June 21 with about a dozen fixes.
Comments (3 posted)
Kernel development news
The problem with leaving everything tweakable is that you're asking
users to make choices about things but not giving them the
information they need to make those choices. Whether you get a
power saving from hard drive spindown depends on whether the drive
is idle for long enough to save the power you'll spend spinning it
back up. Get it wrong and you'll be putting your drive under extra
load, reducing performance and consuming more power than you were
to begin with.
--
Matthew Garrett
If somebody wants to play with it, go wild. I didn't do
"change_bit()", because nobody sane uses that thing anyway. I
guarantee nothing. And if it breaks, nobody saw me do anything.
You can't prove this email wasn't sent by somebody who is good at
forging smtp.
--
Linus Torvalds
Look at it this way: there is no way in which the reviewer of this
patch (ie: me) can work out why this function exists. Hence there
will be no way in which future readers of this code will be able to
work out why this function exists either. This is bad. These
things should be described in code comments and in the changelog
(whichever is most appropriate).
--
Andrew Morton
Comments (5 posted)
A position statement on the distribution of closed-source kernel modules
has been issued and signed by a long list of developers. "
We, the undersigned Linux kernel developers, consider any closed-source
Linux kernel module or driver to be harmful and undesirable. We have
repeatedly found them to be detrimental to Linux users, businesses, and
the greater Linux ecosystem. Such modules negate the openness,
stability, flexibility, and maintainability of the Linux development
model and shut their users off from the expertise of the Linux
community. Vendors that provide closed-source kernel modules force
their customers to give up key Linux advantages or choose new vendors.
Therefore, in order to take full advantage of the cost savings and
shared support benefits open source has to offer, we urge vendors to
adopt a policy of supporting their customers on Linux with open-source
kernel code." Click below for associated information and the names
of the signatories.
Full Story (comments: 47)
James Bottomley has posted
an essay on graphics drivers on the Linux Foundation site. "
For Linux, the best way of demonstrating user satisfaction objectively is with the kerneloops project, which tracks reported problems with various kernels (an oops is something that's equivalent to a panic on Unix or blue screen on windows). For instance looking at the recently released 2.6.25 kernel one can see that both the binary Nvidia driver and binary ATI firegl driver account for positions in the top 15 oopses. If one follows the history, one finds that the binary drivers are always significant contributors to this list, whereas open source drivers appear and disappear (corresponding to people actually seeing the bugs and fixing them). This provides objective support for a significant kernel developer contention that it's harder to get fixes for binary drivers. The other bright spot is that the Intel graphics drivers rarely figure at all in the list also showing that if you want graphics to 'just work' then Intel is the one to choose."
Comments (36 posted)
By Jonathan Corbet
June 23, 2008
The merge window phase of the kernel development cycle is a hectic time.
Over a period of about two weeks, between 5,000 and 10,000
changesets find their way into the mainline git repository. Simply
managing that many patches would be hard enough, but the job is made more
complicated by the fact that these changesets are not all independent of
each other. The
first changes to be merged can change the code base in ways that cause
later patches to fail to apply. So merge windows have traditionally
required maintainers to rework their queued patches to resolve
conflicts which arise as other trees are merged. Given the tight time constraints (patches which aren't ready
when the merge window closes generally sit out until the next cycle
starts), this integration process has been known to put a fair amount of
pressure on subsystem maintainers.
The other person feeling the stress was Andrew Morton; one of his many jobs
was to bash subsystem trees together in his -mm releases. That took a lot
of his time and didn't really solve the problem in the end; much of the
work which shows up in -mm isn't necessarily intended for the next
development cycle. The end result of all this is that each merge window
brought together large amounts of code which had never been integrated
before.
Back in February, the linux-next tree was announced as a way to help ease
some of these problems. We are now nearing the end of the first full
development cycle to use linux-next, so it's worth taking a look to see how
it is working out.
The idea behind this tree is relatively simple. Linux-next maintainer
Stephen Rothwell keeps a list
of trees (maintained with git or quilt) which
are intended to be merged in the next development cycle. As of this
writing, that list contains 95 trees, all full of patches aimed at 2.6.27.
Once a day, Stephen goes through the process of applying these trees to the
mainline, one at a time. With each merge, he looks for merge conflicts and
build failures. The original
plan for linux-next stated that trees causing conflicts or build
failures would simply be dropped. In reality, so far, Stephen usually
takes the time to figure out the problem; he'll then fix up or drop an
individual patch to make everything fit again.
When this process is done, he releases the result as the linux-next tree
for the day. Others then grab it and perform build testing on it; some
people even boot and run the daily linux-next releases. All this results
in a steady stream of problem reports, small fixes, patches moving from one
tree to another, and so on - various bits of integration work required to
make all of the pieces fit together nicely.
There is an interesting sort of implicit hierarchy in the ordering of the
trees. Subsystem trees which are merged early in the process are less
likely to run into conflicts than those which come later. When two trees
do come into conflict, it's the owner of the later tree - the one which
actually shows the conflict - who feels the most pressure to fix things
up. The history so far, though, shows that there has been very little in
the way of finger-pointing when conflicts arise, as they do almost every
day. All of the developers understand that they are working on the same
kernel, and they share a common interest in solving problems.
[PULL QUOTE:
One aspect of
this whole system remains untested, though: the movement of patches from
linux-next into the mainline.
END QUOTE]
So, thus far, linux-next appears to be functioning as intended. It is
serving as an integration point for the next kernel and helping to get
many of the merging problems out of the way ahead of time. One aspect of
this whole system remains untested, though: the movement of patches from
linux-next into the mainline. As things stand now, there is no automatic
movement between the trees; instead, maintainers will send their pull
requests directly to Linus as always. If Linus refuses to merge certain
trees, or if he merges them in an order different from their ordering in
linux-next, integration problems could return. In the end, it seems like
linux-next will have to drive the final integration process more than is
anticipated now, but it will probably take a few development cycles to
figure out how to make it all work.
Meanwhile, anybody who is interested in 2.6.27 can, to a great extent, run
it now by grabbing linux-next. This tree has clarified one aspect of the
development process: the 2-3 month "development cycle" run by Linus
is, in fact, just the tip of the kernel development iceberg. It is the
final integration and stabilization stage. Linux-next nearly doubles the
length of the visible development cycle by assembling the next kernel long
before Linus starts working on it. And even linux-next only comes into
play toward the end of a patch's life.
In the past, Linus has pointedly worked to avoid overlapping the
development and stabilization phases of the development cycle. There was
no development tree at all for almost a year while 2.4 was beaten into
reasonable shape. This separation was maintained out of a simple fear that
an open development tree would distract developers from the more important
task of finding and fixing bugs in the current stable release.
That separation is a thing of the past now; there are literally dozens of
development trees which are open for business at all times. That can only
be worrisome to those who are concerned about the quality of kernel
releases; why should developers concern themselves with 2.6.26 bugs when 2.6.27 is being
assembled and 2.6.28 is already on the radar? Whether such concerns are
valid is likely to be a matter of ongoing debate.
Meanwhile, however, linux-next appears to have settled in as a long-term
feature of the kernel development landscape. It is serving its purpose as
a place to find and resolve integration problems; it has also had the
effect of taking much of that integration work off of Andrew Morton's
shoulders. And that, in turn, should free him to spend more time trying to
get developers to fix all those bugs.
(See the linux-next
wiki for more information on how to work with this tree).
Comments (4 posted)
By Jonathan Corbet
June 25, 2008
On June 23, HP
announced that
it was releasing the source for the "Tru64
Advanced Filesystem" (or AdvFS) under version 2 of the GPL. This is,
clearly, a large release of code from HP. What is a bit less clear
is what the value of this release will be for Linux. In the end, that
value is likely to be significant, but it will be probably realized in
relatively indirect and difficult-to-measure ways.
AdvFS was originally developed by Digital Equipment Corporation for its
version of Unix; HP picked it up when it acquired Compaq, which had
acquired DEC in 1998. This filesystem offers a number of the usual
features. It is intended to be a high-performance filesystem, naturally.
Extent-based block management and directory indexes are provided.
It does journaling for fast crash recovery. There is an undelete feature.
AdvFS is also designed to work in clustered environments.
Much of the thought that went into AdvFS was concerned with avoiding the
need to take the system down. There is a snapshot feature which
can be used to make consistent backups of running systems. Defragmentation
can be done online. There is a built-in volume management layer which
allows storage devices to be added to (or removed from) a running
filesystem; files can also be relocated across devices. The internal
volume manager can perform striping of files across devices, but nothing
more advanced than that; AdvFS will happily work on top of a more capable
volume manager, though.
There are a few things which AdvFS does not have. There is no checksumming
of data, and, thus, no ability to catch corruption. Online filesystem
integrity checking does not appear to be supported. The maximum filesystem
size (16TB) probably seemed infinite in the early 1990's, but it's starting
to look a little tight now.
In general, AdvFS looks like something which was a very nice filesystem
ten or fifteen years ago, but it has little that is not either available in
Linux now, or
in the works for the near future. And AdvFS doesn't even work with Linux -
no porting effort has been made, and it's not clear that one will be made.
So is this release just another dump of code being abandoned by its
corporate owner?
One could make a first answer by saying that, even if this were true, it
would still be welcome. If a company gives up on a piece of code, it's far
preferable to put it out for adoption under the GPL than to let it rot
until nobody can find it anymore. But there may well be value in this
release.
Even if there is no point in trying to make it work under Linux, the AdvFS
code is the repository of more than a decade of experience of making a
high-end filesystem work in a commercial environment. Your editor had
stopped working with DEC systems by the time AdvFS came out, but the word
he heard from others is that the early releases were, shall we say,
something that taught
administrators about the value of frequent backups. But after a few major
releases, AdvFS had stabilized into a fast, solid, and reliable
filesystem. The current code will embody all of the hard lessons that were
learned in the process of getting to that point.
Chris Mason, who is currently working on the Btrfs filesystem, puts it this way:
The idea is that well established filesystems can teach us quite a
lot about layout, and about the optimizations that were added in
response to customer demand. Having the code to these
optimizations is very useful.
Having that code licensed under the GPL is especially useful: any code
which is useful in its current form can be pulled quickly into Linux. And,
even when the code itself cannot be used, the ideas that it embodies can be
borrowed without fear. And that is exactly
what HP was hoping to encourage with this release:
In case its not clear, this is a GPLv2 technology release, not an
actual port to Linux. We're hoping that the code and documentation
will be helpful in the development of new file systems for Linux
that will provide similar capabilities, and perhaps used to make
tweaks to existing file systems.
And that would appear to be likely to happen. Over time, the best ideas
and experience from AdvFS should find their way into the filesystems
supported by Linux, even if AdvFS, itself, never becomes one of those
filesystems. So HP has made a significant contribution to the kernel
development process, one which will probably never show up in the changeset
counts and other easily-obtained metrics.
(Those interested in learning more about AdvFS would be well advised to
grab the documentation tarball from the AdvFS sourceforge page. The
"Hitchhiker's guide" is a good starting place, though, at 229 pages, it's
not for hitchhikers who prefer to travel light.)
Comments (1 posted)
By Jake Edge
June 25, 2008
Freezing seems to be on the minds of some kernel hackers these days,
whether it is the northern summer or southern winter that is causing it is
unclear. Two recent patches posted to linux-kernel look at freezing,
suspending essentially, two different pieces of the kernel: filesystems and
containers. For containers, it is a step along the path to being able to
migrate running processes elsewhere, whereas for filesystems it will allow
backup systems to snapshot a consistent filesystem state. Other than
conceptually, the patches have little to do with each other, but each is
fairly small and self-contained so a combined look seemed in order.
Takashi Sato proposes taking
an XFS-specific feature and moving it into the filesystem code. The patch
would provide an ioctl() for suspending write access to a
filesystem, freezing, along with a thawing option to resume writes. For
backups that snapshot the state of a filesystem or otherwise operate
directly on the block device, this can ensure that the filesystem is in a
consistent state.
Essentially the patch just exports the freeze_bdev() kernel
function in a user accessible way. freeze_bdev() locks a file
system into a consistent state by flushing the superblock and syncing the
device. The patch also adds tracking of the frozen
state to the struct block_device state field. In its simplest
form, freezing or thawing a filesystem would be done as follows:
ioctl(fd, FIFREEZE, 0);
ioctl(fd, FITHAW, 0);
Where fd is a file descriptor of the mount point and the argument is ignored.
In another part of the patchset, Sato adds a timeout value as the argument
to the ioctl(). For XFS compatibility—though courtesy of a
patch by David Chinner, the XFS-specific ioctl() is
removed—a value of 1 for the pointer argument means that the timeout
is not set. A value of 0 for the argument also means there is no timeout,
but any other value is treated as a pointer to a timeout value in seconds.
It would seem that removing the XFS-specific ioctl() would break
any applications that currently use it anyway, so keeping the compatibility
of the argument value 1 is somewhat dubious.
If the timeout occurs, the filesystem will be automatically thawed. This
is to protect against some kind of problem with the backup system. Another
ioctl() flag, FIFREEZE_RESET_TIMEOUT, has been added so
that an application can periodically reset its timeout while it is
working. If it deadlocks, or otherwise fails to reset the timeout, the
filesystem will be thawed. Another FIFREEZE_RESET_TIMEOUT after
that occurs will return EINVAL so that the application can
recognize that it has happened.
Moving on to containers,
Matt Helsley posted a patch
which reuses
the software suspend (swsusp) infrastructure to implement freezing of all
the processes in a control group (i.e. cgroup).
This could be used now to
checkpoint and restart tasks, but eventually could be used to migrate tasks
elsewhere entirely
for load balancing or other reasons. Helsley's patch set is a forward port
of work originally done by Cedric Le Goater.
The first step is to make the freeze option, in the form of the
TIF_FREEZE flag, available to all architectures. Once that is
done, moving two functions, refrigerator() and
freeze_task(), from the power management subsystem to the new
kernel/freezer.c file makes freezing tasks available even to
architectures that don't support power management.
As is usual for cgroups, controlling the freezing and thawing is done
through the
cgroup filesystem. Adding the freezer option when mounting will
allow access to each container's freezer.state file. This can be
read to get the current freezer state or written to change it as follows:
# cat /containers/0/freezer.state
RUNNING
# echo FROZEN > /containers/0/freezer.state
# cat /containers/0/freezer.state
FROZEN
It should be noted that it is possible for tasks in a cgroup to be busy
doing something that will not allow them to be frozen. In that case, the
state would be
FREEZING. Freezing can then be retried by
writing
FROZEN again, or canceled by writing
RUNNING. Moving the
offending tasks out of the cgroup will also allow the cgroup to be
frozen. If the
state does reach
FROZEN, the cgroup can be thawed by writing
RUNNING.
In order for swsusp and cgroups to share the refrigerator() it is
necessary to ensure that frozen cgroups do not get thawed when swsusp is
waking up the system after a suspend.
The last patch in the set ensures that thaw_tasks() checks for a
frozen cgroup before thawing, skipping over any that it finds.
There has not been much in the way of discussion about the patches on
linux-kernel, but an ACK from Pavel Machek would seem to be a good sign.
Some comments by Paul Menage, who developed
cgroups, also indicate interest in seeing this feature merged.
Comments (4 posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Security-related
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>