The current 2.6 kernel is 2.6.7
, which was announced
by Linus on June 15. Changes
since the last release candidate include a fix for the latest denial of
service vulnerability (see below), an NTFS update, some more CPU frequency
controller work, and lots of fixes. The biggest changes since 2.6.6
include scheduling domains
, a big rework of
the reverse-mapping VM code, filtered waitqueues, the removal of the
InterMezzo filesystem, quota and extended attribute support in reiserfs, a
new API for NUMA systems, the removal of IDE tagged command queueing
support, and the usual pile of fixes. See the
for the details.
Linus's BitKeeper repository contains no patches beyond 2.6.7 as of this
The current tree from Andrew Morton is 2.6.7-rc3-mm2. Recent additions to -mm include
ext3 resizing support (see below), a O_NOATIME option to
open(), and various fixes.
The current 2.4 prepatch is 2.4.27-pre6, which was released on June 15. It
includes the FPU denial of service fix, of course, along with some
architecture updates, DVD-RW write support, and a fair number of fixes.
Comments (2 posted)
Kernel development news
This is all part of what responsible release management is about.
I was the junior whiz kid in professional release management teams
before starting Namesys. I listened to my elders and learned from
them. My standards for professional conduct in this arena are
higher than yours as a result of that. You are a bunch of young
kids who lack professional experience in release management. That
is ok, but don't get aggressive about it.
-- Hans Reiser
Comments (11 posted)
The problem was initially reported as a gcc
. If you execute this code:
static void Handler(int ignore)
__asm__ __volatile__ ("fsave %0\n" : : "m"(fpubuf));
__asm__ __volatile__ ("frstor %0\n" : : "m"(fpubuf));
in a signal handler, the system (or, at least, the CPU that was running the
code) will freeze up hard. Ways of locking up the system from an
unprivileged user-space program are generally considered to be bad news;
they also, in general, are not seen as compiler bugs. A bit of digging
turned up the real problem, and the latest kernel denial of service
vulnerability was found.
In theory, the fsave instruction above saves the floating-point unit
(FPU) status into the fpubuf array; the subsequent frstor
should simply restore the same state back into the FPU. Unfortunately, the
above code is incorrect; the assembly instructions should read
"m"(*fpubuf) to actually store the state into the fpubuf
array. The code, as written, restores from the wrong address, corrupting
the state of the FPU and, in particular, setting some exception flags.
FPU exceptions do not result in immediate kernel traps; instead, the trap
happens when the next floating-point command is executed. As it happens,
the kernel checks when a signal handler returns and, if that handler has
used any floating-point instructions, the kernel performs an fwait
instruction to ensure that the last operation is complete. That fwait
causes the floating point exception caused by the corrupt restore to be
delivered as a kernel trap.
The kernel has a way of dealing with floating point traps; it saves the FPU
state and queues up a floating point exception signal for the current
process. It also sets the TS ("task switched") processor flag to indicate
that the FPU state may be other than expected. At that point, it returns
to the place where the exception occurred.
Normally, as part of returning from the trap, the kernel would simply
deliver the floating-point exception signal to user space and get on with life. But, in
this case, the kernel is returning back to kernel space, and back to the
same fwait instruction that caused the problem in the first
place. That instruction sees the TS flag and generates another trap. The
handler for this trap knows just what to do in response to a TS flag; it
restores the saved FPU state and returns. The saved FPU state is, however,
the corrupted state which was in effect before the first attempt to execute
fwait. So, at this point, the loop is closed and a new
floating-point trap will be generated. This will go on for a while.
The fix is relatively straightforward, once
the problem is understood. The kernel simply clears any pending exceptions
before executing fwait, and the problem goes away. All that is
left is the updating and rebooting of large numbers of vulnerable systems.
(Thanks to Sergey Vlasov, whose analysis of
the problem made this article much easier to write.)
Comments (9 posted)
One of the patches which slipped into 2.6.7-rc3-mm2 is one by Andreas
Dilger and others which makes it possible to resize a running ext3
filesystem on the fly. This patch has been shipped with Fedora kernels for
a little while, but has not seen a lot of wider use. That could change, of
course, if the resize patch finds its way into the mainline.
The resize patch is conceptually quite simple. It simply adds one or more
block groups which make use of extra space which, one hopes, is sitting
there idle at the end of the existing filesystem. Once the block groups
are hooked into the filesystem data structures, a simple ioctl()
call or remount will make the space available. Behind this apparent
simplicity, of course, is a significant amount of code which makes the
resize operation happen on a modern, complex filesystem in a robust
People wanting to try out resizing will need a few things:
- A kernel (such as 2.6.7-rc3-mm2) with the online resize patch
- A patch to e2fsprogs to make use of the resize capability; it is
the ext2resize SourceForge download area.
- Free disk space into which the filesystem can expand. Usually this
means that the filesystem should live in a device mapper partition which
can be expanded as well.
- A very good backup of your filesystem.
This patch and its associated documentation (or lack thereof) still require
some work before being ready for widespread deployment. Once they get
there, however, life should get easier for system administrators who,
throughout history, have routinely found out that all that "extra space"
they figured into their filesystems is never enough.
Comments (2 posted)
Device drivers for network interfaces must allocate a "socket buffer"
("skb") for each incoming packet. A standard idiom in the skb allocation
code is a line like this:
This call tells the socket buffer code to set aside the first two bytes of
the data buffer. The reason why this is done can be seen by looking at the
resulting layout of an IP packet in the buffer:
The network stack makes frequent use of the IP addresses stored in the
packet. By padding the beginning of an ethernet-style packet by two bytes,
a network driver can cause those addresses to be aligned on a four-byte
boundary. On some architectures, at least, that alignment will speed
access to the addresses and make the networking system faster.
Or so it might seem. As Anton Blanchard recently figured out, this padding is not always
helpful. A number of modern architectures (Anton works with PPC64, but
Intel-style architectures qualify too) have no real problem with unaligned
memory accesses, so the two-byte offset on IP packets does not necessarily
Unfortunately, the DMA engines in a number of systems do have
trouble working with unaligned addresses. A padded packet buffer does not
start on an aligned address, with the result that DMA operations to that
buffer can be slower than they should be. As network adapters get faster,
the DMA performance penalty becomes increasingly significant.
Anton's proposal was to change the skb_reserve() calls into calls to a
new skb_align() function, which could, depending on the
architecture, decide whether to insert the padding or not. David Miller pointed out, however, that the magic constant
"2" appears in quite a few places, and simply removing the padding could
create bugs elsewhere in the driver code.
The real solution is likely to be the
addition of a defined constant called
something like NET_IP_ALIGN; this constant would be the amount of
padding needed for packet alignment on the current architecture. Yes,
things probably should have been done that way from the beginning, but life
is like that. In any case, once the constant is in, each individual driver
can be looked over and fixed up as need be. And one small obstacle to top
performance on high-end network adapters will have been removed.
Comments (4 posted)
Patches and updates
- Sam Ravnborg: kbuild.
(June 15, 2004)
Core kernel code
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>