Brief items
The current development kernel is 2.5.48, which was
released by Linus on November 17. This
one includes the new module loader - so expect surprises if you compile
with modules, and note that you need a new set of module utilities
(available as
a
source tarball or
source RPM). Other
changes include boot process cleanups (part of the initramfs effort), more
IPSec fixes, high-resolution times in the
stat64() system call,
some SCSI cleanups, a bunch of include file cleanup work, and lots of other
fixes. And, of course, the fix for the denial of service vulnerability.
The
long-format changelog has the details.
Linus's pre-2.5.49 BitKeeper tree includes a number of module fixes,
nanosecond time support for the NFS filesystem, an S/390 update, and a
large number of other fixes.
The current development kernel prepatch from Alan Cox is 2.5.47-ac6. Alan continues to issue patches
against 2.5.47 because "the 2.5.48 tree is a little bit too broken to run
IDE development against."
The current stable kernel is 2.4.19. The second 2.4.20 release
candidate was released by Marcelo on
November 15; it includes a fix for the denial of service vulnerability
and several other updates.
Alan Cox's latest 2.4.20 prepach is 2.4.20-rc2-ac2, which adds a number of fixes to
the second release candidate.
Alan has also released 2.2.23-rc2, which is
primarily motivated by the denial of service fix.
Comments (none posted)
Kernel development news
The 2.5 feature freeze is now three weeks old. At this point, it mostly
appears to be working as intended. The biggest exception (the new module
loader) will be looked at in a separate article.
One of the goals of the freeze was to give developers a well-known target
date so they would not flood Linus with last-minute patches. There
was a big wave of patches that came through in October, but it was
small and well organized compared to the deluges that came after previous
(surprise) feature freezes. These patches were, for the most part, in
reasonably good shape. With relatively few exceptions, the post-freeze
kernel is in relatively stable condition.
The freeze is holding reasonably well. The only really
new features that have gone in recently are the new module loader and
high-resolution times in the stat64() system call. Linus has put
his foot down when faced with a number of destabilizing changes, such as
some overzealous header file "cleanup" work. He is still considering a few
new features (kexec, kernel probes, and POSIX timers), but they are
relatively small and went into the queue well ahead of the freeze date.
Of course, it is far too early to conclude that the freeze will actually
hold - we have to wait to see what happens in 2003 for that.
The 2.5 stabilization process will, hopefully, be helped by the bugzilla database that has been set up by
OSDL. Proper tracking of 2.5 bugs is clearly necessary if they are to be
dealt with before the stable release. Whether this database will really
fill that need remains to be seen; after a week of operation, it only lists
sixty bugs. The 2.5 kernel clearly must have more problems than that; now
is the time for people who have encountered problems to put them into
bugzilla so they do not get overlooked.
Not all of the kernel developers have shown great enthusiasm for working
with the bugzilla system; to some of them, it looks like a lot of
bureaucratic work that distracts from the real job of fixing bugs. This
should not be a problem as long as people who are interested maintain the
bug database and keep it current.
Back at the kernel summit, there seemed to be a consensus that, at this
stage, an assistant to Linus would be named to help with stabilization.
Linus, by his own admission, does not always do a great job of the release
management task. The assistant would help review
patches and might also, eventually, become the maintainer of
the stable release. That prospect, of course, would help motivate the
assistant to look hard at proposed changes and exclude anything that was
not really necessary.
This idea was well received at the summit, even by Linus. But this person
has not been named, and there has not really even been any discussion of
the subject. Following through soon on the appointment of somebody to help
stabilize the kernel is probably one of the best things the development
community could do to ensure that the freeze (and stable release) are
successful.
Comments (none posted)
The current 2.5 and 2.4.20-rc releases both contain a patch for a
newly-discovered vulnerability in the Linux kernel. Simply put, anybody
who can run an arbitrary program on a Linux system can bring it down in flames.
Your editor, who is not an expert on x86 assembly (but who can still
describe the difference between CDC 6xxx A, B, and X registers), has made
an effort to figure out just what is going on here, for those who are
curious.
The x86 processor contains many flags which affect its operation. Two of
these flags are abused in this exploit:
- The trap flag (TF) causes a processor trap to happen after
execution of every instruction. It is used primarily for debugging
purposes.
- The nested task (NT) flag indicates that the current task is
executing via an interrupt (or other task-switching operation) that
causes another task to be suspended. It is part of the hardware task
switching mechanism, which Linux makes only limited use of. When the
NT flag is set, the iret instruction performs a hardware task
switch via the "backlink" field in the task state segment (TSS).
Without NT, iret looks much like a normal return.
The DOS attack works, essentially, by setting both flags (TF and NT), then
jumping into the kernel with an lcall instruction. The kernel
code did not clear those flags when entered via that path. Thus, the
setting of TF would cause an immediate processor trap within the kernel
code. That, by itself, is relatively harmless, except that the trap
handler returns via iret. That instruction, seeing that the NT
flag is set, attempts to perform a task switch via the TSS - an operation
the kernel was not expecting, and which had not been prepared for. So the
kernel switches into a nonexistent task, and everything comes to a stop.
It is at this point that one begins to appreciate the virtues of journaling
filesystems.
The solution, as coded up by
Linus, is simply to clear those flags when the kernel is entered via a call
gate. End of problem - once you get the patch installed.
The call entry code has not changed in a long time, so even very old
kernels are affected. The current 2.4.20 release candidate includes a fix,
and the distributors are beginning (slowly) to release updates which fix
the problem. 2.2 kernels are also vulnerable; if you have a 2.2-based
system running with untrusted users, you may want to rebuild the kernel
with this patch from Matthew Grant applied.
Comments (6 posted)
So... The feature freeze is in effect, the 2.5 kernel appears to be
relatively stable (for this stage of development), and all seems well with
the world. Then Rusty Russell's new module loader patch goes in, and all
hell breaks loose. What's going on?
The inclusion of the module patch is consistent with the policy Linus laid
out toward the end of October: the freeze date would be considered the
deadline for submission to him. Linus would, when it seemed appropriate,
merge new features after the deadline. He has done very little of that
sort of merging, but the new module code was one of the exceptions.
There are a few problems with the new module subsystem, most of which have
to do with the facts that the job is not complete (i.e. features are
missing), and that many of the changes had not been seriously tested out and
reviewed prior to being merged. The work is not complete because Rusty
never knew whether the patch would go in or not, and was busy enough just
keeping it up to date with kernel releases. The lack of testing and review
is explained by Rusty in this way:
Think back: who in their right mind would compile and test patches
to a rapidly-changing kernel, when those changes required userspace
tool changes and you didn't know if it was going to go in or not?
If you care about modules in 2.5, you're probably a developer who
needs modules to do their job, so why rock the boat?
In other words, the nature of the patch was such that the people who most
needed to test it out were uninclined to do so. Many of those people are
the ones who are upset by the current state of affairs.
The initial module patch did, indeed, lack some features. Little things
like module parameters, device table support (needed for hotplug support),
unloading of modules, a working modprobe, modversions, etc. In
other words, when the module patch first went in, loadable modules stopped
working for almost everybody. Broken features are not that unusual for a
development kernel, but this is a much-used feature in a kernel that was
supposed to be in a feature freeze, so people complained.
The situation was not helped by the fact that the first module patches were
merged just as Rusty got on a plane to the other side of the world. Even
so, he has been working frantically to fix up his patches and get them off
to Linus. By the time 2.5.48 came out (the first actual kernel release
with the new code), some of the worst omissions had been taken care of, and
the rest are being addressed quickly. The level of complaints over missing
features has dropped significantly.
Other sorts of complaints remain, however, as people try to
actually make things work with the new scheme. The biggest controversy has
related to Rusty's attempts to eliminate some of the race conditions that
tend to crop up during module loading and unloading. A common bug found in
module initializion routines is to make resources (i.e. a /proc
file or a registered device) available to the kernel, then to fail module
loading later on. If some other process has accessed that resource in the
mean time, it could find itself trying to execute within a module that was
never fully loaded.
Rusty's solution is to add a "live" flag to each module. Any code
which calls into a module must first increase that module's reference count
with the new try_module_get() function. This function will return
a failure status if the live flag is not set. This flag remains
cleared until the module initialization function has finished its work.
This mechanism guarantees that a module's code will not be called until the
module is ready, and it is clear that the module load process will succeed.
(It is also used to unload modules safely; see Rusty's FAQ for more information on how this
all works).
The problem is that, sometimes, there are legitimate reasons for wanting to
call into a module before that module has finished initialization. For
example, when a disk driver registers a disk, the upper layers immediately
want to have a look at the partition table. Under the new scheme, that
look would fail (since the module was not yet marked as being alive) and
the drive's partitions would not be registered. Thus, a patch which was
intended to fix theoretical problems (very few people have actually been
bitten by module load race conditions) ended up creating real problems with
drivers that, previously, had been working just fine. That did not go over
particularly well.
This problem has been fixed by marking a module as being alive while its
initialization function runs. In other words, initialization is, once
again, unprotected, and driver authors need to be very careful to not
export any interface to the rest of the kernel until they are ready for
that interface to be used. Which makes basic sense.
Driver code also needs, in many cases, to be more fault tolerant. Rusty asked a related question: how does one register
two /proc files? If the registration of the second file fails,
there is no way to safely unregister the first one and fail the module
load. Linus's answer makes basic sense once
you look at it: the module simply can not fail to load at that point. Once
the module has exported an interface, it must be there to handle uses of
that interface. It is better to simply do without the failed
/proc file than fail the whole load and risk race conditions. The
complexity required to allow failing at any time is not justified by the
benefits.
Various other problems (such as the requirement that every module have an
initialization function, or explicitly include a no_module_init
line) are being worked out. Before too long, with luck, modules will just
work again (better than before), and the kernel developers will be arguing
about something else.
Comments (none posted)
Patches and updates
Kernel trees
Core kernel code
Development tools
Device drivers
Documentation
Filesystems and block I/O
Janitorial
Memory management
Networking
Architecture-specific
Security-related
Benchmarks and bugs
Miscellaneous
Page editor: Jonathan Corbet
Next page: Distributions>>