LWN.net Logo

Advertisement

AOSP, Kernel Androidisms, System Server, Internals / 5-days / O'Reilly Author Instructor

Advertise here

Kernel development

Brief items

Kernel release status

The current development kernel is 2.5.48, which was released by Linus on November 17. This one includes the new module loader - so expect surprises if you compile with modules, and note that you need a new set of module utilities (available as a source tarball or source RPM). Other changes include boot process cleanups (part of the initramfs effort), more IPSec fixes, high-resolution times in the stat64() system call, some SCSI cleanups, a bunch of include file cleanup work, and lots of other fixes. And, of course, the fix for the denial of service vulnerability. The long-format changelog has the details.

Linus's pre-2.5.49 BitKeeper tree includes a number of module fixes, nanosecond time support for the NFS filesystem, an S/390 update, and a large number of other fixes.

The current development kernel prepatch from Alan Cox is 2.5.47-ac6. Alan continues to issue patches against 2.5.47 because "the 2.5.48 tree is a little bit too broken to run IDE development against."

The current stable kernel is 2.4.19. The second 2.4.20 release candidate was released by Marcelo on November 15; it includes a fix for the denial of service vulnerability and several other updates.

Alan Cox's latest 2.4.20 prepach is 2.4.20-rc2-ac2, which adds a number of fixes to the second release candidate.

Alan has also released 2.2.23-rc2, which is primarily motivated by the denial of service fix.

Comments (none posted)

Kernel development news

The state of the feature freeze

The 2.5 feature freeze is now three weeks old. At this point, it mostly appears to be working as intended. The biggest exception (the new module loader) will be looked at in a separate article.

One of the goals of the freeze was to give developers a well-known target date so they would not flood Linus with last-minute patches. There was a big wave of patches that came through in October, but it was small and well organized compared to the deluges that came after previous (surprise) feature freezes. These patches were, for the most part, in reasonably good shape. With relatively few exceptions, the post-freeze kernel is in relatively stable condition.

The freeze is holding reasonably well. The only really new features that have gone in recently are the new module loader and high-resolution times in the stat64() system call. Linus has put his foot down when faced with a number of destabilizing changes, such as some overzealous header file "cleanup" work. He is still considering a few new features (kexec, kernel probes, and POSIX timers), but they are relatively small and went into the queue well ahead of the freeze date.

Of course, it is far too early to conclude that the freeze will actually hold - we have to wait to see what happens in 2003 for that.

The 2.5 stabilization process will, hopefully, be helped by the bugzilla database that has been set up by OSDL. Proper tracking of 2.5 bugs is clearly necessary if they are to be dealt with before the stable release. Whether this database will really fill that need remains to be seen; after a week of operation, it only lists sixty bugs. The 2.5 kernel clearly must have more problems than that; now is the time for people who have encountered problems to put them into bugzilla so they do not get overlooked.

Not all of the kernel developers have shown great enthusiasm for working with the bugzilla system; to some of them, it looks like a lot of bureaucratic work that distracts from the real job of fixing bugs. This should not be a problem as long as people who are interested maintain the bug database and keep it current.

Back at the kernel summit, there seemed to be a consensus that, at this stage, an assistant to Linus would be named to help with stabilization. Linus, by his own admission, does not always do a great job of the release management task. The assistant would help review patches and might also, eventually, become the maintainer of the stable release. That prospect, of course, would help motivate the assistant to look hard at proposed changes and exclude anything that was not really necessary.

This idea was well received at the summit, even by Linus. But this person has not been named, and there has not really even been any discussion of the subject. Following through soon on the appointment of somebody to help stabilize the kernel is probably one of the best things the development community could do to ensure that the freeze (and stable release) are successful.

Comments (none posted)

The x86 denial of service bug

The current 2.5 and 2.4.20-rc releases both contain a patch for a newly-discovered vulnerability in the Linux kernel. Simply put, anybody who can run an arbitrary program on a Linux system can bring it down in flames. Your editor, who is not an expert on x86 assembly (but who can still describe the difference between CDC 6xxx A, B, and X registers), has made an effort to figure out just what is going on here, for those who are curious.

The x86 processor contains many flags which affect its operation. Two of these flags are abused in this exploit:

  • The trap flag (TF) causes a processor trap to happen after execution of every instruction. It is used primarily for debugging purposes.

  • The nested task (NT) flag indicates that the current task is executing via an interrupt (or other task-switching operation) that causes another task to be suspended. It is part of the hardware task switching mechanism, which Linux makes only limited use of. When the NT flag is set, the iret instruction performs a hardware task switch via the "backlink" field in the task state segment (TSS). Without NT, iret looks much like a normal return.

The DOS attack works, essentially, by setting both flags (TF and NT), then jumping into the kernel with an lcall instruction. The kernel code did not clear those flags when entered via that path. Thus, the setting of TF would cause an immediate processor trap within the kernel code. That, by itself, is relatively harmless, except that the trap handler returns via iret. That instruction, seeing that the NT flag is set, attempts to perform a task switch via the TSS - an operation the kernel was not expecting, and which had not been prepared for. So the kernel switches into a nonexistent task, and everything comes to a stop. It is at this point that one begins to appreciate the virtues of journaling filesystems.

The solution, as coded up by Linus, is simply to clear those flags when the kernel is entered via a call gate. End of problem - once you get the patch installed.

The call entry code has not changed in a long time, so even very old kernels are affected. The current 2.4.20 release candidate includes a fix, and the distributors are beginning (slowly) to release updates which fix the problem. 2.2 kernels are also vulnerable; if you have a 2.2-based system running with untrusted users, you may want to rebuild the kernel with this patch from Matthew Grant applied.

Comments (6 posted)

Fun with modules

So... The feature freeze is in effect, the 2.5 kernel appears to be relatively stable (for this stage of development), and all seems well with the world. Then Rusty Russell's new module loader patch goes in, and all hell breaks loose. What's going on?

The inclusion of the module patch is consistent with the policy Linus laid out toward the end of October: the freeze date would be considered the deadline for submission to him. Linus would, when it seemed appropriate, merge new features after the deadline. He has done very little of that sort of merging, but the new module code was one of the exceptions.

There are a few problems with the new module subsystem, most of which have to do with the facts that the job is not complete (i.e. features are missing), and that many of the changes had not been seriously tested out and reviewed prior to being merged. The work is not complete because Rusty never knew whether the patch would go in or not, and was busy enough just keeping it up to date with kernel releases. The lack of testing and review is explained by Rusty in this way:

Think back: who in their right mind would compile and test patches to a rapidly-changing kernel, when those changes required userspace tool changes and you didn't know if it was going to go in or not? If you care about modules in 2.5, you're probably a developer who needs modules to do their job, so why rock the boat?

In other words, the nature of the patch was such that the people who most needed to test it out were uninclined to do so. Many of those people are the ones who are upset by the current state of affairs.

The initial module patch did, indeed, lack some features. Little things like module parameters, device table support (needed for hotplug support), unloading of modules, a working modprobe, modversions, etc. In other words, when the module patch first went in, loadable modules stopped working for almost everybody. Broken features are not that unusual for a development kernel, but this is a much-used feature in a kernel that was supposed to be in a feature freeze, so people complained.

The situation was not helped by the fact that the first module patches were merged just as Rusty got on a plane to the other side of the world. Even so, he has been working frantically to fix up his patches and get them off to Linus. By the time 2.5.48 came out (the first actual kernel release with the new code), some of the worst omissions had been taken care of, and the rest are being addressed quickly. The level of complaints over missing features has dropped significantly.

Other sorts of complaints remain, however, as people try to actually make things work with the new scheme. The biggest controversy has related to Rusty's attempts to eliminate some of the race conditions that tend to crop up during module loading and unloading. A common bug found in module initializion routines is to make resources (i.e. a /proc file or a registered device) available to the kernel, then to fail module loading later on. If some other process has accessed that resource in the mean time, it could find itself trying to execute within a module that was never fully loaded.

Rusty's solution is to add a "live" flag to each module. Any code which calls into a module must first increase that module's reference count with the new try_module_get() function. This function will return a failure status if the live flag is not set. This flag remains cleared until the module initialization function has finished its work. This mechanism guarantees that a module's code will not be called until the module is ready, and it is clear that the module load process will succeed. (It is also used to unload modules safely; see Rusty's FAQ for more information on how this all works).

The problem is that, sometimes, there are legitimate reasons for wanting to call into a module before that module has finished initialization. For example, when a disk driver registers a disk, the upper layers immediately want to have a look at the partition table. Under the new scheme, that look would fail (since the module was not yet marked as being alive) and the drive's partitions would not be registered. Thus, a patch which was intended to fix theoretical problems (very few people have actually been bitten by module load race conditions) ended up creating real problems with drivers that, previously, had been working just fine. That did not go over particularly well.

This problem has been fixed by marking a module as being alive while its initialization function runs. In other words, initialization is, once again, unprotected, and driver authors need to be very careful to not export any interface to the rest of the kernel until they are ready for that interface to be used. Which makes basic sense.

Driver code also needs, in many cases, to be more fault tolerant. Rusty asked a related question: how does one register two /proc files? If the registration of the second file fails, there is no way to safely unregister the first one and fail the module load. Linus's answer makes basic sense once you look at it: the module simply can not fail to load at that point. Once the module has exported an interface, it must be there to handle uses of that interface. It is better to simply do without the failed /proc file than fail the whole load and risk race conditions. The complexity required to allow failing at any time is not justified by the benefits.

Various other problems (such as the requirement that every module have an initialization function, or explicitly include a no_module_init line) are being worked out. Before too long, with luck, modules will just work again (better than before), and the kernel developers will be arguing about something else.

Comments (none posted)

Patches and updates

Kernel trees

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Networking

Architecture-specific

Security-related

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds