Brief items2.6.5-mc4, the last "merge candidate" tree from Andrew Morton. A great deal of new stuff is going into 2.6.6; see the separate article below for more information.
The current -mm tree is 2.6.5-mm5; recent additions to -mm include more CPU scheduler work, some of Hugh Dickins's "prepare for object-based reverse mapping" patches (see below), a new memory binding API for NUMA systems, and lots of fixes.
The current 2.4 kernel is 2.4.26, which was released on April 14. Among other things, this release includes the fix for the iso9660 filesystem buffer overflow vulnerability. Overall, changes in 2.4.26 include the "forcedeth" nVidia Ethernet driver, a big bonding network driver rework, a lot of XFS work, various architecture updates (including Intel "IA32e" support), TCP Westwood support, an ACPI update, and lots of fixes.
Users of x86_64 systems may want to note that, as of 2.4.26, no more development will be done for that architecture in 2.4.
Kernel development news
Overall, one might be forgiven for thinking that 2.6.6 looks much like a development kernel release. In fact, most of more intrusive patches listed above have been around and tested for some time now; they have just finally made their escape from the -mm tree. With the exception of the CPU scheduler patches (which we hope to cover here next week) and, perhaps, the reverse mapping VM changes, 2.6.6 looks likely to contain the bulk of the work that most developers are still hoping to see added to 2.6. 2.6.6 contains enough big changes that its chances of containing an unpleasant surprise or two are fairly high. Within a few more releases, however, 2.6 may well have stabilized to the point that it can be more widely deployed and the bulk of developer attention can move on to 2.7.
In response to the reverse mapping VM discussions over the last month or so, Hugh Dickins has posted a series of patches which prepare the kernel for a full object-based reverse-mapping scheme and the removal of the per-page PTE chains. Hugh's patches carefully leave room for the inclusion of either his anonmm patches or Andrea Arcangeli's anon_vma work, though he seems to expect that anon_vma will win out. The full set of patches posted so far can be found in the "memory management" part of the "patches and updates" section, below.
Of those patches, the first three have been merged as of this writing. rmap 1 simply creates a new include file (linux/rmap.h) and moves much of the reverse-mapping declarations there. The second patch (rmap 2) changes the way the swap subsystem keeps track of swap cache pages; this change is needed to free up a couple of struct page fields for reverse mapping tasks. Finally, rmap 3 finishes out the struct page work for various architectures.
Later patches in Hugh's series get more ambitious; rmap 7 adds object-based reverse mapping for file-backed memory. Those patches have not been merged as of this writing, however.
A completely different set of patches which changes how the page cache works has been merged. The description of this work, as written by Andrew Morton, reads:
This work made some fundamental changes in how page cache pages are tracked. The struct page structure has long included a field called "list", being a list_head structure used to track the state of the page. When the page is marked dirty, or placed under I/O, it is put on a list with other such pages. Unfortunately, managing those lists as the state of the page changes proves to be difficult; hence the juggling analogy.
In response, the page lists have been removed altogether; as a side-benefit, this change shrinks struct page by eight bytes - a significant savings, considering that there is one such structure for every physical page in the system. The lists have been replaced with an enhanced radix tree which supports "tagging" of pages. When a page is dirtied, it is simply marked dirty in the radix tree, rather than being added to a list. Similarly, pages which are currently being written back to disk are marked. A new set of radix tree operations allows the kernel to find these pages when the need arises. Searching the tree is not as fast as following a dedicated list, but the radix tree implementation appears to be fast enough that few people will notice the difference.
These changes required touching a lot of VM and page cache code; every user of the page->list field had to be fixed. As a result of the changes, the order in which dirty pages are written to disk has changed; writing always happens in file-offset order now. This change appears to be an improvement for many applications; Andrew reports as much as 30% faster benchmark results. I/O can slow down for some situations involving parallel writes on SMP systems, however.required procedure is a bit inelegant, forces the user to ignore warnings from the build code ("you messed with SUBDIRS, do not complain if something goes wrong"), and does not support modversions. It also requires the presence of a configured and built kernel source tree, something which was not necessary with previous kernels, and a build of an external module will often try to rebuild things in the main tree as well. Fixing up the external module build process has been on the "to do" list for some time.
Finally, somebody has done it. Sam Ravnborg has posted a patch which improves the external module build process in a number of ways.
The basic form of a makefile for an external module will not change much. It should still look something like:
ifneq ($(KERNELRELEASE),) obj-m := module.o else KDIR := /lib/modules/$(shell uname -r)/build PWD := $(shell pwd) default: $(MAKE) -C $(KDIR) M=$(PWD) endif
The change has been underlined above; the parameter that once read SUBDIRS=$(PWD) has changed to M=$(PWD). The older SUBDIRS= format will still work, however. It is also no longer necessary to specify the modules target when invoking the kernel build system.
When the kernel build system is invoked with the M= parameter, it does a number of things differently. It will make no effort to ensure that the built files in the kernel source tree are current; if a developer makes a change to the main tree, it is his or her responsibility to rebuild it before trying to make any external modules. Only a few targets (modules, clean, modules_install) are supported when building external modules. And the modpost program now maintains a file (Module.symvers) containing the symbol version information if modversions is in use; this file is used when postprocessing an external module to note the symbol versions expected by that module.
Among other things, the new scheme will allow distributors to package sufficient information for the building of external modules without the user having to actually configure and build the full kernel source tree. That information can be stored under /lib/modules by replacing the build symbolic link (which currently points back to the source tree) with a directory containing just the required information. That should make life simpler for everybody involved.scheduled to ship in just over one month. This distribution will be a high-profile deployment of the 2.6 kernel. Red Hat has often shipped highly-patched kernels, and there have been occasional criticisms that the company's kernels are so divergent from the mainline that they are incompatible with other Linux systems. Since we have been messing with the second Fedora Core 2 test release anyway, it seemed like a good time to look and see what sort of kernel it includes. To that end, we pulled down a copy of 2.6.5-1-321 from Arjan van de Ven's directory.
As it turns out, the number of patches contained in this kernel is relatively small. That is not entirely surprising; vendor kernel patch lists tend to get longer as the current development kernel progresses; some vendors, at least, have a tendency to backport features from the development tree. There is no development tree currently, so there is nothing to backport.
That said, the first patch is a big one: it's the full 2.6.5-mc1 tree from Andrew Morton. Now that the merge candidate patches are finding their way into 2.6.6-pre, Red Hat will not need to apply that particular patch itself.
The 2.6.6 kernel will feature an option (on by default) to use 4KB kernel stacks on the i386 architecture. The Fedora kernel has that patch, of course; it also includes a separate patch which takes away the option of using the traditional 8KB stacks. This change has upset some Fedora test users; the 4KB stacks break certain proprietary device drivers (e.g. nVidia) and some users of those drivers would prefer to have the ability to build a kernel that supports them. Red Hat seems determined to follow this path, however, on the assumption that nVidia will fix its drivers (and the general attitude that breaking binary modules is a low-priority problem at best).
Then, there are patches which are true Red Hat stuff. These include "exec shield," which makes buffer overflow attacks harder by enforcing no-execute permissions; the 4G/4G patch which provides expanded 32-bit virtual address spaces to both user space and the kernel; and TUX, the kernel-based high-performance web server. There is also an SELinux/security module patch which allows the kernel to bypass permission checks when creating sockets internally; this one changes the security module interface.
Then, there are various cleanup and safety patches. For example, gcc 3.4 supports a "warn_unused_result" attribute on functions; the compiler will complain when code calls a function marked with this attribute and fails to check the return value. The Red Hat kernel applies that attribute to a few functions (copy_from_user(), pci_enable_device(), etc.) to trap places where the proper checks are not made. Various functions which use too much kernel stack space have been fixed up. There is a patch which fixes some remaining sleep_on() calls and warns about others. The driver for /dev/mem has been fixed to disallow access to most of main memory. And there is a driver for a "crash" device which provides direct read access to main memory, seemingly for use by a crash dump utility.
Finally, there is a small set of bug fixes and patches to ease the build process on various architectures. Overall, the Fedora kernel suggests that, in Red Hat's view, not a whole lot needs to be added to the 2.6 kernel (the upcoming 2.6.6 version, at least) for it to be ready for wide use.
Patches and updates
Filesystems and block I/O
Page editor: Jonathan Corbet
Next page: Distributions>>
Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds