Kernel development [LWN.net]

Kernel release status

The current development kernel is 2.5.52, which was released by Linus on December 15. It consists mostly of fixes and updates, of course, but there's also a bunch of changes from Andrew Morton's "-mm" tree (including the long-term fix for the ext3 data=journal corruption bug), XFS and JFS updates, more module fixes, and a kconfig update. See the long-format changelog for the gory details.

The current stable kernel is 2.4.20; Marcelo released the second 2.4.21 prepatch on December 18. This large patch is mostly made up of ia-64 updates, but it also includes some NFS fixes, a couple of ext3 fixes, a bunch of stuff from the "-ac" tree, a new megaraid driver, and various other fixes and updates.

For those using very stable kernels: Alan Cox has announced the first 2.2.24 release candidate. It contains a handful of bug fixes, including one for a new denial of service vulnerability caused when somebody runs mmap() on a /proc/pid/mem file.

Comments (3 posted)

How to speed up system calls

It all started with an observation that system calls on a modern Pentium 4 processor are far slower than on older CPUs. It seems that, for whatever reason, software interrupts generated with the int instruction are very slow with the P4 processor. Since x86 Linux invokes system calls with "int $0x80", that slowness makes itself felt - especially with system calls (like getpid()) that would, otherwise, be very fast.

There is an obvious solution to this problem: use the sysenter instruction instead. sysenter is quite a bit faster on modern Pentium processors. There are just a couple of problems: not all x86 processors support sysenter, and sysenter steps on registers in ways that can be hard to work around.

The lack of across-the-board support for sysenter is a problem. The kernel maintains a set of flags telling it what capabilities a given processor has; other processor-specific options are set at configuration time. System calls, however, are not invoked from the kernel - that is the C library's job. The last thing glibc needs is to be trying to figure out, at run time, the right way to invoke system calls.

Linus's solution to this problem is a patch which brings back a variant of an old idea. As of 2.5.53, the kernel will map a global, read-only page at the top of every process's address space. That page contains the optimal code for executing a system call on the current processor. Whenever glibc needs to call into the system, it simply sets up the registers and, rather than doing the old int $0x80, it jumps into the new page. The C library still needs to do a runtime test (since older kernels will lack this "vsyscall" page), but it need not concern itself with the detailed capabilities of different processors.

Keeping the registers straight turned out to be a trickier problem. The way sysenter steps on registers makes it hard to invoke system calls with more than five parameters. Various schemes were looked at, including creating a new "extra argument block" or simply requiring that six-argument system calls be invoked the old way. Linus finally came up with a tricky solution that makes it all work, however; those of you who like digging through x86 assembly may want to peek at his "absolutely wonderfully disgusting solution" to the problem. "I'm a disgusting pig, and proud of it to boot."

The result of all this: the gettimeofday() system call runs in just over half the time on a P4 processor. The speedup on Pentium 3's is less - a factor of 1.2 - but is still worthwhile.

Now that the vsyscall page is in place, will it be used for other things, such as implementing gettimeofday() entirely in user space? The answer, for now, appears to be "no". Getting a user-space gettimeofday() right is, seemingly, harder than it looks; there are synchronization issues, especially on some SMP systems where the clocks may not be synchronized by the hardware. So a user-space gettimeofday() appears to not be in the works, for now at least.

Comments (7 posted)

Whatever happened to the feature freeze?

While most people seem to think that the new system call mechanism makes sense, the question has come up: what kind of feature freeze are we in if we're adding things like a whole new way of doing system calls? Alan Cox, perhaps, had the most direct comment:

Linus. you are doing the slow slide into a second round of development work again, just like mid 2.3, just like 1.3.60, ...

Given the high hopes that have been placed on this feature freeze actually working, this sort of remark is something to be concerned about.

Linus has acknowledged the concern, and started a discussion on how patches should be reviewed. Looking ahead:

I thought about the code freeze require buy-in from three of four people (me, Alan, Dave and Andrew come to mind) for a patch to go in, but that's probably too draconian for now. Or is it (maybe start with "needs approval by two" and switch it to three when going into code freeze)?

There seems to be fairly widespread agreement, however, that this approach could be overly bureaucratic for now. Each development kernel release still contains hundreds of patches (636 for 2.5.51; in 2.5.52 there were "only" 153); people are understandably nervous about having that many patches go through a committee. Or even worse, being on the committee. Of course, Larry McVoy has an elaborate approach involving BitKeeper all planned out, but, given that a couple of people on the short list don't use BitKeeper, things will probably not go that way.

Andrew Morton has suggested simply adopting a set of guidelines for what can be accepted. The suggested list:

Bug fixes
Speedups
In-progress features (or those Linus had already said would be merged)
New drivers or filesystems

Anything outside of that list would not be included at this point. As the freeze gets harder, items are dropped off the list, until only bug fixes are left.

Given everybody's time constraints, the relatively informal approach is the most likely one to be adopted at this point. The important thing, in the end, is that everybody agrees that the feature freeze is important and is keeping an eye out for violations. As long as that continues, things will hopefully not get too far out of control.

Comments (none posted)

Supporting hardware crypto in the kernel

Now that the kernel has its own cryptographic API, James Morris is thinking about how to support cryptographic hardware. A number of cards which perform cryptographic functions exist, and it would be nice to be able to make full use of these cards with a Linux system. Quite a few issues need to be considered on the way there, however, including:

How should multiple cards be supported? This gets tricky, especially for session-oriented crypto operations.
How should card failures (and resource exhaustion) be handled? The current crypto API isn't designed around this sort of failure.
Some network cards can do their own IPsec processing; taking advantage of that capability may require a higher-level interface.
User space may want to be able to use cryptographic devices as well, meaning that some sort of interface needs to be designed.
Many devices lack useful programming documentation, which will make creating a Linux driver harder (or impossible).

And so on. Now is the time to get these decisions right; anybody who is interested in the interface to cryptographic hardware should probably have a look at James's posting and join the discussion.

Comments (none posted)

Elks Distribution Edition 0.0.5 released

Don't throw away that old 80286 system yet - with the just-announced release of EDE (Elks Distribution Edition) 0.0.5, that system, too, can run Linux. EDE comes with a bleeding-edge 0.1.1 kernel and a new elkscmd package; click below for the details. (Thanks to Alan Cox).

Full Story (comments: 4)

William Lee Irwin III 2.5.51-bk1-wli-1 ?

William Lee Irwin III 2.5.52-wli-1 ?

Martin J. Bligh 2.5.52-mjb1 (scalability / NUMA patchset) ?

Patricia Gaughen [PATCH] (1/2) i386 discontigmem support against 2.4.21pre1: paddr_to_pfn ?

Patricia Gaughen [PATCH] (2/2) i386 discontigmem support against 2.4.21pre1: discontigmem ?

Jeff Dike Allow UML kernel to run in a separate host address space ?

Jeff Dike Update UML to 2.5.52 ?

Rusty Russell Revert module directory hierarchy and depmod invocation ?

Rusty Russell Module init reentry fix ?

Rusty Russell module_param() primitive (1/3) ?

Rusty Russell module_param() primitive (2/3) ?

Rusty Russell module_param() primitive (3/3) ?

Eric W. Biederman kexec for 2.5.51.... ?

Eric W. Biederman kexec for 2.5.52 ?

Ingo Molnar threaded coredumps, tcore-fixes-2.5.51-A0 ?

Robert Love updated scheduler tunables ?

Inaky Perez-Gonzalez Priority-based real-time futexes v1.1 for 2.5.52 ?

Joe Korty An O1, nonrecursive ID allocator for Posix timers ?

Greg KH add kobject to struct mapped_device ?

Rusty Lynch Fault-Injection Test Harness Project ?

Vamsi Krishna S . kprobes for 2.5.52 ?

John Levon oprofile update for 2.5.52 ?

Grover, Andrew ACPI releases updated (20021212) ?

Greg KH USB changes for 2.5.51 ?

Greg KH USB changes for 2.5.52 ?

Greg KH [FYI] 2.5 changes in usb core ?

Ducrot Bruno S4bios for 2.5.52. ?

James Bottomley generic device DMA implementation ?

James Morris Hardware support notes for the kernel crypto API (2.5+) ?

Denis Vlasenko lk maintainers ?

Feldman, Scott Intel PRO/100 software developer manual released ?

Jeff Dike UML documentation updates ?

Greg KH dmfs for 2.5.51 ?

Olaf Dietsche 2.5.52: access permission filesystem 0.13 ?

Olaf Dietsche 2.5.52: Filesystem capabilities 0.13 ?

Steve Best Journaled File System (JFS) release 1.1.1 ?

Hugh Dickins kill __GFP_HIGHIO ?

Andrew Morton 2.5.51-mm2 ?

Andrew Morton 2.5.52-mm1 ?

Rik van Riel 2.4.20-rmap15b ?

Sowmya Adiga AIM benchmark result for kernel 2.5.51 with mm2 patch. ?

Sowmya Adiga unixbench result for kernel 2.5.49 and 2.5.50 ?

Sowmya Adiga Unixbench result for kernel 2.5.51 with mm2 patch . ?

Con Kolivas 2.5.52 with contest ?

Albert D. Cahalan procps 3.1.3 ?

Kristian Peters updated BadRAM-patch for linux-2.4.20 ?

Kernel development

Brief items

Kernel release status

Kernel development news

How to speed up system calls

Whatever happened to the feature freeze?

Supporting hardware crypto in the kernel

Elks Distribution Edition 0.0.5 released

Patches and updates

Kernel trees

Architecture-specific

Core kernel code

Development tools

Device drivers

Documentation

Filesystems and block I/O

Janitorial

Memory management

Benchmarks and bugs

Miscellaneous