LWN.net Logo

Kernel development

Brief items

Kernel release status

The current development kernel is 2.5.49, released by Linus on November 22. "Architecture updates, threading improvements, shm fix (the cause of the Oracle problems), networking, scsi, modules, you name it, it's here." Details are in the long-format changelog.

Linus's (pre-2.5.50) BitKeeper tree has a great many patches, the bulk of which come from the -ac and -dj trees. It also has some latency reduction patches from Andrew Morton, real-time swap space accounting, a number of IDE enhancements, an LSM update, and a big ISDN update.

The current prepatch from Alan Cox is 2.5.49-ac1. It consists mostly of compile fixes and other small repairs.

The current stable kernel is 2.4.19. 2.4.20 is getting closer, though; 2.4.20-rc4 was released by Marcelo on November 26.

Alan Cox has released 2.4.20-rc4-ac1, which adds a few fixes to the 2.4.20 release candidate.

Comments (3 posted)

Kernel development news

A look at 2.5.49-mm1

Andrew Morton's -mm patch series continues to be the staging area for no end of interesting patches in the memory management area. As of this writing, Andrew's latest patch is 2.5.49-mm1. Here's a look at a few of the items in that patch that are (1) interesting, and (2) not so complicated as to give your editor severe brain strain.

The shared page table patch is an important part of -mm1. This work was originally done by Daniel Phillips, but the patch has been beaten into shape and turned into something useful by David McCracken. The standard Linux virtual memory implementation does not share page tables between processes; even if two processes are sharing a large chunk of memory, they access that memory through separate page tables. With this patch, processes that fork() share their page tables (on a copy-on-write basis) with their child processes; page tables can also be shared when processes use mmap() to create a large shared memory region.

This patch can speed up fork() significantly (i.e. by a factor of almost 20 for very large processes) since it is no longer necessary to copy page tables and set up the associated reverse mapping data structures. It also greatly reduces the memory used for page tables and rmap entries; the savings can be hundreds of megabytes in the "large Oracle server" scenario. Shared page tables currently only work on x86 systems with high memory. The patch appears stable (the last bug that had been biting people just got stomped), but merging it into 2.5 would push the feature freeze pretty hard at this point. On the other hand, if it does not go into 2.5, it would not be surprising to see this patch worked into various distributor kernels.

The asynchronous direct I/O patch extends the asynchronous I/O infrastructure into the direct (block) I/O subsystem. It is part of the stated goal of making all I/O within the kernel be asynchronous.

Jens Axboe's rbtree I/O scheduler addresses a performance problem with the current I/O block scheduler: it has to scan through the list of pending requests every time it needs to add a new one. As the request queue gets long (and a certain length yields better performance), this scan takes time. So the new scheduler replaces the linear list of requests with a tree (using the generic red/black tree implementation in the 2.5 kernel).

The "currently untested and unused" page reservation API is meant to deal with situations where the kernel must be able to allocate pages without sleeping - and without failing. A call to reserve_local_pages() sets aside a given number of pages which are guaranteed to be available for a subsquent allocation (with the GPF_RESERVED allocation flag). There is also a new page walking API which simplifies the task of wanding through a process's address space. As a special case, this API includes support for the creation of scatter/gather lists for zero-copy I/O operations.

There's a lot of other work rolled into the 2.5.49-mm1 patch; see Andrew's posting for the full list.

Comments (1 posted)

Reworking User-Mode Linux

User-Mode Linux (UML) is Jeff Dike's "port" of the Linux kernel to itself; a UML instance runs as a set of processes on a "real" Linux system. UML has long been useful as a kernel development tool - it's nice to have a development environment which can be tweaked with normal debuggers, and which can crash without taking down the host system. In recent times, there has been a growing level of interest in UML for virtual hosting and honeypot applications as well. Users (or attackers) can be given root access to a UML instance without, one hopes, endangering the host system.

UML has traditionally worked by running every UML process as a process on the host system. The kernel lives up at the top of each process's address space; transitions to and from "kernel mode" are handled with signals. The problem with this mode of operation is that it is hard to make secure, since the UML kernel's memory range is accessible to the processes it is running. This mode is also slow, since it involves frequent memory protection changes and signals.

So Jeff has released a patch which fixes these problems by radically changing how UML works. In the new scheme, a UML instance runs as exactly two processes on the host system. One is the UML kernel, while the other takes turn running user-space processes. The result is more secure (kernel space, being in a separate process, is now completely inaccessible), and significantly faster as well. There is, according to Jeff, only one disadvantage to the new way of doing things: it can't actually be implemented on a stock Linux kernel. This is the sort of nagging little problem that has been the downfall of many a great development project.

The problem has to do with how the user-space process works. That process needs to run each UML process in its own address space. In other words, every time the UML kernel decides to switch to a new process, the host-system process running the UML processes needs a whole new memory management data structure. The Linux kernel does not currently have the ability to switch a process's memory environment in this manner.

Jeff's solution is to create a magic file called /proc/mm. Opening this file creates a new address space; that address space can be modified by writing to the file. When the file is closed, the address space is deleted. Then, there is a set of ptrace() extensions, one of which allows the caller to change the address space of the traced process. By using /proc/mm to create a separate address space for each UML process, the UML kernel can give each of its processes its own view of the world within a single host system process. Problem solved.

It all looks like it works well. The /proc/mm approach may run into some rough sailing on linux-kernel; a system call implementation (or even /dev) might be better received. However it is implemented, this new feature is exactly that: a new feature. Adding new features into the virtual memory and process management subsystems is exactly what is not supposed to happen during this phase of 2.5 development.

Comments (2 posted)

Patches and updates

Kernel trees

Core kernel code

  • Andries.Brouwer@cwi.nl: kill i_dev. (November 22, 2002)

Development tools

Device drivers

Documentation

Filesystems and block I/O

Kernel building

Memory management

  • Rik van Riel: rmap 15a. (November 26, 2002)

Networking

Architecture-specific

Security-related

Benchmarks and bugs

Miscellaneous

Page editor: Jonathan Corbet
Next page: Distributions>>

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds