LWN.net Logo

A look at 2.5.49-mm1

Andrew Morton's -mm patch series continues to be the staging area for no end of interesting patches in the memory management area. As of this writing, Andrew's latest patch is 2.5.49-mm1. Here's a look at a few of the items in that patch that are (1) interesting, and (2) not so complicated as to give your editor severe brain strain.

The shared page table patch is an important part of -mm1. This work was originally done by Daniel Phillips, but the patch has been beaten into shape and turned into something useful by David McCracken. The standard Linux virtual memory implementation does not share page tables between processes; even if two processes are sharing a large chunk of memory, they access that memory through separate page tables. With this patch, processes that fork() share their page tables (on a copy-on-write basis) with their child processes; page tables can also be shared when processes use mmap() to create a large shared memory region.

This patch can speed up fork() significantly (i.e. by a factor of almost 20 for very large processes) since it is no longer necessary to copy page tables and set up the associated reverse mapping data structures. It also greatly reduces the memory used for page tables and rmap entries; the savings can be hundreds of megabytes in the "large Oracle server" scenario. Shared page tables currently only work on x86 systems with high memory. The patch appears stable (the last bug that had been biting people just got stomped), but merging it into 2.5 would push the feature freeze pretty hard at this point. On the other hand, if it does not go into 2.5, it would not be surprising to see this patch worked into various distributor kernels.

The asynchronous direct I/O patch extends the asynchronous I/O infrastructure into the direct (block) I/O subsystem. It is part of the stated goal of making all I/O within the kernel be asynchronous.

Jens Axboe's rbtree I/O scheduler addresses a performance problem with the current I/O block scheduler: it has to scan through the list of pending requests every time it needs to add a new one. As the request queue gets long (and a certain length yields better performance), this scan takes time. So the new scheduler replaces the linear list of requests with a tree (using the generic red/black tree implementation in the 2.5 kernel).

The "currently untested and unused" page reservation API is meant to deal with situations where the kernel must be able to allocate pages without sleeping - and without failing. A call to reserve_local_pages() sets aside a given number of pages which are guaranteed to be available for a subsquent allocation (with the GPF_RESERVED allocation flag). There is also a new page walking API which simplifies the task of wanding through a process's address space. As a special case, this API includes support for the creation of scatter/gather lists for zero-copy I/O operations.

There's a lot of other work rolled into the 2.5.49-mm1 patch; see Andrew's posting for the full list.


(Log in to post comments)

A look at 2.5.49-mm1

Posted Dec 11, 2002 21:36 UTC (Wed) by roelofs (subscriber, #2599) [Link]

Is "wanding" some weird new jargon or merely a typo? (wending? winding? walking?)

Greg

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds