LWN Weekly Edition Front pageKernel development Development Linux in Business Linux in the news Announcements Letters to the editor ->One big page
This page Previous weekFollowing week Sponsored link Serve your customers, not your servers, with VERIO Linux VPS. Full-access test-drive here. |
Kernel developmentRelease status Current kernel release status The current development kernel is 2.5.30, which was released by Linus on August 1. It includes the usual IDE patches (through IDE 111), changes to the "generic disk" data structure, the "strict overcommit" VM patch, the removal of the "khttpd" in-kernel web server, a number of devfs changes (by Greg Kroah-Hartman, and not entirely to devfs author Richard Gooch's liking), a long list of driverfs changes, and many other fixes and updates. See the long-format changelog for all the details.Linus's BitKeeper tree (which will become 2.5.31) includes an ISDN update, more driverfs work, a JFS update, a lot of ethernet driver updates, and more. Interestingly, this tree also includes the "User-mode Linux preparation" patches, which make various changes to core code needed by UML. UML itself is not there yet, but the presence of these patches suggest that it is coming soon. The current prepatch from Dave Jones is 2.5.30-dj1, which contains a small set of fixes and some rubble from his switch over to BitKeeper. "Chances are this won't even boot for many people (if any at all)." The latest 2.5 kernel status summary from Guillaume Boissiere came out on August 7. The current stable kernel is 2.4.19. The much-awaited final release was announced by Marcelo on August 2; it contained no changes after the -rc5 release candidate. The full list of changes in 2.4.19 is available - be warned that it is long. Marcelo has already released 2.4.20-pre1, the first prepatch for the 2.4.20 kernel. The list of changes is long, but it consists mostly of fixes and driver updates. Marcelo did initially include a backport of NAPI (high performance networking; see the October 4, 2001 LWN Kernel Page), but backed parts of it out at the last minute; he is waiting for justification to include it for real. Says Marcelo: "2.4.20 will be a much faster release cycle than 2.4.19 was." The current prepatch from Alan Cox is 2.4.20-pre1-ac1.
Kernel development news Large page support in the Linux kernel Most modern processors have the ability to work with "large pages" - single page table entries which cover large (up to multiple megabyte) ranges of contiguous physical memory. With one exception, this feature is not used in the Linux kernel, which works with a 4K or 8K page size (depending on architecture) in all situations. Smaller pages reduce internal fragmentation, are quick to swap in and out, don't require the virtual memory system to maintain large, contiguous chunks of memory, and help to ensure that exactly the virtual memory that is in use now is resident in physical memory. Small pages are the best choice for most situations. Due to the complication of supporting multiple page sizes in the Linux VM implementation, no such support has been merged so far.There are advantages to working with large pages, however. 4MB of memory in 4KB pages requires 1024 page table entries (PTEs) - that is a lot of memory devoted to overhead, and significant processor time to set up, tear down, and maintain those PTEs. This overhead is multiplied when shared memory segments are in use, since Linux is currently unable to share page tables. But the real savings with large pages has to do with the processor's translation buffer - a small cache which remembers the result of virtual-to-physical address translations. An address lookup through the translation buffer is quick; one that has to actually go to the page table is slow. Large pages greatly extend the range of the translation buffer, and simply make applications run faster; performance improvements of 30% have been claimed at times. The fact that Oracle uses lots of large, shared memory regions and would like to see large page support in the kernel is also helping to drive development in this area. The most recent large page patch is this one by Rohit Seth. It allows processes to explicitly request a chunk of large page memory with a new get_large_pages system call; there is also a share_large_pages call for creating shared memory regions. The patch avoids much of the complexity of supporting large pages in the VM by, well, avoiding it. Large pages are handled completely outside of the normal memory management mechanisms. When the system boots, a percentage of memory (25%, by default) is simply set aside to satisfy large page requests. These pages are handed out when requested (as long as they last) and are not swapped. This patch is thus (relatively) simple. It gets the job done in certain situations - imagine a large box whose job is to run a relational database system; nailing down a quarter of memory to improve database performance is a reasonable thing to do. But this patch (intentionally) does not address the larger problem. In fact, as Linus points out, this isn't really a "large page" patch at all:
The current largepage patch is really nothing but an interface to
the TLB. Please view it as that - a direct TLB interface that has
zero impact on the VFS or VM layers, and that is meant _purely_ as
a way to expose hw capabilities to the few applications that really
really want them
So what might a real large page patch provide? Wishes that have been expressed include:
The automatic use of large pages would be helped by another suggestion from David: if it becomes necessary to swap out a large page, simply split it back into a long list of regular pages and proceed as usual. Then most of the swap complexity would go away. Of course, the October deadline is getting closer. So all of these ideas are almost certainly destined to wait until after the next stable series. But one of the variants of the simpler "TLB interface" patches may yet get in this time around and make the database vendors (and others) happy. (What, you may ask, is the "one exception" where the kernel uses large pages now? The mapping of the kernel image itself - a single, large chunk of non-swappable memory - is handled with a large page PTE.)
Patches and updates Kernel trees
Core kernel code
Development tools
Device drivers
Filesystems and block I/O
Memory management
Architecture-specific
Security-related
Miscellaneous
Page editor: Jonathan Corbet |
Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.