LWN.net Logo

The end of the hugetlb system calls

The hugetlb (or "large page") patch was covered here last August. This patch added a couple of new system calls allowing a suitably privileged process to create anonymous memory using the large page capability of most modern processors. Using large pages cuts down on page table overhead, and, crucially, optimizes the use of the processor's address translation cache. The result is that applications using large memory arrays (Oracle, in particular) run faster.

The large page capability is seen as useful by most developers, but there has been a long series of complaints about the system call interface. The system calls do pretty much what one would expect: allocate a large page region, free it, share it with others. But not everybody sees the need for a new set of system calls for performing what is (mostly) standard memory operations. Then, there is the issue of permissions. The ability to allocate huge pages can not be handed out to just anybody, since it is a good vehicle for the creation of denial of service attacks. That means that root access is required to make use of the large page capability. Call them superstitious, but many users are reluctant to run Oracle with root access.

Meanwhile, William Lee Irwin added hugetlbfs - a RAM-based filesystem which uses large pages. An application wishing to create a memory region with large pages can create a file in a hugetlbfs directory, then use mmap() to map it into its address space. Sharing is nicely handled by the filesystem itself, and need no longer be done with a separate system call. And the permissions problem is solved by allowing a system administrator to set protections on the hugetlbfs filesystem which fit the site's needs. The filesystem interface provides a more flexible interface to the large page facility. So, as of 2.5.54, the system call interface will be removed.

All this could lead one to wonder why the hugetlb patch wasn't done this way in the first place. The whole point of the kernel peer review process, after all, is to keep poor interfaces out of the kernel. Linus's answer to this is simple: the patch simply was not much discussed prior to merging because the companies behind it are still unused to open code development. In fact, some companies have rules which forbid the sorts of conversations needed to develop in an open source environment.

So not only did you have a feature that is mostly useful only to a smallish group of people - you had that group of people not used to open communication in the first place, AND you had rules that made some of the important part of the communication illegal in the first place.

Still wonder why it wasn't widely discussed during development? Intel engineers would basically take people aside in private at conferences talking about what kinds of improvments Oracle was seeing.

Developing code in the open seems like the only way to work for many developers. This episode is a good reminder that not everybody, yet, has really come to understand how the free software development process works.


(Log in to post comments)

Copyright © 2003, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds