The end of the hugetlb system calls
[Posted January 1, 2003 by corbet]
The hugetlb (or "large page") patch was covered here
last August. This patch added a
couple of new system calls allowing a suitably privileged process to create
anonymous memory using the large page capability of most modern
processors. Using large pages cuts down on page table overhead, and,
crucially, optimizes the use of the processor's address translation cache.
The result is that applications using large memory arrays (Oracle, in
particular) run faster.
The large page capability is seen as useful by most developers, but there
has been a long series of complaints about the system call interface. The
system calls do pretty much what one would expect: allocate a large page
region, free it, share it with others. But not everybody sees the need for
a new set of system calls for performing what is (mostly) standard memory
operations. Then, there is the issue of permissions. The ability to
allocate huge pages can not be handed out to just anybody, since it is a
good vehicle for the creation of denial of service attacks. That means
that root access is required to make use of the large page capability.
Call them superstitious, but many users are reluctant to run Oracle with
root access.
Meanwhile, William Lee Irwin added hugetlbfs - a RAM-based filesystem which
uses large pages. An application wishing to create a memory region with
large pages can create a file in a hugetlbfs directory, then use
mmap() to map it into its address space. Sharing is nicely
handled by the filesystem itself, and need no longer be done with a
separate system call. And the permissions problem is solved by allowing a
system administrator to set protections on the hugetlbfs filesystem which
fit the site's needs. The filesystem interface provides a more flexible
interface to the large page facility. So, as of 2.5.54, the system call
interface will be removed.
All this could lead one to wonder why the hugetlb patch wasn't done this
way in the first place. The whole point of the kernel peer review process,
after all, is to keep poor interfaces out of the kernel. Linus's answer to this is simple: the patch simply was
not much discussed prior to merging because the companies behind it are
still unused to open code development. In fact, some companies have rules
which forbid the sorts of conversations needed to develop in an open source
environment.
So not only did you have a feature that is mostly useful only to a
smallish group of people - you had that group of people not used to
open communication in the first place, AND you had rules that made
some of the important part of the communication illegal in the
first place.
Still wonder why it wasn't widely discussed during development?
Intel engineers would basically take people aside in private at
conferences talking about what kinds of improvments Oracle was
seeing.
Developing code in the open seems like the only way to work for many
developers. This episode is a good reminder that not everybody, yet, has
really come to understand how the free software development process works.
(
Log in to post comments)