This article serves mostly as background to help understand why the kernel
developers are considering making fundamental virtual memory changes at
this point in the development cycle. It can probably be skipped by readers
who understand how high and low memory work on 32-bit systems.
A 32-bit processor can address a maximum of 4GB of memory. One could, in
theory, extend the instruction set to allow for larger pointers, but, in
practice, nobody does that; the effects on performance and compatibility
would be too strong. So the limitation remains: no process on a 32-bit
system can have an address space larger than 4GB, and the kernel cannot
directly address more than 4GB.
In fact, the limitations are more severe than that. Linux kernels split
the 4GB address space between user processes and the kernel; under the most common
configuration, the first 3GB of the 32-bit range are given over to user
space, and the kernel gets the final 1GB starting at 0xc0000000.
Sharing the address space gives a number of performance benefits; in
particular, the hardware's address translation buffer can be shared between
the kernel and user space.
If the kernel wishes to be able to access the system's physical memory
directly, however, it must set up page tables which map that memory into
the kernel's part of the address space. With the default 3GB/1GB mapping,
the amount of physical memory which can be addressed in this way is
somewhat less than 1GB - part of the kernel's space must be set aside for
the kernel itself, for memory allocated with vmalloc(), and
various other purposes. That is why, until a few years ago, Linux could
not even fully handle 1GB of memory on 32-bit systems. In fact, back in
1999, Linus decreed
that 32-bit Linux would never, ever support more than 2GB of memory.
"This is not negotiable."
Linus's views notwithstanding, the rest of the world continued on with the
strange notion that 32-bit
systems should be able to support massive amounts of memory. The processor
vendors added paging modes which could use physical addresses which exceed
32 bits in length, thus ending the 4GB limit for physical memory. The
internal addressing limitations in the Linux kernel remained, however.
Happily for users of large systems, Linus can acknowledge an error and
change his mind; he did eventually allow large memory support into the 2.3
kernel. That support came with its own costs and limitations, however.
On 32-bit systems, memory is now divided into "high" and "low" memory. Low
memory continues to be mapped directly into the kernel's address space, and
is thus always reachable via a kernel-space pointer. High memory, instead,
has no direct kernel mapping. When the kernel needs to work with a page in
high memory, it must explicitly set up a special page table to map it into
the kernel's address space first. This operation can be expensive, and
there are limits on the number of high-memory pages which can be mapped at
any particular time.
For the most part, the kernel's own data structures must live in low
memory. Memory which is not permanently mapped cannot appear in linked
lists (because its virtual address is transient and variable), and the
performance costs of mapping and unmapping kernel memory are too high.
High memory is useful for process pages and some kernel tasks (I/O buffers,
for example), but the core of the kernel stays in low memory.
Some 32-bit processors can now address 64GB of physical memory, but the
Linux kernel is still not able to deal effectively with that much; the
current limit is around 8GB to 16GB, depending on the load. The problem
now is that larger systems simply run out of low memory. As the system
gets larger, it requires more kernel data structures to manage, and
eventually room for those structures can run out. On a very large system,
the system memory map (an array of struct page structures which
represents physical memory) alone can occupy half of the available low
There are users out there wanting to scale 32-bit Linux systems up to 32GB
or more of main memory, so the enterprise-oriented Linux distributors have
been scrambling to make that possible. One approach is the 4G/4G patch written by Ingo
Molnar. This patch separates the kernel and user address spaces, allowing
user processes to have 4GB of virtual memory while simultaneously expanding
the kernel's low memory to 4GB. There is a cost, however: the translation
buffer is no longer shared and must be flushed for every transition between
kernel and user space. Estimates of the magnitude of the performance hit
vary greatly, but numbers as high as 30% have been thrown around. This
option makes some systems work, however, so Red Hat ships a 4G/4G kernel
with its enterprise offerings.
The 4G/4G patch extends the capabilities of the Linux kernel, but it
remains unpopular. It is widely seen as an ugly solution, and nobody likes
the performance cost. So there are efforts afoot to extend the scalability
of the Linux kernel via other means. Some of these efforts will likely go
forward - in 2.6, even - but the kernel developers seem increasingly unwilling to distort
the kernel's memory management systems to meet the needs of a small number
of users who are trying to stretch 32-bit systems far beyond where they
should go. There will come a time where they will all answer as Linus did
back in 1999: go get a 64-bit system.
to post comments)