|| ||Andi Kleen <ak-AT-suse.de>|
|| ||[discuss] 4level page tables merged into mainline|
|| ||Tue, 4 Jan 2005 03:05:57 +0100|
The current bitkeeper Linux 2.6 mainline has 4level page tables merged.
It ended up with a cooperation between me and Nick Piggin, who
did some changes.
If you want to test apply
(or the latest bk* patch at the time of your download)
on top of a 2.6.10 tree.
A bit of background:
The x86-64 architecture always uses 4 level page tables. Previously
Linux only supported only 3level of page tables in the generic VM.
This means the kernel could address all physical memory, but a single
process was limited to 39bits of virtual address space (=512GB).
The whole machine could have used more memory.
The rest of the 48 bit address space was used for kernel mappings.
With the 4level patch kit this limitation is gone and user processes
can use much more virtual memory. This is currently mainly useful
to mmap very large files, since existing x86-64 machines are limited
to much less than 512GB memory right now.
The new layout gives 47bit (=128TB) virtual address space to each process.
The kernel uses 46bits for the physical memory, which limits the physical
memory you can plug into a machine to 64TB. Current x86-64 CPUs have
much lower physical memory limits than that (40bits for AMD or 36bits
for Intel). This means it is right now a theoretical consideration.
The x86-64 architecture would in theory support 52bits of physical
address space before the page tables would need to be enlarged.
x86-64 is currently the only port that uses 4level page tables.
The user address space layout has changed slightly, the main process stack
has moved up and mmaps and shared libraries will start at a much higher
address. This is visible to application programs. Since no old software
should have relied on these addresses there are no problems expected by
this change. So far there are no compatibility problems known.
If you notice any strange problems with 64bit programs that only started
with this patch please report them to the list.
The new layout causes most processes to generate more page tables:
in particular each process has its own fourth level and the stack
and the shared libraries are more spread out. This will cost some
One current drawback is that fork and exit (as measured in lmbench) are
somewhat slower currently. This is because the new layout generates
more page tables by default and it takes longer to walk them
on fork or exit. Nick also did some changes to page table freeing
which seem to have a negative performance impact.
There are plans to improve the performance of these operations again,
the slowdown will hopefully only be temporary.
Here is the updated Documentation/x86-64/mm.txt describing the new
kernel VM layout:
Virtual memory map with 4 level page tables:
0000000000000000 - 00007fffffffffff (=47bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40bits) guard hole
ffff810000000000 - ffffc0ffffffffff (=46bits) direct mapping of phys. memory
ffffc10000000000 - ffffc1ffffffffff (=40bits) hole
ffffc20000000000 - ffffe1ffffffffff (=45bits) vmalloc/ioremap space
... unused hole ...
ffffffff80000000 - ffffffff82800000 (=40MB) kernel text mapping, from phys 0
... unused hole ...
ffffffff88000000 - fffffffffff00000 (=1919MB) module mapping space
vmalloc space is lazily synchronized into the different PML4 pages of
the processes using the page fault handler, with init_level4_pgt as
Current X86-64 implementations only support 40 bit of address space,
but we support upto 46bits. This expands into MBZ space in the page tables.
to post comments)