Memory: the flat, the discontiguous, and the sparse
The physical memory in a computer system is a precious resource, so a lot of effort has been put into managing it effectively. This task is made more difficult by the complexity of the memory architecture on contemporary systems. There are several layers of abstraction that deal with the details of how physical memory is laid out; one of those is simply called the "memory model". There are three models supported in the kernel, but one of them is on its way out. As a way of understanding this change, this article will take a closer look at the evolution of the kernel's memory models, their current state, and their possible future.
FLATMEM
Back in the beginning of Linux, memory was flat: it was a simple linear sequence with physical addresses starting at zero and ending at several megabytes. Each physical page frame had an entry in the kernel's mem_map array which, at that time, contained a single unsigned short entry for each page that counted the number of references that page had. Soon enough, though, the mem_map entries grew to also include age and dirty counters for the management of swapping. In Linux 1.3.50 the elements of mem_map were finally named struct page.
The flat memory map provided easy and fast conversion between a physical page-frame number (PFN) and its corresponding struct page; it was a simple matter of calculating an array index. There were changes in the layout of struct page to accommodate new usages (the page cache, for example) and to optimize cache performance for the struct page accesses. The memory map remained a flat array that was neat and efficient, but it had a major drawback: it couldn't deal well with large holes in the physical address space. Either the part of the memory map corresponding to a hole would be wasted or, as was done on ARM, the memory map would also have holes.
DISCONTIGMEM
Support for the efficient handling of widely discontiguous physical memory was introduced into the memory-management subsystem in 1999 as a part of the effort to make Linux work well on NUMA machines. This code was dependent on the CONFIG_DISCONTIGMEM configuration option, so the first memory model that had a name was DISCONTIGMEM.
The DISCONTIGMEM model introduced the notion of a memory node, which remains the basis of NUMA memory management. Each node carries an independent (well, mostly) memory-management subsystem with its own free-page lists, in-use page lists, least-recently-used (LRU) information, and usage statistics. Among all these goodies, the node data represented by struct pglist_data (or pg_data_t for short) contains a node-specific memory map. Assuming that each node has contiguous physical memory, having an array of page structures per node solves the problem of large holes in the flat memory map.
But this doesn't come for free. With DISCONTIGMEM, it's necessary to determine which node holds a given page in memory to turn its PFN into the associated struct page, for example. Similarly, one must determine which node's memory map contains a struct page to calculate its PFN. After a long evolution, starting with the mips64 architecture defining the KVADDR_TO_NID(), LOCAL_MAP_BASE(), ADDR_TO_MAPBASE(), and LOCAL_BASE_ADDR() macros for the first time, the conversion of a PFN to the struct page and vice versa came to rely on the relatively simple pfn_to_page() and page_to_pfn() macros defined in include/asm-generic/memory_model.h.
DISCONTIGMEM, however, had a weak point: memory hotplug and hot remove. The actual NUMA node granularity was too coarse for proper hotplug support, and splitting the node would have created a lot of unnecessary fragmentation and overhead. Remember that each node has an independent memory management structure with all the associated costs; splitting nodes further would increase those costs considerably.
SPARSEMEM
This limitation was resolved with the introduction of
SPARSEMEM. This model abstracted the memory map as a collection of
sections of arbitrary size defined by the architectures. Each section,
represented by struct
mem_section, is (as described in the code): "logically, a
pointer to an array of struct pages. However, it is stored with some other
magic
". The array of these sections is a meta-memory map which can
be efficiently chopped at SECTION_SIZE granularity. For efficient
conversion between a PFN and struct page, several high bits of the
PFN are used to index into the sections array. For the other direction, the
section number was encoded in the page flags.
A few months after its introduction into the Linux kernel, SPARSEMEM was extended with SPARSEMEM_EXTREME, which is suitable for systems with an especially sparse physical address space. SPARSEMEM_EXTREME added a second dimension to the sections array and made this array, well, sparse. With SPARSEMEM_EXTREME, the first level became pointers to mem_section structures, and the actual mem_section objects were dynamically allocated based on the actually populated physical memory.
Another enhancement to SPARSEMEM was added in 2007; it was called generic virtual memmap support for SPARSEMEM, or SPARSEMEM_VMEMMAP. The idea behind SPARSEMEM_VMEMMAP is that the entire memory map is mapped into a virtually contiguous area, but only the active sections are backed with physical pages. This model wouldn't work well with 32-bit systems, where the physical memory size might approach or even exceed the virtual address space. However, for 64-bit systems SPARSEMEM_VMEMMAP is a clear win. At the cost of additional page table entries, page_to_pfn(), and pfn_to_page() became as simple as with the flat model.
The last extension of the SPARSEMEM memory model is more recent (2016); it was driven by the increased use of persistent-memory devices. To support storing the memory map directly on those devices rather than in main memory, the virtual memory map can use struct vmem_altmap, which will provide page structures in persistent memory.
Back in 2005, SPARSEMEM was described as a "newer, and
more experimental alternative to 'discontiguous memory'
". The
commit that added SPARSEMEM_VMEMMAP noted that it "has the potential
to allow us to make SPARSEMEM the default (and even the only) option for
most systems
". And indeed, several architectures have switched over
from DISCONTIGMEM to SPARSEMEM. In 2008,
SPARSEMEM_VMEMMAP
became the only supported memory model for x86-64, as it was only
slightly more expensive than FLATMEM but more efficient than
DISCONTIGMEM.
Recent memory-management developments, such as memory hotplug,
persistent-memory support, and various performance optimizations, all target
the SPARSEMEM model. But the older models still exist, which comes with the
cost of numerous #ifdef blocks in the architecture and
memory-management code, and a peculiar maze of related configuration
options. There is an ongoing work to completely switch the remaining users
of DISCONTIGMEM to SPARSEMEM, but making the change for
such architectures as ia64 and mips64 won't be an easy task. And the ARC
architecture's
use of DISCONTIGMEM to represent a "high memory" area that
resides below the "normal" memory will definitely be challenging to change.
| Index entries for this article | |
|---|---|
| Kernel | Memory management/Physical memory model |
| GuestArticles | Rapoport, Mike |
Posted May 28, 2019 6:18 UTC (Tue)
by jpfrancois (subscriber, #65948)
[Link]
Posted May 28, 2019 7:48 UTC (Tue)
by bgoglin (subscriber, #7800)
[Link] (1 responses)
Does anybody really still use ia64? except maybe HP-UX servers?
Posted May 28, 2019 12:22 UTC (Tue)
by ncm (guest, #165)
[Link]
One can imagine HP-UX users migrating to Itanic emulations running Linux with an HP-UX compatibility layer, hosted on Linux amd64.
It is the norm, nowadays, for enterprise software to be running on several layers of emulation, yet still running faster than on the original hardware. Emulators tend to be, themselves, the least portable of code, so they are, ironically, the first candidates to be emulated instead of ported.
There are only two outcomes from any migration or upgrade of legacy code: it still works, or it doesn't work. So, people responsible for legacy systems become risk-averse. Emulation is less risky than porting, even with a wobbly tower of emulations. Emulating big-endian platforms on little-endian hardware is surprisingly practical.
Posted May 28, 2019 15:25 UTC (Tue)
by scientes (guest, #83068)
[Link]
Posted May 29, 2019 16:38 UTC (Wed)
by pj (subscriber, #4506)
[Link] (1 responses)
Posted May 30, 2019 5:02 UTC (Thu)
by rppt (subscriber, #125478)
[Link]
Posted May 21, 2021 13:43 UTC (Fri)
by pc (guest, #152335)
[Link]
A section is represented with struct mem_section that contains `section_mem_map` that is, logically, a pointer to an array of struct pages.
Memory: the flat, the discontiguous, and the sparse
Memory: the flat, the discontiguous, and the sparse
Memory: the flat, the discontiguous, and the sparse
Memory: the flat, the discontiguous, and the sparse
Memory: the flat, the discontiguous, and the sparse
...in what year? The year of introduction is notes for the other subheaded methods, bit not for SPARSEMEM.
Memory: the flat, the discontiguous, and the sparse
Memory: the flat, the discontiguous, and the sparse
