Clarifying memory management with page folios
At the lowest level, pages are a concept implemented by the hardware; the tracking of memory and whether it is present in RAM or not is done at page granularity. Any given CPU architecture may offer a limited selection of page sizes, but one "base" page size must be chosen, and the most common choice remains 4,096 bytes — the same as it was when the first Linux kernels were released 30 years ago.
The kernel, though, often has reason to work with memory in larger chunks. One example is the management of "huge pages" which, once again, are implemented by the hardware. The x86 architecture, for example, can work with 2MB huge pages, and there are performance advantages to using them where they are applicable. The kernel will also allocate groups of pages in other sizes, though, typically for DMA buffers or other uses where a set of physically contiguous pages is needed. This sort of grouping of pages is known as a "compound page" in the kernel.
Every base page of memory managed by the kernel is represented by a page structure in the system memory map. When a compound page is created out of a set of base pages, the page structure for the first page in the set (the "head page") is specially marked to make its compound nature explicit. The other information in that structure refers to the compound page as a whole. All of the other pages (the "tail pages") are marked as such, with a pointer to the page structure for the associated head page. See this article for details on how compound pages are organized.
This mechanism makes it easy to go from the page structure of a tail page to the head page for the compound page. Many interfaces within the kernel make use of that feature, but it creates a fundamental ambiguity: if a function is passed a pointer to a page structure for a tail page, is it expected to act on that tail page or on the compound page as a whole? Or, as Wilcox put it in the first posting of the folio series in December:
A function which has a struct page argument might be expecting a head or base page and will BUG if given a tail page. It might work with any kind of page and operate on PAGE_SIZE bytes. It might work with any kind of page and operate on page_size() bytes if given a head page but PAGE_SIZE bytes if given a base or tail page. It might operate on page_size() bytes if passed a head or tail page. We have examples of all of these today.
(PAGE_SIZE is the size of a base page, while page_size() returns the full size of a — possibly compound — page.) There does not seem to be an extensive history of bugs resulting from this particular API, but an interface that is this poorly defined seems likely to encourage problems sooner or later.
In an attempt to clarify the situation, Wilcox has come up with the concept of a "page folio", which is really just a page structure that is guaranteed not to be a tail page. Any function accepting a folio will operate on the full compound page (if, indeed, it is a compound page) with no ambiguity. The result is greater clarity in the kernel's memory-management subsystem; as functions are converted to take folios as arguments, it will become clear that they are not meant to operate on tail pages.
When Wilcox first posted this patch series, though, he emphasized a different benefit from the change. Any function that might be passed a tail page, but which must operate on the full compound page containing that tail page, must exchange any pointers to tail-page page structures for pointers to the head page instead. That is typically done with a call to:
struct page *compound_head(struct page *page);
This function is relatively cheap, but it may be called many times over the course of a single operation on a page. That makes the kernel bigger (since it's an inline function) and slows things down. A function that accepts a folio, instead, knows that it is not dealing with a tail page; thus it need not call compound_head(). That saves both time and memory.
The folio type itself is defined as a simple wrapper structure:
struct folio { struct page page; };
From there, a new set of infrastructure is built up. For example, get_folio() and put_folio() will manage references to the folio much like get_page() and put_page(), but without the unneeded calls to compound_head(). A whole set of higher-level functions follows from there. Much of the real work, though, will be in converting various kernel subsystems to use the new type; Wilcox didn't sugarcoat the nature of that task:
This is going to be a ton of work, and massively disruptive. It'll touch every filesystem, and a good few device drivers! But I think it's worth it.
By the time the fourth version of this patch set was posted on March 5, the core patches and the conversions (which Wilcox didn't post) added up to about 100 commits, which is a fair amount to review.
Perhaps as a result of the size of the patch series, the previous postings did not elicit that much discussion. In response to the latest one, though, Andrew Morton took a look and was worried by what he saw:
Geeze it's a lot of noise. More things to remember and we'll forever have a mismash of `page' and `folio' and code everywhere converting from one to the other. Ongoing addition of folio accessors/manipulators to overlay the existing page accessors/manipulators, etc.It's unclear to me that it's all really worth it.
Hugh Dickins, too, expressed
a lack of enthusiasm for this work. On the other hand, Kirill
Shutemov and Michal Hocko
both expressed support for it, in concept at least. Dave Chinner said
that "this abstraction is absolutely necessary
" for filesystem
developers, especially if and when the page cache gains the ability to
manage compound pages of multiple sizes.
So, in other words, there is currently no consensus among the core
developers regarding whether this work improves the kernel or not. That
may change over time as more people look at it and its advantages (or the
lack thereof) become more clear. But change tends to happen slowly in the
memory-management subsystem in general, even when the patch set in question
is not so large and messy. One should also bear in mind that there is an
inevitable discussion on naming to be had; it is already clear that "folio"
is not popular, though alternatives are currently thin on the ground. One
conclusion is thus clear: the
kernel may well get folios or something like them, but it seems unlikely to
happen soon.
Index entries for this article | |
---|---|
Kernel | Memory management/Folios |
Posted Mar 18, 2021 17:24 UTC (Thu)
by logang (subscriber, #127618)
[Link] (7 responses)
In any case, if it gets in as named, it's only a matter of time before we can start describing compound pages as foliolate (having compound leaves) and someone is sure to come up with a case for a 'struct portfolio'. ;-)
Posted Mar 18, 2021 20:14 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link] (6 responses)
Posted Mar 19, 2021 3:17 UTC (Fri)
by willy (subscriber, #9762)
[Link] (5 responses)
https://lore.kernel.org/linux-fsdevel/20201113174409.GH17...
Criteria: Must be easily greppable (book is bad), must be short, shouldn't be too cutesy (banqyet by analogy with byte was not under consideration).
Online thesauri are your friends, but at the end of the day it's always a matter of taste.
Posted Mar 19, 2021 4:38 UTC (Fri)
by jonas.bonn (subscriber, #47561)
[Link]
Normally, pages are created by folding a 'sheet'... so there you go!
https://en.wikipedia.org/wiki/Paper_size#/media/File:A_si...
Posted Mar 19, 2021 9:32 UTC (Fri)
by geert (subscriber, #98403)
[Link] (3 responses)
BTW, "aigle" is not known by "dict", nor by my paper dictionary.
Posted Mar 19, 2021 11:10 UTC (Fri)
by willy (subscriber, #9762)
[Link] (2 responses)
https://en.wikipedia.org/wiki/Units_of_paper_quantity is also a good source of names.
Honestly, I'm 120 patches in at this point. Someone's going to have to be really convincing to have a better name than folio.
Posted Mar 19, 2021 11:17 UTC (Fri)
by geert (subscriber, #98403)
[Link]
Posted Apr 2, 2021 11:13 UTC (Fri)
by Hi-Angel (guest, #110915)
[Link]
A little trick: doing a rename over all of the 120 patches might be done in just under a minute ;) What I'd do here is:
```
Read `sp` as `sed`.
For the sake of completeness: sp is my alias to sed_perl, which in turn is a wrapper over perl to replace text in files https://github.com/Hi-Angel/dotfiles/blob/140c78951502754... I was at some point annoyed by discrepancies in behavior between grep, sed, awk, and what not, and migrated to using perl + ack (a perl version of grep). Never looked back.
So… hopefully this will help.
Posted Mar 18, 2021 21:47 UTC (Thu)
by unixbhaskar (guest, #44758)
[Link]
"One should also bear in mind that there is an inevitable discussion on naming to be had; it is already clear that "folio" is not popular, though alternatives are currently thin on the ground. One conclusion is thus clear: the kernel may well get folios or something like them, but it seems unlikely to happen soon."
Matthew and Jon, how about a simple name(well, kernel is a bloody complex thing, it doesn't mean, it has to have a complex or artistic name,does it?) like "page_access" ? (I am sure that I missed certain things, a plethora of kernel API/ABI should have checked before preaching ...:) , which I haven't done so. But ......
Stop fretting at my naivety .... :) ..please...
Posted Mar 19, 2021 3:21 UTC (Fri)
by guillemj (subscriber, #49706)
[Link]
Posted Mar 19, 2021 4:10 UTC (Fri)
by willy (subscriber, #9762)
[Link] (3 responses)
https://git.infradead.org/users/willy/pagecache.git/short...
I'll do the changelog / cover letter / ... in the morning.
BTW, I do want to emphasize that real workloads see a performance improvement. With the previous work, based on using Transparent Huge Pages, we saw a 7% performance improvement on kernel compiles, and that was with a very naive untuned algorithm for scaling up the THP size.
Posted Mar 19, 2021 9:12 UTC (Fri)
by wahern (subscriber, #37304)
[Link] (1 responses)
Posted Mar 19, 2021 11:29 UTC (Fri)
by willy (subscriber, #9762)
[Link]
If you have a folio and want the n'th page, that's nth_page(&folio->page, n). Nobody's needed that one yet (and only people with really weird physical memory layouts need to do that ... alloc_folio() won't return a folio that you need to do that to. Others are working on maybe disallowing those from existing entirely, in which case (&folio->page + n) will do fine.
The performance improvements do not come from a small subset of the changes. You have to make the entire filesystem safe to handle memory in folios (no more references to, eg, PAGE_SIZE, unless you can prove they're safe, calls to kmap() have to be scrutinised. copy_(to|from)_iter() calls need care and attention, etc, etc). Once the filesystem declares itself safe by setting a bit in the fs_flags then the page cache can start handing it folios instead of pages.
I think what you're suggesting is essentially what I did here:
I've given up on that approach because it's hard to find all the bugs. "Oh this interface takes a struct page. Does it take any struct page, or do I need to call it once for each tail page in the compound page?" I invite you to consider the various implementations of flush_dcache_page() ... and if you can figure out the answer, please let me know.
Posted Mar 25, 2021 23:00 UTC (Thu)
by flussence (guest, #85566)
[Link]
I vaguely remember getting excited over the original THP patchset because I'd measured a consistent 3-4% improvement in memory-heavy workloads…
Posted Mar 19, 2021 14:08 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (8 responses)
Posted Mar 19, 2021 16:57 UTC (Fri)
by willy (subscriber, #9762)
[Link] (5 responses)
Posted Mar 19, 2021 17:01 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (4 responses)
Posted Mar 19, 2021 17:02 UTC (Fri)
by willy (subscriber, #9762)
[Link] (3 responses)
Posted Mar 19, 2021 17:16 UTC (Fri)
by clugstj (subscriber, #4020)
[Link] (2 responses)
Posted Mar 22, 2021 23:48 UTC (Mon)
by milesrout (subscriber, #126894)
[Link] (1 responses)
Posted Sep 16, 2021 8:10 UTC (Thu)
by ncm (guest, #165)
[Link]
How long that name needs to be depends on the scope of the names.
In C, lacking any mechanism for namespacing, a practical name *often* must be unpleasantly long. That is a fault of the language, not (usually) of the person choosing the name; although some people confuse names with specifications, and so invent stupidly long names.
"Compound_page" is not, in any universe, stupidly long for a C struct tag.
Posted Mar 21, 2021 20:38 UTC (Sun)
by kiryl (subscriber, #41516)
[Link] (1 responses)
Posted Jun 15, 2021 12:23 UTC (Tue)
by DavideRepetto (guest, #152795)
[Link]
Posted Aug 3, 2021 8:40 UTC (Tue)
by taladar (subscriber, #68407)
[Link] (4 responses)
Posted Sep 12, 2021 9:37 UTC (Sun)
by deepfire (guest, #26138)
[Link] (1 responses)
..and it's surprising how much resistance does this encounter, given the improvements.
Posted Mar 26, 2025 15:42 UTC (Wed)
by fest3er (guest, #60379)
[Link]
'Bucket-o-bytes', which shortens to 'bob'.
And Bob's yer uncle.
OK. Maybe not.
Posted Oct 16, 2021 9:05 UTC (Sat)
by hidave (guest, #18406)
[Link] (1 responses)
Posted Oct 16, 2021 10:24 UTC (Sat)
by mpr22 (subscriber, #60784)
[Link]
If anything comes to mind at all, it's most likely to be the Spanish form of the Hebrew name יוֹחָנָן (Yôḥānān), equivalent to English John, German Johann, Russian Иван, French Jean, etc.
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
git format-patch -120 --stdout > 1.patch
sp folio my_better_name
git am -3 1.patch
```
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
https://git.infradead.org/users/willy/pagecache.git/short...
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
Clarifying memory management with page folios
https://en.wikipedia.org/wiki/Juan
Clarifying memory management with page folios