Measuring memory fragmentation
Transparent huge pages, he said, never reached wide adoption, partly as the result of fragmentation fears. But now, the kernel supports large folios, and transparent huge pages are "the stone age". Large folios are being used in a number of places, and multi-size transparent huge pages (mTHPs) are on the rise as well — and "the world hasn't ended". Still, worries abound, so he wondered how the fragmentation problem could actually be measured.
The discussion immediately wandered. David Hildenbrand said that there are
people who have been looking into allocation failures and running into the
fragmentation problem. SeongJae Park pointed out that, long ago, Mel
Gorman had proposed
a fragmentation index that was since merged as a debugfs feature, and
that some of Gorman's team are using it. Michal Hocko said that it is a
question of proactive or reactive responses; at what level should people
care about fragmentation? Hildenbrand said that, currently, most
allocations will fall back to a base page if larger pages are not
available; in the future, if users need the larger allocations, that
fallback will no longer be an option. There will be a need to measure the
availability of specific allocation sizes to understand the fragmentation
problem, he said.
In response to a question from Hocko on the objective for this measurement, Chamberlain said that he wanted to know if the introduction of large block sizes was making fragmentation worse. And, if the fragmentation problem is eventually solved, how do we measure it? Hocko suggested relying on the pressure-stall information provided by the kernel; it is measuring the amount of work that is needed to successfully allocate memory. But he conceded that it is "a ballpark measure" of the problem.
Yu Zhao said that kernel developers cannot improve what they cannot measure; Paul McKenney answered that they can always improve things accidentally. That led Zhao to rephrase his point: fragmentation, he said, is a form of entropy, which is typically measured by temperature. But fragmentation is a two-dimensional problem that cannot be described by a single number. Any proper description of fragmentation, he said, will need to be multidimensional. Jan Kara said that a useful measurement would be the amount of effort that is going into memory compaction, but Zhao repeated that a single number will never suffice.
John Hubbard disagreed, saying that it should be possible to come up with a single number quantifying fragmentation; Zhao asked how that number would be interpreted. Hocko said that there is an important detail that would be lost in a single-number measurement: the view of fragmentation depends on a specific allocation request. Movable allocations are different from GFP_KERNEL allocations, for example. He said that, in any case, a precise number is not needed; he repeated that the pressure-stall information shows how much nonproductive time is being put into memory allocations, and thus provides a good measurement of how well things are going.
As the session wound down, Chamberlain tried to summarize the results,
which he described as being "not a strong argument" for any given
measure. Zhao raised a specific scenario: an Android system running three
apps, one in the foreground and two in the background. There is a single
number describing fragmentation, and allocations are failing; what should
be done? Possible responses include memory compaction, reclaim, or
summoning the out-of-memory (OOM) killer; how will this number help to make
this decision? Chamberlain said that he is focused on the measurement, not
the reactions at this point. Zhao went on for a while about how
multi-dimensional measurements are needed to address this problem before
Hocko said that the topic could be discussed forever without making much
progress; the session then came to a close.
Index entries for this article | |
---|---|
Kernel | Memory management/Huge pages |
Conference | Storage, Filesystem, Memory-Management and BPF Summit/2024 |
Posted Jun 2, 2024 15:55 UTC (Sun)
by atnot (subscriber, #124910)
[Link] (3 responses)
Posted Jun 4, 2024 19:59 UTC (Tue)
by willy (subscriber, #9762)
[Link] (2 responses)
It might help a little if we got rid of the LRU list and scanned memory in physical order. But that's a supposition that would need data.
What definitely does help is increasing the number of allocations that use higher orders. Because then you can find something to reclaim that will satisfy your need.
Linked lists are evil and must die, but that is not relevant to this discussion.
Posted Jun 4, 2024 23:15 UTC (Tue)
by atnot (subscriber, #124910)
[Link] (1 responses)
Posted Jun 4, 2024 23:56 UTC (Tue)
by willy (subscriber, #9762)
[Link]
It's probably slab pages and page tables getting intermingled with page cache / anon allocations. But that's just my opinion.
Worth noting, perhaps, that the Oppo measurements were being done with large anon folios but small FS folios because they were using one of the FS that haven't yet been converted to large folios.
Posted Jun 12, 2024 20:04 UTC (Wed)
by kazer (subscriber, #134462)
[Link]
Posted Jun 18, 2024 21:20 UTC (Tue)
by yuzhao@google.com (guest, #132005)
[Link]
Measuring memory fragmentation
Measuring memory fragmentation
Measuring memory fragmentation
Measuring memory fragmentation
Bucketlist..
But what do I know..
In the context of this discussion, a single number can't work, period.