Measuring memory fragmentation

By Jonathan Corbet
May 28, 2024

In the final session in the memory-management track of the 2024 Linux Storage, Filesystem, Memory-Management and BPF Summit, the exhausted group of developers looked one more time at the use of huge pages and the associated problem of memory fragmentation. At its worst, this problem can make huge pages harder (and more expensive) to allocate. Luis Chamberlain, who ran the session, felt that people were worried about this problem, but that there was little data on how severe it truly is.

Transparent huge pages, he said, never reached wide adoption, partly as the result of fragmentation fears. But now, the kernel supports large folios, and transparent huge pages are "the stone age". Large folios are being used in a number of places, and multi-size transparent huge pages (mTHPs) are on the rise as well — and "the world hasn't ended". Still, worries abound, so he wondered how the fragmentation problem could actually be measured.

The discussion immediately wandered. David Hildenbrand said that there are people who have been looking into allocation failures and running into the fragmentation problem. SeongJae Park pointed out that, long ago, Mel Gorman had proposed a fragmentation index that was since merged as a debugfs feature, and that some of Gorman's team are using it. Michal Hocko said that it is a question of proactive or reactive responses; at what level should people care about fragmentation? Hildenbrand said that, currently, most allocations will fall back to a base page if larger pages are not available; in the future, if users need the larger allocations, that fallback will no longer be an option. There will be a need to measure the availability of specific allocation sizes to understand the fragmentation problem, he said.

In response to a question from Hocko on the objective for this measurement, Chamberlain said that he wanted to know if the introduction of large block sizes was making fragmentation worse. And, if the fragmentation problem is eventually solved, how do we measure it? Hocko suggested relying on the pressure-stall information provided by the kernel; it is measuring the amount of work that is needed to successfully allocate memory. But he conceded that it is "a ballpark measure" of the problem.

Yu Zhao said that kernel developers cannot improve what they cannot measure; Paul McKenney answered that they can always improve things accidentally. That led Zhao to rephrase his point: fragmentation, he said, is a form of entropy, which is typically measured by temperature. But fragmentation is a two-dimensional problem that cannot be described by a single number. Any proper description of fragmentation, he said, will need to be multidimensional. Jan Kara said that a useful measurement would be the amount of effort that is going into memory compaction, but Zhao repeated that a single number will never suffice.

John Hubbard disagreed, saying that it should be possible to come up with a single number quantifying fragmentation; Zhao asked how that number would be interpreted. Hocko said that there is an important detail that would be lost in a single-number measurement: the view of fragmentation depends on a specific allocation request. Movable allocations are different from GFP_KERNEL allocations, for example. He said that, in any case, a precise number is not needed; he repeated that the pressure-stall information shows how much nonproductive time is being put into memory allocations, and thus provides a good measurement of how well things are going.

As the session wound down, Chamberlain tried to summarize the results, which he described as being "not a strong argument" for any given measure. Zhao raised a specific scenario: an Android system running three apps, one in the foreground and two in the background. There is a single number describing fragmentation, and allocations are failing; what should be done? Possible responses include memory compaction, reclaim, or summoning the out-of-memory (OOM) killer; how will this number help to make this decision? Chamberlain said that he is focused on the measurement, not the reactions at this point. Zhao went on for a while about how multi-dimensional measurements are needed to address this problem before Hocko said that the topic could be discussed forever without making much progress; the session then came to a close.

Index entries for this article
Kernel	Memory management/Huge pages
Conference	Storage, Filesystem, Memory-Management and BPF Summit/2024

Measuring memory fragmentation

Posted Jun 2, 2024 15:55 UTC (Sun) by atnot (subscriber, #124910) [Link] (3 responses)

Every time this topic comes up, I personally always wonder how much the liberal use of linked lists in the kernel contributes to fragmentation. I wonder if the allocation tracking code could be repurposed to answer such a thing.

Measuring memory fragmentation

Posted Jun 4, 2024 19:59 UTC (Tue) by willy (subscriber, #9762) [Link] (2 responses)

For the kind of fragmentation being discussed in this session, not at all. This is page allocation; can you get an order-4 page when you ask for it, or are only order-0 pages available?

It might help a little if we got rid of the LRU list and scanned memory in physical order. But that's a supposition that would need data.

What definitely does help is increasing the number of allocations that use higher orders. Because then you can find something to reclaim that will satisfy your need.

Linked lists are evil and must die, but that is not relevant to this discussion.

Measuring memory fragmentation

Posted Jun 4, 2024 23:15 UTC (Tue) by atnot (subscriber, #124910) [Link] (1 responses)

That makes sense. Phrased as a question, I was more just wondering what the general causes for so many small allocations were? Or whether that's too hard to know. Probably should have just asked that.

Measuring memory fragmentation

Posted Jun 4, 2024 23:56 UTC (Tue) by willy (subscriber, #9762) [Link]

We have a lot of places which use the page allocator directly. We can use tools/mm/page-types to show what kind of allocations are preventing high order allocations.

It's probably slab pages and page tables getting intermingled with page cache / anon allocations. But that's just my opinion.

Worth noting, perhaps, that the Oppo measurements were being done with large anon folios but small FS folios because they were using one of the FS that haven't yet been converted to large folios.

Bucketlist..

Posted Jun 12, 2024 20:04 UTC (Wed) by kazer (subscriber, #134462) [Link]

I might want to see what are the requested sizes and satisfied sizes for allocations. If there are requests for large allocations (and how large) in relation to actually using smaller allocations (and how small) that might give indication of fragmentation and how bad the fragmentation is. Also statistics of what is actually asked for. So "buckets" for different sized allocations and counters.
But what do I know..

In the context of this discussion, a single number can't work, period.

Posted Jun 18, 2024 21:20 UTC (Tue) by yuzhao@google.com (guest, #132005) [Link]

A counterexample I gave:

https://lore.kernel.org/CAOUHufZ9MLiDDNtNbOdT1cNnJ7gAnC1H...