The trouble with MAX_ORDER
Like everything else in the earliest releases of the Linux kernel, memory management was simple; it was mostly concerned with the tracking and management of individual pages. Around 1994, though, as memory sizes increased and the kernel became more complex, it became clear that the ability to deal with pages in larger chunks would be useful. The 1.1.0 release in April 1994 included the beginnings of what was to become the kernel's "buddy allocator", which tracks and allocates pages in blocks sized in powers of two.
With this change came the concept of an "order", which is just the base-two logarithm of the number of pages in a block. An order-0 block is a single page, an order-1 block holds two pages, and so on. Blocks of different orders were kept in separate lists in 1.1.0, which included a macro (NR_MEM_LISTS) describing how many of those lists there were. It was given a value of six initially, meaning that the largest block that could be managed was of order five — 32 pages. The 2.1.23 release (January 1997) saw a special case added for the AP1000 CPU, which needed to be able to allocate 8MB (order 11) chunks of memory; NR_MEM_LISTS was set to 12 if the kernel was being built for that architecture.
In 2.3.27 (November 1999), NR_MEM_LISTS got a new name — MAX_ORDER — and was set to ten for all but the AP1000. Support for AP1000 was eventually removed in 2.3.42 in February 2000, but the practice of adjusting MAX_ORDER for specific architectures has only grown more prevalent. The default value for MAX_ORDER can now vary widely between architectures; it defaults to 11 on most systems, though. As a result, on such systems, pages can be allocated in blocks of order zero through ten.
One interesting aspect of MAX_ORDER, as described so far, is that it is not actually the maximum order; if MAX_ORDER is 11, then the maximum order that can be allocated is ten. In March 2023, Kirill Shutemov decided to address this inconsistency after noticing some bugs in the code resulting from a misunderstanding of MAX_ORDER; the result was this patch set that fixed such bugs in nine different places, including a longstanding bug in the floppy driver.
The work did not end there, though; at the end of this series, Shutemov redefined MAX_ORDER to mean what its name suggests: the maximum order an allocation can be. As a result, MAX_ORDER often defaults to ten now, with the result that the kernel (still) maintains 11 lists of free blocks. At the time, the reception for the change was mostly positive, though Mel Gorman did worry that this change could cause problems for stable backports. David Laight suggested that changing the name of MAX_ORDER would help prevent such problems, but that suggestion was ignored and Shutemov's patch series was merged for the 6.4 release in June.
At the time, it seemed that the problem was solved. At the end of September, though, Paolo Bonzini pointed out three changes that, having been developed in parallel with the MAX_ORDER change, were still using the older meaning. The changes were correct when written, but had been broken by the MAX_ORDER change but nobody had noticed. Nearly two months later, Mike Snitzer sent a pull request containing a fix for one of those bugs, introduced by a change to the dm-crypt target merged for 6.5. It turns out that out-of-tree code, like backports, can be silently broken by this change — a concern that nobody had raised when the MAX_ORDER change was being considered.
That bug caused Linus Torvalds to question whether the change should have been made at all. He suggested that he might revert it in the absence of a compelling reason to keep it around; he later added:
Bah. I looked at the commits leading up to it, and I really think that "if that's the fallout from 24 years of history, then it wasn't a huge deal".Compared to the inevitable problems that the semantic change will cause for backports (and already caused for this merge window), I still think this was all likely a mistake.
He also said that bugs related to MAX_ORDER tend not to be found by testing because it is rare to allocate blocks of the maximum size.
Shutemov answered
that the bugs he had fixed in that series were only those that made it into
the mainline and had never been found; "I personally spend some time
debugging due MAX_ORDER-1 thing
".
The real mistake, Shutemov continued, was (as Laight had pointed out) in
preserving the MAX_ORDER name while changing its semantics:
"Just renaming it to MAX_PAGE_ORDER or something would give a heads up
to not-yet-upstream code
". He followed up with another
patch adding a new symbol, NR_ORDERS, defined as
MAX_ORDER+1. Torvalds agreed
with that change, but insisted that the MAX_ORDER name needed to
change in order to avoid future problems.
At the end of December, Shutemov posted a pair of new patches making the changes. The first introduces the new name (now NR_PAGE_ORDERS) that describes the number of allocation orders supported by the page allocator. The second then renames MAX_ORDER to MAX_PAGE_ORDER, with the result that any out-of-tree code that uses MAX_ORDER will now fail to compile. That change should prevent a repeat of the dm-crypt bug; it should also keep MAX_ORDER bugs from being introduced into the stable kernels.
These patches, seemingly likely to be applied once the 6.8 merge window
opens, should bring an end to the long history of MAX_ORDER in the
kernel. The moral of the story is that names are indeed important; the
wrong name can lead to the creation of incorrect code. Just as important,
though, is the point that the meaning of a name should not be changed
without raising a flag in potentially affected code. If a name is not
right, it should probably be left behind entirely.
Posted Jan 1, 2024 20:27 UTC (Mon)
by koverstreet (✭ supporter ✭, #4296)
[Link] (11 responses)
Posted Jan 1, 2024 21:31 UTC (Mon)
by NYKevin (subscriber, #129325)
[Link] (10 responses)
Posted Jan 2, 2024 7:10 UTC (Tue)
by willy (subscriber, #9762)
[Link] (2 responses)
Kent's solution would have been to rename the excluded end definition. You ... don't seem to actually be proposing a solution.
Posted Jan 2, 2024 18:52 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Posted Jan 2, 2024 18:57 UTC (Tue)
by willy (subscriber, #9762)
[Link]
Posted Jan 2, 2024 7:37 UTC (Tue)
by donald.buczek (subscriber, #112892)
[Link] (6 responses)
Code evolution should aim for clarity, conciseness, comprehensibility, and consistency, not the contrary. Misnomers should be eradicated, not perpetuated. Duplications should be eliminated, not introduced.
Why not modify the old name to trigger a compilation error with a clear message indicating the sole correct replacement?
Posted Jan 2, 2024 13:35 UTC (Tue)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Getting the solution into the error message is hard, but this would probably be sufficient:
#define MAX_ORDER __error_you_should_use_MAX_PAGE_ORDER_instead_see_above_docs
Posted Jan 2, 2024 14:09 UTC (Tue)
by donald.buczek (subscriber, #112892)
[Link]
Posted Jan 2, 2024 14:59 UTC (Tue)
by danielthompson (subscriber, #97243)
[Link]
Let's hope not! If the out-of-tree code is correct the this replacement is wrong.
Posted Jan 2, 2024 15:05 UTC (Tue)
by danielthompson (subscriber, #97243)
[Link]
There is no correct value to replace MAX_ORDER with since MAX_ORDER has had two contradictory definitions at different points in history. Any advice to mechanically update MAX_ORDER is wrong. You cannot make a mechanical transform unless you know which kernel the out-of-tree code was written for *and* you are sure that the author did not misunderstand what MAX_ORDER meant.
Posted Jan 2, 2024 18:56 UTC (Tue)
by NYKevin (subscriber, #129325)
[Link] (1 responses)
Because if it were that easy, they wouldn't delete the macro in the first place. They would instead protect it with some sort of #ifdef so that it could only be used by out-of-tree code. That took me about five minutes to think of, and I don't even hack on the kernel, so surely they were aware of it as an option.
As others have explained, the macro has had two different definitions at different times. There is no single correct replacement. You have to look at the surrounding code and figure out which one was meant.
Posted Jan 3, 2024 5:52 UTC (Wed)
by donald.buczek (subscriber, #112892)
[Link]
Upon reevaluation, I realize I was mistaken. The new macros, NR_PAGE_ORDERS and MAX_PAGE_ORDER, are aptly named. Considering their actual usage, it seems justified to have two macros: one for the idiomatic C for loop with a "< NR_PAGE_ORDERS" condition, and another for "min(something, MAX_PAGE_ORDERS)" constructs, where "NR_PAGE_ORDERS-1" would be distracting.
My original comment was mistakenly posted as a reply to your comment, not the article, which might have caused some confusion.
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER
> with MAX_PAGE_ORDER.
The trouble with MAX_ORDER
The trouble with MAX_ORDER
The trouble with MAX_ORDER