The trouble with MAX_ORDER

By Jonathan Corbet
January 1, 2024

One might not think that much could be said about a simple macro defining a constant integer value. But the kernel is special, it seems. A change to the definition of MAX_ORDER has had a number of follow-on effects, and the task of cleaning up after this change is not done yet. So perhaps a look at MAX_ORDER is in order.

Like everything else in the earliest releases of the Linux kernel, memory management was simple; it was mostly concerned with the tracking and management of individual pages. Around 1994, though, as memory sizes increased and the kernel became more complex, it became clear that the ability to deal with pages in larger chunks would be useful. The 1.1.0 release in April 1994 included the beginnings of what was to become the kernel's "buddy allocator", which tracks and allocates pages in blocks sized in powers of two.

With this change came the concept of an "order", which is just the base-two logarithm of the number of pages in a block. An order-0 block is a single page, an order-1 block holds two pages, and so on. Blocks of different orders were kept in separate lists in 1.1.0, which included a macro (NR_MEM_LISTS) describing how many of those lists there were. It was given a value of six initially, meaning that the largest block that could be managed was of order five — 32 pages. The 2.1.23 release (January 1997) saw a special case added for the AP1000 CPU, which needed to be able to allocate 8MB (order 11) chunks of memory; NR_MEM_LISTS was set to 12 if the kernel was being built for that architecture.

In 2.3.27 (November 1999), NR_MEM_LISTS got a new name — MAX_ORDER — and was set to ten for all but the AP1000. Support for AP1000 was eventually removed in 2.3.42 in February 2000, but the practice of adjusting MAX_ORDER for specific architectures has only grown more prevalent. The default value for MAX_ORDER can now vary widely between architectures; it defaults to 11 on most systems, though. As a result, on such systems, pages can be allocated in blocks of order zero through ten.

One interesting aspect of MAX_ORDER, as described so far, is that it is not actually the maximum order; if MAX_ORDER is 11, then the maximum order that can be allocated is ten. In March 2023, Kirill Shutemov decided to address this inconsistency after noticing some bugs in the code resulting from a misunderstanding of MAX_ORDER; the result was this patch set that fixed such bugs in nine different places, including a longstanding bug in the floppy driver.

The work did not end there, though; at the end of this series, Shutemov redefined MAX_ORDER to mean what its name suggests: the maximum order an allocation can be. As a result, MAX_ORDER often defaults to ten now, with the result that the kernel (still) maintains 11 lists of free blocks. At the time, the reception for the change was mostly positive, though Mel Gorman did worry that this change could cause problems for stable backports. David Laight suggested that changing the name of MAX_ORDER would help prevent such problems, but that suggestion was ignored and Shutemov's patch series was merged for the 6.4 release in June.

At the time, it seemed that the problem was solved. At the end of September, though, Paolo Bonzini pointed out three changes that, having been developed in parallel with the MAX_ORDER change, were still using the older meaning. The changes were correct when written, but had been broken by the MAX_ORDER change but nobody had noticed. Nearly two months later, Mike Snitzer sent a pull request containing a fix for one of those bugs, introduced by a change to the dm-crypt target merged for 6.5. It turns out that out-of-tree code, like backports, can be silently broken by this change — a concern that nobody had raised when the MAX_ORDER change was being considered.

That bug caused Linus Torvalds to question whether the change should have been made at all. He suggested that he might revert it in the absence of a compelling reason to keep it around; he later added:

Bah. I looked at the commits leading up to it, and I really think that "if that's the fallout from 24 years of history, then it wasn't a huge deal".
Compared to the inevitable problems that the semantic change will cause for backports (and already caused for this merge window), I still think this was all likely a mistake.

He also said that bugs related to MAX_ORDER tend not to be found by testing because it is rare to allocate blocks of the maximum size.

Shutemov answered that the bugs he had fixed in that series were only those that made it into the mainline and had never been found; "I personally spend some time debugging due MAX_ORDER-1 thing". The real mistake, Shutemov continued, was (as Laight had pointed out) in preserving the MAX_ORDER name while changing its semantics: "Just renaming it to MAX_PAGE_ORDER or something would give a heads up to not-yet-upstream code". He followed up with another patch adding a new symbol, NR_ORDERS, defined as MAX_ORDER+1. Torvalds agreed with that change, but insisted that the MAX_ORDER name needed to change in order to avoid future problems.

At the end of December, Shutemov posted a pair of new patches making the changes. The first introduces the new name (now NR_PAGE_ORDERS) that describes the number of allocation orders supported by the page allocator. The second then renames MAX_ORDER to MAX_PAGE_ORDER, with the result that any out-of-tree code that uses MAX_ORDER will now fail to compile. That change should prevent a repeat of the dm-crypt bug; it should also keep MAX_ORDER bugs from being introduced into the stable kernels.

These patches, seemingly likely to be applied once the 6.8 merge window opens, should bring an end to the long history of MAX_ORDER in the kernel. The moral of the story is that names are indeed important; the wrong name can lead to the creation of incorrect code. Just as important, though, is the point that the meaning of a name should not be changed without raising a flag in potentially affected code. If a name is not right, it should probably be left behind entirely.

The trouble with MAX_ORDER

Posted Jan 1, 2024 20:27 UTC (Mon) by koverstreet (✭ supporter ✭, #4296) [Link] (11 responses)

We C programmers deal with half open intervals when we're talking about the sizes arrays - the proper change (if any was required!) would have been to rename it to NR_PAGE_ALLOC_ORDERS, and not change it otherwise.

The trouble with MAX_ORDER

Posted Jan 1, 2024 21:31 UTC (Mon) by NYKevin (subscriber, #129325) [Link] (10 responses)

Half open intervals are used when iterating over things (e.g. arrays, file descriptors w/ OPEN_MAX, etc.). Closed intervals are used when sizing or limiting things (see all of the macros in limits.h, except for OPEN_MAX: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs...).

The trouble with MAX_ORDER

Posted Jan 2, 2024 7:10 UTC (Tue) by willy (subscriber, #9762) [Link] (2 responses)

You're saying the same thing. MAX is customarily used to denote an inclusive end and in this case it wasn't.

Kent's solution would have been to rename the excluded end definition. You ... don't seem to actually be proposing a solution.

The trouble with MAX_ORDER

Posted Jan 2, 2024 18:52 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

I'm not proposing a solution at all. I'm responding to the assertion that C uses half-open intervals.

The trouble with MAX_ORDER

Posted Jan 2, 2024 18:57 UTC (Tue) by willy (subscriber, #9762) [Link]

Kent qualified that statement. You really still are a lawyer.

The trouble with MAX_ORDER

Posted Jan 2, 2024 7:37 UTC (Tue) by donald.buczek (subscriber, #112892) [Link] (6 responses)

Now, when out-of-tree code encounters a compilation failure due to the removal of MAX_ORDER, it will be replaced with MAX_PAGE_ORDER. This results in the kernel having two macros for the same purpose, including a misnomer, indefinitely.

Code evolution should aim for clarity, conciseness, comprehensibility, and consistency, not the contrary. Misnomers should be eradicated, not perpetuated. Duplications should be eliminated, not introduced.

Why not modify the old name to trigger a compilation error with a clear message indicating the sole correct replacement?

The trouble with MAX_ORDER

Posted Jan 2, 2024 13:35 UTC (Tue) by mathstuf (subscriber, #69389) [Link] (2 responses)

> Why not modify the old name to trigger a compilation error with a clear message indicating the sole correct replacement?

Getting the solution into the error message is hard, but this would probably be sufficient:

#define MAX_ORDER __error_you_should_use_MAX_PAGE_ORDER_instead_see_above_docs

The trouble with MAX_ORDER

Posted Jan 2, 2024 14:09 UTC (Tue) by donald.buczek (subscriber, #112892) [Link]

#define MAX_ORDER ({_Static_assert(0, "MAX_ORDER is obsolete, please use NR_PAGE_ORDERS-1 instead.");0;})

The trouble with MAX_ORDER

Posted Jan 2, 2024 14:59 UTC (Tue) by danielthompson (subscriber, #97243) [Link]

> Now, when out-of-tree code encounters a compilation failure due to the removal of MAX_ORDER, it will be replaced
> with MAX_PAGE_ORDER.

Let's hope not! If the out-of-tree code is correct the this replacement is wrong.

The trouble with MAX_ORDER

Posted Jan 2, 2024 15:05 UTC (Tue) by danielthompson (subscriber, #97243) [Link]

Sorry I might have been a big vague here.

There is no correct value to replace MAX_ORDER with since MAX_ORDER has had two contradictory definitions at different points in history. Any advice to mechanically update MAX_ORDER is wrong. You cannot make a mechanical transform unless you know which kernel the out-of-tree code was written for *and* you are sure that the author did not misunderstand what MAX_ORDER meant.

The trouble with MAX_ORDER

Posted Jan 2, 2024 18:56 UTC (Tue) by NYKevin (subscriber, #129325) [Link] (1 responses)

> Why not modify the old name to trigger a compilation error with a clear message indicating the sole correct replacement?

Because if it were that easy, they wouldn't delete the macro in the first place. They would instead protect it with some sort of #ifdef so that it could only be used by out-of-tree code. That took me about five minutes to think of, and I don't even hack on the kernel, so surely they were aware of it as an option.

As others have explained, the macro has had two different definitions at different times. There is no single correct replacement. You have to look at the surrounding code and figure out which one was meant.

The trouble with MAX_ORDER

Posted Jan 3, 2024 5:52 UTC (Wed) by donald.buczek (subscriber, #112892) [Link]

Maybe my original comment was unclear. I didn't mean to imply that current usages shouldn't be reviewed. I was merely questioning if having two macros for the same purpose is optimal.

Upon reevaluation, I realize I was mistaken. The new macros, NR_PAGE_ORDERS and MAX_PAGE_ORDER, are aptly named. Considering their actual usage, it seems justified to have two macros: one for the idiomatic C for loop with a "< NR_PAGE_ORDERS" condition, and another for "min(something, MAX_PAGE_ORDERS)" constructs, where "NR_PAGE_ORDERS-1" would be distracting.

My original comment was mistakenly posted as a reply to your comment, not the article, which might have caused some confusion.