Removing GFP_NOFS

By Jake Edge
June 5, 2024

The GFP_NOFS flag is meant for kernel memory allocations that should not cause a call into the filesystems to reclaim memory because there are already locks held that can potentially cause a deadlock. The "scoped allocation" API is a better choice for filesystems to indicate that they are holding a lock, so GFP_NOFS has long been on the chopping block, though progress has been slow. In a filesystem-track session at the 2024 Linux Storage, Filesystem, Memory Management, and BPF Summit, Matthew Wilcox wanted to discuss how to move kernel filesystems away from the flag with the eventual goal of removing it completely.

He began the session by saying that there are several changes that people would like with regard to the GFP flags, but that the scoped-allocation API (i.e. memalloc_nofs_save() and memalloc_nofs_restore() as mentioned in the LSFMM+BPF topic discussion) for GFP_NOFS went in long ago, while the conversion to it is far from complete. He also wanted to talk a bit about Rust, he said. There is a desire to bring in Rust code from outside the kernel for, say, a hash table, but that requires the ability to allocate memory, which means making GFP flags available. "Why the hell would we want to add GFP flags to every Rust thing that we bring into the kernel? That's crazy."

So, he asked, what interface would the filesystem developers like to have to indicate that an allocation should not recurse into the filesystem code; the existing interface is, seemingly, not the right one, since filesystems are generally not using it. Josef Bacik said that from the Btrfs perspective, there is no real advantage to switching away from GFP_NOFS because there are other GFP flags that it still needs to use. The flags are already plumbed through the Btrfs code, so it is just easier to add another GPF_NOFS use when that situation arises.

But the Rust use case is not one he had considered before; it is the first example he has seen where the save and restore interface makes a difference that makes the switch worth doing. Bacik said that he would be willing to change the code to eliminate GFP_NOFS if that was requested, but it has not really "seemed like a pressing need", so far, to him. The Rust need for the switch changes that equation in his mind.

Jan Kara said that he has been working on removing the flag from ext4 and other places, but has found it inconvenient to do in some parts of the code. Passing the cookie that gets returned from the save operation through, so that it is available to pass to the restore, is ugly and makes the conversion harder, he said. Similarly, there are locks that are taken in some paths such that recursing back into the filesystem needs to be prevented (which is what GFP_NOFS protects against). Rather than having code that manually does the save and restore while managing the cookie, marking those locks in a way that causes the system to automatically handle the allocation scope would make things easier for conversions.

Rust for Linux developer Wedson Almeida Filho said that it was painful to have to manage and think about all of the different GFP flags in the code. He wondered if there were some way to automatically set the scope by detecting the areas where one of them is needed. Ideally, that detection would happen at compile time; there could perhaps be support in Rust for that, he suggested.

Wilcox said that it depends on the specific filesystem, because they work in different ways with regard to their locking. The detection would have to know that a particular lock is taken during the reclaim path, for example. Kara suggested that lockdep might help, but Wilcox seemed skeptical, noting that lockdep cannot say that a particular lock is never taken in the reclaim path, for example. Dave Chinner agreed that it would be difficult to detect that situation since the filesystem code, its locking, and the interaction with the reclaim path are extremely complex.

There was some discussion between Chinner, Almeida, and Kent Overstreet about the difficulty of automatically detecting the proper context. There is also some interaction with lockdep, which emits false-positive warnings; that has led the XFS developers to use the __GFP_NOLOCKDEP flag (some of which is described in a patch from January). It was all rather fast-moving and technically deep.

Overstreet raised the idea of preventing kernel allocations from failing (effectively making GFP_NOFAIL more widespread). He is opposed to that effort because he thinks that it is important to ensure that the error paths are well-tested both for allocation failure and other errors. But he wondered if making that change might eliminate the need for memory pools (or "mempools").

Chinner said that there is a difference between mempools and the no-fail case: mempools provide a guarantee of forward progress that no-fail does not. In particular, when memory is being reclaimed, there is no guarantee that whatever needs the memory will actually get it. That reclaimed memory could be allocated to some other task, unlike with mempools. He thought it would be difficult to be sure that making that switch would work in all cases.

But what a no-fail policy does do, he continued, is remove the possibility of dereferencing null pointers when there is an allocation failure. Those kinds of bugs generally have security implications, so eliminating the possibility of allocation failure can remove a whole class of security-sensitive bugs.

Overstreet said that making the error paths easier to test is another approach. He plans to post some patches for that, including ways to inject errors at any memory-allocation site; those patches rely on his recently merged memory-allocation profiler. Bacik said that Btrfs has ways to inject errors to test its error paths using BPF scripts. Overstreet said that it is important to be able to target the error injection for code that is under your control; simply randomly failing memory allocations for the kernel as a whole is not viable. Bacik said that the Btrfs error paths are systematically tested using the BPF code.

The session ran out of time without coming to any conclusions on the path forward, which is unfortunate, Wilcox said. Everyone seemed interested in removing GFP_NOFS allocations, but there is no concrete proposal for how to get there; he will try to work on one. Now that he realizes there is a major push to get rid of those allocations, Bacik said that he will work with the other Btrfs developers to not add any more and to start removing the ones that are there.

Index entries for this article
Kernel	Filesystems
Kernel	Memory management/GFP flags
Conference	Storage, Filesystem, Memory-Management and BPF Summit/2024

Difficulty pushing restore flags through the code

Posted Jun 5, 2024 18:09 UTC (Wed) by riking (subscriber, #95706) [Link]

Scoped flag restore gets a lot easier when you can make closures easily (you can pass the to-be-restored flags into the closure), but oops, the most generic definition of a closure in C (void* data + void fn(void*)) requires allocation to add data past the first pointer!

Null pointers and error paths

Posted Jun 5, 2024 18:48 UTC (Wed) by comex (subscriber, #71521) [Link] (22 responses)

In 2024 it's just not worth worrying about the security impact of null pointer dereferences. On the hardware side, both x86 and arm64 have long since gained hardware functionality to block the kernel from accidentally dereferencing userspace pointers. Not sure about other arches, but those two account for the vast majority of real-world attack targets. Meanwhile, on the software side, mmap_min_addr has been around for a long time and can be enforced by LSMs.

However…

It *is* worth worrying about the security consequences of error paths. Just two days ago Google published a blog post about yet another kernel vulnerability involving an error path freeing something that had already been freed. [1] I've seen a bunch of those.

So I agree that thoroughly testing error paths is a great idea. At the same time, of course, from a security perspective, what's better than well-tested code is no code. If there are error paths that exist *only* to handle allocation failure (as opposed to handling allocation failure as one of multiple error conditions), then it would definitely be better for security if allocations were infallible and the error paths could be removed entirely.

Note that I'm opining only on the security aspect, not on the other pros and cons of allocator fallibility.

[1] https://androidoffsec.withgoogle.com/posts/attacking-andr...

Null pointers and error paths

Posted Jun 6, 2024 3:40 UTC (Thu) by azumanga (subscriber, #90158) [Link] (21 responses)

Yes, as time goes by I think most memory allocations should be treated as infalliable -- it's fine to have a special "I'm fine if this memory allocation fails" flag, particularly for large allocations (I tend to not bother unless I expect it to be at least 64MB, but that's not a hard rule).

But, experience tells us writing software which can survive many small failing memory allocations is REALLY hard. The only thing I know which does this correctly is sqlite, and that's a fairly small program with a crazy-large test suite, full of special tests to catch just this case.

I'd go as far as saying unless you've fuzz tested extensively with a sometimes-failing allocator, there is no chance your code for dealing with failing small allocations works correctly.

Null pointers and error paths

Posted Jun 6, 2024 12:07 UTC (Thu) by Wol (subscriber, #4433) [Link] (20 responses)

> But, experience tells us writing software which can survive many small failing memory allocations is REALLY hard. The only thing I know which does this correctly is sqlite, and that's a fairly small program with a crazy-large test suite, full of special tests to catch just this case.

Doesn't the malloc spec actually help here? Of course, it's not something I did myself either, but shouldn't frees be written

ptr = free(ptr); ?

Iirc free always returns null (does it always succeed?), and free(null) is defined as a no-op, so provided you don't make multiple copies on the assumption you're only going to free one, at least this tactic means that double frees can't happen ...

Cheers,
Wol

Null pointers and error paths

Posted Jun 6, 2024 12:29 UTC (Thu) by pm215 (subscriber, #98099) [Link] (19 responses)

free() doesn't return any value at all. You can defensively code a free as "free(ptr); ptr = NULL;" but this is not the common idiom in most C code I've seen. Also typically the malloc-failed codepath has to handle unwinding a lot more than merely freeing some other memory allocations.

Null pointers and error paths

Posted Jun 6, 2024 13:37 UTC (Thu) by Wol (subscriber, #4433) [Link] (18 responses)

Or am I thinking something like mmalloc (or remalloc or whatever it's called).

I'm sure I remember something about being able to do that, whether resizing it to zero actually does a free instead, or something. It's 20 years ago!, but I'm sure I remember thinking back in the day that you could achieve a free and assign null to the pointer in one operation. And I'm sure it was actually documented as working, even if nobody ever did it :-)

Cheers,
Wol

Null pointers and error paths

Posted Jun 6, 2024 15:36 UTC (Thu) by paulj (subscriber, #341) [Link] (17 responses)

Maybe you were thinking of realloc(ptr, 0)? Although, it can still return a non-NULL pointer value.

Null pointers and error paths

Posted Jun 6, 2024 16:15 UTC (Thu) by Wol (subscriber, #4433) [Link] (16 responses)

Almost certainly. But I did think the spec said "free and return null if asked for 0 bytes".

Just looked at the man page - I think it's a safe bet the version I used explicitly said it "frees and returns null", but the reality is it's implementation-dependent :-( Shame, would have been a nice fix for preventing double-frees if you could rely on it.

Cheers,
Wol

Null pointers and error paths

Posted Jun 6, 2024 17:22 UTC (Thu) by intelfx (subscriber, #130118) [Link] (12 responses)

realloc(ptr, 0) has just been made UB in the latest C standard :-)

Null pointers and error paths

Posted Jun 6, 2024 22:04 UTC (Thu) by Wol (subscriber, #4433) [Link] (11 responses)

Brilliant. A whole load of previously VALID C code will now silently screw up when fed through the "latest and greatest" compilers.

And they wonder why programmers are switching to other languages like Rust ...

Cheers,
Wol

Null pointers and error paths

Posted Jun 7, 2024 18:01 UTC (Fri) by riking (subscriber, #95706) [Link] (10 responses)

It was made UB because implementations differ in their behavior incompatibly, in particular "returning NULL and freeing the pointer" and "returning NULL without freeing the pointer" are the critical two that make it impossible to code correctly.

Null pointers and error paths

Posted Jun 7, 2024 18:13 UTC (Fri) by Wol (subscriber, #4433) [Link]

In which case you have a compiler flag or something, and it's implementation defined - "you get what you ask for, or either documented option at random".

Not "your previous implementation-defined code can now be optimised away as undefined".

Cheers,
Wol

Undefined, implementation defined, and unspecified behaviour

Posted Jun 8, 2024 8:47 UTC (Sat) by farnz (subscriber, #17727) [Link] (8 responses)

That sounds like a perfect case for implementation defined behaviour, without a set of constrained behaviours; the implementation has to document how it chooses to behave to be compliant, but has to stick to whatever behaviour it documents. I can see why you wouldn't want it to be unspecified behaviour, since (while you can constrain that to a set of allowed options), you generally want unspecified behaviour to be cases where there's a consistent compatible use, and room to do better if the implementation chooses a specific option.

Is there any discussion you can point me to that explains why not implementation defined here?

Undefined, implementation defined, and unspecified behaviour

Posted Jun 8, 2024 10:25 UTC (Sat) by excors (subscriber, #95769) [Link] (7 responses)

Looks like the relevant document is https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2464.pdf ("Zero-size Reallocations are Undefined Behavior").

The key part is:

> Classifying a call to realloc with a size of 0 as undefined behavior would allow POSIX to define the otherwise undefined behavior however they please.

It won't be undefined behaviour to call realloc(ptr, 0) from C23 on a POSIX system, because POSIX already defines it. ("Undefined behaviour" is a recessive trait - if another document wants to define it, then that definition takes priority). Platform-independent code can't rely on the POSIX definition (since it may run on a non-POSIX platform), but it can't rely on any implementation-defined behaviour either, so that's no worse than how it's been since at least C99.

POSIX says realloc(ptr, 0) can either return NULL, or return a pointer to some allocated space of unknown size, with some extra rules about errno (which were never in the C standard). So you can't use it as an alternative to free() anyway - it may perform a new allocation, which you will leak.

Pushing for downstream standards to define more

Posted Jun 8, 2024 13:38 UTC (Sat) by farnz (subscriber, #17727) [Link] (6 responses)

Ah, so the rationale is that they want POSIX and other downstream standards to add definitions for more things that are UB in Standard C; effectively reducing Standard C to "the things that are portable across all implementations of C", and expecting POSIX C to be an extension of Standard C to "the things that portable across all reasonable UNIX-like implementations of C".

Pushing for downstream standards to define more

Posted Jun 9, 2024 9:03 UTC (Sun) by Wol (subscriber, #4433) [Link] (5 responses)

In which case, surely that is "implementation defined" !?!?

If the C standard says "the compiler can assume undefined behaviour cannot happen", then if it's defined as undefined surely the compiler can just delete the code as "can't happen"? And isn't that exactly the behaviour we've been moaning about for ages?

In which case any alternative definition never gets considered?

Cheers,
Wol

Pushing for downstream standards to define more

Posted Jun 9, 2024 10:07 UTC (Sun) by excors (subscriber, #95769) [Link] (4 responses)

> If the C standard says "the compiler can assume undefined behaviour cannot happen" [...]

It doesn't say that. It says:

> undefined behavior: behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this document imposes no requirements
>
> Note 1 to entry: Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

A compiler that assumes realloc(ptr, 0) cannot happen will still be a conforming implementation of the C standard, but it won't be a conforming implementation of POSIX. A compiler that implements the POSIX behaviour will be a conforming implementation of both. Any C compiler that targets POSIX will aim to conform with both standards, so it doesn't really matter which one defines the behaviour.

Incidentally, on non-POSIX systems, Microsoft says "realloc hasn't been updated to implement C17 behavior because the new behavior isn't compatible with the Windows operating system" (https://learn.microsoft.com/en-us/cpp/c-runtime-library/r...). N2464 says C17 changed the definition of realloc "to allow for the existing range of implementations", but evidently they failed since Microsoft believes it doesn't allow their behaviour.

I'm guessing C23 made it undefined because they couldn't work out a useful, unambiguous definition that would allow both the POSIX and Windows behaviours, and neither of those are going to change, so it was a waste of time - better to simply defer to the platform documentation.

Pushing for downstream standards to define more

Posted Jun 9, 2024 16:19 UTC (Sun) by Wol (subscriber, #4433) [Link] (1 responses)

So I'm guessing it's the windows behaviour treats realloc(ptr,0) as free(ptr), seeing as my memories of Microsoft C v5 (and v6).

But that then leaves us wondering what on earth any cross-platform compiler such as gcc does, seeing as it can produce both Posix and Windows binaries ... and given that linux makes no claim whatsoever to support Posix, that's giving the gcc guys carte blanche to just eliminate a realloc(ptr,0) as "does nothing". BAAADDDD ...

Cheers,
Wol

Compiler choices when supporting two conflicting C standards

Posted Jun 10, 2024 8:18 UTC (Mon) by farnz (subscriber, #17727) [Link]

A compiler like GCC can know what platform it's compiling code for (it has to know which CPU, and which ABI, after all, so extending that to which platform standard applies and allowing an override is a non-issue). It can thus support POSIX C when compiling for POSIX platforms (like glibc-based Linux), Windows C when compiling for Windows, and merely ISO C when compiling a freestanding binary that doesn't depend on a platform.

Note that GCC already has the mechanism you'd need for this in place as part of its support for multiple C dialects - it could have a POSIX/Windows/neither switch in there, too.

Pushing for downstream standards to define more

Posted Jun 9, 2024 16:25 UTC (Sun) by Wol (subscriber, #4433) [Link]

following that link to the Microsoft definition, yes that is exactly what I remember.

So doing a "ptr = realloc(ptr, 0)" instead of a free would prevent double frees or access-after-free, so long as (a) you're using a Windows-compliant compiler, and (b) you don't make copies of ptr. Surely that would make masses of sense as a simple way of defensive coding!

Cheers,
Wol

Downstream standards should define more behaviours (changing the split between US, ID, and UB)

Posted Jun 10, 2024 10:40 UTC (Mon) by farnz (subscriber, #17727) [Link]

So, reading the discussions a bit more, I get the sense that the ISO committee's goal is to move the meanings of undefined behaviour, unspecified behaviour and implementation-defined behaviour around a bit (in a way that's always been valid, but has been under-utilized by downstream standards like POSIX).

The ISO standard sets a common definition of C that all implementations of C must agree on, but allows a lot of latitude for downstream standards to tighten up the ISO definitions; for example, ISO says that "double" is a floating point number, so a downstream standard cannot repurpose "double" for integers represented using a pair of registers, but while ISO C says that the size of "char" in bits is implementation-defined and at least 8 bits, POSIX says that the size of "char" is always exactly 8 bits.

This is then an attempt to push downstream standards to tighten up ISO definitions when it comes to behaviours; it's already obvious to downstream standards that where something is implementation defined, a downstream standard can say "the implementation must define it this way", but it's not so obvious that where something is unspecified behaviour (one of a set of choices, no need to be consistent or to document which one as long as you choose from the set every time you encounter this construct) or undefined behaviour (the program can take any meaning if this construct is encountered) in ISO C, a downstream standard can make that defined, implementation-defined (choose and document a behaviour), or unspecified if it so desires, without conflicting with ISO C.

Taking an example that upsets a lot of people, ISO says that signed integer overflow is undefined behaviour; but they'd be very happy for POSIX to say that signed integer overflow is unspecified behaviour, and must be either saturating, twos complement wrap-around, wraps around to zero (skipping the negative numbers completely), or results in the program receiving a SIGFPE signal. It'd then be on implementations to choose the behaviour that results in the most optimal program from those choices, assuming they claimed POSIX support.

Null pointers and error paths

Posted Jun 10, 2024 9:51 UTC (Mon) by paulj (subscriber, #341) [Link] (2 responses)

I've never seen anyone use realloc() in that way myself. I guess, not least, cause it doesn't guarantee what you want. The other subthread on UB/ID is interesting there. ;) (And I agree, why push to make it UB when clearly it's ID, as implementations do have their own definitions? but... sibling subthread ;) ).

What I do is:

1. At the lower levels that need to deal directly with *alloc and free(), I have a wrapper around free() (possibly a macro) which takes a double-pointer to the caller's pointer. It can then null out the caller's pointer directly.

foo *foo_free(foo_t **foo) {
__whatever_house_and_book_keeping ();
free (*foo);
*foo = NULL;
return NULL;
}
#define attr_cleanup(X) __attribute__ ((__cleanup__(X)))
....
{
foo_t *foo attr_cleanup (foo_free) = foo_new(ctxt);
...
}

2. At a higher level, you need to encapsulate the low-level memory allocations into some coherent strategy to manage the lifetime of objects. Often some combination of:
a) Allocating a pre-determined number of objects, suitable for the problem being tackled. (Good for deterministic behaviour).
b) Careful alignment of entity lifetime with the structure of the algorithm being run
c) Reference counting
d) Hierarchical allocation management, in combination with one of the previous
e) Liveness checking [either at a low level by scanning for pointers (v rare in C/C++), or some more abstract, problem-domain + type specific check] and GC

Probably forgetting some strategies.

This is one of the most important and hardest parts to get right. Some languages have features to make it harder, even impossible, to free used objects. But that leads to many programmers in such languages not understanding the importance of lifetime management - which remains important even in such languages for performance.

Null pointers and error paths

Posted Jun 10, 2024 16:00 UTC (Mon) by Wol (subscriber, #4433) [Link] (1 responses)

> I've never seen anyone use realloc() in that way myself. I guess, not least, cause it doesn't guarantee what you want.

How come? It sounds like it typically doesn't work that way on Unix, but if MS define "realloc( ptr, 0)" to free the memory and return null, then "ptr = realloc( ptr, 0)" achieves exactly what I would like by that definition - it destroys the pointer at the same time as destroying the memory.

If you have something that then traps de-referencing a null pointer (or simply your code checks for a null pointer), then the likelihood of double-free, use-after-free, etc goes down.

Cheers,
Wol

Null pointers and error paths

Posted Jun 17, 2024 17:24 UTC (Mon) by mrugiero (guest, #153040) [Link]

Why not just writing a little macro?
#define xfree(ptr) do { \
free(ptr); \
ptr = NULL; \
} while (0)