LWN.net Logo

When memory allocation failure is not an option

By Jonathan Corbet
August 25, 2010
One of the darker corners of the kernel's memory management subsystem is the __GFP_NOFAIL flag. That flag says that an allocation request cannot be allowed to fail regardless of whether memory is actually available or not; if the request cannot be satisfied, the allocator will loop continuously in the hope that, somehow, something will eventually change. Needless to say, kernel developers are not encouraged to use this option. More recently, David Rientjes has been trying to get rid of it by pushing the (possibly) infinite looping back to callers.

Andrew Morton was not convinced by the patch:

The reason for adding GFP_NOFAIL in the first place was my observation that the kernel contained lots of open-coded retry-for-ever loops.

All of these are wrong, bad, buggy and mustfix. So we consolidated the wrongbadbuggymustfix concept into the core MM so that miscreants could be easily identified and hopefully fixed.

David's response is that said miscreants have not been fixed over the course of many years, and that __GFP_NOFAIL imposes complexity on the page allocator which slows things down for all users. Andrew came back with a suggestion for special versions of the allocation functions which would perform the looping; that would move the implementation out of the core allocator, but still make it possible to search for code needing to fix; David obliged with a patch adding kmalloc_nofail() and friends.

This kind of patch is guaranteed to bring out comments from those who feel that it is far better to just fix code which is not prepared to deal with memory allocation failures. But, as Ted Ts'o pointed out, that is not always an easy thing to do:

So we can mark the retry loop helper function as deprecated, and that will make some of these cases go away, but ultimately if we're going to fail the memory allocation, something bad is going to happen, and the only question is whether we want to have something bad happen by looping in the memory allocator, or to force the file system to panic/oops the system, or have random application die and/or lose user data because they don't expect write() to return ENOMEM.

Ted's point is that there are always going to be places where recovery from a memory allocation failure is quite hard, if it's possible at all. So the kernel can provide some means by which looping on failure can be done centrally, or see it done in various ad hoc ways in random places in the kernel. Bad code is not improved by being swept under the rug, so it seems likely that some sort of central loop-on-failure mechanism will continue to exist indefinitely.


(Log in to post comments)

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds