Detecting kernel memory leaks

[Posted June 19, 2006 by corbet]

The implementation language for the Linux kernel is C. That choice makes a great deal of sense; C does a good job of staying out of the way and letting programmers control exactly what is happening. Anybody who does any significant amount of C programming, however, eventually ends up chasing down memory leaks. Since C forces programmers to track every block of allocated memory and clean up their own messes, things occasionally slip through the cracks. Memory leaks can be a problem in applications, especially those which run for a long time - ask any Firefox user. But kernel memory leaks are worse; every time the kernel drops a piece of memory, it is gone until the next boot. A system with a serious kernel memory leak will quickly become unusable.

Tracking down memory leaks can be painful work. When a proprietary memory allocation tracking tool became available for SunOS many years ago, your editor had no qualms about spending thousands of his employer's dollars to license it; the payback time was quite short. In current times, Linux users can employ a free tool like valgrind (version 3.2.0 was released on June 8) to track down user-space memory leaks. But valgrind does not work on a running kernel. (Some work has been done on running User-mode Linux under valgrind, but sometimes one simply has to debug the host system).

As the kernel developers rely more heavily on automated tools for finding bugs, the creation of a kernel memory leak detector is an obvious next step. Catalin Marinas has taken that step with a kernel memory leak detector patch series. This code, if accepted into the kernel, should help to eliminate another big class of errors.

Catalin's patch functions much like a scan-and-mark garbage collector. The first step is to track every memory allocation in the system; to that end, the patch instruments the slab allocator. Every block allocated from a slab (which will include allocations from kmalloc()) is stored in a radix tree; along with a pointer to the block, the stored information includes the block size and a stack trace identifying where the block was allocated. When blocks are freed, their corresponding entries are removed from the radix tree.

During normal system operation, this radix tree just sits there. Should somebody ask about memory leaks (by reading /sys/kernel/debug/memleak), the detection algorithm swings into action. The steps performed are:

A big list is created holding every outstanding memory allocation in the system. This list is called the "white" list; everything on it is considered to be a possible memory leak.
Various parts of memory are scanned for pointers which match the allocated blocks; every time such a pointer is found, the block is moved to the "gray" list of memory which is still reachable, and thus not leaked. The initial scan includes the kernel's static data areas, each process's kernel stack, and each processor's per-CPU variable data area.
The first scan finds all memory referenced directly from static memory, but kernel data structures are more complicated than that. So, each block which has been put onto the gray list is scanned as well. Most of these blocks will be structures allocated from a slab cache, and they may contain pointers to other structures. So each block is queried, paying attention to that block's remembered size. Any pointers found within the block are moved over to the gray list, and scanned in turn. There is, of course, a provision for remembering which blocks have been scanned and avoiding infinite loops.
Once all pointers on the gray list have been scanned, every block of memory reachable by the kernel has been located. Anything remaining on the white list is considered to be leaked, and the relevant information is sent back to user space.

In the real world, things get complicated, so the leak detector is not quite as simple as described above. One situation which had to be addressed is cases where the kernel keeps a pointer to the interior of a block of memory, rather than to the beginning. This happens frequently; many kernel structures are located by way of an embedded list_head structure or kobject, for example. As a way of locating these blocks, the memory leak detector records uses of the container_of() macro; in particular, it remembers the size of the block and the offset to the embedded structure. When a block of a given size is allocated, the detector records "alias" addresses for any possible embedded structures. A pointer to one of those aliases is considered to be equivalent to a pointer to the beginning of the block.

There are various other special cases which must be handled. For example, memory obtained from vmalloc() will be pointed to by the memory allocation code itself, but might still be leaked. In other cases, memory is allocated which cannot be found by the scanning algorithm; a number of special annotations are added to the kernel to suppress the resulting false positive reports. The detector can also be fooled by pointers which are left behind in disused memory, or by random data which happens to look like a pointer to an allocated block; in these cases, false-negatives will result.

Even with these problems, the situation is better than before - a lot of memory leak situations can be found. Ingo Molnar, however, has a vision of a more ambitious scheme wherein type information for every allocated block would be retained. Among other things, this information would allow the scanning to be restricted to parts of the block known to contain pointers; that should speed the process and reduce false negatives. Since type information is available, each scanned pointer could be checked to ensure that it points to a block of the correct type, adding another level of checking to the kernel. Implementing all of this looks like a big task, however; even Ingo may need a couple of days to get it done.

Detecting kernel memory leaks

Posted Jun 22, 2006 8:58 UTC (Thu) by kleptog (subscriber, #1183) [Link] (2 responses)

The question ofcourse is: has it found anything yet?

Detecting kernel memory leaks

Posted Jun 22, 2006 9:39 UTC (Thu) by mingo (guest, #31122) [Link] (1 responses)

yep, even in its current (very) limited testing kmemleak found a real ACPI memory leak, with some more suspected ACPI leaks being analyzed.

We should also note that prior Coverity checking clean-sweeped some of the 'easy' memory-leaks, artificially reducing the visible efficiency of this GPL-ed tool.

(It could be tested by intentionally re-introducing some of the Coverity fixes but that's hard and laborous because not all Coverity fixes are identified as such and the code might have changed meanwhile.)

Detecting kernel memory leaks

Posted Jul 27, 2006 8:41 UTC (Thu) by cmarinas (subscriber, #39468) [Link]

(a bit late with replying but I only saw the article now)

There is no need to re-introduce the bugs found by Coverity as kmemleak managed to find some real leaks (I think about 4-5 at the moment - ACPI, i386 setup, SE Linux and a case of vfree used instead of vunmap in the ARM code) which Coverity wouldn't have been able to see because of the code complexity.

Coverity, however, can find potential leaks before they actually happen since it does a static analysis of the code.

Detecting kernel memory leaks

Posted Jun 22, 2006 9:34 UTC (Thu) by flewellyn (subscriber, #5047) [Link] (11 responses)

That algorithm is strikingly similar to tricolor marking, an incremental garbage collection
algorithm invented by Djkstra.

Which brings to mind the question: should the kernel folks just bite the bullet and implement an
in-kernel GC? This patch is halfway there already.

Detecting kernel memory leaks

Posted Jun 23, 2006 4:40 UTC (Fri) by cpeterso (guest, #305) [Link] (1 responses)

I remember reading a LKML comment by Linus that he would never add a garbage collector to the kernel. Maybe one day, he'll eat those words. :-) I definitely think it's worth investigating, especially given these new memleak patches.

Detecting kernel memory leaks

Posted Jun 24, 2006 11:59 UTC (Sat) by nix (subscriber, #2304) [Link]

net/unix/garbage.c already contains a mark-and-sweep garbage collector: garbage collection is a requirement for AF_UNIX socket handling unless you like DoSes (which are easy: a program sets up an AF_UNIX socket, opens it itself but does not go into accept(), sends the socket's filehandle along the socket, closes both ends, and quits. Oops. One handle leaked, unless you detect the filehandles in transit along the the AF_UNIX socket, which means mark-and-sweep collection given that these fhes may be the fhes of other sockets which may themselves have filehandles in transit along them...)

Detecting kernel memory leaks

Posted Jul 27, 2006 8:49 UTC (Thu) by cmarinas (subscriber, #39468) [Link] (8 responses)

Indeed, the algorithm is pretty close to a garbage collection one, only that it doesn't try to free the orphan blocks.

Apart from the technical difficulties of a complete garbage collector in the kernel (big overhead, SMP systems, "stop the world scanning", pointer aliasing - see the container_of workaround in kmemleak), I am personally against it as it would make people lazier in dealing with memory resources.

Detecting kernel memory leaks

Posted Jun 18, 2009 16:55 UTC (Thu) by proski (subscriber, #104) [Link]

Actually, I'd rather see driver developers worry about aspects specific to the drivers rather than about memory. If the kernel starts consuming too much memory, the biggest users can be found and modified to use less memory.

Detecting kernel memory leaks

Posted Jun 25, 2009 12:18 UTC (Thu) by epa (subscriber, #39769) [Link] (6 responses)

The trouble with garbage collection in languages such as C, which allow casting pointers into other data types, is that you can't be sure what is a pointer and what isn't. A block of memory might contain values which appear to be pointers and stop the garbage collector freeing some blocks. This is just bad luck, and on a 64-bit system would have to be very bad luck, except for the fact that the kernel accepts data from untrusted user space. There would be all sorts of interesting DoS attacks based on getting the kernel to store some binary data which looks like pointers and exhausting the system memory.

Detecting kernel memory leaks

Posted Jun 27, 2009 0:24 UTC (Sat) by dlang (guest, #313) [Link] (5 responses)

given that a pointer is a 32 bit or 64 bit value, please explain what 'looking like a pointer' means.

Detecting kernel memory leaks

Posted Jun 27, 2009 13:08 UTC (Sat) by nix (subscriber, #2304) [Link]

'Looking like a pointer' = 'has a value which makes it likely to be a
pointer and not a number with some other meaning'.

There are many tricks you can use: value pointing to a writably-mapped
region of the application's address space (very effective on x86-64, of
course), alignment on platforms where that matters...

Detecting kernel memory leaks

Posted Jun 28, 2009 23:29 UTC (Sun) by epa (subscriber, #39769) [Link] (3 responses)

By 'looking like a pointer' I mean 'likely to confuse the garbage collector into thinking it is a pointer to addressable memory'. That then causes any allocated space at that address not to be freed.

Detecting kernel memory leaks

Posted Jun 29, 2009 0:10 UTC (Mon) by dlang (guest, #313) [Link] (2 responses)

if a garbage collector is looking at random bytes and trying to guess if they are pointers or not it's going to guess wrong a lot of the time.

I don't think it's even a requirement that pointers be word aligned in memory, so you would have to look at every 4 byte sequence and calculate if it could be a pointer or not.

that's a _lot_ of calculations to waste your time on, even if you had a chance of being right 100% of the time.

Detecting kernel memory leaks

Posted Jun 29, 2009 6:56 UTC (Mon) by nix (subscriber, #2304) [Link]

The garbage collector doesn't have to do that everywhere, but on the C
stack, it does. It can often rely on alignment requirements because humans
cannot dictate the layout of the stack: only the compiler can. (On many
architectures, e.g. SPARC, they're hard requirements anyway.)

Detecting kernel memory leaks

Posted Jun 29, 2009 7:40 UTC (Mon) by epa (subscriber, #39769) [Link]

if a garbage collector is looking at random bytes and trying to guess if they are pointers or not it's going to guess wrong a lot of the time.

Exactly. Which is why C is not an ideal language for a garbage-collection runtime. More high-level languages provide guarantees of what's a pointer and what isn't, so the GC doesn't have to guess.

Garbage collection in C and C++ can still be useful and fast, but it can never be 100% correct. This may not matter for many applications, but for programs like the kernel which need to be highly secure and must cope with untrusted user input (which could easily contain 32-bit or 64-bit ints carefully chosen to look like pointers), I suspect it does.

I wonder if there is a subset of C which provides the minimal guarantees needed to make garbage collection safe and complete. The main thing would be not to cast pointers to ints and back, nor to allow casting between pointers to things of different sizes, nor arbitrary pointer arithmetic (though arrays could still be provided).

Detecting kernel memory leaks

Posted Jun 22, 2006 9:56 UTC (Thu) by simlo (guest, #10866) [Link]

Hmm, why not just "port" the Boehm-Demers-Weiser conservative garbage collector (http://www.hpl.hp.com/personal/Hans_Boehm/gc/) to kernel space?
I have previously used it - as a garbage collector, not only as leak detector - in userspace with great success.

Detecting kernel memory leaks

Posted Jun 22, 2006 13:11 UTC (Thu) by liljencrantz (guest, #28458) [Link] (2 responses)

One way to avoid some memory leaks is to have better ways to specify memory lifetimes.

The traditional memory allocation lifetimes are:

* 'the life of the current function' for allocations on the stack
* 'until I manually free this memory' for allocations on the heap
* 'when the number of users hits zero' for reference counting

All of these are very usable in and taken together they can be quite versatile.

There is however one other allocation model that I often find very useful; namely to say 'this piece of memory here is needed as long as that piece of memory over there is needed'. The Samba people use this allocation strategy using a function called talloc. I have written my own implementation (halloc) which is a bit different from talloc:

* You can register functions to be called when a specific piece of memory is free'd. That means you can call free on memory not allocated using halloc when a halloc'd piece of memory is freed, which makes using halloc together with libraries and functions not specifically written for halloc much easier.

* Halloc overallocates memory, and uses the extra memory to fullfill smaller memory allocations. That means that doing lots and lots of very small memory allocations very close to free, both in memory usage and CPU time. In fish (http://roo.no-ip.org/fish) only one quarter of all calls to halloc actually result in an underlying call to malloc.

Most of the time, the scope of a halloc allocation is still a single functions lifetime. The major difference, however, is that you can specify _which_ function. You can send along a context as a parameter to another function, and all the memory allocations performed by the called function will use the supplied context as allocation scope. This takes C a huge step closer to the ease of use of automatically garbage collected languages when it comes to memory handling.

At least that has been my experience.

Detecting kernel memory leaks

Posted Jun 24, 2006 12:01 UTC (Sat) by nix (subscriber, #2304) [Link] (1 responses)

The need for the registration function is mostly obviated by notifiers, AIUI. The halloc() function is rendered unnecessary by the slab allocator (that's what slabs are, blocks of memory containing many potential objects of the same type, initialized from a template and allocatable at lightning speed.)

Detecting kernel memory leaks

Posted Jul 19, 2006 15:55 UTC (Wed) by liljencrantz (guest, #28458) [Link]

Slabs are a performance hack to make some allocation patterns faster. This is completely different from halloc, which is a method for making memory deallocation significantly less cumbersome for the programmer. Both have the potential to waste less time and memory, but that is mainly a possible byproduct in the case of halloc.

Notifiers would work as a replacement for the halloc callback functions, but the result would be wordier and need more changes on existing code.

The LWN kernel page leaps from strength to strength :-)

Posted Jun 23, 2006 1:37 UTC (Fri) by xoddam (subscriber, #2322) [Link] (1 responses)

Many thanks to all contributors. A stack of interesting leads from these
articles (not to mention the comments). More, more, don't stop!

The LWN kernel page leaps from strength to strength :-)

Posted Jun 9, 2010 7:46 UTC (Wed) by alexs (guest, #13637) [Link]

whilst the article is about an analysis tool for fixing the sources
a bigger part of the comments talk about "fixing" in a live system.
these are two different shoes.

if you are already pretty close having a 100% sane system
then close the gap and make yourselves happy with the result
instead of "inventing" risky and costly nearly-random effect generators.

having no clue about the root problem wont buy you much benefit in successfully resolving it.

Detecting kernel memory leaks

Posted Jun 5, 2012 14:21 UTC (Tue) by daynix (guest, #84976) [Link]

Nice article, thanks. I'm curious is there any plan to track other kernel objects allocations, like skbuffs, devices, etc. Currently we use LeakMon library for this purpose (http://www.daynix.com/index.php?option=com_content&view=article&id=62&Itemid=71), but it would be nice to have something similar in stock kernel...