Detecting kernel memory leaks
Tracking down memory leaks can be painful work. When a proprietary memory allocation tracking tool became available for SunOS many years ago, your editor had no qualms about spending thousands of his employer's dollars to license it; the payback time was quite short. In current times, Linux users can employ a free tool like valgrind (version 3.2.0 was released on June 8) to track down user-space memory leaks. But valgrind does not work on a running kernel. (Some work has been done on running User-mode Linux under valgrind, but sometimes one simply has to debug the host system).
As the kernel developers rely more heavily on automated tools for finding bugs, the creation of a kernel memory leak detector is an obvious next step. Catalin Marinas has taken that step with a kernel memory leak detector patch series. This code, if accepted into the kernel, should help to eliminate another big class of errors.
Catalin's patch functions much like a scan-and-mark garbage collector. The first step is to track every memory allocation in the system; to that end, the patch instruments the slab allocator. Every block allocated from a slab (which will include allocations from kmalloc()) is stored in a radix tree; along with a pointer to the block, the stored information includes the block size and a stack trace identifying where the block was allocated. When blocks are freed, their corresponding entries are removed from the radix tree.
During normal system operation, this radix tree just sits there. Should somebody ask about memory leaks (by reading /sys/kernel/debug/memleak), the detection algorithm swings into action. The steps performed are:
- A big list is created holding every outstanding memory allocation in
the system. This list is called the "white" list; everything on it is
considered to be a possible memory leak.
- Various parts of memory are scanned for pointers which match the
allocated blocks; every time such a pointer is found, the block is
moved to the "gray" list of memory which is still reachable, and thus
not leaked. The initial scan includes the kernel's static data areas,
each process's kernel stack, and each processor's per-CPU variable
data area.
- The first scan finds all memory referenced directly from static
memory, but kernel data structures are more complicated than that.
So, each block which has been put onto the gray list is scanned as
well. Most of these blocks will be structures allocated from a slab
cache, and they may contain pointers to other structures. So each
block is queried, paying attention to that block's remembered size.
Any pointers found within the block are moved over to the gray list,
and scanned in turn.
There is, of course, a provision for remembering which blocks have
been scanned and avoiding infinite loops.
- Once all pointers on the gray list have been scanned, every block of memory reachable by the kernel has been located. Anything remaining on the white list is considered to be leaked, and the relevant information is sent back to user space.
In the real world, things get complicated, so the leak detector is not quite as simple as described above. One situation which had to be addressed is cases where the kernel keeps a pointer to the interior of a block of memory, rather than to the beginning. This happens frequently; many kernel structures are located by way of an embedded list_head structure or kobject, for example. As a way of locating these blocks, the memory leak detector records uses of the container_of() macro; in particular, it remembers the size of the block and the offset to the embedded structure. When a block of a given size is allocated, the detector records "alias" addresses for any possible embedded structures. A pointer to one of those aliases is considered to be equivalent to a pointer to the beginning of the block.
There are various other special cases which must be handled. For example, memory obtained from vmalloc() will be pointed to by the memory allocation code itself, but might still be leaked. In other cases, memory is allocated which cannot be found by the scanning algorithm; a number of special annotations are added to the kernel to suppress the resulting false positive reports. The detector can also be fooled by pointers which are left behind in disused memory, or by random data which happens to look like a pointer to an allocated block; in these cases, false-negatives will result.
Even with these problems, the situation is better than before - a lot of
memory leak situations can be found. Ingo Molnar, however, has a vision of a more ambitious scheme wherein
type information for every allocated block would be retained. Among other
things, this information would allow the scanning to be restricted to parts
of the block known to contain pointers; that should speed the process and
reduce false negatives. Since type information is available, each scanned
pointer could be checked to ensure that it points to a block of the correct
type, adding another level of checking to the kernel. Implementing all of
this looks like a big task, however; even Ingo may need a couple of days to
get it done.
Posted Jun 22, 2006 8:58 UTC (Thu)
by kleptog (subscriber, #1183)
[Link] (2 responses)
Posted Jun 22, 2006 9:39 UTC (Thu)
by mingo (guest, #31122)
[Link] (1 responses)
We should also note that prior Coverity checking clean-sweeped some of the 'easy' memory-leaks, artificially reducing the visible efficiency of this GPL-ed tool.
(It could be tested by intentionally re-introducing some of the Coverity fixes but that's hard and laborous because not all Coverity fixes are identified as such and the code might have changed meanwhile.)
Posted Jul 27, 2006 8:41 UTC (Thu)
by cmarinas (subscriber, #39468)
[Link]
There is no need to re-introduce the bugs found by Coverity as kmemleak managed to find some real leaks (I think about 4-5 at the moment - ACPI, i386 setup, SE Linux and a case of vfree used instead of vunmap in the ARM code) which Coverity wouldn't have been able to see because of the code complexity.
Coverity, however, can find potential leaks before they actually happen since it does a static analysis of the code.
Posted Jun 22, 2006 9:34 UTC (Thu)
by flewellyn (subscriber, #5047)
[Link] (11 responses)
Which brings to mind the question: should the kernel folks just bite the bullet and implement an
Posted Jun 23, 2006 4:40 UTC (Fri)
by cpeterso (guest, #305)
[Link] (1 responses)
Posted Jun 24, 2006 11:59 UTC (Sat)
by nix (subscriber, #2304)
[Link]
Posted Jul 27, 2006 8:49 UTC (Thu)
by cmarinas (subscriber, #39468)
[Link] (8 responses)
Apart from the technical difficulties of a complete garbage collector in the kernel (big overhead, SMP systems, "stop the world scanning", pointer aliasing - see the container_of workaround in kmemleak), I am personally against it as it would make people lazier in dealing with memory resources.
Posted Jun 18, 2009 16:55 UTC (Thu)
by proski (subscriber, #104)
[Link]
Posted Jun 25, 2009 12:18 UTC (Thu)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted Jun 27, 2009 0:24 UTC (Sat)
by dlang (guest, #313)
[Link] (5 responses)
Posted Jun 27, 2009 13:08 UTC (Sat)
by nix (subscriber, #2304)
[Link]
There are many tricks you can use: value pointing to a writably-mapped
Posted Jun 28, 2009 23:29 UTC (Sun)
by epa (subscriber, #39769)
[Link] (3 responses)
Posted Jun 29, 2009 0:10 UTC (Mon)
by dlang (guest, #313)
[Link] (2 responses)
I don't think it's even a requirement that pointers be word aligned in memory, so you would have to look at every 4 byte sequence and calculate if it could be a pointer or not.
that's a _lot_ of calculations to waste your time on, even if you had a chance of being right 100% of the time.
Posted Jun 29, 2009 6:56 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted Jun 29, 2009 7:40 UTC (Mon)
by epa (subscriber, #39769)
[Link]
Garbage collection in C and C++ can still be useful and fast, but it can never be 100% correct. This may not matter for many applications, but for programs like the kernel which need to be highly secure and must cope with untrusted user input (which could easily contain 32-bit or 64-bit ints carefully chosen to look like pointers), I suspect it does.
I wonder if there is a subset of C which provides the minimal guarantees needed to make garbage collection safe and complete. The main thing would be not to cast pointers to ints and back, nor to allow casting between pointers to things of different sizes, nor arbitrary pointer arithmetic (though arrays could still be provided).
Posted Jun 22, 2006 9:56 UTC (Thu)
by simlo (guest, #10866)
[Link]
Posted Jun 22, 2006 13:11 UTC (Thu)
by liljencrantz (guest, #28458)
[Link] (2 responses)
The traditional memory allocation lifetimes are:
* 'the life of the current function' for allocations on the stack
All of these are very usable in and taken together they can be quite versatile.
There is however one other allocation model that I often find very useful; namely to say 'this piece of memory here is needed as long as that piece of memory over there is needed'. The Samba people use this allocation strategy using a function called talloc. I have written my own implementation (halloc) which is a bit different from talloc:
* You can register functions to be called when a specific piece of memory is free'd. That means you can call free on memory not allocated using halloc when a halloc'd piece of memory is freed, which makes using halloc together with libraries and functions not specifically written for halloc much easier.
* Halloc overallocates memory, and uses the extra memory to fullfill smaller memory allocations. That means that doing lots and lots of very small memory allocations very close to free, both in memory usage and CPU time. In fish (http://roo.no-ip.org/fish) only one quarter of all calls to halloc actually result in an underlying call to malloc.
Most of the time, the scope of a halloc allocation is still a single functions lifetime. The major difference, however, is that you can specify _which_ function. You can send along a context as a parameter to another function, and all the memory allocations performed by the called function will use the supplied context as allocation scope. This takes C a huge step closer to the ease of use of automatically garbage collected languages when it comes to memory handling.
At least that has been my experience.
Posted Jun 24, 2006 12:01 UTC (Sat)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Jul 19, 2006 15:55 UTC (Wed)
by liljencrantz (guest, #28458)
[Link]
Notifiers would work as a replacement for the halloc callback functions, but the result would be wordier and need more changes on existing code.
Posted Jun 23, 2006 1:37 UTC (Fri)
by xoddam (guest, #2322)
[Link] (1 responses)
Posted Jun 9, 2010 7:46 UTC (Wed)
by alexs (guest, #13637)
[Link]
if you are already pretty close having a 100% sane system
having no clue about the root problem wont buy you much benefit in successfully resolving it.
Posted Jun 5, 2012 14:21 UTC (Tue)
by daynix (guest, #84976)
[Link]
The question ofcourse is: has it found anything yet?Detecting kernel memory leaks
yep, even in its current (very) limited testing kmemleak found a real ACPI memory leak, with some more suspected ACPI leaks being analyzed.
Detecting kernel memory leaks
(a bit late with replying but I only saw the article now)Detecting kernel memory leaks
That algorithm is strikingly similar to tricolor marking, an incremental garbage collection Detecting kernel memory leaks
algorithm invented by Djkstra.
in-kernel GC? This patch is halfway there already.
I remember reading a LKML comment by Linus that he would never add a garbage collector to the kernel. Maybe one day, he'll eat those words. :-) I definitely think it's worth investigating, especially given these new memleak patches.Detecting kernel memory leaks
net/unix/garbage.c already contains a mark-and-sweep garbage collector: garbage collection is a requirement for AF_UNIX socket handling unless you like DoSes (which are easy: a program sets up an AF_UNIX socket, opens it itself but does not go into accept(), sends the socket's filehandle along the socket, closes both ends, and quits. Oops. One handle leaked, unless you detect the filehandles in transit along the the AF_UNIX socket, which means mark-and-sweep collection given that these fhes may be the fhes of other sockets which may themselves have filehandles in transit along them...)Detecting kernel memory leaks
Indeed, the algorithm is pretty close to a garbage collection one, only that it doesn't try to free the orphan blocks.Detecting kernel memory leaks
Actually, I'd rather see driver developers worry about aspects specific to the drivers rather than about memory. If the kernel starts consuming too much memory, the biggest users can be found and modified to use less memory.
Detecting kernel memory leaks
Detecting kernel memory leaks
Detecting kernel memory leaks
Detecting kernel memory leaks
pointer and not a number with some other meaning'.
region of the application's address space (very effective on x86-64, of
course), alignment on platforms where that matters...
Detecting kernel memory leaks
Detecting kernel memory leaks
Detecting kernel memory leaks
stack, it does. It can often rely on alignment requirements because humans
cannot dictate the layout of the stack: only the compiler can. (On many
architectures, e.g. SPARC, they're hard requirements anyway.)
Detecting kernel memory leaks
if a garbage collector is looking at random bytes and trying to guess if they are pointers or not it's going to guess wrong a lot of the time.
Exactly. Which is why C is not an ideal language for a garbage-collection runtime. More high-level languages provide guarantees of what's a pointer and what isn't, so the GC doesn't have to guess.
Hmm, why not just "port" the Boehm-Demers-Weiser conservative garbage collector (http://www.hpl.hp.com/personal/Hans_Boehm/gc/) to kernel space?Detecting kernel memory leaks
I have previously used it - as a garbage collector, not only as leak detector - in userspace with great success.
One way to avoid some memory leaks is to have better ways to specify memory lifetimes. Detecting kernel memory leaks
* 'until I manually free this memory' for allocations on the heap
* 'when the number of users hits zero' for reference counting
The need for the registration function is mostly obviated by notifiers, AIUI. The halloc() function is rendered unnecessary by the slab allocator (that's what slabs are, blocks of memory containing many potential objects of the same type, initialized from a template and allocatable at lightning speed.)Detecting kernel memory leaks
Slabs are a performance hack to make some allocation patterns faster. This is completely different from halloc, which is a method for making memory deallocation significantly less cumbersome for the programmer. Both have the potential to waste less time and memory, but that is mainly a possible byproduct in the case of halloc.Detecting kernel memory leaks
Many thanks to all contributors. A stack of interesting leads from these The LWN kernel page leaps from strength to strength :-)
articles (not to mention the comments). More, more, don't stop!
The LWN kernel page leaps from strength to strength :-)
a bigger part of the comments talk about "fixing" in a live system.
these are two different shoes.
then close the gap and make yourselves happy with the result
instead of "inventing" risky and costly nearly-random effect generators.
Nice article, thanks.
I'm curious is there any plan to track other kernel objects allocations, like skbuffs, devices, etc.
Currently we use LeakMon library for this purpose (http://www.daynix.com/index.php?option=com_content&view=article&id=62&Itemid=71), but it would be nice to have something similar in stock kernel...
Detecting kernel memory leaks