LWN.net Logo

Slab defragmentation

Memory defragmentation is a subject which has appeared often on this page - even if no solutions have yet found their way into the mainline kernel. Most of the defragmentation approaches out there work at the page level with the idea of being able to satisfy multi-page allocations reliably. There is another type of fragmentation problem, however, which also has the ability to complicate the kernel's memory management: fragmentation within slab pages.

The slab allocator grabs full pages and divides them into allocations of the same size. For example, kernel code which will often allocate a specific structure type will create a slab for that type, allowing those allocations to be satisfied quickly and efficiently. The slab allocator can release pages back to the kernel when all of the objects within those pages have been freed. In real use, however, objects tend to get spread across many pages, leaving the allocator with a pile of partially-used pages and no way to return memory to the system. This sort of internal fragmentation can lead to inefficient memory usage and the inability to reclaim memory when it is needed.

Christoph Lameter's slab defragmentation patch aims to solve this problem by getting slab users to cooperate in freeing specific slab pages. A defragmentation-aware slab user will start by creating a structure of the new kmem_cache_ops type:

    struct kmem_cache_ops {
	void *(*get)(struct kmem_cache *cache, int nr, void **objects);
	void (*kick)(struct kmem_cache *cache, int nr, void **objects, 
                     void *private);
    };

In this structure are two methods which the slab user must define. When the slab code picks a specific page to try to free (typically a page with a relatively small number of allocated objects), it will make an array of those objects and pass it to the get() method. That method has a guarantee that all of the objects are allocated at the time of the call; its job is to increase the reference count of each object to prevent it from being freed while other things are happening. The return value is a private pointer which will be used later.

Note that the get() method is called in something like interrupt context with slab locks held. So it cannot do a whole lot, and, in particular, it cannot call any slab operations.

After get() returns, the slab code will pass the same parameters into kick(), along with whatever value get() returned. Depending on the situation, the private value could be a pointer to internal housekeeping or simply a flag saying that it will not be possible to free all of the objects. Assuming it is possible, kick() should attempt to free every object in the objects array. Slab operations are permissible in kick(), and the function is welcome to reallocate and move the objects. Reallocation will have the effect of freeing the target page and coalescing objects into a smaller number of fully-used pages.

There is no return value from kick(); the slab code simply checks to see if there are any remaining objects on the page to decide whether the operation succeeded or not. It is perfectly acceptable for the operation to fail; that will happen, for example, if code in other parts of the kernel holds references to the target objects.

The slab creation function has had its API changed to allow the association of a set of operations with a given cache:

    struct kmem_cache *kmem_cache_create(const char *name, size_t size, 
                      size_t align, unsigned long flags,
 		      void (*ctor)(void *, struct kmem_cache *, unsigned long),
		      const struct kmem_cache_ops *ops);

The destructor is no longer used, so it has been removed from the list of kmem_cache_create() parameters and replaced by the ops structure.

The patch includes code to add defragmentation support for the inode and dentry caches - often the two largest slab caches in a running system. There is also a new function:

    int kmem_cache_vacate(struct page *page);

This function will attempt to move all slab objects out of page, which really should be a page managed by the slab allocator; a non-zero return value indicates success. Among other things, this function can be used to clear specific pages which would help complete a higher-order allocation.

There has been relatively little discussion of this patch set; the core concept appears not to be overly controversial. It looks like a relatively low-overhead way to improve how the kernel uses memory; even the most critical reviewer can have a hard time getting upset about that.


(Log in to post comments)

No clue about kernels, but why not garbage collection?

Posted Jun 11, 2007 5:17 UTC (Mon) by shapr (subscriber, #9077) [Link]

I'm aware that I have no clue about kernels, but ...

Moving slab objects from one page to another makes me think of 'moving' garbage collectors.

Has anyone considered using a garbage collector in the Linux kernel? That would be one easy way to defragment memory automatically.

Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds