Slab defragmentation
[Posted May 29, 2007 by corbet]
Memory defragmentation is a subject which has appeared often on this page -
even if no solutions have yet found their way into the mainline kernel.
Most of the defragmentation approaches out there work at the page level
with the idea of being able to satisfy multi-page allocations reliably.
There is another type of fragmentation problem, however, which also has the
ability to complicate the kernel's memory management: fragmentation within
slab pages.
The slab allocator grabs full pages and divides them into allocations of
the same size. For example, kernel code which will often allocate a
specific structure type will create a slab for that type, allowing those
allocations to be satisfied quickly and efficiently. The slab allocator
can release pages back to the kernel when all of the objects within those
pages have been freed. In real use, however, objects tend to get spread
across many pages, leaving the allocator with a pile of partially-used
pages and no way to return memory to the system. This sort of internal
fragmentation can lead to inefficient memory usage and the inability to
reclaim memory when it is needed.
Christoph Lameter's slab
defragmentation patch aims to solve this problem by getting slab users
to cooperate in freeing specific slab pages. A defragmentation-aware slab
user will start by creating a structure of the new kmem_cache_ops
type:
struct kmem_cache_ops {
void *(*get)(struct kmem_cache *cache, int nr, void **objects);
void (*kick)(struct kmem_cache *cache, int nr, void **objects,
void *private);
};
In this structure are two methods which the slab user must define. When
the slab code picks a specific page to try to free (typically a page with a
relatively small number of allocated objects), it will make an array of
those objects and pass it to the get() method. That method has a
guarantee that all of the objects are allocated at the time of the call;
its job is to increase the reference count of each object to prevent it
from being freed while other things are happening. The return value is a
private pointer which will be used later.
Note that the get() method is called in something like interrupt
context with slab locks held. So it cannot do a whole lot, and, in
particular, it cannot call any slab operations.
After get() returns, the slab code will pass the same parameters
into kick(), along with whatever value get() returned.
Depending on the situation, the private value could be a pointer
to internal housekeeping or simply a flag saying that it will not be
possible to free all of the objects. Assuming it is possible,
kick() should attempt to free every object in the objects
array. Slab operations are permissible in kick(), and the
function is welcome to reallocate and move the objects. Reallocation will
have the effect of freeing the target page and coalescing objects into a
smaller number of fully-used pages.
There is no return value from kick(); the slab code simply checks
to see if there are any remaining objects on the page to decide whether the
operation succeeded or not. It is perfectly acceptable for the operation
to fail; that will happen, for example, if code in other parts of the
kernel holds references to the target objects.
The slab creation function has had its API changed to allow the association
of a set of operations with a given cache:
struct kmem_cache *kmem_cache_create(const char *name, size_t size,
size_t align, unsigned long flags,
void (*ctor)(void *, struct kmem_cache *, unsigned long),
const struct kmem_cache_ops *ops);
The destructor is no longer used, so it has been removed from the list of
kmem_cache_create() parameters and replaced by the ops
structure.
The patch includes code to add defragmentation support for the inode and
dentry caches - often the two largest slab caches in a running system.
There is also a new function:
int kmem_cache_vacate(struct page *page);
This function will attempt to move all slab objects out of page,
which really should be a page managed by the slab allocator; a non-zero
return value indicates success. Among other things, this function can be
used to clear specific pages which would help complete a higher-order
allocation.
There has been relatively little discussion of this patch set; the core
concept appears not to be overly controversial. It looks like a relatively
low-overhead way to improve how the kernel uses memory; even the most
critical reviewer can have a hard time getting upset about that.
(
Log in to post comments)