Making slab-allocated objects movable

By Jonathan Corbet
April 8, 2019

Memory fragmentation is a constant problem for memory-management subsystems. Over the years, considerable effort has been put into reducing fragmentation in the Linux kernel, but almost all of that work has been focused on memory management at the page level. The slab allocators, which (mostly) manage memory in chunks of less than the page size, have seen less attention, but fragmentation at this level can create problems throughout the system. The slab movable objects patch set posted by Tobin Harding is an attempt to improve this situation by making it possible for the kernel to actively defragment slab pages by moving objects around.

Over the course of normal operation, the kernel allocates (and sometimes frees) vast numbers of small objects in memory. The slab allocators are designed to make these allocation operations efficient; they allocate whole pages, then parcel out the smaller objects from there. Sets of pages ("slabs") are set aside for objects of a fixed size, allowing them to be efficiently packed with a minimum of overhead and waste. Linux users can choose between three slab allocators: the original allocator (simply called "slab"), SLUB (the newer allocator used on most systems), and SLOB (a minimal allocator for the smallest systems).

For a window into how the allocator on a given system is working, one can look at /proc/slabinfo. On your editor's system, for example, there are currently 338,860 active dentry cache entries, each of which requires an object from the slab allocator. A dentry structure is 192 bytes on this system, so 21 of them can be allocated from each full page. Thus, a minimum of 16,136 pages are needed to hold these objects; on the system in question, 16,461 are actually used for that purpose. There are thus over 300 pages allocated beyond what is strictly needed, which is actually a relatively low level of overhead; it can get a lot worse.

When the system runs low on memory, it will call back to owners of various slabs to request that they free objects to make memory available for other use. The recipients of these calls will dutifully free some objects if they can, but this often is not as useful as one would like. The slab allocator can only return a page to the system if all of the objects on that page have been freed. If the dentry cache, for example, frees 100,000 of those 338,860 objects, it will have certainly made the cache hit rate lower, but since those objects may be scattered throughout those 16,461 pages, the number of pages actually freed might be quite small. That can be a significant performance hit for little memory gain, but that is how things work now. A small number of objects can pin a lot of pages that are mostly unused, wasting a lot of memory.

It would be better if the slab allocator could move allocated objects out of slab pages that are mostly empty, freeing those pages while making better use of other pages with free space in them. The problem, of course, is that any such mechanism requires cooperation from whoever is allocating objects from the slab. The owners of those objects access them with direct pointers; if an object is to be moved, any pointers to it will have to be located and changed. That complicates the picture considerably and, for slabs that allocate objects for many callers (those that handle kmalloc(), for example), making objects movable is not really feasible. There are other cases, though, where a single owner exists; for those, getting the owner to move things might just be possible. That is the idea behind the slab movable objects patches, which adds object mobility to the SLUB allocator.

To support movable objects, the owner of a slab cache must provide two new functions. The first, called isolate(), has this prototype:

    typedef void *kmem_cache_isolate_func(struct kmem_cache *s, void **objs, int nr);

A call to this function tells the owner that the slab cache would like to relocate nr objects in the cache s, the addresses of which are stored in objs. The objects should not actually be moved at this time, but they should be locked or otherwise frozen so that no other changes are made to them while the process is underway. If it is known that any of the objects cannot be moved, their pointer can be zeroed out in objs. Should it be necessary to retain any data about this relocation, the function can return a pointer to that data.

The next step is to actually move the objects with the migrate() function:

    typedef void kmem_cache_migrate_func(struct kmem_cache *s, void **objs,
				         int nr, int node, void *private);

The s, objs, and nr parameters are as with isolate(). The node argument indicates a NUMA node where the objects should be moved to, and private is the pointer that was returned from isolate(). The function should actually move the objects, typically by calling kmem_cache_alloc() to allocate new objects, copying the data over, and updating any internal pointers. The old objects should then be freed. Any locking that was applied to these objects in the isolate() function should, of course, be undone here.

To enable object mobility for a given slab cache, the above functions should be passed to:

    void kmem_cache_setup_mobility(struct kmem_cache *s,
    				   kmem_cache_isolate_func isolate,
			           kmem_cache_migrate_func migrate);

There is one other requirement for mobility to work: the slab cache must have a constructor associated with it. That is because the kernel might try to relocate objects at any time, and that can require dealing with the data in those objects. If they are not all properly initialized and consistent from the outset, bad things could happen.

The patch set enables object relocation in two subsystems: the dentry cache and the XArray data structure. The dentry cache implementation is relatively simple; rather than try to relocate cache entries, it simply frees those that have been targeted. One might argue that the functionality is similar to how the cache shrinker works now, but there is a difference: the objects to be freed can be chosen to free up specific pages in the slab cache. It should, thus, be rather more efficient. That said, some problems with the current dentry cache implementation were pointed out by Al Viro; some work will need to be done there before this code is ready for the mainline.

The XArray implementation is simpler; there is no need for an isolate() function, so none is provided. The migrate() function is able to take locks and reallocate objects relatively easily without any advance preparation.

Making slab objects movable will not solve the problem of slab-cache fragmentation on its own. But, if applied to the largest caches in the system, it should be able to improve the situation considerably. That, in turn, will make progress on a problem that has affected the memory-management subsystem for years.

Index entries for this article
Kernel	Memory management/Slab allocators

Making slab-allocated objects movable

Posted Apr 8, 2019 18:52 UTC (Mon) by ikm (guest, #493) [Link] (3 responses)

> The dentry cache implementation is relatively simple; rather than try to relocate cache entries, it simply frees those that have been targeted

This looks dubious. This hardcodes the decision that the cost of relocation multiplied by the probability that the node is going to be read later is smaller than a re-read of the entry from disk. I doubt that this is the place do make this sort of decision. Furthermore, this just looks semantically incorrect - if what is asked is relocation, than that is what should be done. Otherwise this could backfire in other possible use cases. For instance, suppose that for some reason the whole allocated space gets defragmented. Then, instead of the actual defragmentation, the whole dentry cache would be flushed, which is a completely different operation.

Making slab-allocated objects movable

Posted Apr 9, 2019 7:17 UTC (Tue) by matthias (subscriber, #94967) [Link]

Maybe the intended meaning really is: "I need some space, please free these objects. If you cannot, then at least move them out of my view."

These functions will typically be called under memory pressure. Then reallocation cost can be quite high. Also, it should be easy to change the implementation in the future. It does not belong to the userspace interface. If one wants to implement a complete defrag at some time in the future, this should probably be changed. However, I do not really see a reason for a full defrag here. All objects in the dentry cache have the same size, so newly allocated entries can always go into existing holes.

Making slab-allocated objects movable

Posted Apr 9, 2019 14:21 UTC (Tue) by Paf (subscriber, #91811) [Link]

Reading through the associated email chain, Al makes *very* clear he views dentries and inodes as entirely unrelocatable. As a file system developer, I am inclined to agree. It would be quite difficult to put the necessary hooks to relocate these things in to any complex FS, especially when considering the VFS layer accesses that can happen in parallel. Freeing them is pretty tough too, but we’ve got that working because there’s no choice. Adding another complex thing like that doesn’t feel fun or especially worth it...

Making slab-allocated objects movable

Posted Apr 11, 2019 18:47 UTC (Thu) by clameter (subscriber, #17005) [Link]

The decision to relocate or to just zap an object is up to the slab cache and therefore to the subsystem. So its up to the one familiar with the subsystem to decide which strategy to get the object out of the way should be implemented. Thus there is no general scheme that could run into problems with one or the other slab cache.

Making slab-allocated objects movable

Posted Apr 10, 2019 16:08 UTC (Wed) by jhoblitt (subscriber, #77733) [Link] (2 responses)

On my systems, `slabtop` often shows that dentry has a lot of objects but a pretty small size relative to buffer_head and ext4_inode_cache, as in an order of magnitude difference. I didn't read the through the entire lkml thread... are there any "benchmark" numbers showing significantly improved reclaim? I could see it going either way as dentry represents a lot of small objects but I have no idea how interweaved those objects are in practice.

Making slab-allocated objects movable

Posted Apr 10, 2019 21:06 UTC (Wed) by tcharding (guest, #118945) [Link] (1 responses)

I've got the suggestions from Al implemented and working but doing the benchmarking is what I'm stuck on right now. Since the dentry cache is just one piece in this I'm not sure what exactly I'm trying to benchmark to make the FS folks happy. My thought so far is just that this set *does not* make the dcache worse and leave the benefit as theoretical (to be seen when more pieces of the puzzle fall int place). Any ideas?

Making slab-allocated objects movable

Posted Apr 11, 2019 18:51 UTC (Thu) by clameter (subscriber, #17005) [Link]

The typical load that generates a huge amount of sparsely populated inode and dentry slab pages is some kind of scan for files in a huge directory tree followed by an app that makes intensive use of memory and thus puts pressure onto the dentry and inode caches that have grown huge. Continued reclaim from the slab shrinkers will then gradually produce a situation where individual dentries hold back the freeing of a whole 4KB page. Which in turn could prohibit the creation of another 2M page or cause exhaustion of higher order pages that are useful to optimize various kernel subsystems.

Making slab-allocated objects movable

Posted Jun 14, 2019 16:27 UTC (Fri) by skc (guest, #51349) [Link]

To reclam more pages, is it possible to cheat with the memory mapping ?

If we have a 4k page with only 1 object at offset 0 and another with only one object at offset 1, we can put both on the same physical page, and map this page at both adresses. Users will not make any difference and we win a page.