Making slab-allocated objects movable
Over the course of normal operation, the kernel allocates (and sometimes frees) vast numbers of small objects in memory. The slab allocators are designed to make these allocation operations efficient; they allocate whole pages, then parcel out the smaller objects from there. Sets of pages ("slabs") are set aside for objects of a fixed size, allowing them to be efficiently packed with a minimum of overhead and waste. Linux users can choose between three slab allocators: the original allocator (simply called "slab"), SLUB (the newer allocator used on most systems), and SLOB (a minimal allocator for the smallest systems).
For a window into how the allocator on a given system is working, one can look at /proc/slabinfo. On your editor's system, for example, there are currently 338,860 active dentry cache entries, each of which requires an object from the slab allocator. A dentry structure is 192 bytes on this system, so 21 of them can be allocated from each full page. Thus, a minimum of 16,136 pages are needed to hold these objects; on the system in question, 16,461 are actually used for that purpose. There are thus over 300 pages allocated beyond what is strictly needed, which is actually a relatively low level of overhead; it can get a lot worse.
When the system runs low on memory, it will call back to owners of various slabs to request that they free objects to make memory available for other use. The recipients of these calls will dutifully free some objects if they can, but this often is not as useful as one would like. The slab allocator can only return a page to the system if all of the objects on that page have been freed. If the dentry cache, for example, frees 100,000 of those 338,860 objects, it will have certainly made the cache hit rate lower, but since those objects may be scattered throughout those 16,461 pages, the number of pages actually freed might be quite small. That can be a significant performance hit for little memory gain, but that is how things work now. A small number of objects can pin a lot of pages that are mostly unused, wasting a lot of memory.
It would be better if the slab allocator could move allocated objects out of slab pages that are mostly empty, freeing those pages while making better use of other pages with free space in them. The problem, of course, is that any such mechanism requires cooperation from whoever is allocating objects from the slab. The owners of those objects access them with direct pointers; if an object is to be moved, any pointers to it will have to be located and changed. That complicates the picture considerably and, for slabs that allocate objects for many callers (those that handle kmalloc(), for example), making objects movable is not really feasible. There are other cases, though, where a single owner exists; for those, getting the owner to move things might just be possible. That is the idea behind the slab movable objects patches, which adds object mobility to the SLUB allocator.
To support movable objects, the owner of a slab cache must provide two new functions. The first, called isolate(), has this prototype:
typedef void *kmem_cache_isolate_func(struct kmem_cache *s, void **objs, int nr);
A call to this function tells the owner that the slab cache would like to relocate nr objects in the cache s, the addresses of which are stored in objs. The objects should not actually be moved at this time, but they should be locked or otherwise frozen so that no other changes are made to them while the process is underway. If it is known that any of the objects cannot be moved, their pointer can be zeroed out in objs. Should it be necessary to retain any data about this relocation, the function can return a pointer to that data.
The next step is to actually move the objects with the migrate() function:
typedef void kmem_cache_migrate_func(struct kmem_cache *s, void **objs, int nr, int node, void *private);
The s, objs, and nr parameters are as with isolate(). The node argument indicates a NUMA node where the objects should be moved to, and private is the pointer that was returned from isolate(). The function should actually move the objects, typically by calling kmem_cache_alloc() to allocate new objects, copying the data over, and updating any internal pointers. The old objects should then be freed. Any locking that was applied to these objects in the isolate() function should, of course, be undone here.
To enable object mobility for a given slab cache, the above functions should be passed to:
void kmem_cache_setup_mobility(struct kmem_cache *s, kmem_cache_isolate_func isolate, kmem_cache_migrate_func migrate);
There is one other requirement for mobility to work: the slab cache must have a constructor associated with it. That is because the kernel might try to relocate objects at any time, and that can require dealing with the data in those objects. If they are not all properly initialized and consistent from the outset, bad things could happen.
The patch set enables object relocation in two subsystems: the dentry cache and the XArray data structure. The dentry cache implementation is relatively simple; rather than try to relocate cache entries, it simply frees those that have been targeted. One might argue that the functionality is similar to how the cache shrinker works now, but there is a difference: the objects to be freed can be chosen to free up specific pages in the slab cache. It should, thus, be rather more efficient. That said, some problems with the current dentry cache implementation were pointed out by Al Viro; some work will need to be done there before this code is ready for the mainline.
The XArray implementation is simpler; there is no need for an isolate() function, so none is provided. The migrate() function is able to take locks and reallocate objects relatively easily without any advance preparation.
Making slab objects movable will not solve the problem of slab-cache
fragmentation on its own. But, if applied to the largest caches in the
system, it should be able to improve the situation considerably. That, in
turn, will make progress on a problem that has affected the
memory-management subsystem for years.
Index entries for this article | |
---|---|
Kernel | Memory management/Slab allocators |
Posted Apr 8, 2019 18:52 UTC (Mon)
by ikm (guest, #493)
[Link] (3 responses)
This looks dubious. This hardcodes the decision that the cost of relocation multiplied by the probability that the node is going to be read later is smaller than a re-read of the entry from disk. I doubt that this is the place do make this sort of decision. Furthermore, this just looks semantically incorrect - if what is asked is relocation, than that is what should be done. Otherwise this could backfire in other possible use cases. For instance, suppose that for some reason the whole allocated space gets defragmented. Then, instead of the actual defragmentation, the whole dentry cache would be flushed, which is a completely different operation.
Posted Apr 9, 2019 7:17 UTC (Tue)
by matthias (subscriber, #94967)
[Link]
These functions will typically be called under memory pressure. Then reallocation cost can be quite high. Also, it should be easy to change the implementation in the future. It does not belong to the userspace interface. If one wants to implement a complete defrag at some time in the future, this should probably be changed. However, I do not really see a reason for a full defrag here. All objects in the dentry cache have the same size, so newly allocated entries can always go into existing holes.
Posted Apr 9, 2019 14:21 UTC (Tue)
by Paf (subscriber, #91811)
[Link]
Posted Apr 11, 2019 18:47 UTC (Thu)
by clameter (subscriber, #17005)
[Link]
Posted Apr 10, 2019 16:08 UTC (Wed)
by jhoblitt (subscriber, #77733)
[Link] (2 responses)
Posted Apr 10, 2019 21:06 UTC (Wed)
by tcharding (guest, #118945)
[Link] (1 responses)
Posted Apr 11, 2019 18:51 UTC (Thu)
by clameter (subscriber, #17005)
[Link]
Posted Jun 14, 2019 16:27 UTC (Fri)
by skc (guest, #51349)
[Link]
If we have a 4k page with only 1 object at offset 0 and another with only one object at offset 1, we can put both on the same physical page, and map this page at both adresses. Users will not make any difference and we win a page.
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable
Making slab-allocated objects movable