|
|
Subscribe / Log in / New account

Slabs, sheaves, and barns

By Jonathan Corbet
February 24, 2025
The kernel's slab allocator is responsible for the allocation of small (usually sub-page) chunks of memory. For many workloads, the speed of object allocation and freeing is one of the key factors in overall performance, so it is not surprising that a lot of effort has gone into optimizing the slab allocator over time. Now that the kernel is down to a single slab allocator, the memory-management developers have free rein to add complexity to it; the latest move in that direction is the per-CPU sheaves patch set from slab maintainer Vlastimil Babka.

Many kernel developers interact with the slab allocator using functions like kmalloc(), which can allocate objects of any (reasonable) size. There is a lower level to the slab allocator, though, that deals with fixed-size objects; it is used heavily by subsystems that frequently allocate and free objects of the same size. The curious can see all of the special-purpose slabs in their system by looking at /proc/slabinfo. There are many core-kernel operations that involve allocating objects from these slabs and returning them, so the slab allocator has gained a number of features, including NUMA awareness and bulk operations, to accelerate allocation and freeing.

But, naturally, it's never fast enough.

One of the keys to improved performance on today's multiprocessor systems is to avoid interaction between the CPUs whenever possible. Interactions lead to contention, locking delays, and cache-line bouncing, all of which hurt performance, but a CPU that can act like the rest of the system isn't there can go forward at full speed. This understanding has driven the adoption of per-CPU data structures across the kernel. The slab allocator already makes use of per-CPU data, but it still has enough cross-CPU interactions to slow things down.

Sheaves are a concept introduced by Babka's patch set; in essence, they are a per-CPU cache of objects that can be handed out in response to allocation requests without the need to interact with any other CPUs in the system. By default, sheaves are disabled, but they can be enabled for a specific slab by setting a non-zero value in the new field sheaf_capacity in the kmem_cache_args structure passed to kmem_cache_create(). The value is the number of objects that should be cached in a single sheaf; the patch adding sheaf usage to the maple-tree data structure sets it to 32.

When sheaves are enabled, the allocator will maintain a sheaf with the given number of objects for each CPU. An allocation request will be satisfied from this sheaf whenever possible, and freed objects will be placed back into the sheaf if there is room. That turns allocation and free operations into purely local assignments that can be executed quickly; no locking (or even atomic operations) required. There is a second (backup) sheaf maintained for each CPU as well; when the main sheaf is found to be empty, an object will be allocated from the backup sheaf instead. If the main sheaf is full when an object is freed, that object will be placed into the backup sheaf if possible.

When both sheaves are full, there will no longer be a place to stash a freed object with a simple assignment; that is where the "barn" comes in. The barn is simply a place to keep sheaves that are not currently being used for caching by any CPU; there is one barn for each NUMA node in the system. Once a CPU has filled its sheaves, it will try to place one in the barn; if a CPU's sheaves are empty, it will try to obtain a full one from the barn. In either case, this operation is slower, since locking is required to safely access the shared barn, but it is still faster than going all the way into the slab allocator.

The barn holds two sets of sheaves — one for full sheaves, and one for empty sheaves. If a CPU is freeing a lot of objects, it can place its full sheaves into the barn and obtain empty ones to replace them. There is a limit to the number of sheaves the barn can hold; it is wired to ten each for full and empty sheaves in the current patch set. If a CPU tries to place a sheaf into a full barn, that sheaf will be freed, along with any objects it contains, back into the slabs they came from.

As described so far, this series has the potential to greatly accelerate memory-allocation operations for workloads that allocate and free a lot of slab objects. But there are a couple of other features that are added later in the series to make sheaves more useful.

One of those is an enhancement to kfree_rcu(), which will delay the freeing of an object until after a read-copy-update (RCU) grace period has passed, ensuring that the object no longer has any active references. A third per-CPU sheaf is maintained to hold objects freed with kfree_rcu(); once the sheaf fills, it is passed to the RCU subsystem for the grace-period wait. Once that has happened, an attempt will be made to put the sheaf back into the barn.

The other addition is preallocation. There are many places in the kernel where memory must be allocated without delay, and certainly without blocking the allocating thread. There are also numerous code paths that cannot deal with an allocation failure in any reasonable way. In most of these cases, there is an opportunity to preallocate the needed memory before going into the more constrained code. The kernel has long maintained subsystems like mempools to meet this need.

But, if the kernel can go into critical code in the knowledge that it has a per-CPU sheaf full of objects available to it, a lot of these problems (and the need for mempools) go away. To that end, the series provides a set of functions for working with special-purpose sheaves. A call to kmem_cache_prefill_sheaf() will return a sheaf containing at least the requested number of objects, grabbing it out of the barn if possible. Then, kmem_cache_alloc_from_sheaf() can be used to allocate objects from the sheaf with guaranteed success for at least the requested number of objects. Other functions can be used to return a sheaf to the barn or to place more objects into a sheaf. These special-purpose sheaves act similarly to mempools, but they are intended to be short-lived, unlike mempools which usually exist for the life of the system.

This series appears to be relatively uncontroversial, though perhaps developers are reserving their comments for the upcoming Linux Storage, Filesystem, Memory-Management, and BPF Summit, to be held in late March. Given the appetite for faster memory allocation and freeing, though, sheaves and barns seem likely to be added to the mix at some point.

Index entries for this article
KernelMemory management/Slab allocators


to post comments

What exactly is an object?

Posted Feb 24, 2025 17:06 UTC (Mon) by dmv (subscriber, #168800) [Link] (7 responses)

Super basic question, sorry it’s so stupid, but what exactly is an “object” in this context? As far as I can tell, it’s a chunk of memory that’s a certain size (smaller than a page, of course). So if you need 100 bytes for some common purpose P (say, tracking info about access credentials or whatever), the “object” is just that 100 byte chunk of memory. Is that right or are objects more granular than that—e.g., a certain-sized chunk of memory plus a constructor/destructor pair or whatever?

What exactly is an object?

Posted Feb 24, 2025 17:10 UTC (Mon) by corbet (editor, #1) [Link] (5 responses)

An "object" is just a piece of memory, yes. No methods or other class-like stuff, this is C we're talking about :)

Sorry if that wasn't clear.

What exactly is an object?

Posted Feb 24, 2025 17:15 UTC (Mon) by dmv (subscriber, #168800) [Link]

No, not on you at all! I have learned that trying to track down the actual meanings of a fair bit of memory management terminology can lead to hours trawling through 1950s-1970s texts with no ultimate enlightenment at the end, so I figured I’d just ask here since I had been wondering about slab “objects”. Thanks!

What exactly is an object?

Posted Feb 24, 2025 22:18 UTC (Mon) by tialaramex (subscriber, #21167) [Link] (3 responses)

C is at least - in this respect - consistent. C just doesn't have methods, full stop.

What's weird is that C++ and Java for some reason give methods to some things but not others. There's no particular reason 'x'.is_lowercase() shouldn't work, which is why it does in Rust, there's no real "extra" work to deliver this compared with my_goose.honk() in both cases the compiler just has to figure out what type the object is and then go find the function to call (a "method") based on that type then fill in that object as a parameter to the function. In 1972 in a few hundred kilobytes of RAM I can well believe that's an unaffordable luxury for Dennis and so its absence from C is not a great surprise, but why implement this and then switch it off for the built-in types in a newer language? Search me.

While writing this comment and looking at the Rust docs for Kmalloc I found a doc bug - for a conventional Rust crate I'm pretty clear about how I can report or fix this. But it's not as obvious to me for the kernel code - please tell me I don't need to brave the LKML Jonathan?

The text "if `old_layout` is zero-sized `p` does not need to be a pointer returned by this [`Allocator`]." is unnecessary for free() as we do not provide an old_layout parameter. I can't tell (but hopefully the authors know) whether it is appropriate to simply remove this text, or whether in addition some other text should be provided about zero-size things being freed, or not freed as the case may be.

What exactly is an object?

Posted Feb 25, 2025 10:11 UTC (Tue) by Sesse (subscriber, #53779) [Link] (2 responses)

C++ does, in a sense, have this; it's just that the syntax is different. The philosophy is simple: Something can be a part of the class' interface without being a method (or member function, in C++ parlance). E.g., operator==(const Foo&, const Foo&) can be defined well outside of Foo, and then follows the normal rules for free functions. Something as simple as void whatever(const Foo&) can be seen a part of Foo's interface; in fact, it is commonly preferred if you can implement it not as a friend.

Of course, adding _data members_ to another class from the outside would be nearly impossible in C++'s compilation model (which it, of course, inherited from C).

What exactly is an object?

Posted Feb 25, 2025 20:09 UTC (Tue) by tialaramex (subscriber, #21167) [Link] (1 responses)

It doesn't seem useful to say "It's like method syntax except without the same syntax". That's just not method syntax. Some languages have Unified Method Call Syntax, so _all_ functions in the language which take at least one argument can be used as method calls. Rust only has the other side of that coin, all the methods in the language can also be treated as free functions with an extra argument - but not vice versa.

C++ ADL which maybe you were also gesturing at is just a mess. Your whatever function is indeed treated by C++ as if it's "part of Foo's interface" but only when it was defined in the same namespace so that ADL will find whatever when it is looking up Foo. This causes weird hard to understand issues in real world C++ software because what foo(bar) means will depend on the context in which you wrote it in non-obvious ways, as a result "turning off ADL" via arcane tricks is often needed & there have been attempts to reform ADL or to add explicit "No ADL please" features to the language.

I was serious about that docbug, who do I raise this with? I'd rather raise it as a bug rather than sending a patch because I'm certain those answering the bug ticket will know more than I do about kmalloc

What exactly is an object?

Posted Feb 27, 2025 9:28 UTC (Thu) by aviallon (subscriber, #157205) [Link]

You can simply open a bug on the kernel bugzilla

What exactly is an object?

Posted Feb 25, 2025 16:01 UTC (Tue) by rweikusat2 (subscriber, #117920) [Link]

That's true for kmalloc allocations. For actual slab allocations, an object is the pair (<size>, <type>). Bonwick's (SunOS) original slab allocator had constructors and destructors as the basic idea was to cache constructed objects. This doesn't seem to be the case for Linux, though (at least, I couldn't find a reference to something like this while looking. The type is an important bit of metainformation about usage patterns.

LSF/MM/BPF

Posted Feb 24, 2025 17:23 UTC (Mon) by vbabka (subscriber, #91706) [Link]

Indeed, I've just sent out the related topic for LSF/MM/BPF, including where I expect things to be controversial - https://lore.kernel.org/all/14422cf1-4a63-4115-87cb-92685...
Thanks for the write-up :)

mixed metaphor

Posted Feb 25, 2025 0:41 UTC (Tue) by robert.cohen@anu.edu.au (subscriber, #6281) [Link] (4 responses)

I think your mixing your metaphors in the naming.

Sheaves can refer to either a collection of pages in bookbinding. Or collections of wheat stalks.
However as far as I know, the term barn isn't used in bookbinding.
In bookbinding a collection of sheaves would be a book.
If that term is considered too confusing. I guess you could keep your sheaves on a shelf and still be in
metaphor.
Obviously other bookish terms like stack or library would be a bad idea :-)

Récoltes et semailles

Posted Feb 25, 2025 4:18 UTC (Tue) by tux3 (subscriber, #101245) [Link]

If we end up with a couple less than perfectly intuitive metaphors, we'll at least be in good company!
We might be running out of viable book-related words, but other fields show you can do fine even with much more tortured metaphors than we have.
So one might start to wonder whether the problem measures up to the... reams of discussions about it :)

mixed metaphor

Posted Feb 25, 2025 16:15 UTC (Tue) by paulj (subscriber, #341) [Link] (2 responses)

Might as well stick to the original terminology from kmem/umem, as published in the USENIX paper - slabs go into magazines, magazines into depots.

mixed metaphor

Posted Feb 25, 2025 19:14 UTC (Tue) by JoeBuck (subscriber, #2330) [Link] (1 responses)

Perhaps "shed" should go in here somewhere, since we're clearly into bikeshedding territory.

mixed metaphor

Posted Feb 26, 2025 9:56 UTC (Wed) by paulj (subscriber, #341) [Link]

"Just stick to the existing terms, already described in the literature" is not bike-shedding.

Sheaves

Posted Feb 25, 2025 5:14 UTC (Tue) by kmeyer (subscriber, #50720) [Link] (1 responses)

Seems to be another rehashing of Bonwick's "magazines" from 2001.

Sheaves

Posted Feb 25, 2025 16:12 UTC (Tue) by paulj (subscriber, #341) [Link]

Indeed. See the 2001 USENIX paper: https://www.usenix.org/legacy/publications/library/procee...

The allocator is available in userspace in libumem (which should port relatively easily to other Unixes, and indeed some have put such ports on github).

Not much like mempools

Posted Feb 26, 2025 0:43 UTC (Wed) by neilbrown (subscriber, #359) [Link]

> These special-purpose sheaves act similarly to mempools,

I guess it depends on what you mean by "similarly" but I think the above is misleading.

A vital property of mempools is that if memory cannot be allocated (because it is all in use), then the mempool allocation will wait for something to be returned to the mempool - and it can be certain that something *will* be returned to the mempool in a reasonable time.

If a mempool provides "struct bio" that can be used to write a page to a device, then you can be sure that waiting on the mempool will only wait until a currently outstanding write to the device completes - then that bio will become available.

So a mempool is not at all about being able to allocate memory with out blocking. It is precisely about blocking at most a predictable amount of time for memory to become available.

In contrast, sheaves seem to be about maximising the chance of allocating without blocking. Preallocating before taking a spinlock but while pinned to a given CPU - so an allocation under the spinlock must succeed - is great, but there is no guarantee that the preallocation will succeed - so code would need a fallback or an indefinite wait. mempools provide a definite wait.

sheaves improve throughput and simplify code. mempools avoid deadlocks.


Copyright © 2025, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds