Per-call-site slab caches for heap-spraying protection

By Jonathan Corbet
August 20, 2024

One tactic often used by attackers set on compromising a system is heap spraying; in short, the attacker fills as much of the heap as possible with crafted data in the hope of getting the target system to use that data in a bad way. If heap spraying can be blocked, attackers will lose an important tool. The kernel has some heap-spraying defenses now, including the dedicated bucket allocator merged for the upcoming 6.11 release, but its author, Kees Cook, thinks that more can be done.

A heap-spraying attack can be carried out by allocating as many objects as possible and filling each with data of the attacker's choosing. If the kernel can be convinced to use that data, perhaps as the address of a function to call, then the attacker can gain control. Heap spraying is not a vulnerability itself, but it can ease the exploitation of an actual vulnerability, such as a use-after-free bug or the ability to overwrite a pointer. The kernel's kmalloc() function (along with several variants) allocates memory from the heap. Since kmalloc() is used heavily throughout the kernel, any call site that can be used for heap spraying can potentially be used to exploit a vulnerability in a distant, unrelated part of the kernel. That makes the kmalloc() heap a tempting target for attackers.

kmalloc() makes its allocations from a set of "buckets" of fixed-sized objects; most (but not all) of those sizes are powers of two. So, for example, a 48-byte allocation request will result in the allocation of a 64-byte object. The structure behind kmalloc() is, in a sense, an array of heaps, each of which is used for allocations of a given size range. This separation can make heap spraying attacks easier, since it is not necessary to overwrite the entire heap to target an object of a given size.

The dedicated bucket allocator creates a separate set of buckets for allocation sites that are deemed to present an especially high heap-spraying risk. For example, any allocation that can be instigated from user space and filled with user-supplied data would be a candidate for a dedicated set of buckets. Then, even if the attacker manages to thoroughly spray that heap, it will not affect any other allocations; the attacker's carefully selected data cannot be used to attack any other part of the kernel.

The way to get the most complete protection from heap spraying would be to create a set of dedicated buckets for every kmalloc() call site. That would be expensive, though; each set of buckets occupies a fair amount of memory. Inefficiency at that level is the sort of tradeoff that kernel developers tend to view with extreme skepticism; creating a set of buckets for every call site simply is not going to happen.

This new patch series from Cook is built around one of those observations that is obvious in retrospect: most kmalloc() call sites request objects of a fixed size that will never change. Often that size (the size of a specific structure, for example) is known at compile time. In such cases, providing the call site with a single dedicated slab for the size that is needed would give an equivalent level of protection against heap-spraying attacks. There is no need to provide buckets for all of the other sizes; they would never be used.

The only problem with that idea is that there are thousands of kmalloc() call sites in the kernel. Going through and examining each one would be a tedious and possibly error-prone task, that would result in a lot of code churn. But the compiler knows whether the size parameter passed to any given kmalloc() call is a compile-time constant or not; all that is needed is a way to communicate that information to the call itself. If that information were accompanied by something that identified the call site, the slab allocator could set up dedicated slabs for the call sites where it makes sense.

So the problem comes down to getting that information to kmalloc() in an efficient way. Cook's approach is an interesting adaptation of the code-tagging framework that was merged for the 6.10 release. Code tagging is part of the memory-allocation profiling subsystem, which is meant to help find allocation-related bugs; it ties allocations to the call site that requested them, so developers can find, for example, the source of a memory leak.

Code tagging was not really meant as a kernel-hardening technology, but it does provide the call-site information needed here. Cook's series starts by augmenting the tag information stored for each call site with an indicator of whether the allocation size is constant and, if so, what that size is. That information will be available to the slab allocator when the kmalloc() call is made.

If a given allocation request is at the GFP_ATOMIC level, it will be handled in the usual way to avoid adding any extra allocations to that path. Otherwise, though, the allocator will check whether that call site uses a constant size; if so, a dedicated slab will be created for that site and used to satisfy the allocation request (and all that follow). If the size is not constant, then a full set of buckets will be created instead. Either way, the decision will be stored in the code tag to speed future calls. It is worth noting that this setup is not done for any given call site until the first call is made, meaning that it is not performed for the many kmalloc() call sites that will never execute in any given kernel.

If this series is merged, the kernel will have three levels of defense against heap-spraying attacks. The randomized slab option, merged for 6.6, creates 16 sets of slab buckets, then assigns each call site to one set randomly. Its memory overhead is relatively low, but the protection is probabilistic — it reduces the chance that an attacker can spray the target heap, but does not eliminate it. The dedicated-buckets option provides stronger protection, but is limited by the need to explicitly identify risky call sites and isolate them manually. This new option, instead, provides strong protection against heap spraying, but it will inevitably increase the memory overhead of the slab allocator.

The amount of that overhead will depend on the workload being run. For an unspecified distribution kernel, Cook reported that the number of slabs reported in /proc/slabinfo grew by a factor of five or so. Should the series land in the mainline, it will be up to distributors to decide whether to enable this option or not. When a kernel is going to run on a system that is at high risk of heap-spraying attacks, though, that may prove to be an easy decision to make.

Index entries for this article
Kernel	Memory management/Slab allocators
Kernel	Security/Kernel hardening

AUTOSLAB ?

Posted Aug 20, 2024 14:51 UTC (Tue) by Lionel_Debroux (subscriber, #30014) [Link] (4 responses)

This reminds me of https://grsecurity.net/how_autoslab_changes_the_memory_un... , from 2021:
"
Different from quarantining freed kernel heap objects, grsecurity developed an isolation-based approach where each generic allocation site (calling to k*alloc*) has its own dedicated memory caches. As such, two different object types will be isolated from each other since they are allocated from their own dedicated memory caches.
"
The article lists vulnerabilities and benchmarks which can be interesting for evaluating implementations.

AUTOSLAB ?

Posted Aug 21, 2024 18:41 UTC (Wed) by kees (subscriber, #27264) [Link] (3 responses)

All I know is what they've written in their blog posts since "Grsecurity is a commercial product and is distributed only to paying customers."[1]

The idea of separating allocation by type is not new[2] (though doing it per call site is easier). Getting Linux to a safer position to defend against heap UAF is going to take a lot of steps, and this series is just one of many needed steps (see my other comment further down).

[1] https://perens.com/2017/06/28/warning-grsecurity-potentia...
[2] https://chromium.googlesource.com/chromium/src/+/master/b...

AUTOSLAB ?

Posted Aug 21, 2024 22:52 UTC (Wed) by Lionel_Debroux (subscriber, #30014) [Link] (2 responses)

Since you're here: I look forward to reading your analysis and especially benchmarking (I'd expect the latter to be requested as a prerequisite to integrating the patches anyway ?) of your implementation, inspired by the blog post I linked ;)

This sentence you quoted from that questionable post by Bruce Perens, published several weeks after PaX+grsecurity went commercial-only, _might_ have been correct at the time _if_ nobody had redistributed the patches yet... however, I can only think of it as factually incorrect since, at the latest, December 2018, when one version was redistributed to the general public, showcasing the improved defenses and highlighting mainline's stable backporting process missing a sizable number of important fixes (FTR, I did help with the latter).
Further making that sentence factually incorrect in 2024 is the fact that some grsecurity versions more than two years newer than that one, and four years (!) newer than the latest non-commercial ones, have been publicly available for download and usage - under the terms of the GPLv2, obviously - for years. AUTOSLAB is there, as is e.g. RESPECTRE.

AUTOSLAB ?

Posted Aug 22, 2024 23:49 UTC (Thu) by kees (subscriber, #27264) [Link] (1 responses)

Once the new kmalloc_obj() API is finalized I'll make time to get some benchmarking done on this series for the next version. If you or anyone else would like to participate in this effort, I would welcome such an analysis!

As far as inspiration, this series is not trying to implement what AUTOSLAB claims to do. The implementation goals come from all over the place, including MTE, kCTF patches, PartitionAlloc, the XNU kmalloc_type allocator, the GrapheneOS hardened_malloc, etc. Heap defense research is hardly unique to grsecurity. :)

As for the Perens article quote being "factually incorrect", is grsecurity no longer a commercial product? Regardless, random monolithic source leaks is hardly useful for making robust upstream improvements. Besides, Linux has moved away from compiler plugins -- we've been driving language extensions directly in Clang and GCC so the entire Open Source ecosystem can benefit, and then refactoring Linux itself to gain better language robustness and hardening coverage.

AUTOSLAB ?

Posted Aug 23, 2024 8:11 UTC (Fri) by Lionel_Debroux (subscriber, #30014) [Link]

From years of browsing through the PaX & grsecurity patches before 2017, less so afterwards, I was usually able to quickly understand which protection the given hunks dealt with. While some of the protections, for instance KERNEXEC and MEMORY_UDEREF (both among the 5-6 defenses which foiled the implementations of most Linux exploits mentioned on LWN back in the day, without requiring SMEP/SMAP/PXN/PAN/equivalent hardware capabilities which were still scarce in 2017) are largely all-or-nothing, many bits related to e.g. constification and staticification of ops structs, structure layout randomization, or CFI, can be used to make robust upstream improvements. If people - preferably persons actually paid for that task, instead of relying on unpaid volunteers working in their spare time - do it, that is...

You're mentioning moving away from infrastructure (compiler plugins) which has made it possible to provide ongoing support for a wide range of compiler versions for a decade or so, in order to replace it by built-in implementations of a subset of the capabilities provided by compiler plugins into only the newest and future compiler versions, while producing crippled kernel builds on compiler versions which don't support these newfangled extensions - i.e. most of them.
It's an interesting approach, which certainly has upsides beyond Linux (just like the GCC Rust efforts, which Open Source Security Inc. has been one of the very few entities providing actual funding to), if people start to use these language extensions. Their approach is arguably more practical, though. And they can still pull it as a tiny company, which has nowhere remotely near the resources any of the large Linux companies has access to.

Optimization opportunity?

Posted Aug 20, 2024 20:10 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link] (4 responses)

This can be an optimization opportunity by skipping the slab search. Even maybe allocating them statically during the kernel compilation.

Optimization opportunity?

Posted Aug 21, 2024 0:56 UTC (Wed) by willy (subscriber, #9762) [Link] (3 responses)

A further optimisation is that if the kmalloc size is, say, 576 bytes, you can get 7 objects per 4KiB page instead of rounding up to the 1KiB size and getting only 4 objects per page.

Optimization opportunity?

Posted Aug 21, 2024 7:33 UTC (Wed) by taladar (subscriber, #68407) [Link] (2 responses)

Wouldn't you need more information on possible alignment constraints to see if that is possible?

Optimization opportunity?

Posted Aug 21, 2024 8:20 UTC (Wed) by johill (subscriber, #25196) [Link]

Not generally, I think, more than ARCH_KMALLOC_MINALIGN alignment isn't guaranteed in the first place. That might be high, but 576 is 9*64, so packing them has 64-byte alignment already. I don't think many architectures have more than that?

Optimization opportunity?

Posted Aug 21, 2024 18:24 UTC (Wed) by kees (subscriber, #27264) [Link]

Yes, and I'm hoping to solve _that_ problem with another series to get type information (and therefore alignment needs), by replacing the existing code pattern of "void *" assignment into an allocation wrapper macro:

https://lore.kernel.org/lkml/20240807235433.work.317-kees...

kalloc_type

Posted Aug 22, 2024 1:09 UTC (Thu) by comex (subscriber, #71521) [Link] (1 responses)

Compare and contrast with Apple's approach to kernel heap partitioning, which is more like the randomized slab approach, but using struct layout information to group together types that are less dangerous if confused with each other:

https://security.apple.com/blog/towards-the-next-generati...

kalloc_type

Posted Aug 22, 2024 23:19 UTC (Thu) by kees (subscriber, #27264) [Link]

Yeah, this is why I was excited about the codetag infrastructure. Much like Xnu's _KALLOC_TYPE_DEFINE, Linux can start recording much more information about allocation sites and start building a lot more logic into choosing how to do additional hardening. Beyond just the size info in my PoC series, we can record alignment, type signatures (if we want to go that way), etc.