Finer-grained kernel address-space layout randomization
The idea behind kernel address-space layout randomization (KASLR) is to make it harder for attackers to find code and data of interest to use in their attacks by loading the kernel at a random location. But a single random offset is used for the placement of the kernel text, which presents a weakness: if the offset can be determined for anything within the kernel, the addresses of other parts of the kernel are readily calculable. A new "finer-grained" KASLR patch set seeks to remedy that weakness for the text section of the kernel by randomly reordering the functions within the kernel code at boot time.
Kristen Carlson Accardi posted an RFC patch set that implemented a proof-of-concept for finer-grained KASLR in early February. She identified three weaknesses of the existing KASLR:
- low entropy in the randomness that can be applied to the kernel as a whole
- the leak of a single address can reveal the random offset applied to the kernel, thus revealing the rest of the addresses
- the kinds of information leaks needed to reveal the offset abound
This patch set rearranges your kernel code at load time on a per-function level granularity, with only around a second added to boot time."
The changes required are in two main areas. When the kernel is built, a GCC option is used to place each function in its own .text section. The relocation addresses can be used to allow shuffling the text sections as the kernel is loaded, just after it is decompressed. There are, she noted, tables of addresses in the kernel for things like exception handling and kernel probes (kprobes), but those can be handled too:
The second area of changes is in the loading of the kernel into memory; the boot process was changed to parse the vmlinux ELF file to retrieve the key symbols and collect up a list of .text.* sections to be reordered. The function order is then randomized and any tables are updated as needed:
For debugging the proof-of-concept, a pseudo-random-number generator (PRNG) was used so that the same order could be generated by giving it the identical seed. The patch adding the PRNG, which was authored by Kees Cook, might provide some performance benefits, but Andy Lutomirski objected to using a new, unproven algorithm; he suggested using a deterministic random bit generator (DRBG), such as ChaCha20. Similarly, Jason A. Donenfeld was concerned that the random-number sequence could be predicted from just a few leaked address values, which might defeat the purpose of the feature. Cook said that using ChaCha20 instead was a better idea moving forward.
The patch set removes access to the /proc/kallsyms file, which
lists addresses of kernel symbols, for
non-root users. Currently kallsyms simply gives addresses of all
zeroes when non-root users read it, but the list of symbols is given in the
order they appear in the kernel text; that would give away the randomized
layout of the kernel, so access was disabled. Cook pointed out that
making the kallsyms file unreadable has, in the past, "seemed to break weird
stuff in userspace
". He suggested either sorting the symbol names
alphabetically in the output—or perhaps just waiting to see if there were
any complaints.
Impacts
Accardi measured the impact on boot time in a VM and found that it took
roughly one second longer to boot, which is fairly negligible for many use
cases. The run-time performance is harder to characterize; the
all-important kernel build benchmark was about 1% slower than building on
the same kernel with just KASLR enabled. Some other workloads performed
much worse, "while
others stayed the same or were mysteriously better
". It probably is
greatly dependent on the code flow for the workload, which might make for
an area to research in the future; optimizing the function layout for the
workload has
been shown [PDF] to have a positive effect on performance.
Adding the extra information to the vmlinux ELF file to support finer-grained KASLR increases its size, but there is a much bigger effect from the need to increase the boot heap size. Randomizing the addresses of the sections requires a much bigger heap, 64MB, than current boot heaps (64KB for all compressors except bzip2, which needs 4MB). The problem is that a larger boot heap ends up increasing the size of the kernel image by adding a zero-filled section to accommodate the heap.
One of Cook's patches, which was included in Accardi's patch set, seeks to remedy that problem, but it turned out that the underlying problem was a bug in how the sections in the kernel object are laid out. Arvind Sankar pointed to his patch set from January that would fix the problem, which Cook thought was a much better solution to the problem.
Lutomirski also suggested that the sort mechanism being used on the symbol names was too expensive; the swap function being used in the sort() call did quite a bit of unneeded work if a bit more memory was available:
Cook said that he thought there were a number of areas where the tradeoff of memory versus speed need to be considered. The amount of memory being used by the proof-of-concept is much greater than he expected (58MB in his tests). One of the problems there is that the version of free() used when decompressing the kernel image does not actually free any memory. But Accardi thought that the boot latency of a second or so was not likely to deter those who are interested in having the protection—boot-time minimalists are not likely to use finer-grained KASLR anyway, she said.
Security and alignment
In the cover letter, Accardi analyzed the security properties of the patch set, noting that information leaks are often considered to require local access to the system, but that CVE-2019-0688 demonstrated a remote address leak for Windows. The patch set assumes that information leaks are plentiful, so it is trying to make it harder for attackers even in the presence of these leaks. Quantifying the added difficulty is dependent on a number of factors:
She suggested that other alignments could be considered down the road and that execute-only memory (XOM), if it lands, would make the finer-grained technique more effective against certain kinds of attacks. Function sections could perhaps simply be byte-aligned and padded with INT3 instructions, so that a wrong guess would trigger a trap. But the required alignment of functions on Intel processors is somewhat more complicated. Cook said that 16-byte function alignment, as it is now in the kernel, is wasting some space (and some entropy in the function start addresses) when using finer-grained KASLR:
Jann Horn pointed out that Intel recommends 16-byte alignment for branch targets; other alignments might result in less efficient calls. Sankar noted that the current alignment is not that detrimental to the entropy, but Lutomirski said there is another thing to consider:
He also said that there is a bug with some Intel processors that cannot handle certain kinds of jump instructions that span a cache-line boundary. Peter Zijlstra looked at the erratum document [PDF] and thought it implied a need for 32-byte alignment for functions. Handling that may actually require a change to the kernel overall, Cook thought.
The reaction to the idea of finer-grained KASLR was generally positive. No objections to the goals or the techniques used (at a high level) were heard, anyway. It seems like a nice incremental improvement to KASLR. It can also coexist with various control-flow integrity (CFI) measures that are working their way upstream. As Accardi noted, the idea is not new and there has been quite a bit of research into it. OpenBSD uses a similar technique to randomize its kernel at boot time, for example. There is more work to do, of course, but it would not be a surprise to see finer-grained KASLR in the mainline sometime this year.
| Index entries for this article | |
|---|---|
| Kernel | Security/Kernel hardening |
| Security | Linux kernel |
