Addressing Meltdown and Spectre in the kernel
First, a couple of notes with regard to Meltdown. KPTI has been merged for the 4.15 release, followed by a steady trickle of fixes that is undoubtedly not yet finished. The X86_BUG_CPU_INSECURE processor bit is being renamed to X86_BUG_CPU_MELTDOWN now that the details are public; there will be bug flags for the other two variants added in the near future. 4.9.75 and 4.4.110 have been released with their own KPTI variants. The older kernels do not have mainline KPTI, though; instead, they have a backport of the older KAISER patches that more closely matches what distributors shipped. Those backports have not fully stabilized yet either. KPTI patches for ARM are circulating, but have not yet been merged.
Variant 1
The first Spectre vulnerability, known as "variant 1", "bounds-check bypass", or CVE-2017-5753, takes advantage of speculative execution to circumvent bounds checks. If given the following pseudocode sequence:
if (within_bounds(index)) {
value = array[index];
if (some_function_of(value))
execute_externally_visible_action();
}
The body of the outer if statement should only be executed if index is within bounds. But it is possible that this body will be executed speculatively before the bounds check completes. If index is controlled by an attacker, the result could be a reference far beyond the end of array. The resulting value will never be directly visible to the attacker, but if the target code performs some action based on the value, it may leave traces somewhere where the attacker can find them — by timing memory accesses to determine the state of the memory cache, for example.
The best solution here (and for the other variants too) would be for the processor to completely clean up the results of a failed speculation, but that's not in the cards anytime soon. So the approach being taken is to prevent speculative execution after important bounds tests in the kernel. An early patch, never posted for public review, created a new barrier macro called osb() and sprinkled calls to it in places where they appeared to be necessary. In the pseudocode above, the osb() call would be placed immediately after the first if statement.
It would appear that this is not the approach that will be taken in the mainline, though, judging from this patch set from Mark Rutland. Rather than place barriers after tests, this series creates a set of helper macros applied to the pointer and array references instead. The documentation describes them in detail. For the example above, the second line would become:
int *element = nospec_array_ptr(array, index, array_size);
if (element)
value = *element;
else
/* Handle out-of-bounds index */
If the index is less than the given array_size, a pointer to the indicated value — &array[index] — will be returned; otherwise a null pointer is returned. The macro contains whatever architecture-specific magic is needed to prevent speculative execution of pointer dereferencing operation. This magic is supported by new directives being added to the GCC and LLVM compilers.
Earlier efforts had included a separate if_nospec macro that would replace the if statement directly. After discussion, though, its author (Dan Williams) decided to drop it and use the dereferencing macros instead.
These macros can protect against variant 1 — if they are placed in the correct locations. As Linus Torvalds noted, that is where things get a bit sticky:
Is there such a sane model right now, or are we talking "people will randomly add these based on strong feelings"?
Finding exploitable code sequences in the kernel is not an easy task; the
kernel is large and makes use of a lot of values supplied by user space.
It appears that speculative execution can
proceed for sequences as long as "180 or so simple
instructions
", which means that the vulnerable test and subsequent
reference can be far apart — even in different functions. Identifying such
sequences is hard, and preventing the introduction of new ones in the
future may even be harder.
It seems that the proprietary Coverity checker was used to find the spots for which there are patches to date. That is less than ideal going forward, since most developers do not have access to Coverity. The situation may not improve anytime soon, though. Some developers have suggested using Coccinelle, but Julia Lawall, the creator of Coccinelle, has concluded that the task is too complex for that tool.
One final area of concern regarding variant 1 is the BPF virtual machine. Since BPF allows user space to load (and execute) code in kernel space, it can be used to create vulnerable code patterns. The early patches added speculation barriers to the BPF interpreter and JIT compiler, but it appears that they are not enough to solve the problem. Instead, changes to BPF are being considered to prevent possibilities for speculative execution from being created.
Variant 2
Attacks using variant 1 depend on the existence of a vulnerable code sequence that is conveniently accessible from user space. Variant 2, (or "branch target injection", CVE-2017-5715) instead, depends on poisoning the processor's branch-prediction mechanism so that indirect jumps (calls via a function pointer, for example) will, under speculative execution, be redirected to an attacker-chosen location. As a result, a useful sequence of code (a "gadget") anywhere in the kernel can be made to run speculatively on demand. This attack can also be performed across processes in user space, meaning that it can be used to access data outside of a JavaScript sandbox in a web browser, for example.
There are two different variant-2 defenses in circulation, in multiple versions. Complete protection of systems will likely involve some combination of both, at least in the near future.
The first of those is a processor microcode update giving the operating system more control over the use of the branch-prediction buffer. The new feature is called IBRS, standing for "indirect branch restricted speculation". It takes the form of a new bit in a model-specific register (MSR) that, when written, effectively clears the buffer, preventing the poisoning attack. A patch set enabling IBRS usage in the kernel has been posted but, in an example of the rushed nature of much of this work, the patches did not compile and had clearly not been run in their posted form.
The alternative approach is a hackaround termed a "return trampoline" or
"retpoline"; this mechanism is well described in this Google page
(which also suggests that we should "imagine speculative execution as
an overly energetic 7-year old that we must now build a warehouse of
trampolines around
"). A retpoline replaces an indirect jump or indirect
function call with a sequence of operations that, in short, puts the
target address onto the call stack, then uses a return instruction
to "return" to the function to be called. This dance prevents speculative
execution of the call; it's essentially a return-oriented
programming
attack against the branch predictor. The performance cost of using this
mechanism is estimated at 0-1.5%.
Naturally, these retpolines must be deployed to every indirect call in any program (the kernel or something else) that is to be protected. That is not a task that can reasonably be done by hand in non-trivial programs. But it is something that can be given over to a compiler to handle. LLVM patches have been posted to automate retpoline generation, but that is not particularly helpful for the kernel. GCC patches have not yet been circulated, but they can be found in this repository.
Several variants of the retpoline patches for the kernel have been posted by different authors who clearly were not always communicating as well as they could be. The current version, as of this writing, was posted by David Woodhouse. This series changes the kernel build system to use the new GCC option and includes manual conversions for indirect jumps made by assembly-language code. There is also a noretpoline command-line option which will patch out the retpolines entirely.
The retpoline implementation seems to be nearly stable and imposes a relatively small overhead overall. But there is still a lot of uncertainty around whether any given system should be using retpolines or IBRS — or a combination of the two. One might think that a hardware-based mechanism would be preferable, but the performance cost of IBRS is evidently quite high. So it seems that, as a general rule, retpolines are preferable to IBRS. But there are some exceptions.
One of those is that, it would seem, retpolines don't work on Skylake-generation Intel CPUs, which perform more aggressive speculative execution around return operations. Nobody has publicly demonstrated that this speculation can be exploited on Skylake processors, but some developers, at least, are nervous about leaving a possible vulnerability open. As Woodhouse said:
When looking at optimisations, it is rare for us to say "oh, well it opens up only a *small* theoretical security hole, but it's faster so that's OK".
So the more cautious administrators, at least, will probably want to stick with IBRS on Skylake processors. The good news is that IBRS performs better on those CPUs than it does on the earlier ones.
The other problem is that, even if the kernel can be built with retpolines, other code, such as system firmware cannot be. Concerns about firmware surprised some developers, but it would seem that they are warranted. Quoting Woodhouse again:
The firmware that runs in response to those calls is unlikely to be rebuilt with retpolines in the near future, so it may well contain vulnerabilities to variant-2 attacks. Thus the IBRS bit needs to be set before any such calls are made, regardless of whether IBRS is used by the kernel as a whole.
In summary
From all of the above, it's clear that the development community has not yet come close to settling on the best way to address the Spectre vulnerabilities. Much of what we have at the moment was the result of fire-drill development so that there would be something to ship when the disclosure happened. Moving the disclosure forward by six days at the last minute did not help the situation either.
It is going to take some time for everything to settle down — even if no other vulnerabilities crop up, which is not something that would be wise to count on. It's worth noting that, in the IBRS discussion, Tim Chen said that there are more speculation-related CPU features in the works at Intel. They may just provide better defenses against the publicly known attacks — maybe. But even if no other vulnerabilities are about to jump out at us, it seems almost certain that others will be discovered at some point in the future.
Meanwhile, there is enough work to do just to get a proper handle on the
current set of problems and to get acceptable solutions into the mainline
kernel. It seems fair to say that these issues are going
to distract the development community (for the kernel and beyond) for some
time yet.
| Index entries for this article | |
|---|---|
| Kernel | Retpoline |
| Kernel | Security/Meltdown and Spectre |
| Security | Linux kernel |
| Security | Meltdown and Spectre |
