Stuffing the return stack buffer
If a CPU is to speculate past a return instruction, it must have some idea of where the code will return to. In recent Intel processors, there is a special hidden data structure called the "return stack buffer" (RSB) that caches return addresses for speculation. The RSB can hold 16 entries, so it must drop the oldest entries if a call chain goes deeper than that. As that deep call chain returns, the RSB can underflow. One might think that speculation would just stop at that point but, instead, the CPU resorts to other heuristics, including predicting from the branch history buffer. Alas, techniques for mistraining the branch history buffer are well understood at this point.
As a result, long call chains in the kernel are susceptible to speculative-execution attacks. On Intel processors starting with the Skylake generation, the only way to prevent such attacks is to turn on the indirect branch restricted speculation (IBRS) CPU "feature", which was added by Intel early in the Spectre era. IBRS works, but it has the unwelcome side effect of reducing performance by as much as 30%. For some reason, users lack enthusiasm for this solution.
Another way
Gleixner and Zijlstra decided to try a different approach. Speculative execution of return calls on these processors can only be abused if the RSB underflows. So, if RSB underflow can be prevented, this particular problem will go away. And that, it seems, can be achieved by "stuffing" the RSB whenever it is at risk of running out of entries.
That immediately leads to two new challenges: knowing when the RSB is running low, and finding a way to fill it back up. The first piece is handled by tracking the current call-chain depth — in an approximate way. The build system is modified to create a couple of new sections in the executable kernel image to hold entry and exit thunks for kernel functions and to track them. When RSB stuffing is enabled, the entry thunk will be invoked on entry to each function, and the exit thunk will be run on the way out.
The state of the RSB is tracked with a per-CPU, 64-bit value that is originally set to:
0x8000 0000 0000 0000
The function entry thunk "increments" this counter by right-shifting it by five bits. The processor will sign-extend the value, so the counter will, after the first call, look like:
0xfc00 0000 0000 0000
If twelve more calls happen in succession, the sign bit will have been extended all the way to the right and the counter will contain all ones, with bits beginning to fall off the right end; this counter thus cannot reliably count above twelve. In this way it mimics the RSB, which cannot hold more than 16 entries, with a safety margin of four calls; the use of shifts achieves that behavior without the need to introduce a branch. Whenever a return thunk is executed, the opposite happens: the counter is left-shifted by five bits. After twelve returns, the next shift will clear the remaining bits, and the counter will have a value of zero, which is the indication that something must be done to prevent the RSB from underflowing.
That "something" is a quick series of function calls (coded in assembly and found at the end of this patch) that adds 16 entries to the call stack, and thus to the RSB as well. Each of those calls, if ever returned from, will immediately execute an int3 instruction; that will stop speculation if those return calls are ever executed speculatively. The actual kernel does not want to execute those instructions (or all of those returns), of course, so the RSB-stuffing code increments the real stack pointer past the just-added call frames.
The end result is an RSB that no longer matches the actual call stack, but which is full of entries that will do no harm if speculated into. At this point, the call-depth counter can be set to -1 (all ones in the two's complement representation) to reflect the fact that the RSB is full. The kernel is now safe against Retbleed exploitation — until and unless another chain of twelve returns happens, in which case the RSB will need to be stuffed again.
Costs
Quite a bit of work has been put into minimizing the overhead of this
solution, especially on systems where it is not needed. The kernel is
built with direct calls to its functions as usual; at boot time, if the
retbleed=stuff option is selected, all of those calls will be
patched to go through the accounting thunks instead. The thunks themselves
are placed in a huge-page mapping to minimize the translation lookaside
buffer overhead. Even so, as the cover letter comments, there are costs:
"We both unsurprisingly hate the result with a passion
".
Those costs come in a few forms. An "impressive
" amount of memory
is required to hold the thunks and associated housekeeping. The bloating
of the kernel has a performance impact of its own, even on systems where
RSB stuffing is not enabled. The extra instructions add to pressure on the
instruction cache, slowing execution. That last problem could be mitigated
somewhat, the cover letter says, by allocating the thunks at the beginning
of each function rather than in a separate section. Gleixner has prepared
a GCC patch to make that possible, and reports that some of the performance
loss is gained back when it is used.
The cover letter contains a long list of benchmark results comparing the performance of RSB stuffing against that of disabling mitigations entirely and of using IBRS. The numbers for RSB stuffing are eye-opening, including a 382% performance regression for one microbenchmark. In all cases, though, RSB stuffing performs better than IBRS.
Better performance than IBRS is only interesting, though, if the primary goal of blocking Retbleed attacks has been achieved. The cover letter says this:
The assumption is that stuffing at the 12th return is sufficient to break the speculation before it hits the underflow and the fallback to the other predictors. Testing confirms that it works. Johannes [Wikner], one of the retbleed researchers, tried to attack this approach and confirmed that it brings the signal to noise ratio down to the crystal ball level.There is obviously no scientific proof that this will withstand future research progress, but all we can do right now is to speculate about that.
So RSB stuffing seems to work — for now, at least. That should make it
attractive in situations where defending against Retbleed attacks is
considered to be necessary; hosting providers with untrusted users would be
one obvious example. But nobody will be happy with the overhead, even if
it is better than IBRS. For a lot of users, RSB stuffing will be seen as a
clever hack that, happily, they do not need to actually use.
| Index entries for this article | |
|---|---|
| Kernel | Security/Meltdown and Spectre |
Posted Jul 22, 2022 18:27 UTC (Fri)
by alonz (subscriber, #815)
[Link] (1 responses)
Posted Jul 29, 2022 13:48 UTC (Fri)
by smitty_one_each (subscriber, #28989)
[Link]
And that is disquieting.
Posted Jul 22, 2022 22:11 UTC (Fri)
by developer122 (guest, #152928)
[Link] (5 responses)
Posted Jul 23, 2022 13:23 UTC (Sat)
by mss (subscriber, #138799)
[Link] (4 responses)
And these mitigations can always be disabled if not applicable to one's threat model.
Posted Jul 23, 2022 21:42 UTC (Sat)
by kenmoffat (subscriber, #4807)
[Link] (3 responses)
Posted Jul 24, 2022 11:45 UTC (Sun)
by kenmoffat (subscriber, #4807)
[Link] (2 responses)
Posted Jul 24, 2022 14:13 UTC (Sun)
by mss (subscriber, #138799)
[Link] (1 responses)
Posted Jul 24, 2022 20:24 UTC (Sun)
by kenmoffat (subscriber, #4807)
[Link]
Posted Jul 23, 2022 0:40 UTC (Sat)
by scientes (guest, #83068)
[Link] (6 responses)
Posted Jul 23, 2022 14:41 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (4 responses)
That doesn’t sound like including the time domain…? It’s just flushing all state at certain transitions?
Posted Jul 23, 2022 14:47 UTC (Sat)
by Paf (subscriber, #91811)
[Link] (3 responses)
Posted Jul 23, 2022 16:26 UTC (Sat)
by epa (subscriber, #39769)
[Link]
Posted Jul 24, 2022 20:19 UTC (Sun)
by Sesse (subscriber, #53779)
[Link] (1 responses)
Posted Jul 24, 2022 23:03 UTC (Sun)
by Paf (subscriber, #91811)
[Link]
Posted Aug 2, 2022 17:13 UTC (Tue)
by immibis (subscriber, #105511)
[Link]
Posted Jul 23, 2022 3:39 UTC (Sat)
by felixfix (subscriber, #242)
[Link]
Waaaay back when, late 1970s, I worked on Datapoint 2200/5500/6600 8 bit computers, Datapoint's extension of the basic 8008 into Z80 level, but different: no IX IY regs, had system/user modes, other differences.
It also had a 16 level hardware stack with no overflow or underflow detection or warning.
Its only interrupt was every millisecond whether you wanted it or not. Everything was polled.
We threw in a push, push, pop, pop, to force always using 3 levels of stack, because none of our interrupt code was supposed to use more than two levels, and user code was expected to never use more than 13 levels.
Deja vu all over again!
Posted Jul 23, 2022 14:15 UTC (Sat)
by jhoblitt (subscriber, #77733)
[Link] (5 responses)
Posted Jul 23, 2022 14:42 UTC (Sat)
by tglx (subscriber, #31301)
[Link] (3 responses)
Posted Jul 23, 2022 19:13 UTC (Sat)
by jhoblitt (subscriber, #77733)
[Link] (2 responses)
I genuinely appreciate the effort and wizardry going into this problem. I suspect many are in the position of considering risk and evaluating which hosts need retbleed mitigation... and which subset of those can't afford a 1/3 loss of capacity.
It is looking like the ultimate solution is either to buy more hardware to compensate for increased kernel overhead or to upgrade to Intel "12th" gen or newer cpus with eIBRS support? Either option is difficult with the current unprecedented lead times on IT equipment.
Of course, I accepted delivery of 5 pallets of zen3 based servers immediately prior to the public retbleed disclosure.
Posted Jul 27, 2022 15:01 UTC (Wed)
by anton (subscriber, #25547)
[Link] (1 responses)
But when they designed this fallback from the return stack buffer to the history-based indirect branch predictor into Skylake, they apparently did not put a chicken bit for that in, probably because history-based indirect branch prediction had been present in Intel CPUs for many generations.
Posted Jul 24, 2022 0:32 UTC (Sun)
by developer122 (guest, #152928)
[Link]
Posted Jul 24, 2022 6:33 UTC (Sun)
by petkan (subscriber, #54713)
[Link] (1 responses)
Posted Jul 24, 2022 6:42 UTC (Sun)
by petkan (subscriber, #54713)
[Link]
Posted Jul 24, 2022 10:56 UTC (Sun)
by fw (subscriber, #26023)
[Link] (9 responses)
Posted Jul 24, 2022 14:36 UTC (Sun)
by izbyshev (guest, #107996)
[Link]
Posted Jul 24, 2022 16:06 UTC (Sun)
by mss (subscriber, #138799)
[Link] (7 responses)
Posted Jul 24, 2022 17:12 UTC (Sun)
by izbyshev (guest, #107996)
[Link] (6 responses)
Posted Jul 24, 2022 17:18 UTC (Sun)
by izbyshev (guest, #107996)
[Link] (5 responses)
Posted Jul 24, 2022 17:44 UTC (Sun)
by Paf (subscriber, #91811)
[Link] (1 responses)
Posted Jul 24, 2022 18:14 UTC (Sun)
by izbyshev (guest, #107996)
[Link]
Posted Jul 26, 2022 0:24 UTC (Tue)
by developer122 (guest, #152928)
[Link] (2 responses)
Posted Jul 26, 2022 13:47 UTC (Tue)
by mss (subscriber, #138799)
[Link] (1 responses)
Posted Jul 26, 2022 14:47 UTC (Tue)
by izbyshev (guest, #107996)
[Link]
FWIW, it's vice versa: retpolines rely on RSB stuffing to make them less broken on Skylake.
But yeah, the general sentiment of that email is that apparently retpolines would be unsafe on Skylake even if RSB stuffing were added in all cases when the RSB might become empty.
I like that last sentence ("all we can do right now is to speculate about that"). Yeah, look where speculation has brought us…
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
I've been told that Ice Lake and later, except Alder Lake are vulnerable, and that the mitigation will also run on Alder Lake until new firmware is applied.
See INTEL-SA-00702
Stuffing the return stack buffer
The Intel affected CPU model page says that Ice Lake models are not affected by INTEL-SA-00702.Stuffing the return stack buffer
eIBRS parts had their own vulnerability in March (the relevant paper is here), which apparently can also be used to mount Retbleed-style attacks.
Stuffing the return stack buffer
The real another way is to have the time dimension part of the ISA:
Stuffing the return stack buffer
Prevention of Microarchitectural Covert Channels on an
Open-Source 64-bit RISC-V Core
Such as with this suggested new instruction, part of seL4 development.
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Actually it is somewhat surprising. CPU designer often put in "chicken bits" for disabling new microarchitectural features, in case they turn out to be buggy. You can then still sell the CPU instead of having to scrap it. And some of these chicken bits have been known to be used over the years (and probably many more were used before the CPUs were released, and the public never heard of them).
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Huh. Why doesn't the CPU execute the conditional branch in Stuffing the return stack buffer
shlq $5, PER_CPU_VAR(__x86_call_depth); jz 1f speculatively, defeating the mitigation?
Stuffing the return stack buffer
Stuffing the return stack buffer
jz 1f is not an indirect branch.
But it's directly followed by Stuffing the return stack buffer
ret on the fallthrough path. If the speculation window could be arbitrarily large, I don't see what would prevent CPU from simply bypassing the RSB stuffing code by taking the fallthrough path N times where N is the size of the RSB, and then still using the attacker-controlled indirect branch predictor. So it seems that this mitigation relies on a certain upper bound on the size of the speculation window.
And indeed, quoting the patch:
Stuffing the return stack buffer
+ * The shift count might cause this to be off by one in either direction,
+ * but there is still a cushion vs. the RSB depth. The algorithm does not
+ * claim to be perfect and it can be speculated around by the CPU, but it
+ * is considered that it obfuscates the problem enough to make exploitation
+ * extremly difficult.
Stuffing the return stack buffer
Stuffing the return stack buffer
Stuffing the return stack buffer
Some knowledgeable people already say that:
Stuffing the return stack buffer
Retpoline is not safe on Skylake-era CPUs, and we knew this before the Spectre/Meltdown embargo broke in Jan '18.
RSB stuffing relies on retpolines for Spectre v2 mitigation.
Stuffing the return stack buffer
