Taming STIBP

By Jonathan Corbet
November 29, 2018

The Spectre class of hardware vulnerabilities was apparently so-named because it can be expected to haunt us for some time. One aspect of that haunting can be seen in the fact that, nearly one year after Spectre was disclosed, the kernel is still unable to prevent one user-space process from attacking another in some situations. An attempt to provide that protection using a new x86 microcode feature called STIBP has run into trouble once its performance impact was understood; now a more nuanced approach may succeed in providing protection where it is needed without slowing down everybody else.

The Spectre variant 2 vulnerability works by polluting the CPU's branch-prediction buffer (BPB), which is used during speculative execution to make a guess about which branch(es) the code will take; see this article for a refresher on the Spectre vulnerabilities if needed. Closing this hole requires changes at a number of levels, but a fundamental part of the problem is preventing any code that may be targeted from running with a BPB that has been trained by an attacker.

There are a few ways in which this can be accomplished; in many cases the appropriate tool is a new instruction called IBPB, which flushes the BPB. Developers have been discussing the right times to execute IBPB instructions for some time, but the overall strategy is relatively straightforward: an IBPB instruction should be run whenever the CPU switches between tasks that do not trust each other. A few modes for determining when IBPB should be used have been implemented and can be selected with command-line options.

IBPB leaves one part of the problem unsolved, though. When simultaneous multithreading (SMT, or "hyperthreading") is in use, two threads of execution are, for all practical purposes, executing on the same CPU simultaneously. Those threads will share the same BPB; if one thread populates the BPB with hostile entries, the other thread will be affected by them until the next IBPB instruction is executed. In other words, SMT processors create an ongoing series of time windows in which one thread may attack another, even when IBPB is in use. Some security-sensitive users have disabled SMT entirely in response to this problem (and others), but not everybody wants to pay that cost.

That is where STIBP comes in. It is a processor mode (rather than an instruction) that, according to Intel's press materials [PDF], "prevents indirect branch predictions from being controlled by the sibling Hyperthread". This sounds like just what is needed to keep threads from attacking each other. After some discussion, STIBP support was added to the kernel during the 4.20 merge window. At that time, the decision was made to enable STIBP by default and to leave it on, so that systems would automatically be protected. This patch was subsequently backported to the 4.19.2, 4.18.19, 4.14.81, and 4.9.137 stable updates.

It turns out, however, that there is a problem with STIBP: it slows the system down significantly for many workloads. Linus Torvalds managed to keep his promise to be more polite when he described what is going on, but it must have been a strain:

Yes, Intel calls it "STIBP" and tries to make it out to be about the indirect branch predictor being per-SMT thread.

But the reason it is unacceptable is apparently because in reality it just disables indirect branch prediction entirely. So yes, *technically* it's true that that limits indirect branch prediction to just a single SMT core, but in reality it is just a "go really slow" mode.

As reports of performance regressions started rolling in, it became clear that the decision to enable STIBP by default would have to be revisited. In the resulting discussion, Torvalds said that STIBP needed to be made an optional feature that could be enabled by "crazy people" who are willing to pay the performance cost it brings. Arjan van de Ven said that both Intel and AMD recommend against enabling it by default (though Intel has apparently not actually documented that recommendation anywhere). Ingo Molnar promised to require performance measurements for any future mitigations before they can be merged. The STIBP patch was reverted in the 4.19.4 4.14.83, and 4.9.140 stable updates; it remains in 4.18 since that series is no longer receiving updates.

As of this writing, the STIBP patch is also still in the mainline kernel, pending the finalization of a better solution. That solution is likely to take the form of this patch set posted by Thomas Gleixner, containing the work of a number of developers. STIBP is disabled on any system that does not actually have running processors with SMT enabled, even if such processors could materialize in the future. It is also disabled by default for most processes on the system, but it can be globally enabled with the spectre_v2_user=on command-line option.

There is also a new set of values for the spectre_v2= command-line option that can be used to enable more control over branch prediction:

spectre_v2=prctl leaves both IBPB and STIBP disabled by default, but allows them to be enabled for individual processes via a new prctl() operation. In this mode, the system can generally run without the extra overhead of the Spectre mitigations, but those mitigations can be turned on for specific processes that need extra protection.
spectre_v2=seccomp is the same as the prctl mode, with the exception that any processes running under seccomp() will have the mitigations enabled unconditionally.
spectre_v2=prctl,ibpb enables IBPB globally in the system, but only enables STIBP for processes that have turned it on with prctl().
spectre_v2=seccomp,ibpb enables IBPB globally, and STIBP for all seccomp() processes and those that have enabled it explicitly.

This set contains 28 individual patches; it is not a trivial thing to merge this late in the development cycle (or into a stable kernel update). That appears to be the plan, though; the patches have been pulled into the tip tree and are likely to hit the mainline in the near future. Invasive changes like this are just part of the deal in the post-Spectre world, it seems. Once the dust settles, though, Linux systems will have more complete protection against Spectre variant 2, but the cost of that protection will only need to be paid by those who feel that they need it.

Index entries for this article
Kernel	Security/Meltdown and Spectre

"Go really slow"

Posted Nov 29, 2018 16:17 UTC (Thu) by epa (subscriber, #39769) [Link] (4 responses)

Why would anyone want to enable this "go really slow" mode (disabling branch prediction) rather than just getting the "go slightly slower" effect of disabling hyperthreading? What exactly is Intel's envisaged use for it?

Are there some obscure workloads where hyperthreading gives a big speedup, but branch prediction doesn't really matter, and moreover the hyperthreaded tasks on the same CPU don't trust each other?

"Go really slow"

Posted Nov 29, 2018 17:03 UTC (Thu) by hansendc (subscriber, #7363) [Link] (1 responses)

Remember, this is all about the *indirect* branch predictor (The I in STIBP is "Indirect"). STIBP does not disable all branch prediction. Also, the impact can vary drastically based on the microarchitecture. A processor that has hardware mitigations for Spectre v2 might enumerate support for STIBP and allow it to be enabled, but have negligible additional overhead.

Also, remember that we have very limited support for comprehending the trust relationship between any software threads running on a CPU core. We largely don't know if they trust each other or not.

So, no this isn't about obscure workloads. It's about mixing normal workloads with sensitive ones that we want to protect.

Trust...

Posted Dec 20, 2018 0:26 UTC (Thu) by john.carter (guest, #123615) [Link]

>Also, remember that we have very limited support for comprehending the trust relationship between any software threads running on a CPU core. We largely don't know if they trust each other or not.

Hmm.

I would sort of expect if ThreadA has access to /proc/pidThreadB/mem

It's trusted. (ie. it needn't rely on fancy attacks)

If is hasn't, it's not trusted.

"Go really slow"

Posted Nov 29, 2018 17:37 UTC (Thu) by iabervon (subscriber, #722) [Link] (1 responses)

I assume this is like most ISA additions: new CPUs implement them efficiently, so they're faster than the alternatives, while old CPUs implement them correctly, so programs function. It's just that the new CPUs that can split the BPB by thread and run fast with STIBP don't exist at all yet.

"Go really slow"

Posted Dec 3, 2018 15:09 UTC (Mon) by ncm (guest, #165) [Link]

... and, since older CPUs are slowed, people are motivated to replace them with new. If you can't make new CPUs faster than the old ones, sometimes it suffices to make the old CPUs slower than the new ones.

The cynicism is dizzying.

Taming STIBP

Posted Nov 29, 2018 19:38 UTC (Thu) by pbonzini (subscriber, #60935) [Link]

The factdl that this patch made it through the maintainers is probably a late effect of the secrecy that covered all the Spectre/Meltdown work one year ago. It was not documented, but still known, that IBRS and STIBP were, at least for some processors, a bigger hammer than what the documentation suggested.

Taming STIBP

Posted Nov 29, 2018 23:43 UTC (Thu) by Sesse (subscriber, #53779) [Link] (6 responses)

So now we have a (sort-of) solution to the slowdowns, and it did not involve Linus calling anyone an idiot.

I will say the kernel is progressing on two fronts. Two thumbs up.

Taming STIBP

Posted Nov 30, 2018 2:22 UTC (Fri) by mangix (guest, #126006) [Link] (2 responses)

Unfortunately I miss Linus calling people idiots.

Taming STIBP

Posted Dec 4, 2018 17:24 UTC (Tue) by KaiRo (subscriber, #1987) [Link]

I do not, as IMHO calling anyone an idiot (or stupid, or similar) is simply unacceptable between decent human beings. Calling a product of their work idiotic or saying they behave stupidly or something in that manner is still bad behavior but somewhat better, it usually can and should be said with as precise but less strong words. People themselves should never be attacked, it's enough to criticize (hopefully constructively) their actual work that you take issue with.

Taming STIBP

Posted Dec 5, 2018 21:41 UTC (Wed) by flussence (guest, #85566) [Link]

Linus was not in the habit of calling people idiots. Bad ideas and corporations, sure.

He had to stop because, among other reasons, actual idiots were misinterpreting it as celebrity endorsement of their misanthropic, diseased attitudes toward other people.

Taming STIBP

Posted Nov 30, 2018 5:47 UTC (Fri) by unixbhaskar (guest, #44758) [Link] (2 responses)

" Linus calling anyone an idiot"

I am sorry and think you are hindsight and missing the context of those calling in the past. There is no point spreading the FUD, please don't.

No, I am not in favor of abusing people in any form or anybody doing it publicly, but there was some context before that calling, please do care to read those, which led to that outburst.

Taming STIBP

Posted Nov 30, 2018 8:40 UTC (Fri) by dgm (subscriber, #49227) [Link] (1 responses)

Absolutely. The parent comment insinuated that Linus was in fact calling everybody an idiot, all the time. This is, of course, false.

The question really is, is it permisible to call someone "idiot" when you think it is deserved? If it is not, then what we are effectively doing is censoring the word "idiot". Censoring words is an idiotic (sorry, I mean much less than adequate), way to proceed, because people will find ways to "politely" be as much offensive.

Better than censoring, we should promote communication styles that add to the *content* of the conversation.

Taming STIBP

Posted Nov 30, 2018 11:12 UTC (Fri) by anselm (subscriber, #2796) [Link]

The question really is, is it permisible to call someone "idiot" when you think it is deserved?

I don't think people actually deserve being called “idiots” if they write ill-advised code. I've done that for sure, you've probably done it on occasion, even Linus Torvalds has probably done it once or twice. If it happens, by all means call our code bad, or even idiotic if you must. But don't call us idiots. Certainly not over the Internet. If you want to call me an idiot, do it to my face or don't do it at all.

Taming STIBP

Posted Nov 30, 2018 17:40 UTC (Fri) by hmh (subscriber, #3838) [Link] (4 responses)

I am probably worring over nothing, but when you have one SMT running in kernel mode (ring 0), and a sibling SMT running userspace code (ring 3), wouldn't STIBP be required to avoid the side-channel, since the BPB is shared?

Or is the BPB engineered in such a way that one ring cannot pollute/train/alias BPB entries for a different ring?

Taming STIBP

Posted Dec 1, 2018 9:55 UTC (Sat) by pbonzini (subscriber, #60935) [Link] (3 responses)

Ring 0 is not using the indirect branch predictor on affected processors, all indirect branches are patched at runtime to use retpolines instead.

Taming STIBP

Posted Dec 1, 2018 19:29 UTC (Sat) by hmh (subscriber, #3838) [Link]

Thanks, that explains everything!

Taming STIBP

Posted Dec 5, 2018 18:37 UTC (Wed) by raistlin (guest, #37586) [Link] (1 responses)

Yep. Or, if one does not use retpoline, and uses IBRS instead (e.g., on future hardware where that may be faster, or as Xen does already, in some cases), that --I mean setting IBRS when entering ring 0-- prevents BTB updates done in ring 3 to affect branches in context with more privilege (like ring 0). Or so I've understood. :-D

Taming STIBP

Posted Dec 6, 2018 9:59 UTC (Thu) by hmh (subscriber, #3838) [Link]

That's how I understood it as well, but...

While it is likely to be true for "enhanced IBRS" (the one you leave always on, and which doesn't exist quite yet), for the current crop of processors that are way too prone to leak fleeting images of a future past, IMHO it is a IBRS property better tested before being trusted to exist.

After all, it is all about ghosts, and ghosts are tricky ;-)

Merged

Posted Dec 1, 2018 22:28 UTC (Sat) by corbet (editor, #1) [Link]

This work has now been merged for the 4.20-rc5 release.