Meltdown/Spectre mitigation for 4.15 and beyond

By Jonathan Corbet
January 15, 2018

While some aspects of the kernel's defenses against the Meltdown and Spectre vulnerabilities were more-or-less in place when the problems were disclosed on January 3, others were less fully formed. Additionally, many of the mitigations (especially for the two Spectre variants) had not been seen in public prior to the disclosure, meaning that there was a lot of scope for discussion once they came out. Many of those discussions are slowing down, and the kernel's initial response has mostly come into focus. The 4.15 kernel will include a broad set of mitigations, while some others will have to wait for later; read on for details on where things stand.

This article from January 5 gives an overview of the defenses for all three vulnerability variants. That material will not be repeated here, so those who have not read it may want to take a quick look before proceeding.

Variant 1

On its surface, the mitigation for Spectre variant 1 (speculative bounds-check bypass) hasn't changed much. In the latest patch set from Dan Williams, the proposed nospec_array_ptr() macro has been renamed to just array_ptr():

    array_ptr(array, index, size)

Its function remains the same: it returns a pointer that is either within the given array or NULL and prevents the processor from speculating with values that are outside the array. The implementation of this macro has been the subject of some debate, though.

The initial implementation used the Intel-blessed mechanism of inserting an lfence instruction as a barrier to prevent speculation past the bounds check. But barriers are relatively expensive, so this approach generated a fair amount of concern about its performance impacts, though few actual measurements were posted. In response, a different approach, which appears to have originated with Alexei Starovoitov, is being explored. It takes a different tack; rather than disabling speculation, it tries to ensure that any speculation that does occur remains within the bounds of the array being accessed.

The trick is to AND the pointer value with a mask which is generated in the following way, given a constant size and a possibly hostile index:

    mask = ~(long)(index | (size - 1 - index)) >> (BITS_PER_LONG - 1);

If index is larger than size, the subtraction at the core of the macro will generate a negative number. Putting in the index again with an OR operation also ensures that the sign bit will be set for the largest index values that might otherwise cause the subtraction to underflow back to a positive number. The subsequent right-shift by BITS_PER_LONG-1 has the effect of replicating the sign bit through the entire mask, yielding a mask that is either all zeros or all ones; the latter case happens when the index is too large. Finally, the "~" at the beginning inverts all the bits. The result: a mask that is all ones for a valid index, all zeroes otherwise. (Note that there is an x86 implementation of this computation that comes down to two instructions).

The key point here is that, if the processor speculates a load from the array with a given index value, it will speculate the mask generation from the same value. That should ensure that the mask is appropriate to the index used and cause the right thing to happen when the mask is ANDed with the pointer value, heading off any attempts to force speculative loads outside of the bounds of the array. There seems to be a high level of confidence that the processor will not speculate on any of the data values used in the masking operation — speculation is almost entirely limited to control decisions, not data values. Normal speculation and reordering can continue, though, retaining the performance of the code overall.

This would appear to be an optimal solution to the problem. It seems that some developers are not yet fully comfortable with this approach, though; they worry that there is still room for the processor to mis-speculate the calculation of the mask, perhaps abetted by optimizations done by the compiler. The fact that the processor vendors have not given any assurances to the contrary gives weight to those concerns. Linus Torvalds, instead, believes that the masking approach is actually safer than using barriers. Even so, some would like to stick with the barrier-based approach. The current patches, as posted, offer both approaches, controlled by a configuration option.

The other significant problem — finding the places where this macro needs to be used — remains unsolved. The current patch set leaves out most of the locations that had been protected in previous versions, since a number of them proved to be controversial. As of this writing, the variant-1 defenses have not yet found their way into the mainline, but that could yet change in this rather atypical development cycle.

Variant 2

Variant 2 (poisoning of the branch prediction buffer) is primarily protected against using the "retpoline" mechanism, which replaces indirect jumps and calls with a dance that defeats speculation. This mechanism was merged into the mainline for the 4.15-rc8 release — a late date indeed for a change of this magnitude — and there are still a few small pieces missing. Given the short time involved and the number of questions needing answers, though, it would not have been possible to get this work done any sooner.

There were various discussions about implementation details, and ongoing uncertainty over whether retpolines are a sufficient protection against variant 2 on Intel Skylake processors. The problem on Skylake has to do with another data structure internal to the processor: the return stack buffer (RSB). Normally, this buffer is used to predict the address used in a "return" instruction, but there are situations where this buffer can run out of entries. That generally happens when the call stack is made deeper without the processor knowing about it; just about any sort of context switch can cause that to happen, for example. On a Skylake processor, an RSB underflow will cause a fallback to the branch prediction buffer instead, turning any "return" into a possible attack point.

It may also be possible, on some other processors, for user space to populate the RSB with hostile values, once again enabling the wrong kind of speculation. The answer in either case is the same: stuff the RSB full of well-known values in places (like context switches) where things could go wrong. The RSB-stuffing patches have been circulating for a while; they have not yet been merged but that should happen in the near future.

One other issue with retpolines remains somewhat unresolved, though: using them requires support from the compiler, and almost nobody has a compiler with that support available. Support for GCC was only posted by H.J. Lu on January 7; those patches were then subjected to a fair amount of ... discussion ... on the details that threatened to delay their merging indefinitely. Richard Biener finally jumped in to request that the process be expedited a bit:

And I'd also like people not to bikeshed too much on this given we're in the situation of having exploitable kernels around for which we need a cooperating compiler. So during the time we bikeshed this (rather than reviewing the actual patches) we have to "backport" the current non-upstream state anyway to deliver fixed kernels to our customer.

That seems to have been enough to at least bring about agreement that this feature would be requested with the -mindirect-branch=thunk-extern compiler option. The GCC developers did force a name change for the retpoline thunk itself, though, breaking the existing kernel patches and making the compiler (when released) incompatible with the version that distributors have been using to create fixed kernels thus far. If that change sticks, it will require more 4.15 patches in the immediate future.

Meanwhile, the IBRS feature is being added to the microcode for some processors to defend against variant-2 attacks, but the degree to which the kernel will use it is still unclear. Setting the IBRS bit in a model-specific register acts as a sort of barrier, preventing bad values placed in the branch prediction buffer from being used when speculating the execution of code in the kernel. IBRS is generally considered inferior to retpolines because it has a much higher performance impact, though that cost is lower on the newest CPUs. An extensive mailing-list discussion made it clear that few people truly understand how IBRS is meant to work or when it should be used. A rather frustrated series of questions from Thomas Gleixner elicited some answers, but only after a considerable amount of contradictory information had been passed around.

Work on IBRS seems to have slowed for now, though, perhaps because retpolines are now seen as being good enough for Skylake processors — for now, at least. As Gleixner put it, the IBRS question can now be resolved in a non-emergency mode:

The further RSB vs. IBRS discussion has to be settled in the way we normally work. We need full documentation, proper working micro code and actual comparisons of the two approaches vs. performance, coverage of attack vectors and code complexity/ugliness.

The remaining concern on Skylake processors would appear to be system-management interrupts (SMIs), which can cause unprotected code to be run in kernel context. There does not appear to be a consensus that SMIs are exploitable in the real world, though, and no known proofs of the concept. Still, David Woodhouse has stated his intent to eventually have Skylake processors use IBRS by default, with retpolines as a boot-time option. But, as he pointed out, this outcome has been slowed by the lack of anybody pushing the IBRS patches forward at an acceptable rate. 4.15 looks set to release without IBRS support, but it will almost certainly show up in the relatively near future.

Variant 3

Variant 3 (the "Meltdown" vulnerability) allows a user-space process to read the contents of kernel memory on a vulnerable system. The defense against this problem is kernel page-table isolation (KPTI), which has been developed in public since early November. It was merged for the 4.15-rc5 and has remained mostly unchanged since then — if one doesn't count a rather large number of bug fixes. Such a fundamental memory-management change was never going to be without glitches, but they are being found and dealt with, one at a time.

The biggest upcoming change to KPTI is certainly the ability to control its use on a per-process basis. KPTI is an expensive mitigation, with overheads of 30% or more reported for some specific workloads (though most workloads will not see an impact of that magnitude). The nopti command-line option can be used to disable KPTI entirely, but there are likely to be settings where an administrator wishes to exempt specific performance-critical processes from KPTI while retaining that protection for the system as a whole. Willy Tarreau has been working on a patch set to provide that capability, but there are some remaining differences of opinion on how it should work.

Tarreau's patch set adds a couple of new operations to the ptrace() system call: ARCH_DISABLE_PTI_NOW and ARCH_DISABLE_PTI_NEXT. The first immediately disables KPTI for the calling process, while the latter merely sets a flag that causes KPTI to be disabled for the process after it makes a call to execve(). The CAP_SYS_RAWIO capability is required to be able to disable KPTI. There is also a sysctl knob (/proc/sys/vm/pti_adjust) that can be used to disable these operations, either temporarily or permanently.

Many aspects of this interface have been discussed without a whole lot of conclusions. The current proposal works at the process level, for example; it is not possible for different threads within a process to have a different KPTI state. Some developers, though, think that thread-level control makes more sense. Another point of discussion was whether both the "now" and "next" modes are needed, but there is, naturally, disagreement over which of the two should go. Linus Torvalds was adamant that the "next" mode is the right one, because the natural place to disable KPTI is in an external wrapper program:

Processes should never say "I'm so important that I'm disabling PTI". That's crazy talk, and wrong. It's wrong for all the usual reasons - everybody always thinks that _their_ own work is so important and bug-free, and that things like PTI are about protecting all those other incompetent people.

Instead, he said, the decision to disable KPTI should be made by an external program run by the administrator. As might be expected for this group, the first use case for such a wrapper would be a nopti wrapper that could be used to run kernel builds without KPTI.

Andy Lutomirski has proposed that a new capability (CAP_DISABLE_PTI) should control access to this functionality rather than CAP_SYS_RAWIO. That would make a lot of the existing privilege checks just work without the need to add a bunch of new infrastructure. The idea is somewhat controversial, though, and it's not clear whether it will make it into the final version of this feature.

All told, there are a number of unresolved issues around how per-process KPTI control should work, even though everybody involved seems to agree that the feature itself should exist. The 4.15 kernel will be released without the per-process KPTI feature, and it would be surprising to see it get into 4.16 as well.

In conclusion

After all of this work, it would appear that the 4.15 kernel will be released with fairly complete Meltdown and Spectre protection, though a number of sharp edges are sure to remain. But, quoting Gleixner again, the time has come to slow down a bit:

Surely we all know there is room for improvements, but we also have reached a state where the remaining issues are not longer to be treated in full emergency and panic mode. We're good now, but not perfect. [...]

We all are exhausted and at our limits and I think we can agree that having the most problematic stuff covered is the right point to calm down and put the heads back on the chickens. Take a break and have a few drinks at least over the weekend!

Those of us who know Gleixner can be fairly well assured that he will have taken his own advice.

All told, this set of vulnerabilities has been an intense death march for a number of kernel developers, most (or all) of whom were not informed of the problems until months after their discovery. Many of them were doing this work as part of their normal job, but others jumped in just because the work needed to be done. All of them were working to address issues that were not of their making in any way. As a result of their effort, Linux systems are reasonably well protected from these problems. We are all very much in their debt.

Index entries for this article
Kernel	Security/Meltdown and Spectre
Security	Linux kernel
Security	Meltdown and Spectre

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 17:24 UTC (Mon) by cesarb (subscriber, #6266) [Link] (2 responses)

> A rather frustrated series of questions from Thomas Gleixner elicited some answers, but only after a considerable amount of contradictory information had been passed around.

This comment from Arjan Van De Ven on that subthread is worthy of a Quote of the Week:

> I spent the better part of the last 6 months in dungeons with CPU designers trying to to figure out what we could and could not do. I'm pretty darn sure I know the details.

https://www.mail-archive.com/linux-kernel@vger.kernel.org...

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 2:41 UTC (Tue) by jcm (subscriber, #18262) [Link] (1 responses)

That's funny. Whenever I've got a problem to discuss on arm servers, I just text the architects involved at the companies building them because we're one big happy family. There's no need to go to a dungeon or through layers of corporate abstraction.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 9:02 UTC (Tue) by marcH (subscriber, #57642) [Link]

Good thing you weren't aware of this issue then and didn't leak it with random texts.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 17:47 UTC (Mon) by mb (subscriber, #50428) [Link]

Huge thanks to everyone involved!
That also includes the LWN team.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 19:02 UTC (Mon) by prometheanfire (subscriber, #65683) [Link]

It was my understanding that the skylake problem was also hitting kaby lake and coffee lake, is this not true?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 19:13 UTC (Mon) by joey (guest, #328) [Link] (19 responses)

Worth noting that kernel mitigations to Spectre don't protect user-space processes from dumping memory belonging to other user's processes. We're only at the very beginning of an exceedingly long road to getting user-space Linux fixed.

Perhaps this won't be a big deal in practice, after all once an attacker has gotten into the machine as one user, they generally only need to wait to exploit an future security hole to fully own the machine.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 20:09 UTC (Mon) by jcm (subscriber, #18262) [Link] (18 responses)

This is why a complete fix requires either turning off the indirect predictor in all of userspace (we support this in RHEL, for that reason - we thought about this) or repolining the world, which requires a mass rebuild. In our case, this will happen in Fedora, though the exact timing of that is subject to upstream gcc work.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 20:29 UTC (Mon) by Otus (subscriber, #67685) [Link] (5 responses)

> turning off the indirect predictor in all of userspace

Is there something in current processors that allows turning off the branch predictor or what does this mean?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 20:41 UTC (Mon) by jcm (subscriber, #18262) [Link] (4 responses)

This (disabling the indirect predictor logic) is what IBRS achieves (restricts indirect branch prediction). But the current patches only do this for entry into the kernel by default. The other half is IBPB, which you can think of as a flush of the predictor state so that when you do a world switch you can flush the state. But there are possible gaps. I'll leave it to the reader to figure them out.

Those IBRS/IBPB interfaces are special MSRs because they're not real MSRs. They're hacks in the microcode to make what look like new MSRs (hence the always-write logic requirement). On other arches, we have more direct control over CPU chicken bits that control indirect predictors and we can whack those directly. But we still have to account for userspace-to-userspace attacks in general. The example from the spectre papers focuses on the latter by way of example anyway.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 20:49 UTC (Mon) by Otus (subscriber, #67685) [Link]

Ah, so prediction is not disabled, just prevented from interacting between processes?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 5:29 UTC (Wed) by paulj (subscriber, #341) [Link] (2 responses)

So what does IBRS do exactly (as was also the question in the linux-kernel thread)?

Is it /disabling/ IBP CPU logic? In which case, Andrea Arcangeli's belief that setting it once is sufficient surely must be true?

Or have Intel, with the microcode update, managed to add some bits of context (privilege level, address space?) to the branch-prediction table, and setting this IBRS pseudo-MSR is needed to get the CPU to update its view of the context in some way, so that IBRS must be set on every security relevant context change? Which would be more in-line with David and Arjan's views in that thread?

The lack of documentation and explanation is less than ideal. The security issues are now public. It doesn't make sense to try 'manage' what information is made public about any mitigation features - it can only hamper the speed at which any flaws/issues with those mitigations are uncovered.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 11:08 UTC (Wed) by dwmw2 (subscriber, #2063) [Link] (1 responses)

Intel documentation is here. I haven't seen public AMD documentation yet (they have IBPB but not IBRS).

No, it isn't just disabling branch prediction completely. I think that what they could achieve in the microcode hacks was fairly limited. So in some ways setting IBRS is a partial barrier, and flushes certain predictions from the store. But leaving IBRS set also makes things go slow, which implies that it's doing some checking at all times. The details are opaque and will vary from generation to generation.

Thankfully we don't really need IBRS except on Skylake (where it doesn't suck quite so much anyway).

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 19, 2018 12:43 UTC (Fri) by anton (subscriber, #25547) [Link]

Unfortunately, the Intel documentation is quite abstract. It does not tell us what these things actually do (probably because that's different for different generations); instead it tries to specify how to use them and/or what guarantees these things give (but even that is not very clear).

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 21:20 UTC (Mon) by epa (subscriber, #39769) [Link] (10 responses)

Can't it be fixed in the same way the attack on the kernel was fixed: by unmapping the memory? So each user space process would get its own memory mappings.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 23:47 UTC (Mon) by andresfreund (subscriber, #69562) [Link] (9 responses)

That's the fix for meltdown, not spectre, which is the danger talked about in the subthread.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 13:31 UTC (Tue) by epa (subscriber, #39769) [Link] (6 responses)

I was thinking of the "Attacks using native code" section of the Spectre paper. As I understand it, that variant of Spectre relies on the victim process being in the same address space as the attacker. You can speculatively execute arbitrary instruction sequences from the victim process and so read its memory space. This attack is prevented if different userspace programs don't share an address space.

There are other Spectre variants but I didn't think they were quite as powerful; they could not usually "dumping memory belonging to other user's processes" as the first post in this thread talks about.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 14:52 UTC (Tue) by excors (subscriber, #95769) [Link] (5 responses)

As I understand it, the Spectre paper's "Example Implementation on Windows" has the victim process and attacker process *not* sharing an address space. The attacker process puts its own code at virtual addresses that coincide with interesting code in the victim, and uses that to train the branch predictor. The branch predictor only cares about virtual addresses (and maybe not even all their bits), and doesn't care about address spaces or physical addresses. The CPU switches to the victim's address space when running the victim process, but it keeps the attacker's branch predictor state, so the attacker can control what code is run (speculatively) by the victim.

Then the attacker chooses to make the victim run code that transmits the victim's memory through some covert channel - mainly timing of reads to cache lines that are shared between the two processes, but there are other variations of covert channels that don't rely on shared cache lines at all (e.g. a read will evict unrelated data at different addresses that happen to map onto the same cache set).

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 14:59 UTC (Tue) by epa (subscriber, #39769) [Link] (4 responses)

Ah, I see. So the only way to prevent that by changing the address space would be to put each user process at its own set of virtual addresses that do not overlap at all with virtual addresses used by other processes. Even on a 64-bit system this would limit the number of user processes.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 15:26 UTC (Tue) by matthias (subscriber, #94967) [Link]

Would it be enough to put the text segments into non overlapping regions? Or is it possible to train the branch predictor to jump to virtual addresses that are marked non executable in the attackers context? After all such jumps can never really succeed.

If the attacker can only train the branch prediction to jump to addresses that are not mapped by the victim, there should be no information leak.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 15:44 UTC (Tue) by excors (subscriber, #95769) [Link]

Haswell reportedly only uses the lower ~31 bits of addresses for branch prediction, so you'd have to make sure processes' address spaces don't overlap modulo 2^31, which is impractical. (And other CPUs could use even fewer bits.)

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 20:14 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

> Ah, I see. So the only way to prevent that by changing the address space would be to put each user process at its own set of virtual addresses that do not overlap at all with virtual addresses used by other processes.

Or could one empty the branch predictor between process switches, possibly using a similar training mechanism?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 20:18 UTC (Tue) by mjthayer (guest, #39183) [Link]

And yes, I know that is roughly what IBPB does where it is available.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 20:23 UTC (Tue) by mjthayer (guest, #39183) [Link] (1 responses)

Regarding the fix for Meltdown, I was wondering whether emptying the TLB on context switches and rewriting the page tables in memory so that the kernel parts were not accessible to user processes would also have worked. Seems potentially slightly less invasive than the duplicate page table thing, assuming the page table structure makes it doable.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 7:12 UTC (Wed) by mjthayer (guest, #39183) [Link]

Though it would probably break PCIDs on systems which have them. And on second thought, I don't see much benefit over the duplicate page tables.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 5:36 UTC (Wed) by foom (subscriber, #14868) [Link]

Given the discussion about retpoline being somewhat ineffective on Skylake, it seems like it may not actually even be sufficient to mass-rebuild the entire userspace with retpoline?

Reading https://software.intel.com/sites/default/files/managed/c5... it looks like there's not really a "turn off the indirect predictor in all of userspace" option, either. (Although I could imagine IBRS might have that effect on some CPUs.)

And even though IBRS ("Indirect Branch Restricted Speculation") is initially described like a mode switch, the intel doc also says it actually needs to be used as a command at every privilege transition for some of the CPUs (those without the "Enhanced IBRS" feature), and even then it doesn't protect code running at a given privilege level from other code on the same core at the same privilege level but in a different process. So you gotta also use IBPB.

And I have to imagine that IBRS and IBPB must be pretty slow given all the work being put into retpoline.

But if we need retpoline on old processors, and can't depend on retpoline on new processors...does userspace code running on skylake need to take the hit of IBRS-always, and IBPB, *and* retpoline (because presumably there won't be multiple variants of all the distros, built with and without retpoline)? How slow is that gonna be?

(Too bad user code can't use the kernel boot-time alternatives patching trick to eliminate the retpoline manipulation where it's not useful. Well, I guess it could via the vsyscall stuff, but you'd need to do an indirect jump through PLT to get there...which kinda ruins the point.)

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 19:59 UTC (Mon) by jcm (subscriber, #18262) [Link]

On your comments around variant 1...

Value prediction is still relatively new to many cores, and often limited to “guess zero” or “guess one” and so on. There is a lot of research in this area, however, and I would not think it safe to assume prediction is limited to control flow going forward.

Which is where arm did the sensible thing in predefining CSDB. On arm, a conditional select (which was thought of ahead of the Intel version you document) is followed by a new instruction that limits speculation, but isn’t a giant serializing fence.

The correct thing for x86 to do is to add an instruction to future ISAs that limits speculation. Others already realized this is the best way forward.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 20:05 UTC (Mon) by jcm (subscriber, #18262) [Link] (5 responses)

On variant 2, it’s probably also worth noting...

Various other arches will do branch predictor restriction/hardening. An example of work in progress is the generic set of interface work posted by Will for arm.

On x86, they have a new feature (CET) which aims to manage the control flow through the RSB and it will break with retpolines. CET should be delayed until there is no chance of needing retpolines and not being able to deploy them. Calls for that to happen are on the list.

CET

Posted Jan 15, 2018 20:13 UTC (Mon) by corbet (editor, #1) [Link] (4 responses)

My understanding is that the issues with CET and retpolines have been worked out, which is why I've not even mentioned it in the articles.

CET

Posted Jan 16, 2018 3:42 UTC (Tue) by roc (subscriber, #30627) [Link] (3 responses)

I read that on LKML, but I wish someone would explain it in detail, since it seems like retpolines are pretty much the exact thing CET should be protecting against.

CET

Posted Jan 16, 2018 11:36 UTC (Tue) by dwmw2 (subscriber, #2063) [Link] (2 responses)

Any processor with CET will always have the new IBRS_ALL feature too, where you just set the IBRS bit once and forget about it, and it's a "do the right thing" option that is faster than any of the options right now, even retpoline.

(I don't really understand why Intel want the MSR write, in fact, and why they don't just advertise it with a single CPUID bit and leave it at that.)

So, given that with IBRS_ALL we'll ALTERNATIVE away the retpoline thunks to a simple 'jmp *%reg', that means that CET will always be OK.

FWIW this is precisely why we changed the retpoline thunks from being ret-equivalent (with the target address on the stack), to taking their argument in a register. When it was on the stack, there was no way to make it CET-compatible.

CET

Posted Jan 16, 2018 22:47 UTC (Tue) by roc (subscriber, #30627) [Link] (1 responses)

Thanks! That explanation is very helpful. This has some interesting implications.

This implies that a kernel must only enable CET in userspace if it also enables IBRS_ALL successfully.

ld.so, or something else, must reliably detect IBRS_ALL or CET and use the correct thunk.

More worrying, all JITs and handwritten assembly must be modified to detect IBRS_ALL or CET and dynamically switch retpolines on/off. Hardcoded retpolines will not work with CET.

How is userspace going to detect whether to use retpolines? Are there going to be syscalls to detect IBRS_ALL and/or CET? Or some other technique?

CET

Posted Jan 17, 2018 7:33 UTC (Wed) by dwmw2 (subscriber, #2063) [Link]

Userspace? What's that? Er... yes, I confess I hadn't got much past getting the kernel to do the right thing.

I suspect we'll find JITs and handwritten assembly that are going to need fixing for CET anyway. But this one is *conditional*. Maybe a flag in the auxvec to say we have CET or IBRS_ALL.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 21:02 UTC (Mon) by shawn.webb (guest, #118686) [Link] (1 responses)

> One other issue with retpolines remains somewhat unresolved, though: using them requires support from the compiler, and almost nobody has a compiler with that support available.

FreeBSD's decision from years back to migrate to the llvm toolchain for its compiler (and eventually linker on amd64) gives it this support. In HardenedBSD, we've already switched amd64 to use ld.lld as the default linker. We're testing a full OS (world + kernel + packages) with retpoline right now. We'll likely debut retpoline in HardenedBSD 12-CURRENT/amd64 late this week or early next.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 21:22 UTC (Mon) by smoogen (subscriber, #97) [Link]

The website for that would be https://hardenedbsd.org/ . I was not aware of this version of BSD until this post so good luck and good going.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 21:21 UTC (Mon) by sthibaul (✭ supporter ✭, #54477) [Link]

> speculation is almost entirely limited to control decisions, not data values.

Well, there *are* some people working on value speculation, with very interesting speedups, e.g. http://ieeexplore.ieee.org/abstract/document/6835952/ (pdf available on https://hal-univ-rennes1.archives-ouvertes.fr/docs/00/90/... ) but I do hope that the speculated read would be coherent with the speculated value, and thus the mask approach remains correct.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 22:55 UTC (Mon) by roc (subscriber, #30627) [Link] (3 responses)

> After all of this work, it would appear that the 4.15 kernel will be released with fairly complete Meltdown and Spectre protection

Isn't this an overstatement considering that thoroughly protecting the kernel against Spectre variant 1 requires using the new array-index macro everywhere it's needed, and no-one actually knows yet how to determine where it's needed?

Another source of confusion is that some people will interpret the above statement to mean that userspace is protected when running on the right kernel, when that is definitely not the case. This is similar to how the cloud providers quickly announced "we've fixed everything in our cloud!" when in fact they only fixed specific hypervisor-related issues and customers still have a ton of work to do.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 15, 2018 23:39 UTC (Mon) by pbonzini (subscriber, #60935) [Link]

There is work on using static analysis to place the new macro. But yes, at this point it's a whack-a-mole game.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 0:18 UTC (Tue) by corbet (editor, #1) [Link] (1 responses)

Yeah, OK, that probably wasn't the best thing I ever wrote. In my poor defense I'll say that I was awfully tired by the time I got to the end of all that stuff...

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 2:20 UTC (Tue) by roc (subscriber, #30627) [Link]

Sorry! Amidst all the confusion, overall you're doing a great job. Thanks!

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 11:41 UTC (Tue) by Sesse (subscriber, #53779) [Link] (1 responses)

What are those two x86 instructions? I can't figure it out offhand, and neither can GCC or Clang.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 16, 2018 11:48 UTC (Tue) by Sesse (subscriber, #53779) [Link]

Ah, to answer my own question, it's in the link: cmp rax, rbx ; sbb rax, rax. CMP sets the carry flag if (rax - rbx) < 0 (CMP and SUB are generally highly related instructions on x86), and SBB subtracts the carry flag.

Skylake

Posted Jan 16, 2018 11:44 UTC (Tue) by dwmw2 (subscriber, #2063) [Link]

"The remaining concern on Skylake processors would appear to be system-management interrupts (SMIs), which can cause unprotected code to be run in kernel context. There does not appear to be a consensus that SMIs are exploitable in the real world, though, and no known proofs of the concept."

The SMI thing is just one of a litany of conditions which on Skylake+ may cause the RSB to underflow, causing dangerous predictions to be taken from the branch predictor instead. There are many of those conditions which, individually, may have caused my to throw my toys out of the pram and say "Skylake+ gets IBRS". Including the one about deep call stacks of ≥16 in depth. How long did we spend playing whack-a-mole trying to make 4KiB stacks work? And that was when we got a nice clean crash with a stack overflow, not a silent vulnerability...

Thunk names

Posted Jan 16, 2018 12:28 UTC (Tue) by dwmw2 (subscriber, #2063) [Link]

"The GCC developers did force a name change for the retpoline thunk itself, though, breaking the existing kernel patches and making the compiler (when released) incompatible with the version that distributors have been using to create fixed kernels thus far. If that change sticks, it will require more 4.15 patches in the immediate future."

Yeah, that was a fun game on Sunday night. We did manage to avoid that though, in the end: https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01300.html

As User I would like to have a cgroup wide way to disable KPTI

Posted Jan 16, 2018 12:44 UTC (Tue) by giggls (subscriber, #48434) [Link]

This way one could e.g. disable it for a psql Server using its systemd service file.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 1:23 UTC (Wed) by roc (subscriber, #30627) [Link] (5 responses)

Microsoft has released a post about MSVC mitigation for Spectre variant 1: https://blogs.msdn.microsoft.com/vcblog/2018/01/15/spectr...
Apparently they are not going to recommend or support retpolines, in user space at least. I wonder why they've diverged from the Linux community here.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 1:55 UTC (Wed) by corbet (editor, #1) [Link] (2 responses)

That's not too surprising, since retpolines are a defense against variant 2...Am I missing something here?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 8:59 UTC (Wed) by roc (subscriber, #30627) [Link] (1 responses)

For mitigations of variant 2, the post links to another post: https://cloudblogs.microsoft.com/microsoftsecure/2018/01/... which is all about system updates, from which we can conclude that user-space code changes for variant 2 are not forthcoming for MSVC users. At least for now.

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 11:13 UTC (Wed) by cesarb (subscriber, #6266) [Link]

That might mean Microsoft believes that the microcode and kernel changes are enough to protect against variant 2 (they probably consider systems where the microcode won't be upgraded a lost cause).

If Microsoft isn't going use retpolines in user space, won't that lead to a performance disadvantage for Linux distributions which decide to recompile everything with retpolines? And what about that variant 1 mitigation they mentioned, will gcc and llvm implement something like it?

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 16:43 UTC (Wed) by foom (subscriber, #14868) [Link] (1 responses)

The currently-proposed equivalent to this on the Linux side is the GCC and LLVM patches adding a new "__builtin_speculation_safe_load" intrinsic (e.g. https://reviews.llvm.org/D41760, https://reviews.llvm.org/D41761, https://gcc.gnu.org/ml/gcc-patches/2018-01/msg01546.html).

Meltdown/Spectre mitigation for 4.15 and beyond

Posted Jan 17, 2018 21:28 UTC (Wed) by roc (subscriber, #30627) [Link]

I don't think those are equivalent at all, since they require manual changes and the Microsoft approach is automatic (with pros and cons accordingly, as I wrote here: http://robert.ocallahan.org/2018/01/long-term-consequence...).