On that Spectre mitigations discussion

[Posted January 23, 2018 by corbet]

By now, almost everybody has probably seen the press coverage of Linus Torvalds's remarks about one of the patches addressing Spectre variant 2. Less noted, but much more informative, is David Woodhouse's response on why those patches are the way they are. "That's why my initial idea, as implemented in this RFC patchset, was to stick with IBRS on Skylake, and use retpoline everywhere else. I'll give you 'garbage patches', but they weren't being 'just mindlessly sent around'. If we're going to drop IBRS support and accept the caveats, then let's do it as a conscious decision having seen what it would look like, not just drop it quietly because poor Davey is too scared that Linus might shout at him again."

On that Spectre mitigations discussion

Posted Jan 23, 2018 20:52 UTC (Tue) by obrakmann (subscriber, #38108) [Link] (22 responses)

Hang on, I missed this. What's special about Skylake CPUs?

On that Spectre mitigations discussion

Posted Jan 23, 2018 21:04 UTC (Tue) by nix (subscriber, #2304) [Link] (7 responses)

AIUI, they have machinery to predict RETs for use in speculation (which are, after all, a specialized sort of indirect branch) using a shortish in-CPU stack. If that stack is exhausted they fall back to predicting the RET target using, uh, the indirect branch buffer that is already known to be compromizable. Whoops. This means that retpolines are sometimes ineffective on Skylake :(

On that Spectre mitigations discussion

Posted Jan 23, 2018 21:11 UTC (Tue) by welinder (guest, #4699) [Link]

I think the conclusion pretty much is that all current Intel cpus are screwed. We know about
interaction problems with indirect branches (and returns on Skylake), but it would be naive
to believe that there aren't other interactions.

Pressure on floating point units?

Pressure on integer multiplication/division units?

On that Spectre mitigations discussion

Posted Jan 23, 2018 21:31 UTC (Tue) by prometheanfire (subscriber, #65683) [Link] (2 responses)

Is it just skylake or also kabylake and coffeelake?

On that Spectre mitigations discussion

Posted Jan 24, 2018 6:53 UTC (Wed) by tlamp (subscriber, #108540) [Link] (1 responses)

AFAICT all of them: http://lkml.iu.edu/hypermail/linux/kernel/1801.2/06349.html

On that Spectre mitigations discussion

Posted Jan 24, 2018 15:30 UTC (Wed) by dwmw2 (subscriber, #2063) [Link]

Yep, all of them.

On that Spectre mitigations discussion

Posted Jan 24, 2018 0:20 UTC (Wed) by Sesse (subscriber, #53779) [Link]

All relevant CPUs have the ability to speculate on RET using the return stack buffer (RSB)—that's what makes retpolines work in the first place. The problem with Skylake (and presumably also Kaby Lake and later) is that if the RSB is empty, it can speculate RETs using the normal branch predictor mechanisms, which are vulnerable to Spectre.

Forcing an empty RSB is not trivial—it can happen on IRQs (including SMM interrupts), or if the call stack gets more than 16 entries deep (old entries get popped off on CALL, and then on the 17th RET, you've forgotten where you originally came from). The question is how to weigh the risk of such nontrivial attacks versus the cost of enabling IBRS.

On that Spectre mitigations discussion

Posted Jan 24, 2018 8:10 UTC (Wed) by mjthayer (guest, #39183) [Link] (1 responses)

What about Ingo Molnar's suggestion[1] in answer to David's post?

[1] https://www.mail-archive.com/linux-kernel@vger.kernel.org...

On that Spectre mitigations discussion

Posted Jan 24, 2018 16:45 UTC (Wed) by Hattifnattar (subscriber, #93737) [Link]

Andi Kleen doubts this solution would work, at least not without some significant and probably costly modifications:
https://www.mail-archive.com/linux-kernel@vger.kernel.org...

On that Spectre mitigations discussion

Posted Jan 23, 2018 21:07 UTC (Tue) by Lionel_Debroux (subscriber, #30014) [Link] (13 responses)

AFAICR, retpolines are not enough to defeat all side effects of speculative execution on Skylake CPUs, unfortunately.

On that Spectre mitigations discussion

Posted Jan 25, 2018 10:46 UTC (Thu) by Wol (subscriber, #4433) [Link] (10 responses)

Unfortunately, there is no way to defeat all side effects of speculative execution on ANY processor that implements it.

It seems to be accepted that speculative execution will leak information. There's no way to stop it. Hence the name Spectre, it's going to be haunting us for a long time, probably for ever.

The only question is, how easy is it to exploit, and how can we as white-hats drive that cost up.

Cheers,
Wol

On that Spectre mitigations discussion

Posted Jan 27, 2018 15:49 UTC (Sat) by anton (subscriber, #25547) [Link] (9 responses)

There certainly is a way to stop leaking information through cache-based side channels: Don't change the cache until the instructions are no longer speculative. The CPU already has a structure for exactly that purpose, for speculative stores: The store buffer. So one way to deal with the cache side-channel is to enhance the store buffer to also keep the results of loads while they are speculative.

One can think of other information leaks, e.g., DRAM open bank timing attacks. And ways to deal with them (always close the page when the security context id accessing the DRAM bank is different from the the one that accessed it last).

So I think one can close all the leaks coming from speculative execution, but of course one first has to think about them; and the problem with side channels is that they are the things that you normally do not think about. And in that sense you are right; they may haunt us forever, because we can never be sure that we have thought about all of them.

On that Spectre mitigations discussion

Posted Jan 29, 2018 16:32 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (8 responses)

> One can think of other information leaks, e.g., DRAM open bank timing attacks. And ways to deal with them (always close the page when the security context id accessing the DRAM bank is different from the the one that accessed it last).

Unfortunately that just flips the situation around. Instead of observing whether access to the bank becomes faster because it was pre-loaded in the other context, you pre-load it yourself and observe whether access becomes slower because it was accessed from the other context. It also requires the DRAM controller (and all intermediaries) to be aware of the security context of each access, which is generally something which is only tracked in the CPU core.

AIUI the general solution for Specter is to avoid speculatively issuing memory accesses until both the load or store instruction itself *and its dependencies* are certain to be retired (barring asynchronous exceptions). This still permits some degree of reordering, since you don't need to wait until all prior instructions are actually retired—you just need to know that the address *will* be accessed. In particular, this means not speculating loads past conditional branches before the condition is known, or from uncertain memory addresses—treating loads, like stores, as actions with relevant and irreversible side-effects on the state of the system.

On that Spectre mitigations discussion

Posted Jan 29, 2018 17:58 UTC (Mon) by anton (subscriber, #25547) [Link] (7 responses)

Instead of observing whether access to the bank becomes faster because it was pre-loaded in the other context, you pre-load it yourself and observe whether access becomes slower because it was accessed from the other context.

The attacker would then just see that the other channel has accessed the DRAM channel (any access there would close the bank), i.e., that it missed all the caches. Ok, that's still a side channel. This channel can be closed by letting the DRAM access wait until the load is no longer speculative. The performance cost of that would be relatively small, because last-level cache misses are not very frequent, and DRAM accesses take relatively long (the additional cost due to the wait is not that big in relation).

AIUI the general solution for Specter is to avoid speculatively issuing memory accesses until both the load or store instruction itself *and its dependencies* are certain to be retired (barring asynchronous exceptions).

That's throwing out the baby with the bathwater. Keeping the speculation inside the CPU, in a structure specific to the thread, e.g., an enhanced store buffer, is enough to protect against these memory-access-based attacks.

In particular, this means not speculating loads past conditional branches before the condition is known, or from uncertain memory addresses—treating loads, like stores, as actions with relevant and irreversible side-effects on the state of the system.

Stores are speculatively executed, too, using the store buffer.

On that Spectre mitigations discussion

Posted Jan 29, 2018 23:21 UTC (Mon) by nybble41 (subscriber, #55106) [Link] (6 responses)

> The attacker would then just see that the other channel has accessed the DRAM channel (any access there would close the bank), i.e., that it missed all the caches.

I agree that this side-channel seems like it would be very difficult to exploit reliably. Still, it might be simpler to just explicitly close all open banks as part of the context switch rather than messing around tagging accesses with context IDs. Bonus, no hardware changes needed.

> That's throwing out the baby with the bathwater. Keeping the speculation inside the CPU, in a structure specific to the thread, e.g., an enhanced store buffer, is enough to protect against these memory-access-based attacks.

That sounds more or less like what I was proposing...? Keep the access inside the CPU and don't let it affect cache or other memory subsystems until you're sure that location is going to be accessed non-speculatively. However, while stores can be delayed using a store buffer, loads introduce data dependencies which will generally prevent at least some further speculative execution until the load is actually issued outside the CPU.

> Stores are speculatively executed, too, using the store buffer.

"Executed" in a very limited sense; the store can't actually be issued as long as it remains speculative, as that would lose data. The store buffer keeps a record of planned and/or speculative stores inside the CPU, which may be used as input for speculated loads and then either discarded or issued as the situation demands. I'm simplifying this a bit; there are some side-channels related to stores aside from the actual memory updates, such as acquiring and prefetching cache lines, which have the same issues as loads and would also need to be deferred to avoid timing side-channels.

On that Spectre mitigations discussion

Posted Jan 30, 2018 9:30 UTC (Tue) by anton (subscriber, #25547) [Link] (5 responses)

Still, it might be simpler to just explicitly close all open banks as part of the context switch rather than messing around tagging accesses with context IDs. Bonus, no hardware changes needed.

On single-core single-thread processors, yes. But on current hardware, multiple processes use the memory without intervening context switch.

However, while stores can be delayed using a store buffer, loads introduce data dependencies which will generally prevent at least some further speculative execution until the load is actually issued outside the CPU.

Loads can pick up results of earlier speculative loads from a suitably enhanced store buffer just like they now pick up the results from speculative stores. Speculating beyond loads and stores is important for performance, because otherwise speculation would be restricted to a few instructions, resulting in a performance penalty per branch of not much less than a branch misprediction (around 20 cycles per executed branch with current microarchitectures).

Stores are issued speculatively, but like all other instructions, are retired in-order (or discarded), i.e., after any branches they speculated on, and only then is their result propagated outside the store buffer. My suggestion is that loads that have visible effects in the cache hierarchy are handled in a similar way.

On that Spectre mitigations discussion

Posted Jan 30, 2018 18:46 UTC (Tue) by nybble41 (subscriber, #55106) [Link] (4 responses)

> On single-core single-thread processors, yes. But on current hardware, multiple processes use the memory without intervening context switch.

Yeah, that was a major oversight on my part. I was only thinking of hyperthreading and dismissed it because there probably isn't any way to avoid timing side-channels when the same core is running two different threads simultaneously, sharing the same compute units, but only closing banks on context-switches also breaks down on ordinary multi-core systems. Unfortunately I think it would require changes to the DRAM controller, and all the buses and intermediate devices along that path, to close the bank based on the security context.

> Loads can pick up results of earlier speculative loads from a suitably enhanced store buffer just like they now pick up the results from speculative stores.

You'd still have to *execute* the earlier speculative load to get the data into the "load buffer", and that's the core of the problem. Just issuing a load speculatively causes persistent changes which leak data to an attacker. To avoid side-channels, the load must stay within the CPU, where it can be reverted without affecting the rest of the system, until it is no longer speculative.

> Stores are issued speculatively, but like all other instructions, are retired in-order (or discarded), i.e., after any branches they speculated on, and only then is their result propagated outside the store buffer. My suggestion is that loads that have visible effects in the cache hierarchy are handled in a similar way.

So you want the store buffer to encapsulate the entire state of the memory subsystem, including all levels of caches, open DRAM banks, etc., and anything else which they might interact with (caches for other cores), so that all of this can be put back the way it was when the speculated access is discarded? That seems like a huge effort, but anything less would leave side-channels for data to leak through—a partial mitigation, perhaps, but not a final solution. I understand the performance impact, but I don't see a better way to *completely prevent* these sorts of timing side-channels than keeping all speculation well-contained within the CPU core.

On that Spectre mitigations discussion

Posted Jan 31, 2018 17:57 UTC (Wed) by anton (subscriber, #25547) [Link] (3 responses)

There are programming practices intended to avoid timing side-channels for critical pieces of code (e.g., handling of private keys), and some of these practices should even avoid revealing these secrets in the presence of hyperthreading or other SMT. But yes, for security reasons it's better to use SMT only for threads in the same security context.

You'd still have to *execute* the earlier speculative load to get the data into the "load buffer", and that's the core of the problem. Just issuing a load speculatively causes persistent changes which leak data to an attacker.

Which ones? My idea is to avoid all such changes until the load is no longer speculative. You can get the data from the caches into the "load buffer" without making any changes to the caches.

So you want the store buffer to encapsulate the entire state of the memory subsystem, including all levels of caches, open DRAM banks, etc., and anything else which they might interact with (caches for other cores), so that all of this can be put back the way it was when the speculated access is discarded?

No, just load the data when possible without visible changes, and delay until non-speculative when not (closed DRAM banks, cache lines in other cores that are in the wrong state).

On that Spectre mitigations discussion

Posted Jan 31, 2018 19:47 UTC (Wed) by excors (subscriber, #95769) [Link] (1 responses)

> No, just load the data when possible without visible changes, and delay until non-speculative when not (closed DRAM banks, cache lines in other cores that are in the wrong state).

Isn't that likely to be very limiting in practice? I think "wrong state" would have to include the read-only Exclusive state (since a read would transition it to Shared, which is probably observable) - if you e.g. read some data structure on one core, then context-switch and read it again on another core, you'll lose the ability to speculatively fetch it and may get a huge performance drop. And you couldn't speculatively read data that wasn't currently in any cache, because a second core might try to read the same data and would have to decide whether to consider it Exclusive or Shared depending on the supposedly-secret speculative load. I guess it's only safe when the data is already Shared in at least one core, because (hopefully) nobody can observe exactly how many cores it's shared by.

DRAM accesses are presumably where you get the most benefit from speculatively fetching, since it's maybe 6x higher latency than LLC (per http://www.7-cpu.com/cpu/Skylake.html), so prohibiting that sounds like it would hurt a lot. Still better than nothing in terms of performance, but it sounds like a lot of complexity for a relatively small benefit.

On that Spectre mitigations discussion

Posted Feb 1, 2018 17:45 UTC (Thu) by anton (subscriber, #25547) [Link]

I think that these limitations will not have a big performance impact. In most applications most accesses hit the local caches, so the slowdown would only occur in a minority of accesses. Core migrations are rare in a competent OS, because they are slow even without this limitation. a local cache miss typically costs >200 cycles and determining non-speculation typically takes much less, so the slowdown from waiting until such loads are no longer speculative is not that bad even for local-cache misses.

Concerning remote cache accesses, it may be possible to do that speculatively, as long as you only read the remote cache line; changing the state is only necessary when you want to have a copy in the local cache, and you don't do that as long as the access is speculative. Whether doing the remote access twice (once speculatively to get the data, and once non-speculatively to copy it in the local cache) rather than waiting is worthwhile remains to be seen.

On that Spectre mitigations discussion

Posted Jan 31, 2018 22:35 UTC (Wed) by nybble41 (subscriber, #55106) [Link]

> Which ones? My idea is to avoid all such changes until the load is no longer speculative. You can get the data from the caches into the "load buffer" without making any changes to the caches.

OK, in that case I think I can agree that this sort of speculative load would be "safe"—where you're only reading data which already exists in the caches, in a coherency state which permits this core to read the data without involving other cores or otherwise changing any part of the system state. However, as excors already pointed out that seems likely to be of very limited benefit given the likelihood that even reads from L1 will require changes in the coherency state, and the relative latency of L1 vs. DRAM.

On that Spectre mitigations discussion

Posted Jan 26, 2018 0:07 UTC (Fri) by klempner (subscriber, #69940) [Link] (1 responses)

I would think of it slightly differently.

The point of retpoline is to provide a relatively cheap implementation of a primitive that ought to exist: "don't speculatively execute the destination of this indirect branch". It isn't about defeating side effects but rather preventing a particular class of speculative execution from happening in the first place.

(Yes, technically speculative execution happens when you retpoline, but the speculation does nothing.)

The problem is, of course, CPUs weren't designed realizing this primitive needs to exist. retpoline is a hack that relies on CPUs being sufficiently stupid about predicting where ret instructions go. Skylake is just slightly too clever, so retpoline doesn't (always) work there.

Note that retpoline only works against a specific type of speculative execution -- there may be other types of speculative execution side channel exploits we haven't yet figured out. In fact, the Spectre paper describes one other such attack which doesn't involve indirect branches.

On that Spectre mitigations discussion

Posted Feb 7, 2018 4:06 UTC (Wed) by immibis (subscriber, #105511) [Link]

That sounds like something that should be a process-wide setting, at least. I can't think of a good use case for speculatively executing some indirect branches, but not others.

As far as I'm aware, *any* indirect branch can result in information leakage through Spectre. But then there might be workloads or machines where you don't care about information leakage - e.g. high-performance computing.