Avoiding retpolines with static calls

Posted Apr 2, 2020 12:27 UTC (Thu) by anton (subscriber, #25547)
In reply to: Avoiding retpolines with static calls by davecb
Parent article: Avoiding retpolines with static calls

Generally, in-order implementations are not affected, and existing out-of-order implementations are affected; not because it's not possible to make Spectre-immune OoO implementations, but because they did not think about how to do it when they designed these CPUs (Spectre was discovered later).

UltraSPARC T1-SPARC T3 are in-order, but SPARC T4, T5, M7, M8 are OoO. Performing other threads on a cache miss does not protect against Spectre: The cache or other microarchitectural state can still be updated from a speculative load.

Recent in-order CPUs include Cortex A53 and A55, and Bonnell, the first generation of Intel's Atom line.

Avoiding retpolines with static calls

Posted Apr 2, 2020 12:47 UTC (Thu) by davecb (subscriber, #1574) [Link]

Many thanks!

Interestingly, the kind of side-channel attack that Spectre and friends are variants of were studied back in the mainframe days, well before anyone got around to inventing them (;-))

--dave

Avoiding retpolines with static calls

Posted Apr 2, 2020 13:27 UTC (Thu) by farnz (subscriber, #17727) [Link] (6 responses)

Note, though, that even in-order CPUs can be affected. The core of Spectre family attacks is that the CPU speculatively executes code that changes micro-architectural state, and thus any CPU with speculation can be affected. There exist demonstrations that suggest that the Cortex A53, for example, is partially affected by Spectre, but the resulting side channel is too limited to be practical with the currently known exploits.

In theory, even a branch predictor can carry enough state to be attacked!

Avoiding retpolines with static calls

Posted Apr 2, 2020 15:12 UTC (Thu) by davecb (subscriber, #1574) [Link]

Yes: in an experiment at Sun, we found branch prediction could cause differing numbers of cache-line fills. Logically you could use it to signal single bits, for a lowish-speed side channel.

Avoiding retpolines with static calls

Posted Apr 2, 2020 15:37 UTC (Thu) by anton (subscriber, #25547) [Link] (4 responses)

Yes, in-order CPUs perform branch prediction and use that to perform several stages of the pipeline of the first predicted instructions, but normally they do not do anything that depends on the result of one of the predicted instructions; they would need some kind of register renaming functionality for that, and if as a hardware designer you go there, you can just as well do a full OoO design.

A Spectre-style attack consists of a load of the secret, and some action that provides a side channel, with a data flow from the load to the action (otherwise you don't side-channel the secret). An OoO processor can do that, but an in-order processor as outlined above cannot AFAICS. If you can provide a reference to the A53 demonstration, I would be grateful.

Avoiding retpolines with static calls

Posted Apr 2, 2020 18:18 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

Even something as apparently trivial as using the branch predictor to decide which instructions to fetch is enough to make a timing difference that Spectre-type exploits can predict. "All" you need to do is ensure that the branch predictor correctly predicts for a single value of the secret and mispredicts for all others (or vice-versa), and that I have a timer that lets me determine whether or not the branch predictor correctly predicted (made easier if I can control L1I$ contents), and I have a side-channel. Not a very good side channel, but enough that I can extract secrets slowly.

And there's no point having a branch predictor if you then insert a pipeline bubble that means that there's no timing difference between a correct prediction and a mispredict; it's that timing difference that creates the side-channel, though, so you need to somehow prevent the attacker from measuring a timing difference there to be completely Spectre-free.

The demonstration was something in-house, using exactly the components I've described above; it was a slow enough channel to be effectively useless, but it existed.

Avoiding retpolines with static calls

Posted Apr 3, 2020 7:59 UTC (Fri) by anton (subscriber, #25547) [Link]

Yes, the branch predictor can be used as the side channel; in a Spectre-type attack the action part would then be a branch.

But for a Spectre-type attack the branch predictor needs to be updated based on the speculatively loaded value. This does not happen in in-order processors (because the execution does not get that far), and it will not happen in a Spectre-resistant OoO CPU (e.g., by only updating the branch predictor based on committed branches).

Note that not all side-channel attacks are Spectre attacks. We can prevent some side channels through hardware design. E.g., there are Rowhammer-immune DRAMs; I also dimly remember some CPU-resource side channel (IIRC a limitation on the cache ports) that was present in one generation of Intel CPUs, but not in the next generation, which had more resources. Closing other side channels in the non-speculative case appears to be too expensive, e.g., cache and branch predictor side channels, as you explain.

There are established software mitigations for protecting critical secrets (branchless code, data-independent memory accesses for code that handles these secrets) such as encryption keys from non-Spectre/Meltdown side channels. But these do not help against Spectre, because Spectre can use all code in the process to extract the secret.

From your description, you read about a non-Spectre side-channel attack through the branch predictor of the A53.

Avoiding retpolines with static calls

Posted Apr 3, 2020 14:30 UTC (Fri) by excors (subscriber, #95769) [Link] (1 responses)

> A Spectre-style attack consists of a load of the secret, and some action that provides a side channel, with a data flow from the load to the action (otherwise you don't side-channel the secret). An OoO processor can do that, but an in-order processor as outlined above cannot AFAICS.

I don't see why an in-order processor couldn't do that, in principle. They're still going to do branch prediction, and speculatively push some instructions from the predicted target into the start of the pipeline. Once they detect a mispredict they'll flush the pipeline and start again with the correct target. As long as the flush happens before the mispredicted instructions reach a pipeline stage that writes to memory or to registers, that should be safe (ignoring Spectre). But if the mispredicted instructions read from a memory address that depends on a register (which may contain secret data), and the read modifies the TLB state or cache state, then it would be vulnerable to Spectre-like attacks.

(I'm thinking of an artificial case like "insecure_debug_mode = 0; int x = secret; if (insecure_debug_mode) x = array[x];", where mispredicting the 'if' means the very next instruction will leak the secret data via the cache. It doesn't need a long sequence of speculative instructions. That could be a case where the programmer has carefully analysed the expected code path to avoid data-dependent memory accesses etc so their code is safe from side-channels attacks according to the advertised behaviour of the processor, but the processor is speculatively executing instructions that the programmer did not expect to be executed, so I think that counts as a Spectre vulnerability. As I see it, the fundamental issue with Spectre is that it doesn't matter how careful you are about side-channels when writing code, because the processor can (speculatively) execute an arbitrarily different piece of code that is almost impossible for you to analyse.)

In practice, in-order processors seem to typically have short enough pipelines that the misprediction is detected within a few cycles, before the mispredicted instructions have got far enough through the pipeline to have any side effects. But that seems more by luck than by design, and maybe some aren't quite lucky enough. OoO processors aren't more dangerous because they're OoO, they're more dangerous because they typically have much longer pipelines and mispredicted instructions can progress far enough to have many side effects.

Avoiding retpolines with static calls

Posted Apr 3, 2020 16:10 UTC (Fri) by anton (subscriber, #25547) [Link]

So you are thinking about an architecturally visible (i.e., non-speculative) secret, and using speculation only for the side channel. One difference from the classical Spectre attacks is that, like for non-speculative side-channel attacks, software developers only need to inspect the code that deals with the secret and avoid such ifs there; but yes, it's an additional rule they have to observe in such code.

The length of the pipeline of an OoO processor is not the decisive factor. Unlike an in-order processor, speculatively executed instructions on an OoO do not just enter the pipeline, they produce results that other speculatively executed instructions can use as inputs. Essentially the in-order front end (instruction fetch and decoding) is decoupled from the in-order commit part (where the results become architecturally visible) by the OoO engine, and the front end can run far ahead (by hundreds of instructions and many cycles) of the commit part.