Spectre V1 defense in GCC
Spectre V1 defense in GCC
Posted Jul 11, 2018 14:15 UTC (Wed) by nathan (subscriber, #3559)Parent article: Spectre V1 defense in GCC
if (index < structure->array_size) {
correct = (index >= structure->array_size) ? all_zeroes : correct;
requires the compiler's value-range-propagation optimization not function here. After all, because we're inside the if body, C abstract machine semantics tells us that index is indeed less than the array size. so a test for it to be greater-or-equal must be false. Thus C language semantics tells us we can reduce that conditional assignment to 'correct = correct' (and then eliminate it entirely). That of course would defeat the whole point.
That's one of the horrible bits of these vulnerabilities. Not only do they confuse human programmers, you often can't fix them without turning off optimizers. And you only want to do that as locally as possible. Hence the need for a compiler builtin that hides these semantics from the optimizers.
[above deduction presumes lack of volatile objects]
Posted Jul 11, 2018 15:13 UTC (Wed)
by corbet (editor, #1)
[Link] (7 responses)
Posted Jul 12, 2018 7:06 UTC (Thu)
by epa (subscriber, #39769)
[Link]
Posted Jul 12, 2018 13:33 UTC (Thu)
by ncm (guest, #165)
[Link] (5 responses)
It is a strange world we live in, now, where we cannot have any confidence that the machine instructions we see correctly describe the machine behavior they will evoke.
Posted Jul 12, 2018 14:51 UTC (Thu)
by corbet (editor, #1)
[Link] (4 responses)
Or that's how I understand it, at least.
Posted Jul 12, 2018 18:49 UTC (Thu)
by ncm (guest, #165)
[Link] (3 responses)
Some background, for those catching up... In prehistory, each instruction mapped to a specific series of machine states, and you knew everything about the machine just from the instructions you could see. When we got microcode, at first each instruction mapped to a specific sequence of microcode operations. With various caches, register renaming, and out-of-order execution scheduling "functional units" opportunistically, the sequence of machine states is a matter of speculation. With speculative execution, we got even less determinism, because now operations not even asked for ("yet") happen.
Early on, the translation from instructions to microcode sequences lost its direct mapping. Now, that mapping results in micro-ops for various nearby instructions interleaved, operating on physical registers chosen by the scheduler according to data flow dependencies it tracks. The translation to micro-ops can take into account knowledge of the actual run-time state of the machine, invisible to programmer and compiler. For example, the chip can know a divisor in a register is a power of two, and is not updated during a loop, and so substitute a shift or mask operation for the division. memcpy is a frequent bottleneck in real programs, so the chip may watch for instruction sequences that compilers emit for it, and substitute something smarter, instead, maybe based on the actual number of bytes and the actual alignment of the pointers.
At issue here is that the micro-op optimizer also knows which micro-ops change status bits, and so could know that the micro-op sequence following a status-bit-controlled branch can be shortened. There's nothing speculative about this. Chip vendors don't typically reveal this sort of detail, so the best we can do is measure whether the move and conditional move seem always to happen at the same speed, and suppose that, therefore, there would be no reason to do it. Of course, measurements don't tell us about the next release.
Posted Jul 12, 2018 18:57 UTC (Thu)
by corbet (editor, #1)
[Link] (2 responses)
If said "last use" was speculative, and thus the state of the condition code is speculative, then using that code for optimization *is* speculation, instead. The whole point is what happens during speculative execution; the instruction is a no-op in the real world. But an instruction that is defined as not being executed speculatively cannot be elided as the result of a speculative branch prediction.
Posted Jul 12, 2018 23:27 UTC (Thu)
by jcm (subscriber, #18262)
[Link]
Jon is right in his summary. But the point about uop caching and optimization is still a good one. Multiple efforts are underway in the industry to analyze this part of the front end in more detail for side channels. There are quite a few interesting possibilities I can think of, in particular with abuse of value prediction. I've asked a few research teams to consider looking at how badly people screwed up value predictors.
Posted Jul 12, 2018 23:44 UTC (Thu)
by ncm (guest, #165)
[Link]
(*Speculation may pile upon speculation, up to the limit of microarchitectural resources.)
Ultimately we will need assurances from vendors that the conditional nature of the move is not, and won't ever be, optimized away. Later, we will want another version of conditional move that we specifically allow to be micro-optimized; but first things first.
That would indeed be the case if the defense were done in C code, but that code is there for illustrative purposes. The actual implementation is inserted by the compiler, as described further down in the article.
Compiler optimization
Compiler optimization
Hardware-level micro-op optimization
Instructions like CSEL are defined by the architecture to not execute speculatively. That is, as I understand it, a requirement to be able to do things like constant-time crypto operations. So its use of the condition code is different from the test immediately above, which can be speculated. Assuming the processor behaves as specified, the result should be correct.
Hardware-level micro-op optimization
Hardware-level micro-op optimization
Hardware-level micro-op optimization
> the optimizer knows nothing has changed the status bit since its last use is not speculation.
Hardware-level micro-op optimization
Hardware-level micro-op optimization