Grand Schemozzle: Spectre continues to haunt

Posted Aug 13, 2019 22:04 UTC (Tue) by farnz (subscriber, #17727)
In reply to: Grand Schemozzle: Spectre continues to haunt by magnus
Parent article: Grand Schemozzle: Spectre continues to haunt

The trouble is that changing microarchitectural state (like caches and TLB entries) is part of the point of speculative execution.

For a modern high performance CPU, registers are zero-latency (data available as soon as it's asked for.). L1 cache hit is 3 to 6 CPU cycles latency delay, L2 cache hit is 10 to 20 CPU cycles, L3 cache hit is around 40 CPU cycles delay. Memory itself is even further away - at 2 GHz, a RAM access is on the order of 100 to 200 CPU cycles. Arguably, virtually all the gains from speculation come in because of the changes it causes to the caches; if you have to spend hundreds of CPU cycles undoing any failed speculation, you're going to lose a lot of performance on each failures.

Worse, I suspect that a reasonable number of speculation failures gain during recovery from the modified µarch state - the right data is in cache, the right TLB entries are hot etc, so that what would have been a 100 cycle delay re-reading from main memory becomes a 20 cycle delay recovering from L2 cache instead of L1 cache.

Grand Schemozzle: Spectre continues to haunt

Posted Aug 13, 2019 22:34 UTC (Tue) by magnus (subscriber, #34778) [Link] (1 responses)

For sure you don't want to go all the way back to external RAM to restore the caches and TLB when rolling back the speculation, you would have to have some form of local buffer holding the entries that were evicted to quickly restore them as part of the roll back. Possibly an undo buffer integrated directly into the L1 cache RAM in some clever way. I'm not saying this is easy to implement efficiently, if it was then it would have been fixed already.

As for your last line that you may benefit from cache updates even when mis-speculating - that's an interesting point and would be interesting to have real-world measurements on this effect. I don't see how you could allow this without also having the side channel exploitable.

Grand Schemozzle: Spectre continues to haunt

Posted Aug 20, 2019 13:45 UTC (Tue) by anton (subscriber, #25547) [Link]

The way to go is not to restore evicted cache lines, but to keep the speculatively fetched cache lines in separate buffers and only put them in the cache (and evict other lines) when the load is commited (at that point it is no longer speculative, because the speculation has been confirmed).

The same has to be done for all other state that can be seen from the outside, or, if that is impractical (e.g., open pages for main memory accesses), wait until the instruction is no longer speculative. The latter costs a bit of performance, but given that a branch prediction can usually be confirmed in tens of cycles or less, while a main memory access takes hundreds, it should not be a big issue.

As for the benefit of reusing the speculatively fetched stuff in the correct path, branch mispredictions are rare so I don't expect that to have much of an effect. Still, maybe by having a per-thread buffer for such cache lines (or extending the buffer I mentioned above with such capabilities), one could preserve that benefit, but I would be very wary of potential side-channel attacks with such an approach.

Grand Schemozzle: Spectre continues to haunt

Posted Aug 14, 2019 9:50 UTC (Wed) by james (subscriber, #1325) [Link] (1 responses)

I have this horrible feeling that even if you did spend hundreds of cycles undoing failed speculation, that could be a side-channel in itself.

For example,

Time how long it takes on this system to get data from RAM.
Read an accurate clock.
Force a misprediction. The CPU needs data from RAM to resolve this.
Force another misprediction on a bounds check.
Carry out a compare based on data in memory you shouldn't have access to.
If the compare is true, read in lots of data from level 3 cache. That should have evicted data in level 1 and level 2 cache by the time the misprediction at stage 3 is resolved.
Eventually, the misprediction at stage 3 will be resolved, and everything will be rolled back -- including the evictions at stage 6, if they happened.
Read the clock again.
If the whole thing has taken little longer than a single read from RAM, the comparison at stage 5 was false. If it took as long as a read from RAM plus a number of reads from level 3 cache, the comparison was true.

And that's without having another friendly thread on another core watching what happened to the level 3 cache, or a network card that's reading this data...

Grand Schemozzle: Spectre continues to haunt

Posted Aug 14, 2019 10:34 UTC (Wed) by farnz (subscriber, #17727) [Link]

Indeed. And on top of that, most speculation side-channels don't matter - it only matters when a side channel can be used to read across a security boundary. So, for example, if a side channel lets JavaScript in my web browser read the DOM of the page it's part of, that's not an issue - the JavaScript has a direct route to getting that data, so a side channel that lets it get the data slowly is not a problem. The issue kicks in when the side channel lets you cross a software security boundary - e.g. by reading the DOM of the active tab, regardless of whether you're part of that page.

So, if you roll back all speculation, you're wasting effort most of the time; it's only when a security boundary is crossed while speculating that we need to worry. This isn't, however, just about userspace to/from kernelspace crossings; it's also about userspace to/from userspace crossings in VMs and anything else that handles untrusted data.