The "Retbleed" speculative execution vulnerabilities

Posted Jul 19, 2022 22:55 UTC (Tue) by atnot (guest, #124910)
In reply to: The "Retbleed" speculative execution vulnerabilities by deater
Parent article: The "Retbleed" speculative execution vulnerabilities

> It's just programming such a thing is really hard and Intel/Itanium thought they could hide this with a good compiler which turned out not to be possible.

Absolutely, targeting C-like languages at VLIW is very difficult and requires advanced scheduling which was not really available at the time. This was a huge factor. It fared much better with GPUs which were targeted with more easily parallelizable languages. Even those would move away eventually though, coincidentally around the time CUDA and GPGPU came about.

Itanium was definitely far from perfect. The initial implementation was terrible and the decision to encode many implementation details of the first CPUs directly into the ISA was a mistake they quickly recognized. But so was x86, we've just gotten used to it. Certainly, today's 12-wide CPUs would have a lot easier time emulating a mediocre 2000s explicitly parallel VLIW CPU than a mediocre 80s microprocessor. Even with it's flaws, Itanium is still significantly less far off from what a modern CPU actually looks like.

The "Retbleed" speculative execution vulnerabilities

Posted Jul 20, 2022 10:24 UTC (Wed) by farnz (subscriber, #17727) [Link]

The other issue with advanced scheduling is that an out-of-order execution design also benefits from a well-scheduled program. An out-of-order processor has a limited instruction window within which it can reschedule dynamically, and a well-scheduled program is set up so that all the rescheduling that can be done in that window is a consequence of the data the program is processing.

GPUs are a different case because they're designed for the world where single threaded performance is not particularly interesting - as long as all threads complete their work in a millisecond or so, we don't care how long each individual thread took. It's thus possible to avoid OoOE in favour of having more threads available to hardware, and better hardware for switching between threads when one thread gets blocked. In contrast, the whole point of CPUs in a modern system (with GPUs as well as CPUs) is to deal with the code where the time for one thread to complete its work sets the time for the whole operation.

I suspect that, for the subset of compute where the performance of a single thread is the most important factor, an out-of-order CPU is the best possible option. The wide-open question is whether we can design an ISA that allows us to avoid unwanted speculation completely; Itanium had that, because it was designed around making all the possible parallelism explicit, but Itanium wasn't a good ISA for out-of-order execution, and had low instruction density.

The other issue that Itanium's explicit speculation didn't account for is that we're starting to see uses of value prediction, not just memory access prediction; do we want to be explicit about all the possible speculative paths (e.g. "you can speculate that the value in r2 is less than the value in r3", or "you can speculate if you believe that r2 is between -16 and +96"), or do we instead want to find a good way to block speculation completely where it's potentially dangerous?

The "Retbleed" speculative execution vulnerabilities

Posted Jul 20, 2022 18:56 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

> Absolutely, targeting C-like languages at VLIW is very difficult

Targeting ANY languages with VLIW is difficult. The fundamental issue is that scheduling depends on input data, and no language can change that.

> Even those would move away eventually though, coincidentally around the time CUDA and GPGPU came about.

Yup. It's just not efficient to use VLIW for anything, even when OOO is not needed.