|
|
Log in / Subscribe / Register

Another round of speculative-execution vulnerabilities

Another round of speculative-execution vulnerabilities

Posted Aug 9, 2023 9:16 UTC (Wed) by paulj (subscriber, #341)
In reply to: Another round of speculative-execution vulnerabilities by willy
Parent article: Another round of speculative-execution vulnerabilities

One issue is that there is a parallelism limit in common codes, of about 4 to 10 IPC, according to a '93 DEC WRL tech report by David Wall. Also, even if you increase IPC and can avoid pipeline stalls there, memory speed - latency particularly - isn't keeping up.

There is a good argument to be made that the increasing transistor count budgets could be better spent on adding more, simple, compute elements ("cores") rather than adding ever more complex speculative execution logic to ever more complex compute elements. That this would be more efficient overall.

I.e., rather than trying to make 1 (or a very small) number of parallel paths of execution very fast with speculative execution, we should just provide many more paths of execution with simpler cores. The simpler cores might each have to stall more waiting on memory latency, but if you have many of them you can get more throughput - they will not waste cycle or energy on misplaced speculative execution.

These are not new ideas, they go back a long way, and we're slowly going down that path it seems. GPUs are kind of part of that vision, CPUs have gone many-core, but still with very complex speculative logic to fulfil desire for good single-thread benchmark results. Old blog of mine, but the references are still good to read: https://paul.jakma.org/2009/12/07/thread-level-parallelis...


to post comments

Using all of those cores

Posted Aug 10, 2023 16:05 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (4 responses)

How can one beat the parallelism limit you mentioned?

Using all of those cores

Posted Aug 10, 2023 17:19 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

You can't, easily. Much of the parallelism limit is inherent to the way we perceive the problem domain, and it's simply not possible to have more parallelism without radical new understandings of the problems we're trying to solve.

Some problems, such as graphics rendering and neural network modelling, do have a higher inherent parallelism, and we have an alternative type of processor, called a GPU for historical reasons, which is designed to be faster than a CPU on problems with lots of parallelism; it achieves this by sacrificing single thread performance in favour of running a large number of concurrent threads, complete with hardware support for launching a very large number of threads and multiplexing them onto a smaller number of executing threads.

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:01 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

GPUs have other limitations, though. For instance, the SIMT model means that GPUs are terrible at workloads with lots of non-uniform control flow. That isn’t a huge limitation for math or graphics, but it is a serious limitation for what I call “business logic” workloads, where a significant part of the problem is figuring out what to do next. This includes e.g. web applications, which have a huge amount of parallelism but lots of conditional branches and non-uniform memory accesses.

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:08 UTC (Thu) by farnz (subscriber, #17727) [Link]

They're no more terrible at non-uniform control flow than CPUs are - in the worst case, you just use one SIMD lane per GPU core, get a much lower throughput, but still have the large number of threads. It's just that we look at GPUs differently to CPUs, so we see the slowdown from using only one SIMD lane as a big deal on a GPU, but we don't see it as a big deal that we only use scalar instructions on CPU cores with the ability to process 8 (AVX2) or 16 (AVX-512) 32-bit values in parallel, despite the fact that this is the same class of slowdown.

Using all of those cores

Posted Aug 11, 2023 9:48 UTC (Fri) by paulj (subscriber, #341) [Link]

Parallelism for a code for a specific problem?: You have to find a more parallel algorithm. If that is even possible.

Making efficient use of compute resources, in a world where the codes you want to run have limited parallelism? Run many different codes together on the same compute elements, and switch between them to keep memory bandwidth and compute occupied. No single code will run faster, but at least you maintain throughput in the aggregate.

This is kind of where computers have gone anyway. From your phone, to your desktop, to servers running containers running jobs in the cloud - they've all got many many dozens of jobs to run at any given time. If one stalls, switch to another.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds