|
|
Log in / Subscribe / Register

Using all of those cores

Using all of those cores

Posted Aug 10, 2023 17:19 UTC (Thu) by farnz (subscriber, #17727)
In reply to: Using all of those cores by DemiMarie
Parent article: Another round of speculative-execution vulnerabilities

You can't, easily. Much of the parallelism limit is inherent to the way we perceive the problem domain, and it's simply not possible to have more parallelism without radical new understandings of the problems we're trying to solve.

Some problems, such as graphics rendering and neural network modelling, do have a higher inherent parallelism, and we have an alternative type of processor, called a GPU for historical reasons, which is designed to be faster than a CPU on problems with lots of parallelism; it achieves this by sacrificing single thread performance in favour of running a large number of concurrent threads, complete with hardware support for launching a very large number of threads and multiplexing them onto a smaller number of executing threads.


to post comments

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:01 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

GPUs have other limitations, though. For instance, the SIMT model means that GPUs are terrible at workloads with lots of non-uniform control flow. That isn’t a huge limitation for math or graphics, but it is a serious limitation for what I call “business logic” workloads, where a significant part of the problem is figuring out what to do next. This includes e.g. web applications, which have a huge amount of parallelism but lots of conditional branches and non-uniform memory accesses.

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:08 UTC (Thu) by farnz (subscriber, #17727) [Link]

They're no more terrible at non-uniform control flow than CPUs are - in the worst case, you just use one SIMD lane per GPU core, get a much lower throughput, but still have the large number of threads. It's just that we look at GPUs differently to CPUs, so we see the slowdown from using only one SIMD lane as a big deal on a GPU, but we don't see it as a big deal that we only use scalar instructions on CPU cores with the ability to process 8 (AVX2) or 16 (AVX-512) 32-bit values in parallel, despite the fact that this is the same class of slowdown.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds