|
|
Log in / Subscribe / Register

Using all of those cores

Using all of those cores

Posted Aug 10, 2023 16:05 UTC (Thu) by DemiMarie (subscriber, #164188)
In reply to: Another round of speculative-execution vulnerabilities by paulj
Parent article: Another round of speculative-execution vulnerabilities

How can one beat the parallelism limit you mentioned?


to post comments

Using all of those cores

Posted Aug 10, 2023 17:19 UTC (Thu) by farnz (subscriber, #17727) [Link] (2 responses)

You can't, easily. Much of the parallelism limit is inherent to the way we perceive the problem domain, and it's simply not possible to have more parallelism without radical new understandings of the problems we're trying to solve.

Some problems, such as graphics rendering and neural network modelling, do have a higher inherent parallelism, and we have an alternative type of processor, called a GPU for historical reasons, which is designed to be faster than a CPU on problems with lots of parallelism; it achieves this by sacrificing single thread performance in favour of running a large number of concurrent threads, complete with hardware support for launching a very large number of threads and multiplexing them onto a smaller number of executing threads.

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:01 UTC (Thu) by DemiMarie (subscriber, #164188) [Link] (1 responses)

GPUs have other limitations, though. For instance, the SIMT model means that GPUs are terrible at workloads with lots of non-uniform control flow. That isn’t a huge limitation for math or graphics, but it is a serious limitation for what I call “business logic” workloads, where a significant part of the problem is figuring out what to do next. This includes e.g. web applications, which have a huge amount of parallelism but lots of conditional branches and non-uniform memory accesses.

Not everything with parallelism is suitable for GPUs

Posted Aug 10, 2023 22:08 UTC (Thu) by farnz (subscriber, #17727) [Link]

They're no more terrible at non-uniform control flow than CPUs are - in the worst case, you just use one SIMD lane per GPU core, get a much lower throughput, but still have the large number of threads. It's just that we look at GPUs differently to CPUs, so we see the slowdown from using only one SIMD lane as a big deal on a GPU, but we don't see it as a big deal that we only use scalar instructions on CPU cores with the ability to process 8 (AVX2) or 16 (AVX-512) 32-bit values in parallel, despite the fact that this is the same class of slowdown.

Using all of those cores

Posted Aug 11, 2023 9:48 UTC (Fri) by paulj (subscriber, #341) [Link]

Parallelism for a code for a specific problem?: You have to find a more parallel algorithm. If that is even possible.

Making efficient use of compute resources, in a world where the codes you want to run have limited parallelism? Run many different codes together on the same compute elements, and switch between them to keep memory bandwidth and compute occupied. No single code will run faster, but at least you maintain throughput in the aggregate.

This is kind of where computers have gone anyway. From your phone, to your desktop, to servers running containers running jobs in the cloud - they've all got many many dozens of jobs to run at any given time. If one stalls, switch to another.


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds