|
|
Subscribe / Log in / New account

AVX-512

AVX-512

Posted Oct 1, 2022 11:31 UTC (Sat) by atnot (subscriber, #124910)
In reply to: AVX-512 by drago01
Parent article: Hybrid scheduling gets more complicated

It is slower in most practical scenarios because it imposes a high frequency penalty in Intel's current implementations. So you need to have enough AVX512 instructions lined up in a row to make up for the latency of clocking down into and back out of AVX512 mode. This gets worse in the common cases where backend bottlenecks mean that AVX512 is only actually slightly faster. So this means AVX512 is usually going to be slower outside of select HPC workloads.


to post comments

AVX-512

Posted Oct 2, 2022 7:30 UTC (Sun) by drago01 (subscriber, #50715) [Link] (5 responses)

If the clock offset hurts you more than the gains from AVX-512 then your workload does not really benefit from AVX-512 in a meaningful way.

AVX-512

Posted Oct 2, 2022 12:06 UTC (Sun) by khim (subscriber, #9252) [Link]

The biggest problem there is the fact that decision to use (or not use) SSE, AVX, AVX-512 is local (you pick these on level is tiny, elementary, functions) while the question about whether AVX-512 is beneficial or not is global.

Essentially the same dilemma which killed Itanic, just not as acute.

AVX-512

Posted Oct 3, 2022 9:32 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

The difficulty comes in when my workload is scheduled on a single server with other workloads. The right decision for my workload if on a machine by itself is AVX-512 at the lower clocks; however, depending on what the scheduler does, the right decision might become AVX2 if other workloads are more important than mine, and are adversely affected by the core doing AVX-512 downclocking.

This is the problem with using local state ("does this OS thread make use of AVX-512") to drive a global decision ("what clock speed should this core run at"). The correct answer depends not only on my workload, but also on all other workloads sharing this CPU core - which is fine for HPC type workloads, where there are no other workloads sharing a CPU core, but more of a problem with general deployment of AVX-512.

As a side note, as Intel moves on with process from the 14nm of original AVX-512 CPUs, the downclock becomes less severe, and it's nearly non-existent on the latest designs. This, to me, suggests that the downclock is a consequence of backporting AVX-512 to Skylake on 14nm, and thus will become a historic artefact over time.

AVX-512

Posted Oct 3, 2022 15:15 UTC (Mon) by drago01 (subscriber, #50715) [Link] (2 responses)

Well there is also no downclock on Zen4.
But give how CPUs work now days that's not entirely true either because a lighter workload will result into higher clocks and vise versa. CPUs try to maximize performance within the power budget.

Clocks don't matter much though, what matters is the performance you are getting. And if you workload benefits from wide vectors it will offset any clock changes.

AVX-512

Posted Oct 3, 2022 15:33 UTC (Mon) by farnz (subscriber, #17727) [Link]

The critical difference is that with SKX, the maximum permitted clock assuming that thermals allowed was massively reduced for "heavy" AVX-512, because it caused thermal hot-spots on the chip that weren't properly accounted for by "normal" thermal monitoring. With ICL and with RKL there's no longer a huge limit - instead of the SKX thing (where a chip could drop from 3.0 GHz "base" to 2.8 GHz "max turbo" if you used AVX-512), you now can always sustain the same "base", but the max turbo is reduced by 100 MHz or so.

AVX-512

Posted Oct 13, 2022 13:54 UTC (Thu) by roblucid (guest, #48964) [Link]

Dr Ian Cutress tested various AVX usages on Zen4 and both power and performance benefited with AVX512 with an exception that lost about 5%. He was impressed with the lack of downsides.


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds