AVX-512

Posted Oct 1, 2022 6:23 UTC (Sat) by epa (subscriber, #39769)
In reply to: Hybrid scheduling gets more complicated by jhoblitt
Parent article: Hybrid scheduling gets more complicated

The article says “Both types of CPU implement the same instruction set”. But that’s not true, surely? The AVX-512 instructions and perhaps others are not available on E-cores.

AVX-512

Posted Oct 1, 2022 7:38 UTC (Sat) by drago01 (subscriber, #50715) [Link] (10 responses)

Yes but neither do P-cores. They technically could support AVX-512 but it's disabled.

AVX-512

Posted Oct 1, 2022 8:59 UTC (Sat) by ballombe (subscriber, #9523) [Link] (9 responses)

...which makes AVX512 even less dependable.
Why write AVX512 code when almost nobody can run it and when for half of those who can, it is slower than AVX2 ?

AVX-512

Posted Oct 1, 2022 10:24 UTC (Sat) by drago01 (subscriber, #50715) [Link] (7 responses)

Why would it be slower than AVX2?
If your workload benefits from AVX-512 it will be significantly faster than AVX2.

AVX-512

Posted Oct 1, 2022 11:31 UTC (Sat) by atnot (subscriber, #124910) [Link] (6 responses)

It is slower in most practical scenarios because it imposes a high frequency penalty in Intel's current implementations. So you need to have enough AVX512 instructions lined up in a row to make up for the latency of clocking down into and back out of AVX512 mode. This gets worse in the common cases where backend bottlenecks mean that AVX512 is only actually slightly faster. So this means AVX512 is usually going to be slower outside of select HPC workloads.

AVX-512

Posted Oct 2, 2022 7:30 UTC (Sun) by drago01 (subscriber, #50715) [Link] (5 responses)

If the clock offset hurts you more than the gains from AVX-512 then your workload does not really benefit from AVX-512 in a meaningful way.

AVX-512

Posted Oct 2, 2022 12:06 UTC (Sun) by khim (subscriber, #9252) [Link]

The biggest problem there is the fact that decision to use (or not use) SSE, AVX, AVX-512 is local (you pick these on level is tiny, elementary, functions) while the question about whether AVX-512 is beneficial or not is global.

Essentially the same dilemma which killed Itanic, just not as acute.

AVX-512

Posted Oct 3, 2022 9:32 UTC (Mon) by farnz (subscriber, #17727) [Link] (3 responses)

The difficulty comes in when my workload is scheduled on a single server with other workloads. The right decision for my workload if on a machine by itself is AVX-512 at the lower clocks; however, depending on what the scheduler does, the right decision might become AVX2 if other workloads are more important than mine, and are adversely affected by the core doing AVX-512 downclocking.

This is the problem with using local state ("does this OS thread make use of AVX-512") to drive a global decision ("what clock speed should this core run at"). The correct answer depends not only on my workload, but also on all other workloads sharing this CPU core - which is fine for HPC type workloads, where there are no other workloads sharing a CPU core, but more of a problem with general deployment of AVX-512.

As a side note, as Intel moves on with process from the 14nm of original AVX-512 CPUs, the downclock becomes less severe, and it's nearly non-existent on the latest designs. This, to me, suggests that the downclock is a consequence of backporting AVX-512 to Skylake on 14nm, and thus will become a historic artefact over time.

AVX-512

Posted Oct 3, 2022 15:15 UTC (Mon) by drago01 (subscriber, #50715) [Link] (2 responses)

Well there is also no downclock on Zen4.
But give how CPUs work now days that's not entirely true either because a lighter workload will result into higher clocks and vise versa. CPUs try to maximize performance within the power budget.

Clocks don't matter much though, what matters is the performance you are getting. And if you workload benefits from wide vectors it will offset any clock changes.

AVX-512

Posted Oct 3, 2022 15:33 UTC (Mon) by farnz (subscriber, #17727) [Link]

The critical difference is that with SKX, the maximum permitted clock assuming that thermals allowed was massively reduced for "heavy" AVX-512, because it caused thermal hot-spots on the chip that weren't properly accounted for by "normal" thermal monitoring. With ICL and with RKL there's no longer a huge limit - instead of the SKX thing (where a chip could drop from 3.0 GHz "base" to 2.8 GHz "max turbo" if you used AVX-512), you now can always sustain the same "base", but the max turbo is reduced by 100 MHz or so.

AVX-512

Posted Oct 13, 2022 13:54 UTC (Thu) by roblucid (guest, #48964) [Link]

Dr Ian Cutress tested various AVX usages on Zen4 and both power and performance benefited with AVX512 with an exception that lost about 5%. He was impressed with the lack of downsides.

AVX-512

Posted Oct 13, 2022 13:48 UTC (Thu) by roblucid (guest, #48964) [Link]

AMD Zen4 *cough cough*

Dr Ian Cutress's program uses AVX2 or AVX512 and it had a 6x speed up on Zen3.

AVX-512

Posted Oct 1, 2022 7:42 UTC (Sat) by Sesse (subscriber, #53779) [Link]

That's true, and Intel has “solved” that by permanently disabling AVX-512 on the P-cores. The silicon is physically present, but that just a manufacturing detail (similar to how GPUs often have some defective cores that are disabled). So yes, they do run the same instruction set.