|
|
Subscribe / Log in / New account

Intel AMX support in 5.16

Intel AMX support in 5.16

Posted Nov 10, 2021 22:25 UTC (Wed) by bartoc (guest, #124262)
In reply to: Intel AMX support in 5.16 by jak90
Parent article: Intel AMX support in 5.16

both options are kinda bad though, you don't want just "I'm going to use AVX-512 please pin me" you also need to tell the scheduler when you're done using it. Similarly if the kernel just pins on the first illegal instruction it would need to occasionally unpin if it wants to be able to do the non-avx512 things on the non-avx cores.

Also, most apps using avx-512 are not doing it unconditionally, but rather call cpuid and check the results. Because cpuid completely serializes execution it's very much not fast, and so apps tend to just call it once, in a static initializer or similar. calling it before each portion of code using AVX-512 is just not fast at all, so you'd want to make the "can I do avx-512" part of the per-thread state, and have the scheduler change it for you when it decided to schedule you on a CPU with different features.

For AVX-512 on something like alder lake I think you'd need a system call that's essentially "do I have AVX-512" and the kernel could say yes or no (even if there are some cores with AVX-512), but if it said yes then it would promise not to schedule you on any cores without AVX-512 until you were done. Hopefully this would be implementable without actually making a real transition to kernel mode by setting some per-thread flag the scheduler could look at when needed. Even this (pretty complicated) mechanism poses some problems, because apps might not tell the kernel when they are done, either because they forget, or because the kernel told them they could use the fancy instructions and they don't want to give the core back. This would be a particular problem on laptops where I'd imagine the kernel might want to get everyone off the P cores so they could be completely powered off. Unfortunately once the app has started doing it's fancy AVX-512 things the kernel can't unilaterally decide to take back the permission to do AVX-512, as even if it handled the illegal instructions after moving the thread to an E core it can't go back in time to have the process take the non-avx branch. So you might get situations (a little like with switchable graphics) where you have long running apps that ask for AVX-512, don't tell the kernel when they are done with it, and then cause pretty dramatic reductions in battery life for no reason.

I suppose the kernel _could_ forcibly reschedule the process by somehow implementing a software version of the AVX-512 instructions, that way you'd just get somewhat extreme slowness.

Another option would be for intel themselves to implement such software versions of AVX-512 instructions, and use their execution as input into their new hardware scheduler thingy to indicate that maybe the thread should be scheduled on a P core.


to post comments

Intel AMX support in 5.16

Posted Nov 11, 2021 10:05 UTC (Thu) by wtarreau (subscriber, #51152) [Link] (2 responses)

No, the situation is much worse: applications are using it by accident during a memcpy() or such stuff that they most often do not even require the tiny savings brought by the instruction set, and such calls may happen way more often than one would accept to migrate the tasks, so the real result is that any task using a given libc would end up running exlusively on the avx-enabled core. What we really need is to turn such features into opt-in at the libc level so that we're not inflicted that trouble without consent. And by the way there are plenty of cases where using this significantly lowers the CPU's frequency and dramatically slows down the useful workload, which is another reason some people explicitly disable AVX512 on their machines.

Intel AMX support in 5.16

Posted Nov 11, 2021 18:07 UTC (Thu) by anton (subscriber, #25547) [Link] (1 responses)

AVX slowdown and even AVX-512 slowdown does not seem to be bad in recent Intel CPUs.

I think that AVX-512 support on heterogeneous CPUs where some cores don't support AVX-512 is not a big problem. There are several ways to deal with the situation. Sure you can come up with a scenario for every one of them where you would prefer a different result, but even in these scenarios the disadvantage of the not-preferred result is not that big, certainly not worse than outright disabling AVX-512 or outright disabling E-cores.

E.g. if you automatically reduce the cpu-list of a thread to the P-cores once an AVX-512 instructions is used, the worst case is that the E-cores won't be used. I guess many threads don't use AVX-512, so there is enough left for the E-cores; as for memcpy() and friends, the code for selecting the actual routine could be made more CPU-specific (rather than just checking the AVX-512 flags in cpuinfo).

Alternatively, only report the AVX-512 flags on threads where the cpu-list is limited to the cores that have AVX-512. So you won't get AVX-512 on ordinary threads. Given that relatively few code actually makes significant use of AVX-512, it's not a big problem that the user then has to call such code with taskset or somesuch.

In any case, in order to have such problems at all, we need CPUs that enable AVX-512 at the same time as E-cores. From what I read, Intel wanted to give us no AVX-512 at all, and board manufacturers give us either AVX-512 or E-cores, but not both.

Intel AMX support in 5.16

Posted Nov 12, 2021 11:44 UTC (Fri) by wtarreau (subscriber, #51152) [Link]

I also noticed newer cores are less impacted by this, they're making progress.

For memcpy() ideally the solution would be to only consider features that intersect all CPUs the task may run on, and not just the starting one. It's not much complicated after all, the most painful part is already done (except if it's relying on a cpuid instruction).


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds