Intel AMX support in 5.16
Intel AMX support in 5.16
Posted Nov 10, 2021 22:25 UTC (Wed) by bartoc (guest, #124262)In reply to: Intel AMX support in 5.16 by jak90
Parent article: Intel AMX support in 5.16
Also, most apps using avx-512 are not doing it unconditionally, but rather call cpuid and check the results. Because cpuid completely serializes execution it's very much not fast, and so apps tend to just call it once, in a static initializer or similar. calling it before each portion of code using AVX-512 is just not fast at all, so you'd want to make the "can I do avx-512" part of the per-thread state, and have the scheduler change it for you when it decided to schedule you on a CPU with different features.
For AVX-512 on something like alder lake I think you'd need a system call that's essentially "do I have AVX-512" and the kernel could say yes or no (even if there are some cores with AVX-512), but if it said yes then it would promise not to schedule you on any cores without AVX-512 until you were done. Hopefully this would be implementable without actually making a real transition to kernel mode by setting some per-thread flag the scheduler could look at when needed. Even this (pretty complicated) mechanism poses some problems, because apps might not tell the kernel when they are done, either because they forget, or because the kernel told them they could use the fancy instructions and they don't want to give the core back. This would be a particular problem on laptops where I'd imagine the kernel might want to get everyone off the P cores so they could be completely powered off. Unfortunately once the app has started doing it's fancy AVX-512 things the kernel can't unilaterally decide to take back the permission to do AVX-512, as even if it handled the illegal instructions after moving the thread to an E core it can't go back in time to have the process take the non-avx branch. So you might get situations (a little like with switchable graphics) where you have long running apps that ask for AVX-512, don't tell the kernel when they are done with it, and then cause pretty dramatic reductions in battery life for no reason.
I suppose the kernel _could_ forcibly reschedule the process by somehow implementing a software version of the AVX-512 instructions, that way you'd just get somewhat extreme slowness.
Another option would be for intel themselves to implement such software versions of AVX-512 instructions, and use their execution as input into their new hardware scheduler thingy to indicate that maybe the thread should be scheduled on a P core.
Posted Nov 11, 2021 10:05 UTC (Thu)
by wtarreau (subscriber, #51152)
[Link] (2 responses)
Posted Nov 11, 2021 18:07 UTC (Thu)
by anton (subscriber, #25547)
[Link] (1 responses)
I think that AVX-512 support on heterogeneous CPUs where some cores don't support AVX-512 is not a big problem. There are several ways to deal with the situation. Sure you can come up with a scenario for every one of them where you would prefer a different result, but even in these scenarios the disadvantage of the not-preferred result is not that big, certainly not worse than outright disabling AVX-512 or outright disabling E-cores.
E.g. if you automatically reduce the cpu-list of a thread to the P-cores once an AVX-512 instructions is used, the worst case is that the E-cores won't be used. I guess many threads don't use AVX-512, so there is enough left for the E-cores; as for memcpy() and friends, the code for selecting the actual routine could be made more CPU-specific (rather than just checking the AVX-512 flags in cpuinfo).
Alternatively, only report the AVX-512 flags on threads where the cpu-list is limited to the cores that have AVX-512. So you won't get AVX-512 on ordinary threads. Given that relatively few code actually makes significant use of AVX-512, it's not a big problem that the user then has to call such code with taskset or somesuch.
In any case, in order to have such problems at all, we need CPUs that enable AVX-512 at the same time as E-cores. From what I read, Intel wanted to give us no AVX-512 at all, and board manufacturers give us either AVX-512 or E-cores, but not both.
Posted Nov 12, 2021 11:44 UTC (Fri)
by wtarreau (subscriber, #51152)
[Link]
For memcpy() ideally the solution would be to only consider features that intersect all CPUs the task may run on, and not just the starting one. It's not much complicated after all, the most painful part is already done (except if it's relying on a cpuid instruction).
Intel AMX support in 5.16
AVX slowdown and even AVX-512 slowdown does not seem to be bad in recent Intel CPUs.
Intel AMX support in 5.16
Intel AMX support in 5.16