|
|
Subscribe / Log in / New account

NOHZ_FULL, isolated CPUs and reading CPU MSR

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 4, 2020 1:57 UTC (Sat) by vstinner (subscriber, #42675)
Parent article: Frequency-invariant utilization tracking for x86

Using NOHZ_FULL and isolated CPUs reduces the system jitter to run benchmarks. But it is incompatible with CPU drivers which rely on the scheduler callback to read frequently CPU MSRs at scheduler interruption.

If an isolated CPU never gets the scheduler interrupt, its workload is ignored to decide the P-state of the CPU. As a consequence, the performance of isolated CPUs only rely on the non-isolated CPUs workload. For a benchmark, it means that a benchmark can become suddenly 2x faster or slower...

How I found this issue in practice: https://vstinner.github.io/intel-cpus-part2.html

The maintainer of the intel_pstate driver just told me that he never tested isolated CPUs with NOHZ_FULL. Kernel realtime developers told me that NOHZ_FULL cannot work with intel_pstate by design.

Workaround: don't use NOHZ_FULL or use fixed CPU frequency.


to post comments

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 4, 2020 18:42 UTC (Sat) by jcm (subscriber, #18262) [Link] (1 responses)

...Or have another CPU core provide the information from the OS about the isolated cores. I'll be pushing some spec updates for CPPC, etc. that will allow for this scenario in the coming months.

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 6, 2020 4:06 UTC (Mon) by florianfainelli (subscriber, #61952) [Link]

I should really read upon CPPC, but in premise what vstinner describes is what we have encountered with systems that use TrustZone and the trusted OS mandates a specific CPU P-state to complete its duty cycle with the realtime deadline imposed. In that case though the trusted OS "wins" it all as the overall P-state decision is under control of an EL3 monitor which could be "lying" about the actual CPU cluster frequency to Linux. Our systems are multi-core (Cortex-A53) but the whole cluster has to be on the frequency and voltage point at any given time.

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 8, 2020 14:20 UTC (Wed) by nix (subscriber, #2304) [Link] (4 responses)

Kernel realtime developers told me that NOHZ_FULL cannot work with intel_pstate by design.
Really? This configuration is the common case for every distro kernel I checked. Sounds like we need better communication somewhere...

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 8, 2020 22:26 UTC (Wed) by vstinner (subscriber, #42675) [Link] (3 responses)

> Kernel realtime developers told me that NOHZ_FULL cannot work with intel_pstate by design.

Sorry, my sentence is wrong: the issue is not NOHZ_FULL alone, but NOHZ_FULL+isolated CPUs. I understood that intel_pstate is not compatible with isolated CPUs using NOHZ_FULL.

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 9, 2020 0:44 UTC (Thu) by nix (subscriber, #2304) [Link] (2 responses)

Oh right, that makes a lot more sense and explains why this hasn't caused more trouble (isolated CPUs are an exceedingly rare use case in the sort of generalist domains where enterprise kernels are used).

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Apr 18, 2020 18:20 UTC (Sat) by zlynx (guest, #2285) [Link] (1 responses)

I'm not too sure about "exceedingly rare" because several enthusiast forums I read give advice to use isolated CPUs, NOHZ_FULL and then explicitly assign CPUs to KVM virtual machines in order to get the very best Windows virtual machine performance.

Of course they also set performance to maximum, so this wouldn't affect power and frequency management.

Anyway, I believe this is more common than you may think.

NOHZ_FULL, isolated CPUs and reading CPU MSR

Posted Jun 2, 2020 1:36 UTC (Tue) by nix (subscriber, #2304) [Link]

Yeah: also enthusiast forums and enterprise kernels on stodgy old stability-first enterprise distros seem like things that won't be mixing very often. :)

(What they are presumably actually looking for here is CPU affinity with QEMU to try to keep a roughly 1:1 mapping between QEMU vCPU cores and real CPU cores. There were patches to do it inside QEMU but they never made it upstream and eventually bitrotted: it looks like libvirt does it from outside QEMU by brute force and cgroups.)


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds