Frequency-invariant utilization tracking for x86

Posted Apr 3, 2020 8:24 UTC (Fri) by marcH (subscriber, #57642)
Parent article: Frequency-invariant utilization tracking for x86

> Reading those MSRs is relatively expensive, so this calculation cannot be made often, but once per clock tick (every 1-10ms) turns out to be enough.

intel_pstate is confusingly two very different things in one: HardWare P-states vs not. I think the main advantage claimed by HardWare P-states (a.k.a. "Speed Shift") is lower latency and better efficiency thanks to among others very frequent load sampling, much more frequent than anything software could achieve.

It would have been interesting to prove that software can do better that "hardware accelerated governor", unfortunately the benchmarks seem to treat HWP and software intel_pstate like they were minor variants of the same thing...

I guess these comparisons can always be done later; it doesn't sound like this series removes anything. No big deal.

Frequency-invariant utilization tracking for x86

Posted Apr 3, 2020 10:13 UTC (Fri) by jan.kara (subscriber, #59161) [Link] (1 responses)

It depends on what you mean by "software can do better than "hardware accelerated governor"" - e.g. for workloads that are IO bound we have found some cases where HWP was worse than intel_pstate because it never considered CPU load to be high enough to bump up the frequency and so interrupt latency suffered...

Frequency-invariant utilization tracking for x86

Posted Apr 3, 2020 17:07 UTC (Fri) by marcH (subscriber, #57642) [Link]

Fascinating, thanks!

Considering scheduling and governing frequency is all about predicting the future, it makes sense a stream of randomly spaced packets is one of the toughest nuts to crack.

There is a gazillion of throughput benchmarks, we really need more latency benchmarks - especially for something advertised like "Speed Shift".

I googled "Kolivas for a 5 seconds" and instantly found this:
https://lwn.net/Articles/720227/
> The MuQSS scheduler has reportedly better Interbench benchmark scores than CFS. However, ultimately, it is hard to quantify "smoothness" and "responsiveness" and turn them into an automated benchmark, so the best way for interested users to evaluate MuQSS is to try it out themselves.

At $DAYJOB I've seen test reports bragging about video conferences "scoring" 59.7 FPS average over 1h, much better than the previous 57.9 FPS average. Like the user cared. Zero consideration for freezes, drops, out of sync audio,...

https://bravenewgeek.com/everything-you-know-about-latenc...