Kernel support for processor undervolting

November 2, 2020

This article was contributed by Marta Rybczyńska

Overclocking the processor — running it above its specified maximum frequency to increase performance — is a familiar operation for many readers. Sometimes, however, it is necessary to go the other direction and decrease a processor's operating power point by lowering its voltage to avoid overheating. Recently, Jason Donenfeld submitted a short patch removing a warning emitted by the kernel when user space accesses special processor registers that allow this "undervolting" on x86 processors. It caused a long discussion that might result in a kernel interface to allow users to safely control their processor's voltage.

Voltage, frequency, and undervolting

Current processors can run with any of a number of combinations of frequency and voltage, which can change dynamically in a process called dynamic frequency scaling. Different combinations of frequency and voltage will naturally vary in terms of both the number of instructions executed per second and power consumption. It is possible to place a CPU into a configuration outside of its specified operational envelope; when this is done, the processor may malfunction in a number of ways, from occasional false results from some instructions to a complete crash.

For some users, lowering the operating voltage is a necessity. Their chips, especially recent Intel laptop models, can overheat while running under high load, for example when compiling a kernel. One solution is to undervolt the processors, making them run at the lower voltage to decrease power consumption (and thus heat generation). As the frequency does not change, the performance of the system stays about the same. Fortunately for those users, tools like intel-undervolt exist to help them in this task. However, they face two difficulties: the values to use are undocumented and vary from one processor to the next, and the kernel prints a worrisome warning every time the tool changes the configuration.

In the case of Intel chips, the voltage settings are controlled by Model Specific Registers (MSRs), which do not just serve to change the voltage, as MSRs are an interface to many processor settings. On Linux, access to the MSRs from user space is possible using /dev/cpu/CPUID/msr special files. Write access can be disabled, however, via the msr.allow_writes boot-time option or if the kernel is running in lockdown mode. Within the kernel, MSR access requires specific processor instructions and is handled by the msr platform-specific driver. This driver emits a warning when an attempt is made to write to a MSR that is not explicitly listed as being safe to change; it still allows the write to happen, however, if writes are enabled in general.

Donenfeld's patch silences that warning by adding an entry to the list of safe MSRs. That entry, named MSR_IA32_OC_MAILBOX by the patch, allows changing the processor voltage; it is the register used by intel-undervolt and other similar tools. Interested readers can refer to a background paper on how those registers are configured. Apparently, this work is based on partial documentation and a significant amount of reverse engineering with trial and error.

Undervolting as an essential feature

Donenfeld's patch sparked a discussion about why direct access to MSRs from user space is necessary. Borislav Petkov suggested that it would be better to provide controlled access to specific registers via sysfs and remove the ability to write directly to registers. He later went further, suggesting disabling user-space access to MSRs altogether by default. That provoked a number of reactions from users who feel that this capability is essential. Donenfeld explained that his system requires undervolting to remain usable and there are many other users in the same situation:

Well that's not cool. And it's sure to really upset the fairly sizable crowd of people who rely on undervolting and related things to make their laptops remotely usable, especially in light of the crazy thermal designs for late-era 14nm intel cpus. [...] I know that my laptop, at least, would suffer.

Another example came from Sultan Alsawaf, who described his experiences with a number of laptop processors. Undervolting is necessary on all of them when performing tasks like compiling the kernel; it results in a 22-30% power use reduction and improved performance. "I'd like to point out that on Intel's recent 14nm parts, undervolting is not so much for squeezing every last drop of performance out of the SoC as it is for necessity", he said. Petkov acknowledged this use case, saying that it should be better supported: "Sounds to me that this undervolting functionality should be part of the kernel and happen automatically". Donenfeld noted that doing it automatically could be hard, though, since the correct value varies from one chip to the next depending on the "silicon lottery".

If this functionality is to be properly supported by the kernel, there are some other questions to answer as well. Donenfeld asked where the right place to do such operations is: whether it belongs in the kernel or user space. Petkov then responded strongly in favor of the creation of "a proper interface" in the kernel. He also mentioned the in-tree x86_energy_perf_policy tool that uses a different MSR; that MSR too, he said, can be taken off the allowlist once a real kernel interface to that functionality exists. Donenfeld agreed with this goal, but said it might be hard to achieve in practice because the MSRs are not all publicly documented and differ in their semantics.

Srinivas Pandruvada, maintainer of Intel power-related drivers, responded that overclocking (along with undervolting, presumably) is not an architectural interface. There is also no public documentation of the commands to be passed to this specific MSR. He promised to look for that documentation internally. A proper sysfs interface, he said, would have to perform checks of the passed values to prevent users from crashing their systems.

Toward a solution

At that point, Andy Lutomirski, maintainer of many x86-related subsystems, commented that MSR access and undervolting are two separate topics. According to him, MSR access should be allowed (with warnings emitted) only if restrictions are off, but the undervolting feature should be supported by the kernel. He did point out a potential problem with lockdown, though, noting that this feature could destabilize the system and perhaps enable privilege escalation. He proposed a separate lockdown bit for this feature. Matthew Garrett pointed out the Plundervolt [PDF] attack, which allows the corruption of Software Guard Extensions (SGX) enclaves using undervolting. He also noted that a sysfs interface would allow adding an SELinux or AppArmor rule and thus protect the interface if needed.

About then, Pandruvada returned with the answers from Intel. It turns out that the correct values come from experimentation and Intel's guide warns about possible stability issues. There is kernel code that uses the MSR in question now (the intel_turbo_max_3 driver), so the operation of that MSR is public, but there is no way to validate the commands written to it, he said.

The discussion about where to put the functionality continued for some time until Dave Hansen proposed that Intel developers look into the documentation of the MSRs of as many models as possible and create a separate driver, perhaps for only one model at first. Petkov agreed, and the discussion stopped at that point.

Kernel developers have thus come to an agreement that the undervolting feature is essential for some users, who require it to keep their CPU in reasonable thermal conditions. The path toward providing this feature has also been laid out. One blocking point may be the lack of official documentation, but it looks like there is a will from Intel to solve this problem. The work still needs to be done, but we can hope that the new interface is going to appear soon.

Index entries for this article
Kernel	Architectures/x86
GuestArticles	Rybczynska, Marta

Kernel support for processor undervolting

Posted Nov 2, 2020 18:08 UTC (Mon) by darwi (subscriber, #131202) [Link] (1 responses)

> I have an i9-9880H laptop which, at 4000 MHz on all CPUs while compiling a kernel, uses 107W with the stock configuration. Upon undervolting, this figure goes down to 88W, a ~22% reduction in power consumption at the maximum pstate. The point of this is not primarily to reduce the power footprint of my machine, but to get the thermals of this CPU under control. Without undervolting, my i9-9880H is quite useless, and quickly reaches 100 degC with the fans at max speed when placed under load.

> Donenfeld noted that doing it automatically could be hard, though, since the correct value varies from one chip to the next depending on the "silicon lottery".

TL;DR late-era 14nm Intel cpus, or at least the manufactures putting them inside laptops without proper thermal packaging, suck.

Kernel support for processor undervolting

Posted Nov 3, 2020 12:22 UTC (Tue) by WolfWings (subscriber, #56790) [Link]

It's more that there is no proper thermal packaging for these chips at all. Their TDW ratings are almost all averages under some unspecified benchmarks instead of true peak TDW's. So for average browsing? The thermals are fine. But if you actually try to load down the CPU fully (and who would buy a top-flight CPU in a laptop unless they're going to load it down fully) the "rated" TDW gets defenestrated with extreme prejudice.

Some generations of Surface laptops are infamous for this: You get WORSE performance if you got the models with "upgraded" CPUs because the higher CPU's hit thermal throttle near-instantly under full load.

Kernel support for processor undervolting

Posted Nov 2, 2020 19:10 UTC (Mon) by intelfx (subscriber, #130118) [Link] (2 responses)

Is undervolting still relevant at all in light of Plundervolt, which is commonly “mitigated” by unconditionally disabling undervolting on firmware level?

I assumed Plundervolt killed undervolting, at least en masse. Is this not true? Or are we talking strictly about old generations of laptops, carefully kept off the relevant firmware patches by a handful of enthusiasts?

Kernel support for processor undervolting

Posted Nov 3, 2020 12:24 UTC (Tue) by WolfWings (subscriber, #56790) [Link] (1 responses)

Plundervolt can be entirely ignored if you're not using SGX. It's only a way to circumvent the SGX mechanisms. Hint: You're very likely not using SGX.

Kernel support for processor undervolting

Posted Nov 3, 2020 13:37 UTC (Tue) by intelfx (subscriber, #130118) [Link]

Yes, but tell that to laptop vendors.

Kernel support for processor undervolting

Posted Nov 2, 2020 23:28 UTC (Mon) by flussence (guest, #85566) [Link]

I would greatly appreciate this on an old AMD (btver1) I have, so that I don't have to run it permanently underclocked. Chances of anything AMD getting decent Linux support these days seem to be nil though.

Kernel support for processor undervolting

Posted Nov 3, 2020 3:56 UTC (Tue) by magila (guest, #49627) [Link] (7 responses)

The proper (at least if you ask Intel) way to deal with an overheating processor is to lower the power limit settings to a level that the cooling solution can handle. You give up some frequency doing this of course, but you'll still be operating within the envelope which has been validated at the factory.

While undervolting doesn't carry the risk of degrading the silicon like overclocking, it does still involve operating outside of what the manufacturer has validated as stable. Most overlockers don't do adequate validation that their overclocks are stable (firing up Prime95 or IntelBurnTest is often enough to crash their systems) and I doubt these undervolters are either. There's a reason why Intel treats messing with these MSRs as "overclocking" regardless and doesn't support it.

Kernel support for processor undervolting

Posted Nov 3, 2020 11:37 UTC (Tue) by leromarinvit (subscriber, #56850) [Link] (4 responses)

While operating an electronic component out of spec is certainly something no manufacturer would ever support (so don't complain to Intel if your CPU crashes or produces incorrect results when undervolted), I don't think the kernel should actively try to prevent it. If it's *my* system, then *I* want to decide how I use it and what risks I'm willing to take. Perhaps a taint flag would be adequate?

> Most overlockers don't do adequate validation that their overclocks are stable (firing up Prime95 or IntelBurnTest is often enough to crash their systems) and I doubt these undervolters are either.

People who overclock or undervolt without adequate testing are idiots, probably looking for nothing more than (dubious) bragging rights. Of course idiots exist, but at the same time there's no shortage of people painstakingly documenting their (often very thorough) testing procedures when performing such experiments. I know I spent many hours running Prime95 and other tools when I undervolted my Pentium M laptop some 15 years or so ago, and (after finding the lowest stable voltage) I never had any problems whatsoever with it.

Obviously I didn't test at the extremes of the manufacturer supported environmental conditions, but for non-critical usage, "works for me" is fine by me. And for anything critical, relying on a single implementation of a single algorithm running on a single machine (an out of spec one at that) is stupid anyway.

Kernel support for processor undervolting

Posted Nov 3, 2020 12:37 UTC (Tue) by Wol (subscriber, #4433) [Link]

> While operating an electronic component out of spec is certainly something no manufacturer would ever support (so don't complain to Intel if your CPU crashes or produces incorrect results when undervolted)

The problem is, AS SUPPLIED TO THE CUSTOMER, normal operation results in the processor going out of spec! If the processor cannot be used within spec without problems, then the customer is bound to be upset and want to do something about it.

Cheers,
Wol

Kernel support for processor undervolting

Posted Nov 3, 2020 15:53 UTC (Tue) by luto (guest, #39314) [Link]

> I don't think the kernel should actively try to prevent it. If it's *my* system, then *I* want to decide how I use it and what risks I'm willing to take. Perhaps a taint flag would be adequate?

The kernel doesn’t particularly care if you run your CPU out of spec, although a taint flag might make sense. The kernel cares about using the wildly unsafe userspace MSR interface to do so. As it stands, poking at the MSRs used for undervolting may race against other uses of those MSRs, resulting in unpredictable behavior.

Kernel support for processor undervolting

Posted Nov 4, 2020 6:42 UTC (Wed) by fulke (guest, #140430) [Link] (1 responses)

Agreed. Proper solution should be setting PL1/PL2 value to limit watt usage. Of course then the CPU downclocked but it's trade-off. (Blame Intel's process or Manufacturer's poor heat design)

BTW support for undervolting is great! It's hacky feature like overclocking rather than proper solution.

Kernel support for processor undervolting

Posted Nov 15, 2020 17:37 UTC (Sun) by roblucid (guest, #48964) [Link]

That's extremely unfair characterisation, dies vary and under-volting is merely reducing the V below the worst case required for the skew when passing validation testing.
If the die functions correctly it is wasting less power and running cooler.
It is the same binning done by manufacturers in providing power efficient skews they sell operating at lower voltage.

Kernel support for processor undervolting

Posted Nov 9, 2020 16:01 UTC (Mon) by abufrejoval (guest, #100159) [Link] (1 responses)

And I am doing this for i7 NUC8 and NUC10 variants, which have fine grained controls for PL1/2 and their timings in their BIOS, which I use mostly to ensure that their fans never get annoying.

The problem is that regrettably notebooks don't tend to expose these options and I'm afraid I've already overcooked a battery in an otherwise rather nice Lenovo Yoga S730 with an i7-8565U, because it's lost 30% of its capacity after a year, when that battery was practically never used.

But I had the machine run as a CentOS7/oVirt server when I wasn't on the road with it (not since the start of Covid-19), where the default power management profile is set to 'virtual host' by default. With the lid closed it did get rather warmer than it should for the sake of the battery, while the cooling fan actually stayed nicely unobtrusive.

From the sound of it, the ability to set the PL1/2 parameters and their timings via a kernel boot flag would cover a lot of ground already, while full control typically comes with plenty of opportunities for additional mistakes or loopholes.

Kernel support for processor undervolting

Posted Nov 9, 2020 20:38 UTC (Mon) by mjg59 (subscriber, #23239) [Link]

If the firmware hasn't locked them then PL1/PL2 can be modified using the intel_rapl driver and the nodes under /sys/class/powercap. If the firmware /has/ locked them, the kernel can't override them.

Kernel support for processor undervolting

Posted Nov 3, 2020 5:00 UTC (Tue) by alison (subscriber, #63752) [Link] (4 responses)

Indefensible:
"Srinivas Pandruvada, maintainer of Intel power-related drivers, responded that overclocking (along with undervolting, presumably) is not an architectural interface. There is also no public documentation of the commands to be passed to this specific MSR. He promised to look for that documentation internally."

In a sane world, the processor-specific MSR registers and their interface would be made visible via ACPI so that sysfs could be properly set up. It's hard to understand how Intel's customers let them get away with this lack of documentation.

Thanks to Marta Rybczyńska for another fine article.

Kernel support for processor undervolting

Posted Nov 3, 2020 17:54 UTC (Tue) by luto (guest, #39314) [Link] (3 responses)

I suspect that those of Intel’s big customers who care enough manage to get NDA docs.

Part of the problem here is that Intel takes “architectural” features quite seriously. When Intel commits to an architectural feature, they are promising to retain compatibility for a long long time. With things like the frequency and voltage control in recent CPUs, the underlying way that they’re managed changes dramatically on a regular basis. Intel probably doesn’t want to guarantee that the knob that undervolting currently turns will continue to make any sense on future CPUs.

Unfortunately, Intel seems to be pretty bad as documenting details like “this is how it works *now*; no promises that it will continue to work this way.” It’s also worth noting that documentation is expensive. For docs to be any good, not only do they need to be written, but someone needs to verify that the docs match the hardware and the software implementations. In contrast, it’s easier to hack together something that works under the exact parameters under which it’s shipped.

Skipping the engineering involved in making a real spec cuts both ways, of course. Take a look at the series of bugs and unfortunate behavior in the interactions between various flavors of AVX and thermal / power management. Building something that appears to work without exploring the nasty corners means that it could very well be possible that the corners are wrong in a way that actually matters.

Kernel support for processor undervolting

Posted Nov 3, 2020 19:55 UTC (Tue) by alison (subscriber, #63752) [Link]

> Building something that appears to work without exploring the nasty corners means that it could very well be possible that the corners are wrong in a way that actually matters.

Violently agree! Code review and unit tests are your friends. Intel would benefit too from providing this data via ACPI.

Kernel support for processor undervolting

Posted Nov 4, 2020 10:47 UTC (Wed) by nilsmeyer (guest, #122604) [Link] (1 responses)

May be interesting to look at this for AVX2 and AVX512 (though those CPUs are rare) workloads where the power draw is usually higher. I think most of those prime / cpu burn things do not use any vector instructions.

Kernel support for processor undervolting

Posted Nov 4, 2020 20:53 UTC (Wed) by nivedita76 (subscriber, #121790) [Link]

prime95 does use AVX. It's core algorithm uses a multiplication implemented via fast Fourier transforms.