Kernel support for processor undervolting
Overclocking the processor — running it above its specified maximum frequency to increase performance — is a familiar operation for many readers. Sometimes, however, it is necessary to go the other direction and decrease a processor's operating power point by lowering its voltage to avoid overheating. Recently, Jason Donenfeld submitted a short patch removing a warning emitted by the kernel when user space accesses special processor registers that allow this "undervolting" on x86 processors. It caused a long discussion that might result in a kernel interface to allow users to safely control their processor's voltage.
Voltage, frequency, and undervolting
Current processors can run with any of a number of combinations of frequency and voltage, which can change dynamically in a process called dynamic frequency scaling. Different combinations of frequency and voltage will naturally vary in terms of both the number of instructions executed per second and power consumption. It is possible to place a CPU into a configuration outside of its specified operational envelope; when this is done, the processor may malfunction in a number of ways, from occasional false results from some instructions to a complete crash.
For some users, lowering the operating voltage is a necessity. Their chips, especially recent Intel laptop models, can overheat while running under high load, for example when compiling a kernel. One solution is to undervolt the processors, making them run at the lower voltage to decrease power consumption (and thus heat generation). As the frequency does not change, the performance of the system stays about the same. Fortunately for those users, tools like intel-undervolt exist to help them in this task. However, they face two difficulties: the values to use are undocumented and vary from one processor to the next, and the kernel prints a worrisome warning every time the tool changes the configuration.
In the case of Intel chips, the voltage settings are controlled by Model Specific Registers (MSRs), which do not just serve to change the voltage, as MSRs are an interface to many processor settings. On Linux, access to the MSRs from user space is possible using /dev/cpu/CPUID/msr special files. Write access can be disabled, however, via the msr.allow_writes boot-time option or if the kernel is running in lockdown mode. Within the kernel, MSR access requires specific processor instructions and is handled by the msr platform-specific driver. This driver emits a warning when an attempt is made to write to a MSR that is not explicitly listed as being safe to change; it still allows the write to happen, however, if writes are enabled in general.
Donenfeld's patch silences that warning by adding an entry to the list of safe MSRs. That entry, named MSR_IA32_OC_MAILBOX by the patch, allows changing the processor voltage; it is the register used by intel-undervolt and other similar tools. Interested readers can refer to a background paper on how those registers are configured. Apparently, this work is based on partial documentation and a significant amount of reverse engineering with trial and error.
Undervolting as an essential feature
Donenfeld's patch sparked a discussion about why direct access to MSRs from user space is necessary. Borislav Petkov suggested that it would be better to provide controlled access to specific registers via sysfs and remove the ability to write directly to registers. He later went further, suggesting disabling user-space access to MSRs altogether by default. That provoked a number of reactions from users who feel that this capability is essential. Donenfeld explained that his system requires undervolting to remain usable and there are many other users in the same situation:
Another example came from Sultan Alsawaf, who described
his experiences with a number of laptop processors. Undervolting is
necessary on all of them when performing tasks like compiling the kernel;
it results in a 22-30% power use reduction and improved
performance. "I'd like to point out that on Intel's recent 14nm
parts, undervolting is not so much for squeezing every last drop of
performance out of the SoC as it is for necessity
", he said. Petkov
acknowledged
this use case, saying that it should be better supported: "Sounds to me
that this undervolting functionality should be part of the kernel and
happen automatically
". Donenfeld noted
that doing it automatically could be hard, though, since the correct value
varies from one chip to the next depending on the "silicon
lottery
".
If this functionality is to be properly supported by the kernel, there are
some other questions to answer as well. Donenfeld asked
where the right place to do such operations is: whether it belongs in
the kernel or user space. Petkov then responded
strongly in favor of the creation of "a proper interface
" in
the kernel. He also mentioned the in-tree x86_energy_perf_policy
tool that uses a different MSR; that MSR too, he said, can be taken off
the allowlist once a real kernel interface to that functionality
exists. Donenfeld agreed
with this goal, but said it might be hard to achieve in practice because the
MSRs are not all publicly documented and differ in their semantics.
Srinivas Pandruvada, maintainer of Intel power-related drivers, responded that overclocking (along with undervolting, presumably) is not an architectural interface. There is also no public documentation of the commands to be passed to this specific MSR. He promised to look for that documentation internally. A proper sysfs interface, he said, would have to perform checks of the passed values to prevent users from crashing their systems.
Toward a solution
At that point, Andy Lutomirski, maintainer of many x86-related subsystems, commented that MSR access and undervolting are two separate topics. According to him, MSR access should be allowed (with warnings emitted) only if restrictions are off, but the undervolting feature should be supported by the kernel. He did point out a potential problem with lockdown, though, noting that this feature could destabilize the system and perhaps enable privilege escalation. He proposed a separate lockdown bit for this feature. Matthew Garrett pointed out the Plundervolt [PDF] attack, which allows the corruption of Software Guard Extensions (SGX) enclaves using undervolting. He also noted that a sysfs interface would allow adding an SELinux or AppArmor rule and thus protect the interface if needed.
About then, Pandruvada returned with the answers from Intel. It turns out that the correct values come from experimentation and Intel's guide warns about possible stability issues. There is kernel code that uses the MSR in question now (the intel_turbo_max_3 driver), so the operation of that MSR is public, but there is no way to validate the commands written to it, he said.
The discussion about where to put the functionality continued for some time until Dave Hansen proposed that Intel developers look into the documentation of the MSRs of as many models as possible and create a separate driver, perhaps for only one model at first. Petkov agreed, and the discussion stopped at that point.
Kernel developers have thus come to an agreement that the undervolting feature is essential for some users, who require it to keep their CPU in reasonable thermal conditions. The path toward providing this feature has also been laid out. One blocking point may be the lack of official documentation, but it looks like there is a will from Intel to solve this problem. The work still needs to be done, but we can hope that the new interface is going to appear soon.
| Index entries for this article | |
|---|---|
| Kernel | Architectures/x86 |
| GuestArticles | Rybczynska, Marta |
