By Jonathan Corbet
December 5, 2012
The kernel's power management subsystem has become increasingly effective
over recent years, to the point that our CPU power management is said to be
second to none. But, while the kernel endeavors to minimize the power
consumed by a given workload, it lacks mechanisms to put an overall limit
on the amount of power consumed. The recently-announced
PowerClamp driver by Jacob Pan and Arjan van
de Ven is intended to change that situation on Intel processors.
Most users will never want to use PowerClamp. As a general rule,
when one has purchased hardware with a given computational capability, one
wants that full capability to be available when needed. But there are
situations where it makes sense to run a system below its full speed. Data
centers have power-consumption and cooling constraints that can argue
against running all systems flat-out all the time. Even the owner of an
individual laptop or handheld system may wish to ensure that its operating
temperature does not exceed a given value; an overly hot laptop can be
uncomfortable to work with, even if it is still working within its
specified temperature range. So there can be value in telling the system
to run slower at times.
The PowerClamp driver allows the system administrator to set a desired idle
percentage by way of a sysfs attribute. That percentage is capped at 50%
in the current implementation. Once a percentage has been set, the kernel
monitors the actual idle time for each processor in the system. Should a
processor's idle time fall below the desired idle percentage, a special
kernel thread
(called kidle_inject/N, where N is the number of the CPU
to which the thread is assigned) is created to take corrective
action.
That thread operates as a high-priority realtime process, so it is able to
respond quickly when needed. Its job is relatively simple: look at the amount
of idle time on its assigned CPU and calculate the difference from the
desired idle time. Then, periodically, the thread will run, disable the
clock tick, and force the CPU into a sleep state for the required amount
of time. The sleeping is done for a given number of jiffies, so
the sleep states tend to be relatively long — a necessary condition for an
effective reduction in power usage.
Naturally, the PowerClamp thread will continue to monitor actual idle time
as it operates, adjusting the amount of forced sleep time as needed. It
also monitors the amount of desired sleep time that is lost to interrupts.
Interrupts remain enabled during the forced sleep, so they can bring the
processor back to an operational state before the PowerClamp driver would
have otherwise done so. Over time, the amount of sleep time lost in this
manner is tracked; the driver will then attempt to compensate by increasing
the amount of forced sleep time to try to pull the CPU back to the original
idle time target.
By itself, PowerClamp can come close to achieving the desired level of idle
time on a system with a changing workload. Often, though, the real goal is
not idle time as such; instead, the purpose is to keep the system within a
given level of power consumption or a set of thermal limits. Doing that
will require the implementation of additional logic in user space. By
monitoring the parameter of interest, a user-space process can implement a
control loop that adjusts the desired level of idle time as needed. The
PowerClamp driver can respond relatively quickly to those changes, giving the
control process an effective tool for the management of the amount of power
used by the system.
The driver has been through a couple of revisions with little in the way of
substantive comments. This patch poses a relatively small risk to the
system, since it
does not do anything if the feature is not in use. It could thus conceivably
be ready for merging as soon as the 3.8 development cycle. Some more
information can be found in the documentation
file included with the patch.
(
Log in to post comments)