February 10, 2010
This article was contributed by Oliver Neukum
Introduction
Linux has
supported system suspend to RAM and disk for several years now. This
valuable feature
has a major drawback, however: a system cannot be used while it is
suspended. Reducing the power a system consumes while in active use is
an even nicer feature. It is called "runtime power management."
This can be done by clocking down or switching off components. The
current kernel supports this mainly in form of CPU frequency
management and USB autosuspend.
The core kernel needs drivers to help
it in order to do runtime power management; some support beyond what
drivers need to do to support system suspension is necessary. Drivers
need to tell the rest of the kernel when a device may be suspended
without unduly impacting performance. Furthermore, drivers need to be
able to suspend and resume a device in a live system without the
process freezer protecting them from races. A driver for an ordinary character
device need not worry about suspend() and resume() racing
against open(),
read(), write() or ioctl().
This is no longer true if a driver uses
runtime power management, but techniques to avoid such races will
be shown later.
USB was the first subsystem in the kernel to
introduce runtime power management in the form of the USB autosuspend
feature; its success has led to the generic framework just being
merged.
USB 2.0 devices are rather simple in
terms of power management. They know just two modes with respect to
power management: active or suspended. They also retain all their
internal state when suspended. This makes the job of drivers easy in
the ideal case. The driver ceases IO to the device and suspends the
device when it is no longer needed and reverses the process when it
is needed again.
Testing USB autosuspend on a laptop
with the average set of built-in USB devices whose drivers all
supported autosuspend, I found power savings on order of about 1W.
The 6 laptops I tested on drew about 15W of power on average, so USB
autosuspend can
reduce power consumption by about 7%.
That said, USB autosuspend is not just
for laptops. All those single watts saved in a company's desktops
will add up to serious power savings. Even the blades in a data center
profit a bit as the root hubs are suspended, too.
API
The API for implementing USB
autosuspend is based on drivers telling the core USB subsystem
whenever a reason for not suspending a device arises or ceases to exist.
The subsystem counts the reasons why a device must not be
autosuspended; the core USB subsystem may then suspend a device whose
counters have reached zero. "Counters" is not a typo: a USB device may
consist of a multitude of interfaces, each of which may have its own
driver.
The counters are manipulated with "get"
and "put" functions which wake or suspend devices according to the
state of the counters. They are provided in synchronous and
asynchronous versions.
- usb_autopm_get_interface(struct usb_interface *);
-
Increment the counter and guarantee the device has been resumed
(may sleep)
- usb_autopm_put_interface(struct usb_interface *);
-
Decrement the counter (may sleep)
- usb_autopm_get_interface_async(struct usb_interface *);
-
Increment the counter, which will wake the device at a later
time (safe in atomic contexts).
- usb_autopm_put_interface_async(struct usb_interface *);
-
Decrement the counter (safe in atomic contexts)
The asynchronous versions were recently fixed in commit
ccf5b801 for the 2.6.32 release; earlier
kernels were buggy.
Those stuck with an older kernel for some reason cannot use these
functions.
For these manipulations of the counters
to have any effect, a driver must tell the USB subsystem that it
supports USB autosuspend. It does so by setting a flag in its
usb_driver structure. For example, the kaweth driver includes
this initialization:
static struct usb_driver kaweth_driver = {
/* ... */
.supports_autosuspend = 1,
};
The core USB subsystem guarantees
drivers that for all its calls to methods
of struct usb_driver,
except
for, of course, resume() and reset_resume(), the device
in question
has been resumed and won't be suspended while the call is in
progress.
Sysfs
Two sysfs attributes are
exported pertaining to USB autosuspend for each device.
- /sys/$DEVICE/power/level
- On for inactive
autosuspend, auto for
active autosuspend
- /sys/$DEVICE/power/autosuspend
-
The delay between counters reaching zero and autosuspend in
seconds.
The delay mentioned in this table serves a double function.
Firstly, some devices have a large energy consumption when resuming;
disks, for example, have to spin up. Suspending them for a very
short time saves no energy. The delay is a heuristic to avoid such
situations.
Secondly some devices need time to process data even after the host
has finished talking to them. So do not set this delay to zero unless you know
what you are doing.
Detecting idleness
Most devices are, obviously, idle most of
the time. Think about how often one uses the fingerprint sensor or
the camera built into most modern laptops. Even an Ethernet adapter is
almost always unused while the WLAN is active and vice versa.
User space tells the kernel when it may
require services of a device; an application must open a device before it can
use it. This is true for any device that maps to a character device
node and also for network devices, which are upped and downed. The notable
exceptions to this rule are few, mainly framebuffers and input
devices. These require considerable work to provide good runtime
power savings.
Autosuspend based on open and close
Code which follows this pattern the kernel
will not enable autosuspend for a device for which a file descriptor is held
open. It can also be used for network devices because they have an
equivalent to open() and close() in the form of ifconfig up
and ifconfig down.
Let us have a look at a driver that
implements this simple form of autosuspend:
From the kaweth driver:
static int kaweth_open(struct net_device *net)
{
struct kaweth_device *kaweth = netdev_priv(net);
int res;
res = usb_autopm_get_interface(kaweth->intf);
if (res) {
err("Interface cannot be resumed.");
return -EIO;
}
The driver calls
usb_autopm_get_interface() at the very beginning. This ensures that
the device will not be autosuspended after it has returned without an
error. The driver may henceforth assume that the device is usable and
may ignore the issue of power management until the device is closed
again. The driver must just make sure that it does no IO to the device before it
calls usb_autopm_get_interface().
A similar pattern is followed when the device is closed:
static int kaweth_close(struct net_device *net)
{
struct kaweth_device *kaweth = netdev_priv(net);
netif_stop_queue(net);
/* ... */
kaweth_kill_urbs(kaweth);
usb_autopm_put_interface(kaweth->intf);
The driver finishes all IO to the
device, then calls usb_autopm_put_interface().
For a conventional driver waiting for
all IO to finish is a very good idea; for a driver using this kind of
autosuspend it is mandatory.
Strictly speaking one cannot be sure exactly when
transferred data has been processed by the hardware. That's why the core
USB subsystem
introduces a small delay between the counters reaching zero and the first
attempt to autosuspend the device.
The normal implementations of suspend()
and resume() needed to support system sleep need not be altered much,
if at all. The reason they may need to be changed is locking, because
resume() can be called directly
from usb_autopm_get_interface(). Thus,
resume() must not attempt to retake a lock already held when
usb_autopm_get_interface(). In theory this restriction is obvious, in
practice this is the most common bug in resume().
The resume() function also operates under some
restrictions concerning memory allocations. It may use only GFP_NOIO
or GFP_ATOMIC to allocate memory. This restriction arises because the
kernel might otherwise try to resume another device to
launder pages. One should take care to get this right; otherwise this bug
will show
itself in very rare spurious deadlocks almost impossible to debug.
A driver's little helpers
For some types of devices there's a
generic driver for which subdrivers are written; USB serial devices are in
that category. For such devices this simple form of autosuspend is
already supported in generic code. A subdriver needs only to set
supports_autosuspend.
Autosuspend for devices that user
space has opened
Some devices are open for most of the
running time of the system. For such devices, power saving measures which are
active only in the closed mode are futile. The canonical example is
the keyboard which is literally always open. To get significant power
savings, the detection of idleness must be refined to the point that
periods of actual idleness can be detected after user space has
informed the kernel that services of a device may be required.
For output this is a comparatively easy
task. As user space requests that the kernel perform output to a
device, the device ceases to be idle. It becomes idle again when the
output has been completed.
Let us look at an example for how
output in the simple case is done.
As the open() method is no longer fine-grained enough an
instrument to determine idleness, the detection is
pushed down into the write() code path.
From the cdc-wdm driver (unrelated code has been removed):
static ssize_t wdm_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
u8 *buf;
int rv = -EMSGSIZE, r, we;
struct wdm_device *desc = file->private_data;
struct usb_ctrlrequest *req;
/* ... */
r = mutex_lock_interruptible(&desc->wlock); /* concurrent writes */
r = usb_autopm_get_interface(desc->intf);
set_bit(WDM_IN_USE, &desc->flags);
rv = usb_submit_urb(desc->command, GFP_KERNEL);
if (rv < 0) {
kfree(buf);
clear_bit(WDM_IN_USE, &desc->flags);
}
After some preliminaries a lock is
taken and usb_autopm_get_interface() is called.
Thereafter the
driver knows that the device is and will remain active. I/O can be
started just as if the driver didn't do runtime power management.
However, care must be taken to balance
the counters in the error case by calling
usb_autopm_put_interface().
As I/O
finishes, the counter must be decremented again. This is done in the
completion handler using usb_autopm_put_interface_async().
This example from usbhid shows how to do it.
static void tx_complete (struct urb *urb)
{
/* ... */
usb_autopm_put_interface_async(dev->intf);
urb->dev = NULL;
entry->state = tx_done;
defer_bh(dev, skb, &dev->txq);
}
It is literally a one-liner.
The PM message and using the return value of the
suspend() method
There's another facet of autosuspend that deserves to be mentioned. In
case all the counters mentioned here don't help, one can benignly fail an
autosuspend returning -EBUSY from suspend(). If this is
done during a full system suspend, the whole suspend operation will be
aborted. Therefore this should really be limited to autosuspend in rare
cases. Automatic suspend can be detected by testing
the PM_EVENT_AUTO bit in the event field of
the message parameter to suspend().
When suspend is aborted in this way, the core USB subsystem will retry the
autosuspension after the above-mentioned delay.
Remote wakeup and spontaneous input
Handling input in the same manner as
output hits a fundamental obstacle. The usual semantics of input
operations are that input data a device generates is stored in a
buffer and handed to user space as the read() system call is
executed. A driver cannot normally predict when a device will volunteer input
data.
To overcome this obstacle, USB has a
feature called "remote wakeup". The feature is optional,
but generally supported by devices it makes sense for.
A suspended device using remote wakeup
can tell the system that it would like to transfer input data. The
system is then required to resume the device. The feature can best be
thought of as an analog of interrupts: like interrupts on PCI
devices, remote wakeup with a USB device has to be explicitly enabled.
A driver requests that remote wakeup be
enabled by setting the aptly-named
needs_remote_wakeup flag
in
struct usb_interface. The core USB subsystem will never
autosuspend a device that does not support remote wakeup if any
of its interfaces' drivers request that remote wakeup be enabled.
Let us look at an example of how a
driver requests that remote wakeup be enabled:
From cdc-acm:
static int acm_tty_open(struct tty_struct *tty, struct file *filp)
{
struct acm *acm;
/* ... */
if (usb_autopm_get_interface(acm->control) < 0)
goto early_bail;
else
acm->control->needs_remote_wakeup = 1;
/* ... */
usb_autopm_put_interface(acm->control);
Note that a driver has to make sure its
device is active when it requests that remote wakeup be enabled. The
device will be automatically be resumed as input data becomes ready
to be transferred. The driver must take care that remote wakeup is
disabled when the device is closed again.
Marking a device busy
Waking up a device has some cost in
time and power; it takes about 40ms to wake up the device. Therefore
staying in the suspended mode for less than a few seconds is not
sensible. As already mentioned, there's a configurable delay between
the time the counters reach zero and autosuspend is attempted. When using
remote wakeup, however, the counters remain at zero all the time unless
they are incremented due to output. Yet a delay after the last time a
device is busy, that is, does I/O, and the next attempt to autosuspend
the device is highly desirable.
An API is provided for that purpose:
-
usb_mark_last_busy(struct usb_device *);
- Start the delay for the autosuspend anew from now on. Safe in atomic
context
This function restarts the delay every time it is
called.
Let us look at an example - from
cdc-acm:
static void acm_read_bulk(struct urb *urb)
{
struct acm_ru *rcv = urb->context;
struct acm *acm = rcv->instance;
/* ... */
if (!ACM_READY(acm)) {
dev_dbg(&acm->data->dev, "Aborting, acm not ready");
return;
}
usb_mark_last_busy(acm->dev);
}
The driver marks the device busy as it
receives data and then processes the received data.
This way,
autosuspend is attempted only if no input or output was performed
for the duration of the configurable delay.
Sleepless in the kernel
What is to be done if a driver cannot sleep in its write path? In that case
a simple solution can no longer be given. The driver needs to call
usb_autopm_get_interface_async() for every call to the write path,
just as
in the above example. The difference is that the driver cannot be sure that
the device is active after the call. Obviously, since it cannot wait for the
device to become active, I/O must be queued.
From
usbnet's usbnet_start_xmit():
spin_lock_irqsave(&dev->txq.lock, flags);
retval = usb_autopm_get_interface_async(dev->intf);
if (retval < 0) {
spin_unlock_irqrestore(&dev->txq.lock, flags);
goto drop;
}
#ifdef CONFIG_PM
/* if this triggers the device is still asleep */
if (test_bit(EVENT_DEV_ASLEEP, &dev->flags)) {
/* transmission will be done in resume */
usb_anchor_urb(urb, &dev->deferred);
/* no use to process more packets */
netif_stop_queue(net);
spin_unlock_irqrestore(&dev->txq.lock, flags);
devdbg(dev, "Delaying transmission for resumption");
goto deferred;
}
#endif
The
asynchronous API is used and errors handled. After that, if the
device is still asleep, I/O is queued. The
queued I/O must be actually started in resume().
From
usbnet's usbnet_resume():
spin_lock_irq(&dev->txq.lock);
while ((res = usb_get_from_anchor(&dev->deferred))) {
skb = (struct sk_buff *)res->context;
retval = usb_submit_urb(res, GFP_ATOMIC);
if (retval < 0) {
dev_kfree_skb_any(skb);
usb_free_urb(res);
usb_autopm_put_interface_async(dev->intf);
} else {
dev->net->trans_start = jiffies;
__skb_queue_tail(&dev->txq, skb);
}
}
smp_mb();
clear_bit(EVENT_DEV_ASLEEP, &dev->flags);
spin_unlock_irq(&dev->txq.lock);
Here, I/O
requests are taken from the queue and given to the hardware. Care
must be taken to handle the counters correctly in the error case.
A driver's not so little helpers
Usbnet implements both forms of
autosuspend for its subdrivers. If a subdriver sets
supports_autosuspend it gets the simple form of autosuspended.
If, instead, it defines
-
manage_power(struct usbnet *dev, int on);
-
Manage remote wakeup according to on (may sleep).
This function is supposed to set
needs_remote_wakeup based on "on"; it also gets
runtime power management while the interface is up.
Conclusion
I've
tried to show how, in most cases, significant power savings can be had
with little effort. I hope that many coders will find this useful in
their work. In runtime power management the whole is more than the
sum of the parts. Remember that all a device's interfaces must
support autosuspend for a device to be autosuspended and all a hub's
children must be suspended for the hub to be suspended. In this case
the chain breaks at the weakest link. Thus I hope every driver developer makes
at least a small effort to consider runtime power management.
[ The author would like to thank B1-Systems for their support. ]
(
Log in to post comments)