Brief items
The current development kernel is 2.6.33-rc7 released on February 6. "I have to admit
that I wish we had way fewer regressions listed by this time... But we've
certainly fixed a few things, and it's been a week, so here's -rc7. I wish
I could say that it's the last -rc, but I strongly doubt that, and we'll
almost certainly have at least one more." See the
full changelog for the details.
Stable updates: 2.6.32.8 was released on February 9. "Sorry for the delay in releasing it, but there were a few crashes that
people had reported, combined with verifying that a security problem
really was fixed and backported properly, along with travel to and from
FOSDEM, all [of] which caused delays." 2.6.27.45 remains as the latest
stable update for 2.6.27.
Comments (none posted)
You know, I'm -><- that close to posting a highly unprintable
rant about hooks in general, associated style of development and
resulting problems. With names named and *many* examples given.
LSM is essentially a trashcan and just about everything icky gets
swept over there. That's fine, as long as one doesn't care whether
their code makes sense and just wants to keep it away from
unfriendly eyes.
--
Al Viro
Comments (7 posted)
Kernel development news
By Jonathan Corbet
February 9, 2010
The release of the
2.6.33-rc7
prepatch indicates that this development cycle is headed toward a close,
even if Linus thinks that a -rc8 will be necessary. As has become
traditional, LWN has taken a look at some statistics related to this cycle
and where the code came from.
As of this writing, 10,500 non-merge commits have found their way into
2.6.33 - fairly normal by recent standards. These changes added almost
900,000 lines while deleting almost 520,000 others; as a result, the kernel
grew by a mere 380,000 lines this time around. According to the most recent regression list,
97 regressions have been reported in 2.6.33, of which 20 remain
unresolved.
Some 1,152 developers contributed code to 2.6.33. The most active of those
were:
| Most active 2.6.33 developers |
| By changesets |
| Ben Hutchings | 145 | 1.4% |
| Frederic Weisbecker | 145 | 1.4% |
| Arnaldo Carvalho de Melo | 138 | 1.3% |
| Luis R. Rodriguez | 130 | 1.2% |
| Masami Hiramatsu | 128 | 1.2% |
| Bartlomiej Zolnierkiewicz | 124 | 1.2% |
| Eric Dumazet | 108 | 1.0% |
| Alan Cox | 105 | 1.0% |
| Manu Abraham | 102 | 1.0% |
| Thomas Gleixner | 101 | 1.0% |
| Eric W. Biederman | 97 | 0.9% |
| Roel Kluin | 91 | 0.9% |
| Alexander Duyck | 88 | 0.8% |
| Paul Mundt | 87 | 0.8% |
| Johannes Berg | 80 | 0.8% |
| Wey-Yi Guy | 77 | 0.7% |
| Alex Deucher | 76 | 0.7% |
| Jean Delvare | 73 | 0.7% |
| Al Viro | 72 | 0.7% |
|
| By changed lines |
| Bartlomiej Zolnierkiewicz | 206468 | 18.1% |
| Henk de Groot | 50355 | 4.4% |
| Jerry Chuang | 49627 | 4.3% |
| Ben Skeggs | 37555 | 3.3% |
| Philipp Reisner | 23182 | 2.0% |
| Eilon Greenstein | 23123 | 2.0% |
| Tomi Valkeinen | 22508 | 2.0% |
| Mike Frysinger | 13116 | 1.1% |
| Ben Hutchings | 12680 | 1.1% |
| Jakob Bornecrantz | 11613 | 1.0% |
| Wu Zhangjin | 11325 | 1.0% |
| Greg Kroah-Hartman | 10468 | 0.9% |
| Rajendra Nayak | 9978 | 0.9% |
| Manu Abraham | 9625 | 0.8% |
| jack wang | 9171 | 0.8% |
| Masami Hiramatsu | 8973 | 0.8% |
| Alan Cox | 7672 | 0.7% |
| David VomLehn | 7331 | 0.6% |
| Arnaldo Carvalho de Melo | 7217 | 0.6% |
|
While some of the usual names appear at the top of this list, there are
some newcomers as well. Ben Hutchings did a lot of work with network
drivers, including the addition of the SolarFlare SFC9000 driver (which has
several co-authors). Frederic Weisbecker has been active in a number of
areas, adding the hardware breakpoints code, removing the big kernel lock
from the reiserfs filesystem, and working with tracing and the perf tool.
Arnaldo Carvalho de Melo's work is almost all with the perf events subsystem and
the perf tool in particular. Luis Rodriguez continues to work all over the
wireless driver subsystem, and with the Atheros drivers in particular, and
Masami Hiramatsu's largest contribution is the dynamic probing work.
In the "lines changed" column, Bartlomiej Zolnierkiewicz continues to work
in fixing up some wireless drivers in the staging tree, deleting a lot of
code in the process; he also continues his IDE driver work. Henk de Groot
added the Agere driver for HERMES II chipsets, Jerry Chuang added the
Realtek rtl8192u driver, and Ben Skeggs added much of the Nouveau driver.
Contributions to 2.6.33 came from 182 employers that your editor was able
to identify. The most active of those are:
| Most active 2.6.33 employers |
| By changesets |
| (None) | 1535 | 14.6% |
| Red Hat | 1223 | 11.6% |
| Intel | 1011 | 9.6% |
| (Unknown) | 868 | 8.3% |
| IBM | 500 | 4.8% |
| Novell | 390 | 3.7% |
| Nokia | 319 | 3.0% |
| (Consultant) | 316 | 3.0% |
| Fujitsu | 204 | 1.9% |
| Texas Instruments | 199 | 1.9% |
| Atheros Communications | 169 | 1.6% |
| (Academia) | 166 | 1.6% |
| AMD | 165 | 1.6% |
| Oracle | 136 | 1.3% |
| Analog Devices | 130 | 1.2% |
| Renesas Technology | 126 | 1.2% |
| Pengutronix | 125 | 1.2% |
| HP | 124 | 1.2% |
| Solarflare Communications | 123 | 1.2% |
|
| By lines changed |
| (None) | 304895 | 26.7% |
| (Unknown) | 109716 | 9.6% |
| Red Hat | 92991 | 8.1% |
| Broadcom | 54272 | 4.8% |
| Realtek | 49951 | 4.4% |
| Intel | 46302 | 4.1% |
| Nokia | 37505 | 3.3% |
| Novell | 27235 | 2.4% |
| IBM | 26783 | 2.3% |
| (Consultant) | 25845 | 2.3% |
| Texas Instruments | 24232 | 2.1% |
| LINBIT | 23247 | 2.0% |
| Analog Devices | 19677 | 1.7% |
| VMWare | 16045 | 1.4% |
| Samsung | 15707 | 1.4% |
| Solarflare Communications | 15054 | 1.3% |
| JiangSu Lemote Corp. | 11439 | 1.0% |
| AMD | 9218 | 0.8% |
| Universal Scientific Industrial Co. | 9194 | 0.8% |
|
As usual, Red Hat maintains its position at the top of the list, but others
are gaining; we may yet see a day when Red Hat is just one of several major
contributors. Some readers may be surprised to see Broadcom near the top
of the list, given that this company's reputation for contribution is not
the best. The truth of the matter is that Broadcom has several developers
contributing to various drivers in the networking and SCSI subsystems; it's
only in the wireless realm that the trouble starts.
For the fun of it, your editor typed the "changeset percent"
numbers for the last ten releases into a spreadsheet and got this plot:
The percentages are surprisingly stable over the course of almost three
years. The most obviously identifiable trends, perhaps, are the steady
increases in the contributions from Intel and Nokia.
All told, the process continues to function smoothly. The occasional
complaint about certain companies not fully participating in the process
notwithstanding, the picture is one of hundreds of companies cooperating to
a high degree to create the Linux kernel despite their fierce competition
elsewhere. The significant percentage of code coming from developers
working on their own time shows that Linux is not just a corporate
phenomenon, though. We have built a development community which is able to
incorporate the interests and work of an astonishingly wide variety of
people into a single kernel.
As always, thanks are due to Greg Kroah-Hartman, who has done a great deal of work to reduce the size of the "(Unknown)" entries in the tables above.
Comments (28 posted)
By Jake Edge
February 10, 2010
The perf tool for performance analysis is adding functionality quickly.
Since being added to the
mainline in 2.6.31, primarily as a means to access various CPU
performance counters, it has expanded its scope. Support for treating
kernel tracepoint events like performance counter events came into the
kernel at around the same time. More recently, though, Tom Zanussi has added
support for using perl and python scripts with the perf tool, making it
even easier to do sophisticated processing of perf events.
The perl support is already in the mainline, but Zanussi added a
python scripting engine more
recently. Interpreters for both perl and python can be embedded into the
perf executable, which allows processing the raw perf trace data stream in
either of those languages.
The perl scripting can be used from the 2.6.33-rc series, but the python
support is only available by applying Zanussi's patches to the tip tree.
Building perf in the tools/perf directory, which requires
development versions of various libraries and tools (glibc, elfutils, libdwarf,
perl, python, etc.), then gives access to the new functionality.
Multiple different example scripts are provided with perf, which can be
listed from perf itself:
# perf trace -l
List of available trace scripts:
syscall-counts [comm] system-wide syscall counts
syscall-counts-by-pid [comm] system-wide syscall counts, by pid
failed-syscalls-by-pid [comm] system-wide failed syscalls, by pid
workqueue-stats workqueue stats (ins/exe/create/destroy)
check-perf-trace useless but exhaustive test script
failed-syscalls [comm] system-wide failed syscalls
wakeup-latency system-wide min/max/avg wakeup latency
rw-by-file <comm> r/w activity for a program, by file
rw-by-pid system-wide r/w activity
This list is a mix of perl and python scripts that live in the
tools/perf/scripts/{perl,python} directories and get installed in
the proper location (
/root/libexec by default) after a
make
install.
The scripts themselves are largely generated by the perf trace
command. Zanussi's documentation for perf-trace-perl and perf-trace-python explain the
process of using perf trace to create the skeleton scripts, which
can then be edited to add the required functionality. Adding two helper
shell scripts (for recording and reporting) to the appropriate directory
will add new scripts to the list produced by perf trace described
above.
The installed scripts can then be used as follows:
# perf trace record failed-syscalls
^C[ perf record: Woken up 11 times to write data ]
[ perf record: Captured and wrote 1.939 MB perf.data (~84709 samples) ]
This captures the perf data into the appropriately named
perf.data
file, which can then be processed by:
# perf trace report failed-syscalls
perf trace started with Perl script \
/root/libexec/perf-core/scripts/perl/failed-syscalls.pl
failed syscalls, by comm:
comm # errors
-------------------- ----------
firefox 1721
claws-mail 149
konsole 99
X 77
emacs 56
[...]
failed syscalls, by syscall:
syscall # errors
------------------------------ ----------
sys_read 2042
sys_futex 130
sys_mmap_pgoff 71
sys_access 33
sys_stat64 5
sys_inotify_add_watch 4
[...]
# perf trace report failed-syscalls-by-pid
perf trace started with Python script \
/root/libexec/perf-core/scripts/python/failed-syscalls-by-pid
syscall errors:
comm [pid] count
------------------------------ ----------
firefox [10144]
syscall: sys_read
err = -11 1589
syscall: sys_inotify_add_watch
err = -2 4
firefox [10147]
syscall: sys_futex
err = -110 7
[...]
This simple example shows using the
failed-syscalls script to
gather the data, then processing it with the corresponding perl script as
well as a compatible python script (
failed-syscall-by-pid) that slices the same data somewhat
differently. The first report shows a count of each system call that
failed during the few seconds while the trace was active. It shows the
number of errors by process, as well as by system call.
The second report combines the two and shows each process along with a
which system calls failed for it, and how many times. There are also
corresponding scripts that count all system calls, not just those that
failed, and report
on them similarly. Wakeup latency, file read/write activity, and workqueue
statistics are the focus of some of the other provided scripts.
These scripting features will make it that much easier for kernel
hackers—or possibly those who aren't—to access the perf
functionality. The state of tracing and instrumentation in the kernel has
been quick to develop over the last few development cycles. It doesn't
look to be slowing down anytime soon.
Comments (3 posted)
February 10, 2010
This article was contributed by Oliver Neukum
Introduction
Linux has
supported system suspend to RAM and disk for several years now. This
valuable feature
has a major drawback, however: a system cannot be used while it is
suspended. Reducing the power a system consumes while in active use is
an even nicer feature. It is called "runtime power management."
This can be done by clocking down or switching off components. The
current kernel supports this mainly in form of CPU frequency
management and USB autosuspend.
The core kernel needs drivers to help
it in order to do runtime power management; some support beyond what
drivers need to do to support system suspension is necessary. Drivers
need to tell the rest of the kernel when a device may be suspended
without unduly impacting performance. Furthermore, drivers need to be
able to suspend and resume a device in a live system without the
process freezer protecting them from races. A driver for an ordinary character
device need not worry about suspend() and resume() racing
against open(),
read(), write() or ioctl().
This is no longer true if a driver uses
runtime power management, but techniques to avoid such races will
be shown later.
USB was the first subsystem in the kernel to
introduce runtime power management in the form of the USB autosuspend
feature; its success has led to the generic framework just being
merged.
USB 2.0 devices are rather simple in
terms of power management. They know just two modes with respect to
power management: active or suspended. They also retain all their
internal state when suspended. This makes the job of drivers easy in
the ideal case. The driver ceases IO to the device and suspends the
device when it is no longer needed and reverses the process when it
is needed again.
Testing USB autosuspend on a laptop
with the average set of built-in USB devices whose drivers all
supported autosuspend, I found power savings on order of about 1W.
The 6 laptops I tested on drew about 15W of power on average, so USB
autosuspend can
reduce power consumption by about 7%.
That said, USB autosuspend is not just
for laptops. All those single watts saved in a company's desktops
will add up to serious power savings. Even the blades in a data center
profit a bit as the root hubs are suspended, too.
API
The API for implementing USB
autosuspend is based on drivers telling the core USB subsystem
whenever a reason for not suspending a device arises or ceases to exist.
The subsystem counts the reasons why a device must not be
autosuspended; the core USB subsystem may then suspend a device whose
counters have reached zero. "Counters" is not a typo: a USB device may
consist of a multitude of interfaces, each of which may have its own
driver.
The counters are manipulated with "get"
and "put" functions which wake or suspend devices according to the
state of the counters. They are provided in synchronous and
asynchronous versions.
- usb_autopm_get_interface(struct usb_interface *);
-
Increment the counter and guarantee the device has been resumed
(may sleep)
- usb_autopm_put_interface(struct usb_interface *);
-
Decrement the counter (may sleep)
- usb_autopm_get_interface_async(struct usb_interface *);
-
Increment the counter, which will wake the device at a later
time (safe in atomic contexts).
- usb_autopm_put_interface_async(struct usb_interface *);
-
Decrement the counter (safe in atomic contexts)
The asynchronous versions were recently fixed in commit
ccf5b801 for the 2.6.32 release; earlier
kernels were buggy.
Those stuck with an older kernel for some reason cannot use these
functions.
For these manipulations of the counters
to have any effect, a driver must tell the USB subsystem that it
supports USB autosuspend. It does so by setting a flag in its
usb_driver structure. For example, the kaweth driver includes
this initialization:
static struct usb_driver kaweth_driver = {
/* ... */
.supports_autosuspend = 1,
};
The core USB subsystem guarantees
drivers that for all its calls to methods
of struct usb_driver,
except
for, of course, resume() and reset_resume(), the device
in question
has been resumed and won't be suspended while the call is in
progress.
Sysfs
Two sysfs attributes are
exported pertaining to USB autosuspend for each device.
- /sys/$DEVICE/power/level
- On for inactive
autosuspend, auto for
active autosuspend
- /sys/$DEVICE/power/autosuspend
-
The delay between counters reaching zero and autosuspend in
seconds.
The delay mentioned in this table serves a double function.
Firstly, some devices have a large energy consumption when resuming;
disks, for example, have to spin up. Suspending them for a very
short time saves no energy. The delay is a heuristic to avoid such
situations.
Secondly some devices need time to process data even after the host
has finished talking to them. So do not set this delay to zero unless you know
what you are doing.
Detecting idleness
Most devices are, obviously, idle most of
the time. Think about how often one uses the fingerprint sensor or
the camera built into most modern laptops. Even an Ethernet adapter is
almost always unused while the WLAN is active and vice versa.
User space tells the kernel when it may
require services of a device; an application must open a device before it can
use it. This is true for any device that maps to a character device
node and also for network devices, which are upped and downed. The notable
exceptions to this rule are few, mainly framebuffers and input
devices. These require considerable work to provide good runtime
power savings.
Autosuspend based on open and close
Code which follows this pattern the kernel
will not enable autosuspend for a device for which a file descriptor is held
open. It can also be used for network devices because they have an
equivalent to open() and close() in the form of ifconfig up
and ifconfig down.
Let us have a look at a driver that
implements this simple form of autosuspend:
From the kaweth driver:
static int kaweth_open(struct net_device *net)
{
struct kaweth_device *kaweth = netdev_priv(net);
int res;
res = usb_autopm_get_interface(kaweth->intf);
if (res) {
err("Interface cannot be resumed.");
return -EIO;
}
The driver calls
usb_autopm_get_interface() at the very beginning. This ensures that
the device will not be autosuspended after it has returned without an
error. The driver may henceforth assume that the device is usable and
may ignore the issue of power management until the device is closed
again. The driver must just make sure that it does no IO to the device before it
calls usb_autopm_get_interface().
A similar pattern is followed when the device is closed:
static int kaweth_close(struct net_device *net)
{
struct kaweth_device *kaweth = netdev_priv(net);
netif_stop_queue(net);
/* ... */
kaweth_kill_urbs(kaweth);
usb_autopm_put_interface(kaweth->intf);
The driver finishes all IO to the
device, then calls usb_autopm_put_interface().
For a conventional driver waiting for
all IO to finish is a very good idea; for a driver using this kind of
autosuspend it is mandatory.
Strictly speaking one cannot be sure exactly when
transferred data has been processed by the hardware. That's why the core
USB subsystem
introduces a small delay between the counters reaching zero and the first
attempt to autosuspend the device.
The normal implementations of suspend()
and resume() needed to support system sleep need not be altered much,
if at all. The reason they may need to be changed is locking, because
resume() can be called directly
from usb_autopm_get_interface(). Thus,
resume() must not attempt to retake a lock already held when
usb_autopm_get_interface(). In theory this restriction is obvious, in
practice this is the most common bug in resume().
The resume() function also operates under some
restrictions concerning memory allocations. It may use only GFP_NOIO
or GFP_ATOMIC to allocate memory. This restriction arises because the
kernel might otherwise try to resume another device to
launder pages. One should take care to get this right; otherwise this bug
will show
itself in very rare spurious deadlocks almost impossible to debug.
A driver's little helpers
For some types of devices there's a
generic driver for which subdrivers are written; USB serial devices are in
that category. For such devices this simple form of autosuspend is
already supported in generic code. A subdriver needs only to set
supports_autosuspend.
Autosuspend for devices that user
space has opened
Some devices are open for most of the
running time of the system. For such devices, power saving measures which are
active only in the closed mode are futile. The canonical example is
the keyboard which is literally always open. To get significant power
savings, the detection of idleness must be refined to the point that
periods of actual idleness can be detected after user space has
informed the kernel that services of a device may be required.
For output this is a comparatively easy
task. As user space requests that the kernel perform output to a
device, the device ceases to be idle. It becomes idle again when the
output has been completed.
Let us look at an example for how
output in the simple case is done.
As the open() method is no longer fine-grained enough an
instrument to determine idleness, the detection is
pushed down into the write() code path.
From the cdc-wdm driver (unrelated code has been removed):
static ssize_t wdm_write(struct file *file, const char __user *buffer,
size_t count, loff_t *ppos)
{
u8 *buf;
int rv = -EMSGSIZE, r, we;
struct wdm_device *desc = file->private_data;
struct usb_ctrlrequest *req;
/* ... */
r = mutex_lock_interruptible(&desc->wlock); /* concurrent writes */
r = usb_autopm_get_interface(desc->intf);
set_bit(WDM_IN_USE, &desc->flags);
rv = usb_submit_urb(desc->command, GFP_KERNEL);
if (rv < 0) {
kfree(buf);
clear_bit(WDM_IN_USE, &desc->flags);
}
After some preliminaries a lock is
taken and usb_autopm_get_interface() is called.
Thereafter the
driver knows that the device is and will remain active. I/O can be
started just as if the driver didn't do runtime power management.
However, care must be taken to balance
the counters in the error case by calling
usb_autopm_put_interface().
As I/O
finishes, the counter must be decremented again. This is done in the
completion handler using usb_autopm_put_interface_async().
This example from usbhid shows how to do it.
static void tx_complete (struct urb *urb)
{
/* ... */
usb_autopm_put_interface_async(dev->intf);
urb->dev = NULL;
entry->state = tx_done;
defer_bh(dev, skb, &dev->txq);
}
It is literally a one-liner.
The PM message and using the return value of the
suspend() method
There's another facet of autosuspend that deserves to be mentioned. In
case all the counters mentioned here don't help, one can benignly fail an
autosuspend returning -EBUSY from suspend(). If this is
done during a full system suspend, the whole suspend operation will be
aborted. Therefore this should really be limited to autosuspend in rare
cases. Automatic suspend can be detected by testing
the PM_EVENT_AUTO bit in the event field of
the message parameter to suspend().
When suspend is aborted in this way, the core USB subsystem will retry the
autosuspension after the above-mentioned delay.
Remote wakeup and spontaneous input
Handling input in the same manner as
output hits a fundamental obstacle. The usual semantics of input
operations are that input data a device generates is stored in a
buffer and handed to user space as the read() system call is
executed. A driver cannot normally predict when a device will volunteer input
data.
To overcome this obstacle, USB has a
feature called "remote wakeup". The feature is optional,
but generally supported by devices it makes sense for.
A suspended device using remote wakeup
can tell the system that it would like to transfer input data. The
system is then required to resume the device. The feature can best be
thought of as an analog of interrupts: like interrupts on PCI
devices, remote wakeup with a USB device has to be explicitly enabled.
A driver requests that remote wakeup be
enabled by setting the aptly-named
needs_remote_wakeup flag
in
struct usb_interface. The core USB subsystem will never
autosuspend a device that does not support remote wakeup if any
of its interfaces' drivers request that remote wakeup be enabled.
Let us look at an example of how a
driver requests that remote wakeup be enabled:
From cdc-acm:
static int acm_tty_open(struct tty_struct *tty, struct file *filp)
{
struct acm *acm;
/* ... */
if (usb_autopm_get_interface(acm->control) < 0)
goto early_bail;
else
acm->control->needs_remote_wakeup = 1;
/* ... */
usb_autopm_put_interface(acm->control);
Note that a driver has to make sure its
device is active when it requests that remote wakeup be enabled. The
device will be automatically be resumed as input data becomes ready
to be transferred. The driver must take care that remote wakeup is
disabled when the device is closed again.
Marking a device busy
Waking up a device has some cost in
time and power; it takes about 40ms to wake up the device. Therefore
staying in the suspended mode for less than a few seconds is not
sensible. As already mentioned, there's a configurable delay between
the time the counters reach zero and autosuspend is attempted. When using
remote wakeup, however, the counters remain at zero all the time unless
they are incremented due to output. Yet a delay after the last time a
device is busy, that is, does I/O, and the next attempt to autosuspend
the device is highly desirable.
An API is provided for that purpose:
-
usb_mark_last_busy(struct usb_device *);
- Start the delay for the autosuspend anew from now on. Safe in atomic
context
This function restarts the delay every time it is
called.
Let us look at an example - from
cdc-acm:
static void acm_read_bulk(struct urb *urb)
{
struct acm_ru *rcv = urb->context;
struct acm *acm = rcv->instance;
/* ... */
if (!ACM_READY(acm)) {
dev_dbg(&acm->data->dev, "Aborting, acm not ready");
return;
}
usb_mark_last_busy(acm->dev);
}
The driver marks the device busy as it
receives data and then processes the received data.
This way,
autosuspend is attempted only if no input or output was performed
for the duration of the configurable delay.
Sleepless in the kernel
What is to be done if a driver cannot sleep in its write path? In that case
a simple solution can no longer be given. The driver needs to call
usb_autopm_get_interface_async() for every call to the write path,
just as
in the above example. The difference is that the driver cannot be sure that
the device is active after the call. Obviously, since it cannot wait for the
device to become active, I/O must be queued.
From
usbnet's usbnet_start_xmit():
spin_lock_irqsave(&dev->txq.lock, flags);
retval = usb_autopm_get_interface_async(dev->intf);
if (retval < 0) {
spin_unlock_irqrestore(&dev->txq.lock, flags);
goto drop;
}
#ifdef CONFIG_PM
/* if this triggers the device is still asleep */
if (test_bit(EVENT_DEV_ASLEEP, &dev->flags)) {
/* transmission will be done in resume */
usb_anchor_urb(urb, &dev->deferred);
/* no use to process more packets */
netif_stop_queue(net);
spin_unlock_irqrestore(&dev->txq.lock, flags);
devdbg(dev, "Delaying transmission for resumption");
goto deferred;
}
#endif
The
asynchronous API is used and errors handled. After that, if the
device is still asleep, I/O is queued. The
queued I/O must be actually started in resume().
From
usbnet's usbnet_resume():
spin_lock_irq(&dev->txq.lock);
while ((res = usb_get_from_anchor(&dev->deferred))) {
skb = (struct sk_buff *)res->context;
retval = usb_submit_urb(res, GFP_ATOMIC);
if (retval < 0) {
dev_kfree_skb_any(skb);
usb_free_urb(res);
usb_autopm_put_interface_async(dev->intf);
} else {
dev->net->trans_start = jiffies;
__skb_queue_tail(&dev->txq, skb);
}
}
smp_mb();
clear_bit(EVENT_DEV_ASLEEP, &dev->flags);
spin_unlock_irq(&dev->txq.lock);
Here, I/O
requests are taken from the queue and given to the hardware. Care
must be taken to handle the counters correctly in the error case.
A driver's not so little helpers
Usbnet implements both forms of
autosuspend for its subdrivers. If a subdriver sets
supports_autosuspend it gets the simple form of autosuspended.
If, instead, it defines
-
manage_power(struct usbnet *dev, int on);
-
Manage remote wakeup according to on (may sleep).
This function is supposed to set
needs_remote_wakeup based on "on"; it also gets
runtime power management while the interface is up.
Conclusion
I've
tried to show how, in most cases, significant power savings can be had
with little effort. I hope that many coders will find this useful in
their work. In runtime power management the whole is more than the
sum of the parts. Remember that all a device's interfaces must
support autosuspend for a device to be autosuspended and all a hub's
children must be suspended for the hub to be suspended. In this case
the chain breaks at the weakest link. Thus I hope every driver developer makes
at least a small effort to consider runtime power management.
[ The author would like to thank B1-Systems for their support. ]
Comments (5 posted)
Patches and updates
Kernel trees
Core kernel code
Device drivers
Filesystems and block I/O
Memory management
Networking
Architecture-specific
Virtualization and containers
Benchmarks and bugs
Page editor: Jonathan Corbet
Next page: Distributions>>