LWN.net Logo

Kernel Summit 2005: The power management summit

July 20, 2005

This article was contributed by Patrick Mochel.

On Sunday, 17 July 2005, there was a meeting of several kernel developers on the topic of power management with the goal of sorting out some of the details that have been causing much disagreement and confusion in the last few years. In Kernel Land these days, such a meeting is called a "Summit," and so, for 8 hours this week,the first Linux Power Management Summit took place.

Power Management is a big, complicated topic with many things working against it. Instead of being contained in a single subsystem or being relevant on a single architecture, it has the potential to affect users of nearly every type of computer. Furthermore, it can mean one of a number of things to different people, depending on the platform most familiar to them: system suspend states, CPU performance scaling, runtime power management, or general efficiency. And, many of those things can behave very differently depending on the CPU architecture platforms. Discussions can get lively. Our goal on Sunday was to sit down and determine what we could agree upon.

The attendees of the summit were:

  • Pavel Machek (Novell)
  • Vojtech Pavlik (Novell)
  • Nigel Cunningham (Cyclades)
  • Benjamin Herrenschmidt (IBM)
  • Len Brown (Intel)
  • Alexey Starikovskiy (Intel)
  • Greg Kroah-Hartman
  • Patrick Mochel

Even though there are many more people with a vested interest in power management, and some of the interested parties maintain more embedded systems that one can shake a USB memory stick at, the goal for this initial meeting was to keep the group small, restricted to those most active on general PM infrastructure, and focused. As such, the group was most concerned with x86 systems, especially notebook computers.

Because of our expertise, we wanted to focus on the two main concerns of users of those systems: system power management (where the entire system goes to a low power state, e.g. suspend-to-RAM and suspend-to-Disk) and runtime power management (where individual devices selectively or automatically enter low power states when not in use). The two other main topics in most peoples' minds, CPU performance scaling and embedded power management, were touched upon briefly.

System power management

System power management is well known to users of all notebook computers. For a long time, it was known as those great features that worked more or less flawlessly on other operating systems, and not at all on Linux. That has changed quite a bit, especially in the last year. at least one major distribution enables suspend-to-disk by default and allows users to use suspend-to-RAM (though with the caveat that it may not work).

Perception

We still have some big problems with system power management, the largest of which is perception. Many people believe, based on past experiences, that it's unstable, that it has a tendency to corrupt user's data, and that the code is unmanageable. The happy users will tell you otherwise. It works reliably on many systems, and has even been ported to the PowerPC by Ben. Both Pavel and Nigel assured the group that they've received no reports of data corruption in a long time.

Many kernel developers have a reluctance to test or audit Linux power management code, which many believe is holding it back. Even after this author implored Kernel Summit attendees last year to at least try it, it's unlikely that many people have. It's unclear how to change people's perception, but the PM summit attendees realize that the key to its success is wider adoption and acceptance.

Drivers

The majority of issues that arise with system suspend states are related to drivers. The most serious issue today is with video drivers when resuming from a suspend-to-RAM state. On many systems, Linux is responsible for reinitializing the video hardware and restoring it to its previous state. Unfortunately, this a very difficult task, considering the complexity of the video chipsets, and the documents necessary to do so are rarely, if ever, distributed by the hardware vendors.

Len Brown assured the group that Intel is putting pressure on BIOS writers and system vendors with Intel chipsets to support Linux especially with regard to power management. If this works out as well as planned, it means that the BIOS will eventually reinitialize the video chipset when resuming, so Linux won't have to worry about it. However, this will only be true for platforms with Intel video hardware.

For everything else, PM summit attendees came to the conclusion that there is little the PM core can, or should, do. It is the video driver's responsibility to restore the device to a usable state. Just because there are competing video drivers in the kernel, and still more reside outside of the kernel, they shouldn't be treated specially. Since there seems to be a general trend towards moving video drivers out of the kernel (and into e.g. X), there was some discussion about the proper way to support that using an in-kernel video driver stub (since the kernel can't safely access the video hardware even to print a character, it is better done early in the process rather waiting for the switch back to userspace and trying to suppress all console access).

When entering suspend-to-RAM, a video driver should disable the console. If it can reinitialize the card when resuming from RAM, then it should do so. If there is an application or library in userspace that can, or will, do so, it should create a kernel thread to run the program. This userspace helper should be self-contained, do its job quickly, and return to kernel space, where the kernel thread should exit and the driver should re-enable the console.

Greg Kroah-Hartman mentioned that he had already volunteered to implement the correct support for an ATI Radeon chipset. Most likely this will serve as a positive example for other developers to follow.

Suspend2 and Software Suspend

There was agreement among the attendees that Nigel Cunningham's suspend-to-disk patches ("Suspend2") are stable and worthwhile to many users. It was suggested that he begin the process of merging his patches with Pavel Machek's in-kernel software suspend implementation. A lengthy discussion followed about strategies for doing so and the philosophy of gradual kernel development.

To briefly recap: Suspend2 is very robust and feature rich. Not only does it include a reliable process freezer, it has the ability to compress and encrypt the suspended image and includes a graphical status bar. Although it apparently does receive positive reviews from users, most kernel developers do not care about such eye candy. It was suggested and agreed that Nigel will split the patches (all 69 of them so far) into functional groups, and push them separately. We agreed that the process freezer patches would come first, which should also benefit the existing suspend implementation as well. Next will most likely be the new algorithmic core and eventually the plugin architecture and graphical features. It was heavily stressed that Nigel and Pavel must work together and that the more effort that is put in to making the patches smaller and simpler, the easier it will be to merge this work.

Other Issues

There were three other issues related to system power management that were discussed at the PM Summit.

  • Suspend flags. It was agreed that we need to pass different flags via the pm_message_t argument to individual drivers' suspend and resume methods.

  • The 2.6.13 kernel will impose greater requirements on the suspend and resume methods of PCI drivers. They must now release their IRQ on suspend and reacquire it on resume. This requirement is documented in Documentation/power/pci.txt, and is based on the recent ACPI changes to not save/restore the PCI IRQ Link objects from the ACPI namespace.

  • There was a potential issue brought up about BIOS reserved pages. Pavel suspects that the suspend code should not save them because there have been some odd interactions with regard to ACPI when restoring them (since they may contain shared data which seems to be changing between the time that the system is turned on and the image is restored).

Runtime power management

The PM summit attendees had hoped to spend a considerable amount of time discussing runtime power management. For better or worse, the discussions had to be completed within just a couple of hours. This left less time for brainstorming, but we managed to condense the discussion down to a list of commonly agreed upon items.

  • The driver model needs a "bus instance" data type.

    This would be an object that is created for each bus present on the system, regardless of type of bus (PCI, USB, SCSI, etc). This will be used for a number of reasons, in this context for keeping track of the power states of each device.

  • Drivers are responsible for knowing and tracking when a device is idle.

    How this happens is up to the driver, and it will probably be common across a device class (e.g. sound, networking). We need some good examples of this working to a) show others how to do it, and b) define the requirements for some common infrastructure (via struct device or struct class_device) to help this effort.

    When a driver detects "idleness," it can transition the device to a low-power state automatically after a certain amount of time. The amount of time and the exact power state to enter should be controlled via files in sysfs. We need a framework (some helpers) to export these attributes via sysfs, but it will be the responsibility of some early adopters to implement these things on their own.

    When a device is automatically powered down, the driver must resume it when requests come in. Whether this happens on open(), read() or socket() is up to the driver and most likely going to be common to the class.

  • Drivers need to bubble their "idleness" up the device tree.

    When a device automatically suspends, it must somehow notify the bus it resides on (using the bus instance mentioned above). When all the devices on the bus are put into a low power state, the bus must go into a low-power state and notify its parent bus.

    This feature can save a lot of power of many laptop systems. USB is the "Holy Grail" of this area. It causes a lot of power to be consumed even when there are no USB devices being used (by raising IRQs and keeping the CPU from staying in a low-power state). However, USB is going to be difficult to convert to this model.

  • We need an interface for userspace to power down a specific device and a sub-tree of devices.

    We also need an attribute exported for at least some devices that will specify whether or not the device should wake up automatically when a request comes in (or whether it should wait until userspace specifically wakes it up).

  • We want a separate hierarchy for power management dependencies.

    This would be represented via a distinct object type and exported via sysfs. It would allow both runtime and system power management to accurately and easily traverse the electrical hierarchy, without having to have the drivers make a lot of special case checks to determine what device is the next to power down (which is impossible most of the time because the core cannot discern the power hierarchy).

In short, there's a lot to do. A lot of this work is in the power management and driver model core code. This means that once it's written, it should be correct and stable. However, this also means it will take some time to get right and will require some heavy lifting by a small number of individuals. The general sentiment of the summit was that everyone would like to see this work done but all of the individuals present are already oversubscribed. It may be some time before this work could even be started.

Embedded Systems and power management

Since there were no summit attendees that currently work full time on embedded systems, the attendees did not want to make assertions about the different systems and power management schemes. However, the summit attendees chose to come to agreement on what they knew about the embedded state of things (even if was very little).

  • The maintainers of the driver model and power management cores need the different embedded camps to work together and come up with some common framework among themselves.

    There are several different power management infrastructures for embedded systems (CELF, DPM from MontaVista, etc). They each support a number of systems and have happy users. But, it's unclear whether they are compatible or conflict with one another. The maintainers cannot determine this on their own and cannot merge all of the competing schemes.

  • The embedded camps need to review the changes for runtime power management as they happen and suggest changes that can be made to better facilitate their effort.

    It is unreasonable to expect the runtime power management implementors to accomodate every unique PM scheme. However, it is their responsibility to not implement code that will prevent some platform port from realizing its fullest potential by enforcing poor policy on the platform.

    It is the responsibility of people like embedded developers to notify the implementors of these potential issues.

Conclusion

The attendees of the power management summit agreed that the session was valuable to the progress of the project. It was the first time they had all sat down in a room together and talked about the project. There were many power management topics that were left untouched, including many that are in the forefront of many other developers' and vendors minds. Most agree that it will take many days, if not weeks, to discuss all of the issues, let alone implement all of the necessary infrastructure and features. More than anything, the PM summit set the stage for many future face-to-face interactions on the topic in the future.


(Log in to post comments)

For me it is a downhill....

Posted Jul 20, 2005 13:09 UTC (Wed) by nhasan (guest, #1699) [Link]

I have a Dell I8K and I have been using SUSE on it since I bought it. My experience with power management on this laptop is going from bad to worse in the suspend area. At first I was able to Suspend-to-RAM and restore successfully even while running X. After upgrading to SUSE 9.1, I could not longer succefully restore the system if I suspended while in X. I had to exit to console to suspend. Recently, I upgraded to SUSE 9.3 and now that does not work either and I am forced to shutdown every time. What a waste of time.

For me it is a downhill....

Posted Jul 31, 2005 9:04 UTC (Sun) by marineam (subscriber, #28387) [Link]

I also have a Dell I8K, and have managed the reverse. Although I run Gentoo, not SuSE. Using Suspend2's hibernate script which will unload modules for troublesome hardware clears up most problems. Both suspend to ram and disk work using that script. The only tricky part is doing suspend to ram with the framebuffer enabled. With no framebuffer Gentoo's Xorg can take care of reinitilizing the video card (ATI), but to suspend with the console frame buffer enabled a kernel patch is required. I have no idea what the status of the Nvidia video cards is though.

For me it is a downhill....

Posted Aug 8, 2005 21:18 UTC (Mon) by barrygould (guest, #4774) [Link]

I have a Dell D800, which I believe is about the same as the I8000.

On it, enabling the power-on password in the BIOS solves the video-reset problem when using the NV driver.

However, it does not work if using NVidia's driver.

Barry

For me it is a downhill....

Posted Aug 16, 2005 18:15 UTC (Tue) by EricBackus (guest, #2816) [Link]

I have a Dell I5000e, which I believe is one generation older than the
I8000. I've had mixed results with power management, but I can't really
say it's been downhill, more sideways.

Originally, I used APM, which allowed me to suspend-to-RAM successfully
and also allowed some kind of "standby" mode. The kernel didn't support
suspend-to-disk, but the BIOS actually has a built-in suspend-to-disk that
can be used (but don't use it, it's very slow and somewhat buggy).

Over time, the ACPI code in the kernel got to where it works fairly well
with this laptop, and today that's what I use. I can suspend-to-disk with
this. However, suspend-to-RAM doesn't work at all, nor does standby, and
ACPI goes crazy (dumping messages to /var/log/messages fast enough to fill
the disk quickly) if I close the laptop lid when the laptop is powered up.
I'm running SuSE 9.3 now, so I can verify that suspend-to-disk works on
the I5000e.

In spite of Linus's comment, I'd have to say that suspend-to-disk is the
most important power management mode, so I'm actually reasonably happy
today.

The power management summit

Posted Jul 20, 2005 13:50 UTC (Wed) by riteshsarraf (subscriber, #11138) [Link]

I don't know how is Suspend2 being termed stable.
For me yet till now, only the in-kernel suspend code has been working stable for months.
With every new release of Suspend2 I try, it fails. Probably it needs more testing before a merge

The power management summit

Posted Jul 21, 2005 2:24 UTC (Thu) by NCunningham (guest, #6457) [Link]

There is plenty of documentation on the Suspend2 web site (http://suspend2.net), and also a mailing list available (also accessible through the site). If you ask on there, I'm sure we'll be able to help you.

Problems almost always related to a complete or partial lack of driver support in some area. The most common areas are USB and DRI/DRM. If USB is built as modules and unloaded while suspending, and DRI is disabled, you can work around the issue. I know and agree that these are not the best solutions and you shouldn't have to do them, but I can't fix the drivers myself.

If your issue turns out to be a real suspend2 bug, let me know and I'll seek to fix it. I can't do so, however, if I don't know there's a problem.

Hope this helps.

Nigel (suspend2 kernel patch author).

The power management summit

Posted Jul 20, 2005 16:21 UTC (Wed) by josh_stern (guest, #4868) [Link]

Resuming after suspension on WinXP doesn't work all that reliably either, especially in the video area, even though the people who wrote the modules presumably had access to all the available card specs.

Regarding the issue with virtual consoles and video cards with lots of RAM - 1 GB of video ram *is* a power user who could be expected to afford a few dollars extra for a few GB more of swap space, so it seems to me that the preferred approach would be to treat the video memory as a special part of the virtual memory system that gets pre-paged to swap at a very low priority and perhaps with some built in latency - in that case there would usually be little extra delay when switching.

The power management summit

Posted Jul 30, 2005 13:27 UTC (Sat) by niner (subscriber, #26151) [Link]

> Regarding the issue with virtual consoles and video cards with lots of RAM - 1 GB of video ram *is* a power user who could be expected to afford a few dollars extra for a few GB more of swap space

Even cheap comnsumercards (starting at 46 Euro here in Austria) ship with 256MB RAM and more and this will only rise. So maybe 1GB is a poweruser now, but in one or two years it could be pretty much standard. And even 256MB is something to worry about.

The power management summit

Posted Jul 20, 2005 19:59 UTC (Wed) by tjw.org (guest, #20716) [Link]

I count myself lucky that my laptop is too old to support APCI.

What was so bad about APM anyway? It sure works nice for me:

1) open lid and push space bar (wakes up)

2) my X session is restored in about 1-2 seconds (even before the disk finishes spinning up)

3) close lid on laptop and the BIOS puts it into APM suspend-to-RAM. The equiv of running the 'apm --suspend' command.

Now compare this to my "modern" APCI desktop machine. Until recently Linux won't even handle suspend-to-RAM (I'm not even sure if it does now). Windows which does support it takes about 30-45 seconds to wake up!

That's progress I guess.

The power management summit

Posted Jul 26, 2005 16:19 UTC (Tue) by shane (subscriber, #3335) [Link]

I tend to agree. I have a old Dell Pentium 133 laptop that suspends to
RAM and restores just fine. Although the battery only lasts a few
minutes away from AC, it is enough to move the machine around, and I use
suspend to RAM rather than powering the machine off since a reboot is so
painful.

My new Dell Latitude D400 has a known-bad ACPI, and no APM. Restore from
suspend to disk fails when it suspended in the docking station, and gets
confused about wireless networking at all times.

Of course, I haven't tried suspend to anything in 6 months. I'll give it
a shot tonight. :)

The power management summit

Posted Jul 31, 2005 2:14 UTC (Sun) by lotso (guest, #31409) [Link]

For those uninitiated with suspend2, it is by far more stable and faster than what's currently available with swsusp which is in mainline. You really have to try it out to check out how cool it is. On my Dell D600, suspend and resume has been flawless since I installed it a few months ago.

it's also quick, resuming from 1.5GB of swap (I have 1.5GB or RAM) takes from powerup to X in 30 seconds.

If you really are having issues with it, I suggest you do what Nigel suggested, subscribe to the suspend2 mailing list and ask. We're all very helpful (as all FLOSS users are).

Also, check out the article on suspend2 on the MyOSS Magazine. It's in Edition 1 of the Magazine.

http://mag.my-opensource.org is the address.

(yeah, the last bit there was promotion of the mag.)

Copyright © 2005, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds