Major suspend changes

June 28, 2006

This article was contributed by Greg Kroah-Hartman.

The past two weeks has seen a huge long email thread about the future of suspend in Linux. No, not that other type of suspend, this is all about what users really want, a working suspend to RAM.

It all started out with a few simple patches from Linus that implemented a framework for allowing a way to debug problems during suspend, but quickly spiraled out of control into rants about how badly the kernel handles suspend issues today:

> I think you are trying to change a model that is not broken...

Bzzt. Thank you for playing.

The fact is, this thing has been broken for years. At some point,
we have to just accept the fact that it's not just "drivers".
There's something else that is broken, and I bet it's the model.

To how wrong everyone has been over the years in how suspend should really work:

See? WE DO NOT DO THIS. I told people we needed to do this _years_
ago. I tried to push through the two-phase suspend. I tried to
explain why. I clearly failed, because we do _nothing_of_the_sort_
right now.

Instead, the "please suspend" thing to the devices is a
single-phase "put yourself into D3", with no support for a
separate "please save your state" call. Crap.

After arguing this last point over and over for many emails, Linus did what anyone should do who wants to prove that their point is correct, he wrote up a working patch that implements his proposed changes.

To fully understand the problem, let us look at the interface that the kernel provides drivers today to handle suspend. When the kernel wants to shut devices down (for some kind of suspend action), the whole device tree is walked, and the suspend callback is called.

For PCI devices, this callback looks like:

int (*suspend) (struct pci_dev *dev, pm_message_t state);

The pointer to the PCI device that is about to be suspended is passed to the driver, along with the state that the kernel wants to go into. Within this single function, the driver is responsible for doing all suspend tasks needed for the device.

The big problem with this is that if a device can not be suspended at that point in time, it has to go through great lengths to try to let the core know that it should be called back again (it does this by returning -EAGAIN to the core and hoping that it will be called back.) But the big issue is that the driver is responsible for shutting the device down entirely in this function. This prevents the kernel from doing things like system snapshots easily, or what to do if the driver simply does not have enough memory available to it in order to properly save the device state off in order to suspend.

Also the big issue is that the "class" cores should be handling most of the suspend process, instead of the individual drivers. For example, the network core should be shutting down the transmit queues and making stuff go quiet for the drivers, so that they do not need to individually do this in each and every driver. This last point is the biggest change in Linus's model, and (in this author's opinion) the most important issue.

So, Linus changes the suspend process to a series of different steps:

All devices start out on a list called dpm_active and are, as indicated, "active" and up and running.
A new callback is called for every device in the global device tree. This callback is called suspend_prepare and has the same arguments that the current suspend callback has for each individual bus type. In this function, the devices are not allowed to disconnect themselves from the kernel (like USB devices disconnecting themselves to shut down), and the drivers for these devices need to do everything necessary to be ready to suspend the device some time in the future. This usually entails allocating any needed memory to save the device state, or other kinds of housekeeping. Anything that might possibly fail should be done here, and if something bad happens, the error should be reported. Drivers can call functions that might sleep here, as interrupts are not disabled.
The kernel then iterates over all of the dpm_active list and moves it to the dpm_off list and calls the suspend callback for the different subsystems (which is new). Followed by the subsystem suspend, the bus suspend callback is made.
Interrupts are now disabled in the system.
Then the kernel iterates over all of the devices on the dpm_off list and moves them to the dpm_off_irq list, while calling a new callback called suspend_late().
After this is complete, the system can be suspended by shutting down the CPU by putting it into any sleep level that is desired.

To resume the system, the kernel reverses the order of manipulating the device lists and does the following steps:

The kernel iterates over the dpm_off_irq list and moves the devices to the dpm_off list while calling a new callback called resume_early.
Interrupts are enabled.
The kernel iterates over all of the devices on the dpm_off list and moves them to the dpm_active list, while calling the resume callback (first the bus specific resume function, followed by the class specific resume.)

This new scheme allows the kernel to properly handle error conditions if anything bad happens while the suspend process was happening. For example, if an error is caused during the suspend_late process, then only the devices on the dpm_off_irq list will be called with the resume_early callback in order to resume the system in the proper procedure and recover from the error properly.

Linus's patch is a small patch, not over 400 lines, and generated some good feedback with other kernel developers who seem to be coming around to this new scheme. The patch has not shown up in any public kernel trees yet, but hopefully soon Linux will be able to handle suspend issues in a much more robust and correct manner.

Index entries for this article
GuestArticles	Kroah-Hartman, Greg

Major suspend changes

Posted Jun 29, 2006 4:11 UTC (Thu) by kirkengaard (guest, #15022) [Link]

You know, this is the first time I've seen this concept implemented in a simple and sensible way, and explained so. Thank you.

I like that it seems to be more rule than exception, and also that there seems to be less external code required, and more kernel action for core functionality. We'll see what happens as the rubber hits the road, but this article was full of smack-yourself-in-the-head moments for me. I'm enthusiastic about suspend, instead of apprehensive; I had gotten used to not dealing with it.

Major suspend changes

Posted Jun 29, 2006 8:51 UTC (Thu) by job (guest, #670) [Link] (1 responses)

How are userspace apps notified of the change? There are lots of stuff they may want to do on suspend, such as flagging 'away' in jabber etc.

Major suspend changes

Posted Jul 4, 2006 15:06 UTC (Tue) by gdt (subscriber, #6284) [Link]

The kernel doesn't notify user space apps. There's no need. Something asked the kernel to go into suspend-to-RAM. That 'something' is in user space and can notify the applications.

As a concrete example, the 'something' might be a ACPI lid button. The action script for that can clean up user space before calling the kernel. I wouldn't worry about Jabber so much as saving and restoring the system time to/from the hardware clock, asking any DHCP client daemons to retry leases, etc.

Of course, it would be very nice if all the 'something's (ACPI, command line, GNOME/KDE) all used the same mechanism. And that's been a long-time area of failure in Linux.

Major suspend changes

Posted Jun 29, 2006 15:00 UTC (Thu) by sbishop (guest, #33061) [Link] (5 responses)

Gmane is down, and I haven't been able to find the discussion talked about on MARC. Does anyone have another link to it?

Also, what are the pros and cons of suspend-to-RAM and suspend-to-disk? I'm sure suspend-to-RAM is faster, and you don't have to worry about the hardware changing from under you without notification. But suspend-to-disk is actually shutting the machine off, right? So in that case you don't have to worry about the batteries being sufficient until the next restart.

Thanks,
Sam

Major suspend changes

Posted Jun 30, 2006 15:32 UTC (Fri) by tetromino (guest, #33846) [Link] (4 responses)

Suspend-to-RAM simply puts your hardware in a low-power state, using ACPI or whatever mechanism ppc machines use. It's fairly fast, but the machine will continue using (a little) power when suspended. The major downside is that in Linux, it's horrendously unreliable: there is a very good chance of your machine unsuspending to an unresponsive state, and since modern machines typically lack a serial port (for attaching a serial console), your only choice then is to reboot and lose data.

Suspend-to-disk writes your memory to your disc (typically, in a compressed form to your swap partition), then powers down. When you boot the machine back up, you pass the kernel a parameter where to look for the suspend image; if it finds it, early in the boot process it will load the image back into memory. This is slower than suspend-to-ram, but doesn't use any power when suspended, and most importantly, WORKS. I.e. Suspend2, when properly configured, reliably works on pretty much any hardware I throw at it. Suspend-to-ram doesn't. Hopefully, Linus's patch will change the situation.

Major suspend changes

Posted Jul 4, 2006 19:56 UTC (Tue) by vegge (guest, #6926) [Link]

Yes, working suspend-to-ram would be wonderful!

Suspend-to-ram is the reason I got an iBook, running OS X. Close the lid and it goes to sleep, open the lid and it wakes up. I've never had a failure, and the sleep mode uses very little power (the battery will last for days in sleep mode).

OS X has other drawbacks, however...

Major suspend changes

Posted Jul 6, 2006 6:12 UTC (Thu) by pimlott (guest, #1535) [Link] (2 responses)

The major downside is that in Linux, it's horrendously unreliable

I kept hearing this, so I never bothered with suspend to RAM. One day, I idly tried it on my Thinkpad X40--and to my amazement, it worked perfectly! And it's never failed, making it much more reliable than APM suspend. I wish I could say why this is, but I sure am happy about it.

Major suspend changes

Posted Jul 6, 2006 18:20 UTC (Thu) by hazelsct (guest, #3659) [Link] (1 responses)

You are very fortunate. Try as I may, I have never had it work on my Fujitsu Lifebook. I really hope the new code will work, but when will it be merged? 2.6.19?

Major suspend changes

Posted Jul 7, 2006 23:15 UTC (Fri) by NCunningham (guest, #6457) [Link]

It will be in 2.6.18.

Major suspend changes

Posted Jun 30, 2006 16:58 UTC (Fri) by arafel (subscriber, #18557) [Link]

> this is all about what users really want, a working suspend to RAM.

Eh? I'd rather have suspend to disk, personally...

Major suspend changes

Posted Jul 1, 2006 7:41 UTC (Sat) by patha (subscriber, #6986) [Link]

Great read.

The posts cited above can be found at:
http://thread.gmane.org/gmane.linux.power-management.gene...
http://thread.gmane.org/gmane.linux.power-management.gene...

The other suspend..

Posted Jul 7, 2006 0:01 UTC (Fri) by huaz (guest, #10168) [Link]

I use Linux as development but not on my laptop, but I've watched the "war" about suspend on lkml countless times. Each time it's the same: suspend2 couldn't get in because the current maintainer doesn't like it.

It has been years, yet Linux lacks a top-quality implementation. It's not that there is lack of talent or engineering time, but more and more like "legacy" and "politics".

I am sure Negal would like to work it out with other developers if they offer honest help, but when it comes to "competition" and "you want to replace my code", it becomes not just a technical issue anymore, and it does no good to anyone. I understand Linus/Andrew want the people involved to work it out, but sometimes it just can't happen.

I envy Linus has the position to go ahead and make his own decisions. Most others don't. I wish one day Linus himself got tired of non-working suspend and just replace the in-kernel swsusp code wholesale. (yeah he did it once at 2.4.10 time).