User: Password:
|
|
Subscribe / Log in / New account

Yet another approach to software suspend

Back in early 2006, there was an ongoing, energetic debate over the future of the software suspend (to disk) code - a situation which remains true to this day. In the middle of it all, Andrew Morton had jumped in with a suggestion for a different approach:

If you want my cheerfully uninformed opinion, we should toss both of them out and implement suspend3, which is based on the kexec/kdump infrastructure. There's so much duplication of intent here that it's not funny. And having them separate like this weakens both in the area where the real problems are: drivers.

Eighteen months later, it looks like we might just get that "suspend3" in the form of the kexec jump patch, posted by Ying Huang.

Ying's patch builds on the existing kdump facility. The purpose of kdump is to provide safe and useful crash dumps in situations where the state of the operating system is uncertain. If the system panics it is nice to be able to save its current state for post-mortem debugging. It is important, however, that the buggy kernel - which is now in an untrustworthy state - not be used to do dangerous things like write crash dump data to disk. To avoid that situation, a small "dump kernel" is placed in a reserved area of memory where, most of the time, it lurks unnoticed and unneeded. Should a panic occur, a kexec() call is made to transfer control to the dump kernel, which will be able to start up in a known state. As long as the dump kernel stays within its reserved area of memory, it will be able to write the rest of the system state to disk (or wherever) in a relatively safe way.

What Andrew recognized last year is that suspend-to-disk (which is slowly being rebranded "hibernation") does essentially the same thing: system activity is stopped and the current system state is written to disk. If the dump kernel could read that state back into memory and return to the original kernel, it would be able to hibernate (and resume) the system. An implementation along these lines would have the advantage of unifying much of the kdump and hibernation code, thus concentrating development effort and generally simplifying things. Plus it would be a way to eliminate the current code, which, despite many years' tenure in the mainline, remains somewhat unloved.

The current patch does not do all of that; it is really just the first step: making it possible to jump from the secondary kernel back into the original kernel. The code is relatively simple; though it does rely on much of the existing infrastructure to properly suspend and power down all devices in the system for the jump in either direction. So if device drivers are interfering with hibernation now, that problem will still exist in a kexec-based implementation. But much of the other hibernation code, including the much-maligned process freezer, would be unneeded and could be removed.

There's a few little details to take care of before one can take a hatchet to the current hibernation code, though. Powering-down devices between the two kernels is not really necessary or desirable; they just need to go into a quiet "hibernate" state. A kdump kernel needs to be placed in reserved memory from the beginning; trying to load it at panic time would be far too late. A kernel used for hibernation, instead, need not occupy system memory all the time, so some sort of on-demand secondary kernel loading is needed. The actual task of saving and restoring the system image is yet to be implemented - that can all be done easily in user space, however, with very little in the way of kernel support. Making the resume process fast enough will take some work - users might take a dim view of having to wait for two kernels to boot before getting their system back. And so on.

So, in other words, nobody should be holding their breath for kexec-based hibernation in the near future. But the initial response to this approach was mostly positive; there seems to be a lot of interest in simply starting over in this area. Some of that enthusiasm might fade as work progresses and it turns out that, even with a new approach, hibernation is still a difficult and somewhat grungy problem. So only time will tell if this code will develop into a better hibernation implementation.


(Log in to post comments)

Yet another approach to software suspend

Posted Jul 19, 2007 13:58 UTC (Thu) by i3839 (guest, #31386) [Link]

> making it possible to jump from the secondary kernel back into the original kernel.

Which is totally useless for hibernate resume, because there there's no need to do any kexec stuff at resume.

The speed problem is only at suspend/hibernate time, because then the new kernel needs to be started and needs to detect enough hardware to be able to write the image.

This kexec approach seems silly: The old kernel already knows which state needs to be saved and is ready to use the hardware needed for the dump. But the smart thing is that it solves the problem of "how to save the userspace state from userspace without changing that state".

The worrying part is that some people seem to want to use this approach also for the suspend to ram case, where it makes no sense and just complicates and slows down everything.

Yet another approach to software suspend

Posted Jul 19, 2007 16:38 UTC (Thu) by intgr (subscriber, #39733) [Link]

> there there's no need to do any kexec stuff at resume.
As far as I understand it, the "kexec" kernel (which is a misleading name for
it at this point) will load the original kernel state into memory and resume
that.

> some people seem to want to use this approach also for the suspend to ram
> case
Last time I saw Linus speaking about this, it sounded like he'd kill anyone
submitting "disk and RAM suspend" unification patches with an axe.

Yet another approach to software suspend

Posted Jul 19, 2007 23:12 UTC (Thu) by i3839 (guest, #31386) [Link]

But the kexec kernel will be the one writing the image and shutting down the machine, after which point it's gone, assuming hibernation. So at "resume" time, the machine just booted up, more or less, and the first kernel that's loaded can be the final one. No place for kexec here.

For the suspend to ram case, switching back is very fast because nothing needs to be done by the "kexec" kernel (in which case it's indeed more a "kvfork" than a "kexec" kernel). But worse, there isn't anything useful to do for the "kexec" kernel in the s2ram case anyway.

> Last time I saw Linus speaking about this, it sounded like he'd kill anyone
> submitting "disk and RAM suspend" unification patches with an axe.

Yeah, I noticed that too, so I'm not overly worried about it happening. ;-)

Sort of ironic that the kexec approach is like the freezer thread on steroids, taking everything much further. The problem is that the stored state should be consistent, there are two ways to achieve that:

1) Prevent that the state changes.

2) Prevent any changes to the state that would cause problems.

The freezer and kexec take the first approach, the latter achieving it arguably much better. Using kexec is tempting because no infrastructure changes are needed or much else. (Maybe it's better to call it the kdump approach instead.) The hard part is to have the required hardware drivers working in the kexec kernel.

To do 2) well more or less all IO needs to be stopped, except for the IO doing the dump image writing. This doesn't seem to be that hard either, but it may add runtime overhead and if you overlook one IO device corruption or less nasty problems can crop up. Advantage is that it should be faster and simpler in the end, though getting there seems like much more work, depending at which level the IO is blocked (if it's done at the driver level then every driver needs to be updated).

Yet another approach to software suspend

Posted Jul 20, 2007 8:36 UTC (Fri) by khim (subscriber, #9252) [Link]

So at "resume" time, the machine just booted up, more or less, and the first kernel that's loaded can be the final one.

Are you an idiot or just play one on TV ? When the kernel boots it detects all devices anew. The structures will be put in different place in memory, modules will be loaded in different order, etc. A lot of userspace-visible changes. So you must either move all kernel structures in proper positions (it's just 100 times harder then to implement any other hibernate scheme because 99% of kernel code are not ready to see movable structures) or you should fix all userspace programs which interact with kernel in any (i.e. essentially all userspace programs: if the program stores it's own PID somewhere and this PID is used by some kernel-level process in new kernel you are screwed, for example).

Thus the only sensible way to implement wake-up is to restore old kernel in the same state - and after that you need some way to return to that restored kernel. From the perspective of this restored kernel currently active kernel was kexec'ed (even if it's totally different kernel in reality)... Thus you need a way to jump from the secondary kernel back into the original kernel. Hibernate without wake-up is not very useful, really...

Yet another approach to software suspend

Posted Jul 20, 2007 11:23 UTC (Fri) by Klavs (guest, #10563) [Link]

> Are you an idiot or just play one on TV

That is an unnecessary comment, and if we're all lucky, your ridiculous remark will just be ignored.

Pls. do refrain from unnecessary "person attacks" like this in the future - no one wants the LWN comments to become a troll playground.

Yet another approach to software suspend

Posted Jul 21, 2007 0:42 UTC (Sat) by i3839 (guest, #31386) [Link]

I think you misunderstand me. With "machine bootup" I meant the hardware, I didn't mean that the kernel does a regular bootup.

The topic was returning from the kexec kernel, and I said it's not needed. Maybe you're right that doing a kexec is needed to cleanly restore the old kernel state, as doing it from the first kernel loaded is too ugly, but I still don't see any reason why there's a need to return from the kexeced kernel back to the original one. Why not just kexec to the restored, original kernel? (No matter to which point exactly.)

Yet another approach to software suspend

Posted Jul 21, 2007 0:49 UTC (Sat) by i3839 (guest, #31386) [Link]

(It makes going to s2ram after the image dump possible, then jumping back is required indeed. But not for plain hibernate.)

Yet another approach to software suspend

Posted Jul 21, 2007 18:48 UTC (Sat) by dlang (subscriber, #313) [Link]

there is contradictory information about how much work needs to be done for the restore

the current restore functions assume that all the hardware has been put into ACPI low-power mode or the restore image may not work

so the two options right now seem to be
enhance this mode
use kexec to do a kernel shuffle (boot one kernel, kexec to a second kernel that's in the reserved space, then restore the image to the origional location (overwriting the boot kernel) and then kexec into it

the second approach isn't elegant, but doesn't depend on the state of the hardware.

Yet another approach to software suspend

Posted Jul 22, 2007 10:17 UTC (Sun) by i3839 (guest, #31386) [Link]

Why let the kexec kernel do the restore instead of the boot kernel? As I see it there are two cases:

- The normal kernel and the kexec kernels share a kernel image. In this case the boot kernel just needs to allocate mem in the reserved area and kexec to the restored kernel when it's done restoring the image.

- They have two different kernel images. Now the bootup kernel needs to be loaded in the reserved area. Though this can be hard as the bootloader loads the kernel, so yes, in this case doing two kexeces seems simpler, though that would slow down the restore.

It's unclear what the ACPI hibernate mode does, but I guess it lets the bios boot up the machine quicker because it skips some hardware init stuff or something. So you'd want to support this as good as possible anyway, assuming it makes sense, no matter what else you do.

And the kexec approach also depends on the hardware state because the restored image depends on it. Only thing that kexec does well is making a good snapshot, but it doesn't solve the hardware side of it.

(Which could cause complications, because that state might be changed by loading the image.)

So no matter what approach is used (kexec, freezer, ...), 90% of the work will always be the suspend/hibernate/resume/wake-up functions in the drivers. Same for most problems caused.

If the current restore method works fine, why not continue using that?

Yet another approach to software suspend

Posted Jul 22, 2007 22:06 UTC (Sun) by dlang (subscriber, #313) [Link]

Quote:
If the current restore method works fine, why not continue using that?

becouse Rafael is telling everyone that if you completely power off the system the current restore method will not properly restore the box (most things should come up, but things will not be right)

One sane unification case

Posted Jul 20, 2007 11:54 UTC (Fri) by farnz (subscriber, #17727) [Link]

There is one interesting unification setup (which can be implemented later, once we've got S2RAM and hibernation working well): hibernate, then (without damaging the hibernate state, so that we can resume again if needed), resume into S2RAM. If we restart running from S2RAM, great, we've just had a nice fast resume. If we lose power, so that we can't resume from S2RAM, ah well, time to restore from hibernate.

It's a limited use case, but one that some people could benefit from; I can think of the following times when I'd want it:

  1. When leaving my desktop alone for an extended period; if power is stable, I want a fast resume. If power goes on me, it'd be nice to still get a resume.
  2. When my server's UPS tells it that it's low on battery and lacking mains input. If the outage lasts only another 10 minutes, resuming from S2RAM (via USB signalling or similar) is quicker than coming back from hibernate.
  3. When I'm closing my laptop at the end of the day to travel home; if I've got enough battery to hold me in S2RAM, I'd like the fast resume when I'm next working. If I don't, I'd still prefer resume from disk to rebooting.

As you can see, all of these are unusual use cases, and S2RAM is just an optimization; I'd cope with having to do hibernate for every case.

One sane unification case

Posted Jul 20, 2007 15:00 UTC (Fri) by jond (subscriber, #37669) [Link]

"hibernate, then (without damaging the hibernate state, so that we can resume again if needed), resume into S2RAM."

I guess you mean "suspend into S2RAM" there -- there's a user-space tool "s2both" that does this at the moment, using the existing disk and RAM suspension techniques.

Linux is behind there too.

Posted Jul 20, 2007 20:05 UTC (Fri) by khim (subscriber, #9252) [Link]

What you've described is standard modus of operandi for both Mac OS and Vista... Quite useful approach, yes...

this isn't a new observation...

Posted Jul 27, 2007 3:48 UTC (Fri) by HalfMoon (guest, #3211) [Link]

What Andrew recognized last year...

For the record, the observation wasn't new to Andrew, or even new as of last year.

In fact, high availability systems have long supported "checkpoint" and "restart" capabilities. That's important for enterprise systems: if the job takes a few days to run and the system power vanished an hour before it would have finished, you want to restart from a checkpoint and just redo the last chunk of work. Suspend-to-disk is exactly such a checkpoint; the main difference between that and the enterprise stuff is that laptops have much more varied hardware, and the configurations are more likely to change after the checkpoint.

What's interesting now is that kdump and friends are starting to seem nearly ready that it's time to take advantage of that. There's still a lot of new work to be done of course. But now that people realize this, maybe we can finally start to address some of the nasty cruft in these areas!


Copyright © 2007, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds