LWN.net Logo

Advertisement

GStreamer, Embedded Linux, Android, VoD, Smooth Streaming, DRM, RTSP, HEVC, PulseAudio, OpenGL. Register now to attend.

Advertise here

Software suspend - again

Last week's Kernel Page looked at one small piece of the software suspend debate. Meanwhile, the wider discussion has flared up yet again, and looks unlikely to slow down. Developers of the in-kernel suspend-to-disk code are working on moving parts of it to user space and generally tweaking the existing structure. Nigel Cunningham and other supporters of the Suspend2 patches, instead, still hope to see that work merged, eventually replacing much of the existing implementation. The discussion does not appear to be nearing any sort of resolution.

One has become clear, though: Pavel Machek has a firm grip on the current in-tree swsusp code, and that puts Suspend2 at a significant disadvantage. Pavel has taken a strong position against many aspects of the Suspend2 code, and seems determined that it will never be merged. One gets the sense, sometimes, that he just wishes Nigel and his code would go away. Nigel is somewhat more persistent than that, however.

At one point, the two suggested that Linus and Andrew should make a decision between the two implementations and settle the debate. Andrew, however, does not want to do that:

You're unlikely to hear anything dispositive from either of us on this... What we hope and expect is that you'll come up with an agreed path in accordance with general kernel coding and development principles. Linus and I don't want to have to make tiebreak decisions - if we have to do that, the system has failed.

So much for the easy solution. Since then, the relevant parties have been talking, but without a whole lot of apparent progress.

Perhaps the more interesting part of Andrew's note, however, was this:

If you want my cheerfully uninformed opinion, we should toss both of them out and implement suspend3, which is based on the kexec/kdump infrastructure. There's so much duplication of intent here that it's not funny.

kexec(), remember, is a relatively new system call used to boot from one kernel directly into another without going through the whole BIOS startup ritual. The kdump code uses kexec() to perform safe crash dumps. When the kernel panics, it uses kexec() to boot into a small, special-purpose kernel which has been lurking in a reserved part of memory for just this occasion. The new kernel restricts itself to the reserved memory, so the entire memory image of the old, crashed kernel remains intact. That image can then be written to disk in a relatively safe manner.

It is true that suspend-to-disk can be thought of as a sort of kernel dump; the only difference is this little desire to be able to restart the kernel from the dump image at a future time. Using kdump for suspend-to-disk has some obvious appeal. A great deal of effort now goes into freezing most processes on the system - but not the ones needed to complete the suspend process. The suspend code also must be very careful about what kernel state it changes as it goes about its work. Simply jumping into a separate dump kernel has the potential to make many of those problems go away. It might almost be like the Good Old Days, when BIOS-based suspend code simply worked most of the time.

A kdump-based suspend would not be without its costs. In particular, some people might balk at reserving a substantial chunk of memory for the suspend kernel. And, of course, the entire idea remains vaporware for now.

Andrew's suggestion generated little discussion on the mailing list. But, just maybe, it will have ignited a gleam in some hacker's eye. A simpler, more robust suspend mechanism based on kdump which appeared out of left field might just solve this problem - and put the whole tiresome debate in the past - for good.


(Log in to post comments)

Software suspend - again

Posted Feb 9, 2006 4:41 UTC (Thu) by pivot (guest, #588) [Link]

It's sad it is taking so long to get a useable suspend implementation in the linux kernel. One of the reasons my next laptop is a mac is the nice way it handles suspend on closing the lid. (The other reason is the graphical interface, but that is moot soon in any case, with the introduction of Xgl..)

Works for me and for a lot of others

Posted Feb 9, 2006 15:01 UTC (Thu) by hummassa (subscriber, #307) [Link]

And (besides the uncomfortable fact that ATM we can't interrupt the
suspension process) it works really well. My work machine only goes to
hibernate at night, never shuts down. (in our building, fire regulations
mandate that every electrical appliance is turned off at nights -- except
of course inside the datacenter, that is especially protected)

The decision procedure

Posted Feb 9, 2006 12:30 UTC (Thu) by NAR (subscriber, #1313) [Link]

It's interesting to see that on an other comment thread, the linux kernel decision procedure was shown as a working model, but here we see that this process has also flaws.

Bye,NAR

Software suspend - again

Posted Feb 9, 2006 13:42 UTC (Thu) by job (guest, #670) [Link]

What Andrew says seems reasonable. kexec/dump seems like the proper way to do this.

Software suspend - again

Posted Feb 9, 2006 16:44 UTC (Thu) by bronson (subscriber, #4806) [Link]

"For every complex problem, there is a solution that is simple, neat, and wrong." -- H. L. Mencken

There are a number of nontrivial problems with using kexec to suspend a generic kernel. I'm not saying it's impossible, just that the amount of effort required makes it pretty unlikely. Of course, I'd be as happy as anybody to see the suspend problem licked once and for all... This situation is getting embarrasing!

Software suspend - again

Posted Feb 11, 2006 3:40 UTC (Sat) by bk (guest, #25617) [Link]

Please explain these nontrivial problems (seriously). I'd like to know, since otherwise kexec looks like the obvious way to go.

Software suspend - again

Posted Feb 11, 2006 22:14 UTC (Sat) by zblaxell (subscriber, #26385) [Link]

It would seem to me that the problems that need to be solved to adapt kexec to do suspend (suspend-kexec) are all problems that suspend{1,2} have been solved already.

One big hunk of the changes made by the suspend2 patches deal with special cases on work queues, process, and RAM page flags, because suspend{1,2} require the kernel to stay half-alive. Suspend-kexec would stop the entire kernel, with no need to distinguish between one process and another. Crash dumps would already have to do this.

Another big hunk is the suspended kernel image file reader/writer code. In suspend-kexec this would be handled by the newly booted kernel. There would need to be a mechanism where the suspend-kexec kernel can access some of the data structures of the kernel being suspended. OTOH, this is only required for efficiency, to find free pages and avoid writing them to swap, and to know which pages of swap are free so that swap partitions can be reused for suspend image storage. A simpler implementation could get away with using a separate partition and require little interaction (other than read/write page) with the kernel to be suspended at all.

Another big hunk of suspend2 patches is the documentation. 'Nuff said.

The rest of the suspend2 patches are related to restoring time on the CMOS clock, the userspace UI, and various sanity checks during booting.

Software suspend - again

Posted Feb 9, 2006 14:27 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

While kdump requires a reserved bit of memory for dumping crashed kernels, I don't see why the same must be true for suspend. Can't this space be dynamically allocated, or is it a problem to guarantee sufficient contiguous memory on a whim?

My words.

Posted Feb 9, 2006 15:01 UTC (Thu) by hummassa (subscriber, #307) [Link]

The suspend kernel won't be big enough IMHO that contiguous memory will be
hard to get -- come on, we can implement it in less than 16k (4 pages)
can't we?

My words.

Posted Feb 9, 2006 16:52 UTC (Thu) by bronson (subscriber, #4806) [Link]

Only if you want to seriously restrict where the suspend image can be stored. Sure, custom-coding a non-DMA ATA-only suspend block can fit inside 16K.

But the minute you add SCSI, USB, FireWire, SATA, network filesystems, that code balloons out of control. That's not even counting LVM and partitioning issues. And, if the kernel has been running a long time, even 16K of contiguous memory becomes nontrivial to find.

So, yes, 16K is theoretically possible. But would it be useful? I doubt it.

My words.

Posted Feb 9, 2006 17:53 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

Indeed. And if you could fit the required code into 16K, it'd make perfect sense to preallocate it anyway. My understanding is that this would be more on the order of a couple-hundred kilobytes, since the suspend-to-disk kernel could omit most things, including the network stack. (I don't *quite* see the benefit of suspend-to-NFS.)

My words.

Posted Feb 9, 2006 21:31 UTC (Thu) by bronson (subscriber, #4806) [Link]

Diskless workstation? I admit, suspend-to-network is probably more trouble than it's worth.

My words.

Posted Feb 9, 2006 21:37 UTC (Thu) by jzbiciak (✭ supporter ✭, #5246) [Link]

I had given that a brief thought, and it seems pretty worthless to me, IMHO, but then so do diskless workstations most of the time. If you did suspend-to-network on a diskless workstation, you could dump all the filesystem drivers (other than NFS) and disk drivers, trading one code lump for the other. You could probably even pare back most of the networking code if you were real spartan about it. (But you would need a reliable protocol like TCP to ensure it all works.)

I imagine suspend-to-network would be greated with the same enthusiasm as swap-to-network. :-)

suspend to network storage

Posted Feb 9, 2006 22:21 UTC (Thu) by pspinler (subscriber, #2922) [Link]

Is it worthless ? Hmm, maybe, but, what if you could restore the image to a different piece of (identical) hardware elsewhere ?

-- Pat

Software suspend - again

Posted Feb 9, 2006 17:08 UTC (Thu) by dambacher (subscriber, #1710) [Link]

I am really fed up with this discussions. Why can't the kernel developers think of the kernel users fortune and just let _them_ decide?

It would be so simple to just do two kconfig options for either suspend code and a 3rd for kexe/kdump/(kresume?) The better one wins.
Or all win for special purposes.
We have choice in many other places of the kernel (see schedulers), why not with the suspend code?

Sometimes the diskussion is on a niveau where kids throw each other with mud .-)

Software suspend - again

Posted Feb 9, 2006 18:53 UTC (Thu) by khim (subscriber, #9252) [Link]

It would be so simple to just do two kconfig options for either suspend code and a 3rd for kexe/kdump/(kresume?)

If that's "so simple" then why did you never submit such patch to LKML ? Or have I missed it ? How big and intrusive was your patch ?

We have choice in many other places of the kernel (see schedulers), why not with the suspend code?

Guess. You know the basics:
1. If kernel developers can offer end-users choice without huge hassle they tend to do this.
2. They never offered such choice in regard to suspend code.
That's more then enough information to deduce correct answer...

Software suspend - again

Posted Feb 9, 2006 22:07 UTC (Thu) by NCunningham (guest, #6457) [Link]

>Sometimes the discussion is on a niveau where kids throw each other with
mud .-)

Yeah. I tried hard to avoid that, but don't think I was always successful.
Sorry for my contributions that were less than best in that regard.

Software suspend - again

Posted Feb 27, 2006 19:05 UTC (Mon) by Tv (subscriber, #7109) [Link]

> Why can't the kernel developers think of the kernel users fortune and just
> let _them_ decide?

Because kernel users won't maintain the code.

Software suspend - again

Posted Feb 9, 2006 22:00 UTC (Thu) by NCunningham (guest, #6457) [Link]

I spent some time with a gleam in my eye yesterday, but the more I thought
about it, the dimmer the gleam got. Here are some of the issues I came up
with, which should be considered in addition to the ones mentioned above.

- At the moment, Suspend2 supports file-backed suspend to ram - we enter
S3 instead of powering down after writing the image. From that state, if
your battery runs out, you do a normal resume, if it doesn't, we just
reread the small portion of memory that was overwritten for the atomic
copy, and then you're resumed. With kexec, this wouldn't be possible.
- It makes things a lot more complicated for users. One of the reasons I
don't like the userspace suspend idea is because it makes it much easier
to break the whole set up. Requiring another kernel would have the same
problems.
- If makes things more complicated programmatically. I doubt that 8MB or
16M would be enough. We need to be able to load all the data to be
atomically restored. That's normally 20-30% of the image, so for a 1GB
image, we need to load up to 512MB, but normally 200-300MB. Unless kexec
does some magic that allows us to access memory outside of the mem= limit
(and I won't deny it's possible!), I'm not sure it can work. Presumably
we'd also need to do some interesting things to get access to the
information in the kernel we're writing, to figure out which pages are LRU
and which aren't (for doing the full image of memory).
- I don't understand kexec much at all yet, but doesn't the switch to the
new kernel take place in real mode? In that case, getting the resume to
happen would require adding extra real mode code to be invoked in place of
the initial boot code, implementing the return to where we left off when
suspending. Not impossible, but it's more complication.

In short, while it sounds nice, and it would be possible, I don't think it
is feasible. As always, I'm willing to be educated that this isn't the
case...

Software suspend - again

Posted Feb 11, 2006 12:16 UTC (Sat) by ebiederm (subscriber, #35028) [Link]

Actually the file-backed suspend to ram would work with kexec,
you do the suspend in user space (of the other kernel) then you
suspend the dump kernel to ram. We might want to do a little
trickery so the restore goes back to the primary kernel.

I don't think it is fundamentally more complicated for users but it
would be something that needs looking at.

8M/16M is more than you need that is the kind of size where
you can put a kernel and a glibc based user space in so it is easy.

Well it wouldn't be kexec that let's you access something outside
of mem=limit but /dev/mem.

kexec happens in whatever the kernels native mode is.

The LRU and that information could be an issue. I am not sufficiently
familiar with the swap suspend process to understand what needs to happen.
I think what happens is that you stop all hardware devices and processes
to get a consistent system state. Then you wake up just enough of the
system to save that state?

If that is the case kexec may be useful. Especially when used in the kdump
way kexec is just a nice wrapper around goto. :)

suspend, kdump, swap, ...

Posted Feb 10, 2006 21:40 UTC (Fri) by utoddl (subscriber, #1232) [Link]

Just tossing an idea out in ignorance (I don't know much about how the competing suspend models work), but it seems we've got another mechanism we could use beside kdump. We already page data and programs out to swap. Could we not page out everything including the running kernel too (without freeing the RAM of course) so that an early step in rebooting would be to examine swap and see if it looked like it contained a viable "kernel+user space image" and do the Right Thing with it?

Another thought: kdump only requires a private kernel to do the dump because the "real" kernel is presumably injured. Seems that if the kdump technique were used for suspend, wouldn't it be safe to assume the main kernel is okay and let it do the dump? No need for a special dump kernel.

Wish I knew what I was talking about. -- Cheers

suspend, kdump, swap, ...

Posted Feb 11, 2006 0:40 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

The challenge of suspend to disk isn't really the task of saving a memory image and restoring it later. The state of the system lives in various places besides main memory that dissolve when the power goes off. For example, an Ethernet adapter has plenty of state in its own memory. Plus, some parts of the system can't tolerate having hours pass in the blink of an eye. So they have to know about the sleep and actively go to sleep and wake up.

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds