Suspending the system
[Posted April 30, 2003 by corbet]
The software suspend patch was first merged in 2.5.18. It offers the
ability to suspend any Linux system to disk, whether that system has
hardware suspend support or not. It works by doing the following:
- Each process in the system is given (what looks like) a special
"freeze" signal. The process responds by going into the STOPPED
state.
- As much memory as possible is freed up within the system. Caches
are shrunk, user pages are forced out, etc.
- Pending disk writes are flushed out. Sort of.
- Each device on the system is put into the suspend state - at least,
those which support power management functions are.
- Control goes off into an uncommented assembly routine called
do_magic(). It arranges to find a swap partition to use,
creates a "page directory" containing a copy of each in-use page on
the system, writes the whole mess to the system swap partition (which
requires unsuspending the devices, then suspending them again), and
finally powers down.
When the system is next booted, it detects the saved image in the swap
partition and reverses the above process. If all goes well, the system
comes back to life looking mostly as it did before being suspended. It all
seems like a reasonable system if you don't mind that it does not work on
SMP boxes, it does not work with high memory, it only works on the x86
architecture, and it requires an adequately-sized swap partition (a regular
swap file can lead to corruption on some filesystems). It also fails badly
if it cannot find enough swap space to save the system image.
Work is in progress to address some of these issues. The swap space
problem, for example, could be easily solved by simply setting aside a
special partition for saving the system image. Many other systems work
that way now. Given the size of modern disks, setting aside a partition
with enough room to hold the system's RAM should not be that big of a
deal.
Saving to a swap file is a harder problem. Before the system can be
resumed, the host filesystem must be mounted so that the swap file can be
accessed. If a journaling filesystem is involved, remounting will clean
out the journal, making changes to the filesystem. Once the system image
is restored, however, the kernel will expect the filesystem to be in its
previous state -
before the journal was replayed. And that leads to filesystem corruption.
Possible solutions include remembering block numbers for the swap file (as
lilo does for kernel images) or setting up a way to mount the filesystem
without replaying the journal.
In the end, however, what may really happen is that most of the current
suspend code will be replaced. Patrick Mochel is working on a general power
management framework for Linux (that was, after all, the original purpose
of all that driver model work he has been doing). Included therein is a
flexible suspend implementation that can be tuned to the needs of the user
and the abilities of the hardware; if the hardware can save and restore
memory itself, there's little point in having the kernel duplicate that
ability.
So, in the new scheme, suspending (and resuming) the system becomes another
set of operations that can be hidden behind a structure full of function
pointers. Systems which can handle power management entirely through ACPI
calls run with one set of operations, while those requiring the software
suspend capability can have it. As part of this work, the software suspend
code has been substantially reworked and cleaned up. At this point,
though, the basic technique used by the code is the same, and it will
suffer from many of the same problems.
This work is not yet complete, however; expect it to be improved further
before heading toward the mainline 2.5 kernel. Those wanting to look at
Patrick's work can get it with BitKeeper at
ldm.bkbits.net/linux-2.5-power; your editor is not aware of a
non-BK copy available at this time.
(
Log in to post comments)