| Please consider subscribing to LWN Subscriptions are the lifeblood of LWN.net. If you appreciate this content and would like to see more of it, your subscription will help to ensure that LWN continues to thrive. Please visit this page to join up and keep LWN on the net. |
The addition of a checkpoint/restore functionality to Linux has been an ongoing topic of discussion and development for some years now. After the poor reception given to the in-kernel C/R implementation at the end of 2010, that particular project seems to have faded into the background. Instead, most of the interest seems to be in solutions that operate mostly in user space. Depending on the approach taken, most or all the support needed to implement this functionality in user space already exists. But a complete solution is not yet there.
Cyrill Gorcunov has been working to fill in some of the gaps with a preparatory patch set for user-space checkpointing/restore with the "CRIU" tool set. There are a number of small additions to the kernel ABI to be found here:
Perhaps the most significant new feature, though, is the addition of a new system call:
long kcmp(pid_t pid1, pid_t pid2, int type, unsigned long idx1, unsigned long idx2);
Checkpoint/restore is meant to work as well on a tree of processes as on a single process. One challenge in the way of meeting that goal is that some of those processes may share resources - files, say, or, perhaps, a whole lot more. Replicating that sharing at restore time is relatively easy; the clone() system call provides a nice set of flags controlling the sharing of resources. The harder part is knowing, at checkpoint time, whether that sharing is taking place.
One way for user space to determine whether, for example, two processes are sharing the same open file would be to query the kernel for the address of the associated struct file and see if they are the same in both processes. That kind of functionality sets off alarms among those concerned about security, though; learning where data structures live in kernel space is often an important precondition to an attack. There was talk for a while of "obfuscating" the pointers - through an exclusive-OR with a random value, for example - but the risk was still seen as being too high. So the compromise is kcmp(), which simply answers the question of whether resources found in two processes are the same or not.
kcmp() takes two process ID parameters, indicating the processes of interest; both processes must be in the same PID namespace as the calling process. The type parameter tells the kernel the specific item that is being compared:
The return value from kcmp() is zero if the two items are equal, one if the first item is "less" than the second, or two if the first is "greater" than the second. The ordered comparison may seem a little strange, especially when one looks at the implementation and sees that the pointers are obfuscated before comparison within the kernel. The result is, thus, an ordering that (by design) does not match the ordering of the relevant data structures in kernel space. It turns out that even a reshuffled (but consistent) "ordering" is useful for optimizing comparisons in user space when large numbers of open files are present.
This patch set has been through a few cycles of review and seems to have addressed most of the concerns raised by reviewers. It may just find its way in through the next merge window. Meanwhile, people who want to see how the user-space side works can find the relevant code at criu.org.
CRIU is not the only user-space checkpoint/restore implementation out there; the DMTCP (Distributed MultiThreaded CheckPointing) project has been busy since about 2.6.9. DMTCP differs somewhat from CRIU, though; in particular, it is able to checkpoint groups of processes connected by sockets - even across different machines - and it requires no changes to the kernel at all. These features come with a couple of limitations, though.
Checkpoint/restore with DMTCP requires that the target process(es) be started with a special script; it is not possible to checkpoint arbitrary processes on the system. That script uses the LD_PRELOAD mechanism to place wrappers around a number of libc and (especially) system call implementations. As a result, DMTCP has no need to ask the kernel whether two processes are sharing a specific resource; it has been watching the relevant system calls and knows how the processes were created. The disadvantage to this approach - beyond having to run checkpointable process in a special environment - is that, as can be seen in the table of supported applications, not all programs can be checkpointed.
The recent 1.2.4 release improves support, though, to the point that everything a wide range of users care about should be checkpointable. The system has been integrated with Open MPI and is able to respond to MPI-generated checkpoint and restore requests. DMTCP is available with the openSUSE, Debian Testing, and Ubuntu distributions. DMTCP may offer something good enough today for many users, who may not need to wait for one of the other projects to be ready sometime in the future.
Preparing for user-space checkpoint/restore
Posted Feb 2, 2012 4:57 UTC (Thu) by thedevil (guest, #32913) [Link]
http://www.cl.cam.ac.uk/~jrh13/hol-light/
Does anyone know the answer, or do I have to just try it? :-P
Preparing for user-space checkpoint/restore
Posted Feb 9, 2012 2:51 UTC (Thu) by karya (guest, #71446) [Link]
Preparing for user-space checkpoint/restore
Posted Feb 2, 2012 12:13 UTC (Thu) by misiu_mp (guest, #41936) [Link]
Preparing for user-space checkpoint/restore
Posted Feb 2, 2012 12:35 UTC (Thu) by gidoca (subscriber, #62438) [Link]
This article has a brief explanation.
Preparing for user-space checkpoint/restore
Posted Feb 2, 2012 13:00 UTC (Thu) by misiu_mp (guest, #41936) [Link]
kcmp vs strcmp convention
Posted Feb 2, 2012 15:16 UTC (Thu) by jnareb (subscriber, #46500) [Link]
kcmp vs strcmp convention
Posted Feb 2, 2012 16:05 UTC (Thu) by cesarb (subscriber, #6266) [Link]
In a system call, values between -1 and -4095 (inclusive) are reserved for errors, with values from errno.h. In particular, -1 means EPERM on x86.
Preparing for user-space checkpoint/restore
Posted Feb 8, 2012 10:04 UTC (Wed) by ebirdie (guest, #512) [Link]
LWN.net: Checkpoint/restart (mostly) in user space
https://lwn.net/Articles/452184/
Just a minor interesting info bit.
Relationship with Android?
Posted Feb 9, 2012 13:36 UTC (Thu) by renox (subscriber, #23785) [Link]
Relationship with Android?
Posted Feb 9, 2012 14:37 UTC (Thu) by mfedyk (guest, #55303) [Link]
In android, apps are expected to store state themselves so that when started again they will continue with that state.
As you can imagine, app implementation of this is spotty.
Relationship with Android?
Posted Feb 9, 2012 18:48 UTC (Thu) by raven667 (subscriber, #5198) [Link]
Relationship with Android?
Posted Feb 9, 2012 19:27 UTC (Thu) by mjg59 (subscriber, #23239) [Link]
Preparing for user-space checkpoint/restore
Posted Jul 23, 2012 3:09 UTC (Mon) by bergwolf (subscriber, #55931) [Link]
Preparing for user-space checkpoint/restore
Posted Jul 23, 2012 7:17 UTC (Mon) by dlang (guest, #313) [Link]
Doing this inside an app is fairly easy as long as there is no problem re-doing work since the last checkpoint, or you can send the app a signal "stop working and save a checkpoint now"
doing this at the OS level so that you can do this with arbitrary apps, without the app (or other systems the app is communicating with) even knowing that it has taken place is very hard. It's this problem that you are seeing worked on.
Copyright © 2012, Eklektix, Inc.
This article may be redistributed under the terms of the
Creative
Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds