LWN.net Logo

LCE: Checkpoint/restore in user space: are we there yet?

LCE: Checkpoint/restore in user space: are we there yet?

Posted Nov 30, 2012 2:35 UTC (Fri) by karya (guest, #71446)
Parent article: LCE: Checkpoint/restore in user space: are we there yet?

We'd like to point to DMTCP, another user-space approach to checkpoint-restart. The DMTCP approach complements the approach of CRIU. While CRIU restores the precise state of the kernel, DMTCP tries to stay close to standard POSIX system calls, while augmenting those calls by certain heuristics and limited use of such things as /proc/PID/maps.

DMTCP is LGPL. It is currently a package in Debian, Ubuntu, and openSUSE, and is under review by Fedora. DMTCP handles both multithreaded and distributed processes (including many dialects of MPI). Instead of restoring the precise kernel state, DMTCP supports heuristics for most common cases involving external resources, including: files that no longer exist at restart, communication with daemons like NCSD, checkpointing a GNU screen application that hardwired its terminal name, etc. For further details, see http://dmtcp.sourceforge.net/supportedApps.html.

- Gene and Kapil (for the DMTCP team)


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds