LWN.net Logo

Kernel-based checkpoint and restart

Kernel-based checkpoint and restart

Posted Aug 14, 2008 12:31 UTC (Thu) by csamuel (✭ supporter ✭, #2624)
Parent article: Kernel-based checkpoint and restart

I think us HPC folks need more than this and the project with the head start in
this area is BLCR (Berkeley Lab Checkpoint/Restart), a hybrid kernel/user space
solution .

http://ftg.lbl.gov/CheckpointRestart/CheckpointRestart.shtml

You need more than O/S support for this, you need support in the MPI stacks too 
and BLCR is already supported by OpenMPI.  You also want support in the queueing
systems, and Torque (derived from OpenPBS) now has initial BLCR support.

There's a nice presentation on BLCR from GlobusWorld earlier this year:

http://www.globusworld.org/E.Roman-BLCROverview080515.pdf...


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds