User: Password:
|
|
Subscribe / Log in / New account

Preparing for user-space checkpoint/restore

Preparing for user-space checkpoint/restore

Posted Jul 23, 2012 7:17 UTC (Mon) by dlang (subscriber, #313)
In reply to: Preparing for user-space checkpoint/restore by bergwolf
Parent article: Preparing for user-space checkpoint/restore

The HPC applications periodically store their state so that they can kill the app, move the state file to another machine, and start the app again (picking up where it left off)

Doing this inside an app is fairly easy as long as there is no problem re-doing work since the last checkpoint, or you can send the app a signal "stop working and save a checkpoint now"

doing this at the OS level so that you can do this with arbitrary apps, without the app (or other systems the app is communicating with) even knowing that it has taken place is very hard. It's this problem that you are seeing worked on.


(Log in to post comments)


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds