Checkpoint/restart: it's complicated
Posted Nov 12, 2010 19:55 UTC (Fri) by daglwn
In reply to: Checkpoint/restart: it's complicated
Parent article: Checkpoint/restart: it's complicated
One of the primary users for checkpoint/restart is high-performance computing.
Not for much longer. CR does not scale. Given that DoD wants an exascale computer by 2018 with millions of cores and the associated MTBF, there's no way a CR system could possibly keep up. At best it would completely saturate the network.
We're going to have to get a lot smarter about resiliency.
to post comments)