Checkpoint/restart: it's complicated
Posted Nov 12, 2010 19:55 UTC (Fri) by
daglwn (subscriber, #65432)
In reply to:
Checkpoint/restart: it's complicated by Np237
Parent article:
Checkpoint/restart: it's complicated
One of the primary users for checkpoint/restart is high-performance computing.
Not for much longer. CR does not scale. Given that DoD wants an exascale computer by 2018 with millions of cores and the associated MTBF, there's no way a CR system could possibly keep up. At best it would completely saturate the network.
We're going to have to get a lot smarter about resiliency.
(
Log in to post comments)