One billion files on Linux
One billion files on Linux
Posted Aug 19, 2010 20:32 UTC (Thu) by mhelsley (guest, #11324)Parent article: One billion files on Linux
Was that exactly Ric's point -- that the applications had to checkpoint themselves? Or did he just say that being able to checkpoint applications was necessary? I ask because there's a big difference. Expecting all applications that might be run in these environments to explicitly checkpoint themselves just isn't practical. Look at how many non-HPC applications use BLCR for example.
The alternative is to enable "external" checkpointing. Checkpoints that don't require rewriting the application, or ld preloads, etc. There is already an effort underway to push this to mainline:
Posted Aug 20, 2010 18:12 UTC (Fri)
by ricwheeler (subscriber, #4980)
[Link] (1 responses)
How you checkpoint/restart is less critical to me. I would see that some applications (like rsync itself) should be aware and restartable in their design. Others would certainly benefit from external checkpointing.
Posted Aug 20, 2010 21:54 UTC (Fri)
by mhelsley (guest, #11324)
[Link]
This use of rsync presents an interesting case for the userspace portion of checkpoint/restart.
During checkpoint we often need to checkpoint the contents of the filesystems. One way to do that is with a frozen filesystem and rsync. Obviously if we're rsync'ing to mirror the filesystem in the first place then we shouldn't attempt to checkpoint the rsync task's filesystem(s) with rsync -- we'd want to do a "local" snapshot if possible.
Since the kernel does not force userspace to save the filesystem contents userspace can choose if and how it will do so. In other words this case requires no special changes to the checkpoint syscall.
One billion files on Linux
One billion files on Linux