LWN.net Logo

A Checkpoint/restart update

A Checkpoint/restart update

Posted Feb 25, 2010 22:29 UTC (Thu) by mhelsley (subscriber, #11324)
In reply to: A Checkpoint/restart update by k3ninho
Parent article: A Checkpoint/restart update

Hi K3n,

That actually gets us suprisingly little. Yes, there's a good amount of code that looks like:

save_field = struct_field;

but that's not most of what we do. Locking, error handling, and reference counting account for much of the rest of it. We don't assume that the freezer protects kernel data structures for checkpoint/restart. The freezer is there to ensure the checkpointed tasks themselves aren't doing anything - inside or outside the kernel. That doesn't prevent non-frozen tasks from using things inside the kernel (shared struct files for example). Even if we limited ourselves to freezing "whole containers" this would be a problem. So we need to take the appropriate locks in the right order, hold references while we checkpoint, and avoid sleeping sometimes.

More extreme solutions using lists could be made to work but the code, memory, and performance of that approach is actually much worse. We could stop all other cpus on the machine while we checkpoint. Then we couldn't use most of the existing kernel code though because the functions we want to reused wouldn't be allowed to grab locks, sleep, or reliably allocate memory. That means even more new code and more fragile code too. We'd also need to know beforehand exactly how much memory is needed for the checkpoint image. Finally, it would all have to fit in kernel memory. Given that userspace tasks can use large amounts of memory that's not very practical.


(Log in to post comments)

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds