User: Password:
Subscribe / Log in / New account

A Checkpoint/restart update

A Checkpoint/restart update

Posted Feb 25, 2010 11:36 UTC (Thu) by k3ninho (subscriber, #50375)
In reply to: A Checkpoint/restart update by mhelsley
Parent article: A Checkpoint/restart update

Can you not have a build-time reader which examines the kernel's interface definitions in the source tree, allowing you to learn the kernel's own definitions of the data structure without having to maintain more than a list of the files/functions to scan?


(Log in to post comments)

A Checkpoint/restart update

Posted Feb 25, 2010 15:47 UTC (Thu) by hallyn (subscriber, #22558) [Link]

Matt wasn't talking (I don't think) about the checkpoint image format,
which (IIUC) is what you would be addressing with your suggestion.

I personally think the main maintainability concern is that updates to
object creation/destruction/updates code not require maintainers to know
to look at other random places like checkpoint/file.c to update
corresponding checkpoint and restart code. That is why we have made it
a point to re-use existing (or create) helpers like cred_setresuid()
in the checkpoint and restart paths, so that updates to the core helpers
will automatically update checkpoint/restart code as well.

In addition to this, Matt is working on moving everything (or nearly
everything) that is under checkpoint/ into the right files in the core
code, i.e. checkpoint/file.c helpers likely belong in fs/namei.c,
fs/namespace.c etc.

Now, what you're talking about with auto-generation of headers has also
been discussed, and specifically suggested by Andrew Morgan
June/018289.html ). But I think it's still an open question whether
that will just obfuscate what is really going on, and whether it is
addressing a real problem. If it turns out to be a real problem, then
we're certainly open to it!

A Checkpoint/restart update

Posted Feb 25, 2010 22:29 UTC (Thu) by mhelsley (guest, #11324) [Link]

Hi K3n,

That actually gets us suprisingly little. Yes, there's a good amount of code that looks like:

save_field = struct_field;

but that's not most of what we do. Locking, error handling, and reference counting account for much of the rest of it. We don't assume that the freezer protects kernel data structures for checkpoint/restart. The freezer is there to ensure the checkpointed tasks themselves aren't doing anything - inside or outside the kernel. That doesn't prevent non-frozen tasks from using things inside the kernel (shared struct files for example). Even if we limited ourselves to freezing "whole containers" this would be a problem. So we need to take the appropriate locks in the right order, hold references while we checkpoint, and avoid sleeping sometimes.

More extreme solutions using lists could be made to work but the code, memory, and performance of that approach is actually much worse. We could stop all other cpus on the machine while we checkpoint. Then we couldn't use most of the existing kernel code though because the functions we want to reused wouldn't be allowed to grab locks, sleep, or reliably allocate memory. That means even more new code and more fragile code too. We'd also need to know beforehand exactly how much memory is needed for the checkpoint image. Finally, it would all have to fit in kernel memory. Given that userspace tasks can use large amounts of memory that's not very practical.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds