LWN.net Logo

OpenVZ's live checkpointing

The OpenVZ project is a GPL-licensed subset of SWSoft's proprietary Virtuozzo offering. With OpenVZ, a Linux system can implement multiple "virtual environments", each of which appears, to the processes running within it, to be a separate, standalone system. Virtual environments can have their own IP addresses and be subjected to specific resource limits. They are, in other words, an implementation of the container concept, one of several for Linux. In recent times the various virtualization and container projects have shown a higher level of interest in getting at least some of their code merged into the mainline kernel, and OpenVZ is no exception. So the OpenVZ developers have been maintaining a higher profile on the kernel mailing lists.

The latest news from OpenVZ is this announcement of a new release with a major feature addition: live checkpointing and migration of virtual environments. An environment (being a container full of Linux processes) can be checkpointed to a file, allowing it to be restarted at some later time. But it is also possible to checkpoint a running virtual environment and move it to another system, with no interruption in service. This feature, clearly meant to be competitive with Xen's live migration capabilities, enables run-time load balancing across systems.

The OpenVZ patch, weighing at 2.2MB, is not for the faint of heart; it makes the price to be paid for these features quite clear. Much of what is contained within the patch has been discussed here before; for example, it contains the PID virtualization patches, and every bit of code within the kernel must be aware of whether it is working with "real" or "virtual" process IDs. A number of other kernel interfaces must be changed to support OpenVZ's virtualization features; among other things, many device drivers and filesystems require tweaks.

As might be expected, the checkpointing code is on the long and complicated side. The checkpoint process starts by putting the target process(es) on hold, in a manner similar to what the software suspend code does. Then it comes down to a long series of routines which serialize and write out every data structure and bit of memory associated with a virtual environment. The obvious things are saved: process memory, open files, etc. But the code must also save the full state of each TCP socket (including the backlog of sk_buff structures waiting to be processed), connection tracking information, signal handling status, SYSV IPC information, file descriptors obtained via Unix-domain sockets, asynchronous I/O operations, memory mappings, filesystem namespaces, data in tmpfs files, tty settings, file locks, epoll() file descriptors, accounting information, and more.

For each of the objects to be saved, an in-file version of the kernel data structure must be created. Each dump routine then serializes one or more data structures into the proper format for writing to the checkpoint file. It all apparently works, but it has the look of a highly brittle system - almost any change to the kernel's data structures seems guaranteed to break the checkpoint and restore code. Even if the checkpoint and restore code were merged into the mainline, getting kernel developers to understand (and care about) that code would be a challenge. Keeping it working must be be an ongoing hassle, whether or not the code is in the mainline tree.

None of the above should be interpreted to say that OpenVZ's features are not worth the cost. Virtual environments, checkpointing, and live migration are powerful and useful features. But the virtualization of everything within the kernel will lead to a higher level of internal complexity and higher maintenance costs. The decision process which draws the line determining which features are merged and which are not will be interesting to watch.


(Log in to post comments)

OpenVZ's live checkpointing

Posted Apr 27, 2006 20:16 UTC (Thu) by oak (guest, #2786) [Link]

Is there enough overlap between what is needed for SW suspend and
live checkpointing / migration that they could share code?

OpenVZ's live checkpointing

Posted May 6, 2006 9:54 UTC (Sat) by dev (guest, #34359) [Link]

Only process freezing is the common code.

OpenVZ's live checkpointing

Posted Apr 28, 2006 16:27 UTC (Fri) by dowdle (subscriber, #659) [Link]

Thanks for covering this!

It seems to me that the "process container" type of virtualization isn't getting as much respect or press coverage as I would expect... as a user of OpenVZ and Linux-VServer user who is amazed with how well they work and what can be done with them.

Not trying to whine here... but I think if more people were aware of OpenVZ and Linux-VServer... and how much lighter weight yet functional they are... they'd be doing more with their Linux boxes.

Xen is fantastic too but requires boatloads more resources. Since they offer completely different virtualization approaches... they don't really serve the same needs although there is quite a bit of feature overlap.

I am reminded of the AppArmor vs. SELinux debate mentioned in the LWN Weekly Edition this week. Not directly related... but two different approaches to a goal having different pluses and minuses... targeted at similiar but quite different goals.

Process container virtualization is so easy to use (especially with OpenVZ) and functional I can only hope it is added as a mainstream kernel feature someday... although I do understand the long process and the debate that goes into getting somewhat disruptive changes put into the mainline.

I work in an Educational environment and with OpenVZ I can quickly and easily give a student their own machine where they can be "god"... without really noticing it... and I can do it over and over... without reallying seeing much of a negative impact. That's freaking handy. :)

Copyright © 2006, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds