Friday was virtualization day at the 2005 Ottawa Linux
Symposium; the large room was devoted to that topic all day long. Your
editor can only handle so much virtualization at once, and so failed to
attend the full set of sessions. Two talks, however, gave a good overview
of where a couple of the most important Linux virtualization projects are
and what they see in the future.
A full house turned out to hear Xen hacker Ian Pratt discuss his project.
Xen is riding high; the software is cool and getting cooler, the venture
money is flowing in, and there is no lack of buzz. Ian's talk, while
mostly technical in nature, showed the signs of an up-and-coming business:
slick, animated slides, and a good marketing pitch ("virtualization in the
enterprise") on why virtualization is a useful thing in the first place.
This was worth seeing; it is easy to understand why something like Xen is
cool technology, but it can be harder to get a handle on why investors are
lining up to throw money at it.
Virtualization is not a particularly new idea. Your editor first
experienced it on an IBM mainframe over twenty years ago; we shared files
by sending them out our virtual card punch into a co-worker's virtual card
reader. Given that the alternative, in that particular time and place, was
a real card reader, this looked pretty good. Every now and then
things would go weird, and we would have to reboot CMS on our virtual CPU.
Not only have things changed little since then, but that was all old stuff
even on those days.
In the Linux world, virtualization takes one of three forms. In the
"single operating system image mode," as used by the Linux-vserver project (or a simple
chroot() setup, for that matter), instances are run within
resource containers. Getting strong isolation is hard with this approach.
Full virtualization runs an unmodified operating system in a complete
virtual machine; systems like VMWare and Qemu work this way. The
problem with full virtualization is that it can be hard to do in a way
which is both secure and efficient, especially on current x86 hardware.
Finally, there is para-virtualization, where the guest operating system
kernel is explicitly ported to a virtual machine architecture; both Xen and
user-mode Linux are para-virtualized systems.
So why bother with all of this? One is server consolidation: move all of
those servers onto fewer actual boxes, with the resulting savings in floor
space, power, air conditioning, and hardware maintenance. If you can move
virtual machines between physical hosts, you can shift them around to avoid
down time; when the disk drive starts to squeal, the administrator can
evacuate the virtual systems to working hardware and deal with the
problem. Migration also allows workload balancing; it is easier to put
more virtual systems on each physical host if they can be shifted around to
keep the load on all of those hosts about the same.
One other use for virtualization is security: putting a process within a
virtual machine encapsulates it nicely. Even if that process is
compromised, there are limits to the damage it can do - as long as it
remains trapped within its virtual host. It is also possible to monitor
the behavior of the virtual hosts themselves; if one starts doing unusual
things, there is a good chance it has been compromised. In this sense,
virtualization achieves the same broad goal as SELinux: it puts walls
between applications running on the same host. The virtualization approach
has the advantage of relative simplicity for situations where all users of
a host are to be completely isolated from each other.
Xen, currently, is at version 2.0.6. It provides secure isolation,
resource control, quality of service guarantees, live migration of virtual
machines, and an execution speed which is "close to native" on the x86
architecture. As a para-virtualization system, Xen requires that the guest
kernel be ported to its virtual architecture; ports exist for NetBSD,
FreeBSD, Plan9, Solaris, and, of course, Linux. The first virtual machine
("domain 0") is special; it is used for a number of Xen configuration
tasks and often provides services to other virtual hosts.
Xen itself runs as a thin layer between the guest and the host operating
system. Guests normally run autonomously, as separate processes; they call
into the hypervisor for privileged operations. The number of modifications
to the guest kernel is relatively small; beyond the privileged calls, the
guest must be aware that there is a difference between the time it spends
running and how much time passes in the real world. There is also an
interface for the guest to find out what resources (memory and such) have
been allocated to it, so that it can optimize its behavior accordingly.
There is an interface which allows guest systems to access devices on the
host. This interface provides virtualized access to the PCI configuration space,
intermediated by the hypervisor; guests can also map device MMIO space into
their address spaces. Interrupts are delivered by way of the hypervisor.
Virtual systems can perform DMA; this can be a security problem if the host
system (like most x86 systems) lacks an I/O memory management unit. For
this reason, and others, devices are often handled by the "domain 0"
guest and exported to other guests.
The Xen developers are clearly proud of the virtual machine migration
feature. The migration code has been carefully written to minimize the
impact on the host system and to avoid creating downtime for the guest.
When the decision is made to move a virtual system, Xen will start copying
the guest's memory over to the new host while the guest continues to run.
The guest will thus continue to create dirty pages, and some pages will be
changed after they are copied. So an iterative technique is used; each
pass copies (hopefully) fewer pages, and gets closer to creating a full,
current copy on the new host. The final stage is to stop the guest, copy
any remaining memory and other state, then start the guest on the new
system. The actual downtime can be far less than one second; Ian showed
traces from a move of a Quake server; the server was stopped for some 50ms,
and the players never noticed.
A 3.0 release is in the works. The architecture is being reworked somewhat
to move much of the platform initialization code into domain 0, making
the hypervisor smaller (and easier to audit). Things like PCI and ACPI
initialization will move in this way; that work has already been done in
Linux, after all. There will be support for access to video devices from
guest systems; this is apparently a plot to force the Xen developers to run
it on their desktops and fix bugs more quickly. There will be ports to a
number of new platforms, including x86-64, ia64, and (a little later) the
PowerPC. Support will be added for the x86 architecture running in the PAE
mode, allowing Xen to be run on systems with large amounts of memory. Xen
will allow the creation of SMP guest operating systems; in fact, it will be
possible to add and remove virtual CPUs on the fly. Migration support will
be enhanced for tasks like cluster load balancing.
The 3.0 release is going into a stabilization period now. So the
developers are already looking toward 3.1. For this release, work is being
done to support Intel's VT (and AMD's "Pacifica") architecture, which
will enable full virtualization of unmodified guest operating systems. The
control tools will be enhanced, and a great deal of performance tuning will
be done. Ian notes that it is currently quite easy to configure Xen for
bad performance; it would be better if it could configure itself to perform
well. 3.1 will have at least some support for NUMA systems, for direct
access to InfiniBand devices, and more.
Looking further head, the Xen developers are contemplating whole-system
debugging, with an eye toward finding problems in large, distributed
applications. "Virtual machine forking" would be useful for the creation
of honeypots or quickly sandboxing untrusted software. "Multi-level Xen"
as a secure virtualization technique is also on the list.
The user-mode Linux project predates Xen, but, seemingly, has been eclipsed
by the publicity Xen has received in the last year. Certainly UML is on
the Xen radar; Ian Pratt took pains to mention a few places where Xen was
able to claim better performance than UML. Jeff Dike's UML talk, instead,
looked at where that project was going without a single mention of the
competition. UML is alive, well, and currently undergoing significant
UML is adding support for the Intel VT mechanism. Jeff figures that the
work should apply well to AMD's Pacifica offering, but VT is the main
priority now. (That is not entirely surprising, once one realizes that this
work is being done by Intel engineers.) The VT extension allows the creation of a
complete virtual processor within the hardware. The virtual system is
essentially indistinguishable for the "real" host, but certain privileged
operations trap back to the host system, rather than being executed in the
User-mode Linux will, when running under VT, run in ring 0, just
like a real kernel. Most system calls made by processes running inside the
guest will trap directly into the guest kernel; the host will not be
involved at all. When the guest itself must make a call to the host
system, it forces a trap with the VMCALL instruction. Despite the fact
that UML now runs in ring 0, it is still a user-mode process, and thus
still deserves its name.
The big benefit to this mode of operation is performance. A number of the
things which currently hurt UML, such as the cost of implementing system
calls in the guest, just go away. Further work, such as in the adoption of
some variant of the dynamic tick
patch, should also help improve performance.
Actually making this work requires the incorporation of a simple hypervisor
into the host system kernel. The hypervisor will handle getting UML
started as a guest system, and will be invoked when the guest makes a
system call or springs some other sort of trap. This work is essentially
complete (Jeff credited Asit Mallick, Suresh Siddha, Gennady Sharapov, and
Mikhail Kharitonov for the actual work). By the time systems with VT are
available, UML should be close to being in a position to make full use of
A virtual conclusion
Virtualization is clearly a hot topic at the moment; no other subject was
covered by so many talks at OLS. Money is being spent, companies have been
formed, and people clearly expect this stuff to go somewhere. Computers
are clearly valuable, as witnessed by the fact that we have created so many
of them. So it makes sense that people will want to create even more
computers in software. When the hype settles and the technology
stabilizes, we'll probably find that, while virtualization has not changed
the world, it has added a tool which proves to be useful in a number of
to post comments)