An introduction to lguest
[Posted January 23, 2007 by corbet]
Linux cannot be said to suffer from a shortage of virtualization
solutions. What is harder to come by, however, is a paravirtualization
system which is amenable to relatively easy understanding. A relatively
recent entrant into the field, however, changes that situation
significantly. With just 6,000 lines (including the user-space code),
Rusty Russell's hypervisor
implementation,
lguest
(pronounced
rʌs.ti'vai.zər), provides a
full, if spartan paravirtualization mechanism for Linux.
The core of lguest is the lg loadable module. At initialization
time, this module allocates a chunk of memory and maps it into the kernel's
address space just above the vmalloc area - at the top, in other words. A
small hypervisor is loaded into this area; it's a bit of assembly code
which mainly concerns itself with switching between the kernel and the
virtualized guest. Switching involves playing with the page tables - what
looks like virtual memory to the host kernel is physical memory to the
guest - and managing register contents.
The hypervisor will be present in the guest systems' virtual address spaces
as well. Allowing a guest to modify the hypervisor would be bad news,
however, as that would enable the guest to escape its virtual sandbox.
Since the guest kernel will run in ring 1, normal i386 page protection
won't keep it from messing with the hypervisor code. So, instead, the
venerable segmentation mechanism is used to keep that code out of reach.
The lg module also implements the basics for a virtualized I/O
subsystem. At the lowest level, there is a "DMA" mechanism which really
just copies memory between buffers. A DMA buffer can be bound to a given
address; an attempt to perform DMA to that address then copies the memory
into the buffer. The DMA areas can be in memory which is shared between
guests, in which case the data will be copied from one guest to another and
the receiving guest will get an interrupt; this is how inter-guest
networking is implemented. If no shared DMA area is found, DMA transfers
are, instead, referred to the user-space hypervisor (described below) for
execution. Simple disk and console drivers exist as well.
Finally, the lg module implements a controlling interface accessed
via /proc/lguest - a feature which might just have to be changed
before lguest goes into the mainline. The user-space hypervisor creates a
guest by writing an "initialize" command to this file, specifying the
memory range to use, where to find the kernel, etc. This interface can
also be used to receive and execute DMA operations and send interrupts to
the guest system. Interestingly, the way to actually cause the guest to
run is to read from the control file; execution will continue until the
guest blocks on something requiring user-space attention.
Also on the kernel side is a paravirt_ops implementation
for working with the lguest hypervisor; it must be built into any
kernel which will be run as a guest. At system initialization time, this
code looks for a special signature left by the hypervisor at guest startup;
if the signature is present, it means the kernel is running under lguest.
In that situation, the lguest-specific paravirt_ops will be
installed, enabling the kernel to run properly as a guest.
The last component of the system is the user-mode hypervisor client. Its job is
to allocate a range of memory which will become the guest's "physical"
memory; the guest's kernel image is then mapped into that memory range.
The client code itself has been specially linked to sit high in the virtual
address space, leaving room for the guest system below. Once that guest
system is in place, the user-mode client performs its read on the control
file, causing the guest to boot.
A
file on the host system can become a disk image for the guest, with the
user-mode client handling the "DMA" requests to move blocks back and forth.
Network devices can be set up to perform communication between guests. The
lg network driver can also work in a loopback mode, connecting an
internal network device to a TAP device configured on the host; in this
way, guests can bind to ports and run servers.
With sufficient imagination, how all of this comes together can be seen in
the diagram to the right. The lguest client starts the process, running in
user space on the host. It allocates the memory indicated by the blue box,
which is to become the guest's virtualized physical memory, then maps in
the guest kernel. Once the user-mode client reads from
/proc/lguest, the page tables and segment descriptors are tweaked
to make the blue box seem like the entire system, and control is passed to
the guest kernel. The guest can request some services via the kernel-space
hypervisor code; for everything else, control is returned to the user-mode
client.
That is a fairly complete description of what lguest can do. There is no
Xen-style live migration, no UML-style copy-on-write disk devices, no
resource usage management beyond what the kernel already provides, etc. As
Rusty put it at linux.conf.au, lguest eschews fancy features in favor of
cute pictures of puppies. The simplicity of this code is certainly one of
its most attractive qualities; it is easy to understand and to play with.
It should have a rather easier path into the kernel than some of the other
hypervisor implementations out there. Whether it can stay simple once
people start trying to do real work with it remains to be seen.
(
Log in to post comments)