September 21, 2011
This article was contributed by Koen Vervloesem
On August 15, at the KVM forum 2011,
Bryan Cantrill, VP Engineering at Joyent, gave a presentation entitled "Experiences
Porting KVM to SmartOS." The SmartOS in the
title is Joyent's illumos-based operating system that is the foundation of
its public cloud and its SmartDataCenter
product. With this talk, Cantrill essentially announced
that Joyent has ported KVM to the
illumos (Solaris) kernel.
Thanks to its illumos base, Joyent's SmartOS already had several key
features for a cloud operating system, such as the ZFS file system, the
dynamic tracing possibilities of DTrace, network virtualization with
Crossbow, and operating system-level virtualization (Zones) to isolate
virtual operating systems, all running on the same kernel. However, one
essential piece was missing in this puzzle of enterprise technologies:
hardware virtualization. Granted, a few years ago OpenSolaris had Xen Dom0
support (called xVM), even with hardware virtualization, but the project
was abandoned even before Oracle walked away from OpenSolaris.
Joyent (which is a member of the Open Virtualization
Alliance dedicated to the awareness and adoption of KVM) believes in
the thesis that the best hypervisor is the host operating system itself,
because anyone attempting to implement a thin hypervisor would end up
retracing the history of operating systems. This is exactly the vision of
KVM, so when Joyent decided in the fall of last year that it needed to port
KVM to SmartOS, this was a natural (but not trivial) choice.
Because its resources were constrained, Joyent decided to focus
exclusively on KVM support for Intel processors. More specifically, a
machine running KVM on illumos needs an Intel processor with VT-x and EPT
(Extended Page Tables), such as the Nehalem Core i3/i5/i7. However, the
developers made sure that they didn't make decisions that would impede
later AMD support. Also, only x86-64 hosts and x86 and x86-64 guests are
supported. Apart from these constraints, one of the design goals was that
the KVM port to illumos would maintain compatibility with the QEMU/KVM
interface as much as possible.
Porting an unportable component
At first sight, it seems
impossible to port a component that is essentially not designed to be
portable: KVM is very specific to Linux. Some of the Linux-specific
facilities that KVM uses could be emulated in the illumos kernel, but,
because of the big differences between the Linux and the illumos kernel,
this would not be a clean solution. Joyent engineer Max Bruning started
working on the port in the fall of 2010 by copying the KVM bits from the
stable Linux 2.6.34 source and getting it to compile on illumos; in
April 2011 he was joined by Robert Mustacchi and Bryan Cantrill. As
Cantrill explained in his presentation, DTrace (invented by Cantrill when
he was working at Sun) was essential in the porting process: it let them
understand how much the still unported code was used by virtual
machines.
In his blog post
KVM on
illumos, Cantrill explains why the porting effort was so grueling:
Because KVM is so tightly integrated into the Linux
kernel, it was difficult to determine dependencies - and hard to know when
to pull in the Linux implementation of a particular facility or function
versus writing our own or making a call to the illumos equivalent (if
any).
According to Joyent's measurements, the illumos KVM port performs as
well as Linux KVM, at bare-metal speeds for entirely CPU-bound
workloads. For other workloads, such as a MySQL benchmark, the performance
is obviously not at full bare-metal speed, but the Linux and illumos
implementations of KVM don't diverge much from each other. Tested guest operating
systems include SmartOS, 64-bit Linux, Windows Server 2008 R2, FreeBSD,
OpenBSD, QNX, Plan 9 and Haiku. As for VM resources, the same limitations
as for Linux KVM apply: up to 256 virtual CPUs per virtual machine and up
to 2 TB virtual memory.
Limitations and enhancements
Illumos KVM diverges from Linux
KVM in some areas. For starters, apart from the focus on Intel and
x86/x86-64 there are some limitations in functionality. As a cloud
provider, Joyent doesn't believe in overselling memory, so it locks down
guest memory in KVM: if the host hasn't enough free memory to lock down all
the needed memory for the virtual machine, the guest fails to start. Using
the same argument, Joyent also didn't implement kernel same-page merging
(KSM), the Linux functionality for memory deduplication. According to
Cantrill's presentation, it's technically possible to implement this in
illumos, but Joyent doesn't see an acute need for it. Another limitation
is that illumos KVM doesn't support nested virtualization.
However, tying KVM to illumos instead of Linux also makes some
interesting enhancements possible. For instance, you can create ZFS volumes
for your virtual machine images. At first sight, this looks like just a
convenient way to store your VM images, but it really improves the
virtualization experience. Because ZFS can clone volumes in constant time,
you can provision new KVM guests nearly instantly if
you already have a reference image. Moreover, ZFS remote replication with
the zfs send and zfs receive commands makes an
efficient
foundation for remote cloning and migration of virtual machines over the
network. To streamline this, Cantrill intends
to integrate QEMU migration with ZFS remote replication. Also, because
ZFS has a unified adaptive replacement cache (ARC), guest I/O will be
efficiently cached in the host, resulting in improved random I/O
operations.
Another improvement that illumos makes possible is the use of Zones, the
OS virtualization feature. While the illumos KVM implementation doesn't
require it, SmartOS runs KVM guests in a local zone, with the QEMU process
as the only process running in the zone. While zones were originally used
for resource management, quality of service, I/O throttling, and
so on, containing QEMU in its own zone also improves security by
reducing the attack surface for QEMU exploits. If an exploit 'breaks out of
the virtual machine', it's still contained in the local zone and has no
access to the global zone of the virtualization host or the local zones of
the other virtualization guests.
Another interesting feature of illumos is network virtualization, also
known as Crossbow. With a few commands, you can create a virtual network
interface card (VNIC) per KVM guest. The SmartOS developers have written
some glue code to connect this feature to virtio and have been able to
attain 1Gbit/second data rates
to/from a KVM guest. VNICs also makes managing the virtual machine's
network usage more easy. Thanks to flow management, guests can be capped at
specified levels of bandwidth, and guests can be confined to specified IP
addresses, hereby making IP spoofing impossible.
And last but not least, the dynamic tracing possibilities of the DTrace
framework let the system administrator understand the workload
characteristics of his virtual machines and helps with
troubleshooting. For this purpose, Joyent has added some DTrace probes to
QEMU and KVM to examine the behavior of KVM guests. They have even
integrated several of these metrics into their Cloud
Analytics tool to visualize
the KVM guest behavior in graphs. In his presentation, Cantrill even
suggests that, thanks to the better visibility in guest behavior, DTrace
will help in finding performance improvements for KVM, which will likely
carry from illumos KVM to the original Linux implementation.
The source
Joyent was already a big contributor to illumos, the
successor to the OpenSolaris
community. However, their KVM port is the first major addition of
functionality to the illumos source since Oracle
let the OpenSolaris community
die. The source code is published in two parts: a GitHub repository for
KVM itself (illumos-kvm) and one with
some minor patches to QEMU 0.14.1, all of which they intend to upstream (illumos-kvm-command). Other
KVM-specific tooling (such as the kvmstat command for
monitoring of KVM statistics) has already been upstreamed to illumos
itself. According to Joyent, this port is at or near production
quality.
Joyent hasn't pushed any bug fixes back to KVM, but the reason for this
is quite simple: they didn't find any bugs in KVM. Cantrill explains this
in an email interview:
I was actually surprised
by this: while I knew that KVM broadly worked, I would have assumed that we
would have found some bug at some point in KVM – but all of the bugs
we found in the course of the project were essentially self-inflicted
wounds. The high quality of KVM is a tribute to both Avi Kivity and to the
KVM engineering team – but also to the folks who have put together
the automated testing for KVM: after having met Lucas Rodrigues and the KVM
autotest team, it's clear that the quality in KVM is due at least in part
to a superlative verification effort.
As for the enhancements Joyent's team made to illumos KVM: all of them
are specific to illumos features like ZFS, DTrace, Zones, Crossbow, kstats,
and the like. As these features do not exist on a Linux host, it doesn't
make any sense to upstream them.
But of course there have been a lot of changes to Linux KVM since
2.6.34, the version on which illumos KVM is based. Cantrill is not very
concerned about this, he explains:
In part
because KVM is so rock-solid, we are less concerned about being based on
Linux 2.6.34: we are monitoring patches against 2.6.34, and will
incorporate those patches into our implementation as appropriate, but we
don't feel a desire to track Linux KVM any more tightly than that. The
features that have been implemented since 2.6.34 are not ones that we feel
strongly about integrating. For example, nested virtualization adds a
tremendous amount of complexity but brings essentially no value for us
– we are using KVM in a datacenter environment where nested
virtualization is of dubious utility. All of this is not to say that we
won't revisit this in the future – but for now we are
2.6.34-based.
The license
Of course some questions arose about the
license: Joyent has copied the GPL-ed KVM code from Linux, while the
illumos kernel uses the CDDL (Common Development and Distribution
License). However, according to Cantrill this doesn't pose any problems. On
his blog he answers
a question from a reader about the issue:
Our
KVM port remains GPL and its own work (and lives in its own repo) - the
illumos kernel is CDDL but is in no way a derived work of our KVM
port.
And on Hacker News he clarifies that their
KVM port doesn't use the hooks that Linux KVM has into the Linux kernel
(which are marked as EXPORT_SYMBOL_GPL in the Linux kernel):
"Actually, our port does not use these hooks –
there were zero mods to the illumos kernel to support KVM per se."
So, although there seem to be some questions about the legality of the
KVM module in illumos, the developers are fairly confident that the
problems don't apply because the illumos kernel (CDDL-licensed) is not a
derived work of the illumos KVM module (GPL-licensed).
SmartOS and OpenIndiana
SmartOS, for which you can find the
source on GitHub (smartos-live), can be downloaded as an ISO or USB
image, and it's a minimal live distribution meant to run as a
virtualization server. SmartOS just boots from a USB stick or CD-ROM and
runs from RAM. It's not meant to be installable, and Joyent doesn't intend
to develop an installer, but with
some elaborate commands it's possible to install SmartOS on a hard
disk.
When running SmartOS from RAM, changes made on the
running system naturally don't persist across boots. This doesn't have to
be a big
issue, as long as you don't want to change your operating system's
configuration. Just create one or more ZFS pools with zfs create
to initialize your hard disks and to be able to store data on
them. However, because of the transient nature of SmartOS, you have to
manually import all pools with the zpool import command after each
boot.
There's some scarce
and fragmented documentation on the SmartOS wiki, with some help about
creating
zones, creating
virtual machines, and other tasks. If you're not comfortable with
the Solaris commands, you can also read the topic Finding
Your Way Around a SmartMachine on the wiki of SmartMachines,
Joyent's commercial cloud offering based on SmartOS.
As SmartOS is really stripped down to only have the minimal bits to work
as a virtualization hosts, many tools for other purposes are lacking
out-of-the-box. To install extra software, you can use pkgsrc, NetBSD's portable package manager, by
downloading
the pkgsrc bootstrap image and unpacking it.
If you really want to have illumos KVM installed on hard disk instead
and don't want to get your hands dirty with the manual installation of
SmartOS, there's another option. OpenIndiana, the spiritual successor of
the OpenSolaris distribution, recently released their
development build 151a exactly one year after the initial release of
the distribution. It's the first build based on the illumos core and it
also has integrated Joyent's KVM support. Installing OpenIndiana oi_151a
gives you a Gnome desktop for a workstation or a text-based installation
for headless servers, and the KVM bits can be installed with the pkg
install command of the package manager IPS (Image Packaging
System). The OpenIndiana wiki shows you the needed commands.
Conclusion
If anyone doubted that illumos would be able to
build enough momentum, Joyent's KVM port to illumos and the subsequent
illumos-based OpenIndiana development release have surely answered these
doubts. Illumos appears to be here to stay, and it offers a lot of interesting
technology, such as ZFS, DTrace, Crossbow, Zones, and now KVM. For Linux
users who were interested in these Solaris technologies but wouldn't want
to lose their favorite hypervisor KVM, SmartOS and OpenIndiana are now able
offer the best of both worlds.
(
Log in to post comments)