| LWN.net needs you! Without subscribers, LWN would simply not exist. Please consider signing up for a subscription and helping to keep LWN publishing |
On August 15, at the KVM forum 2011, Bryan Cantrill, VP Engineering at Joyent, gave a presentation entitled "Experiences Porting KVM to SmartOS." The SmartOS in the title is Joyent's illumos-based operating system that is the foundation of its public cloud and its SmartDataCenter product. With this talk, Cantrill essentially announced that Joyent has ported KVM to the illumos (Solaris) kernel.
Thanks to its illumos base, Joyent's SmartOS already had several key features for a cloud operating system, such as the ZFS file system, the dynamic tracing possibilities of DTrace, network virtualization with Crossbow, and operating system-level virtualization (Zones) to isolate virtual operating systems, all running on the same kernel. However, one essential piece was missing in this puzzle of enterprise technologies: hardware virtualization. Granted, a few years ago OpenSolaris had Xen Dom0 support (called xVM), even with hardware virtualization, but the project was abandoned even before Oracle walked away from OpenSolaris.
Joyent (which is a member of the Open Virtualization Alliance dedicated to the awareness and adoption of KVM) believes in the thesis that the best hypervisor is the host operating system itself, because anyone attempting to implement a thin hypervisor would end up retracing the history of operating systems. This is exactly the vision of KVM, so when Joyent decided in the fall of last year that it needed to port KVM to SmartOS, this was a natural (but not trivial) choice.
Because its resources were constrained, Joyent decided to focus exclusively on KVM support for Intel processors. More specifically, a machine running KVM on illumos needs an Intel processor with VT-x and EPT (Extended Page Tables), such as the Nehalem Core i3/i5/i7. However, the developers made sure that they didn't make decisions that would impede later AMD support. Also, only x86-64 hosts and x86 and x86-64 guests are supported. Apart from these constraints, one of the design goals was that the KVM port to illumos would maintain compatibility with the QEMU/KVM interface as much as possible.
At first sight, it seems impossible to port a component that is essentially not designed to be portable: KVM is very specific to Linux. Some of the Linux-specific facilities that KVM uses could be emulated in the illumos kernel, but, because of the big differences between the Linux and the illumos kernel, this would not be a clean solution. Joyent engineer Max Bruning started working on the port in the fall of 2010 by copying the KVM bits from the stable Linux 2.6.34 source and getting it to compile on illumos; in April 2011 he was joined by Robert Mustacchi and Bryan Cantrill. As Cantrill explained in his presentation, DTrace (invented by Cantrill when he was working at Sun) was essential in the porting process: it let them understand how much the still unported code was used by virtual machines.
In his blog post KVM on illumos, Cantrill explains why the porting effort was so grueling:
According to Joyent's measurements, the illumos KVM port performs as well as Linux KVM, at bare-metal speeds for entirely CPU-bound workloads. For other workloads, such as a MySQL benchmark, the performance is obviously not at full bare-metal speed, but the Linux and illumos implementations of KVM don't diverge much from each other. Tested guest operating systems include SmartOS, 64-bit Linux, Windows Server 2008 R2, FreeBSD, OpenBSD, QNX, Plan 9 and Haiku. As for VM resources, the same limitations as for Linux KVM apply: up to 256 virtual CPUs per virtual machine and up to 2 TB virtual memory.
Illumos KVM diverges from Linux KVM in some areas. For starters, apart from the focus on Intel and x86/x86-64 there are some limitations in functionality. As a cloud provider, Joyent doesn't believe in overselling memory, so it locks down guest memory in KVM: if the host hasn't enough free memory to lock down all the needed memory for the virtual machine, the guest fails to start. Using the same argument, Joyent also didn't implement kernel same-page merging (KSM), the Linux functionality for memory deduplication. According to Cantrill's presentation, it's technically possible to implement this in illumos, but Joyent doesn't see an acute need for it. Another limitation is that illumos KVM doesn't support nested virtualization.
However, tying KVM to illumos instead of Linux also makes some interesting enhancements possible. For instance, you can create ZFS volumes for your virtual machine images. At first sight, this looks like just a convenient way to store your VM images, but it really improves the virtualization experience. Because ZFS can clone volumes in constant time, you can provision new KVM guests nearly instantly if you already have a reference image. Moreover, ZFS remote replication with the zfs send and zfs receive commands makes an efficient foundation for remote cloning and migration of virtual machines over the network. To streamline this, Cantrill intends to integrate QEMU migration with ZFS remote replication. Also, because ZFS has a unified adaptive replacement cache (ARC), guest I/O will be efficiently cached in the host, resulting in improved random I/O operations.
Another improvement that illumos makes possible is the use of Zones, the OS virtualization feature. While the illumos KVM implementation doesn't require it, SmartOS runs KVM guests in a local zone, with the QEMU process as the only process running in the zone. While zones were originally used for resource management, quality of service, I/O throttling, and so on, containing QEMU in its own zone also improves security by reducing the attack surface for QEMU exploits. If an exploit 'breaks out of the virtual machine', it's still contained in the local zone and has no access to the global zone of the virtualization host or the local zones of the other virtualization guests.
Another interesting feature of illumos is network virtualization, also known as Crossbow. With a few commands, you can create a virtual network interface card (VNIC) per KVM guest. The SmartOS developers have written some glue code to connect this feature to virtio and have been able to attain 1Gbit/second data rates to/from a KVM guest. VNICs also makes managing the virtual machine's network usage more easy. Thanks to flow management, guests can be capped at specified levels of bandwidth, and guests can be confined to specified IP addresses, hereby making IP spoofing impossible.
And last but not least, the dynamic tracing possibilities of the DTrace framework let the system administrator understand the workload characteristics of his virtual machines and helps with troubleshooting. For this purpose, Joyent has added some DTrace probes to QEMU and KVM to examine the behavior of KVM guests. They have even integrated several of these metrics into their Cloud Analytics tool to visualize the KVM guest behavior in graphs. In his presentation, Cantrill even suggests that, thanks to the better visibility in guest behavior, DTrace will help in finding performance improvements for KVM, which will likely carry from illumos KVM to the original Linux implementation.
Joyent was already a big contributor to illumos, the successor to the OpenSolaris community. However, their KVM port is the first major addition of functionality to the illumos source since Oracle let the OpenSolaris community die. The source code is published in two parts: a GitHub repository for KVM itself (illumos-kvm) and one with some minor patches to QEMU 0.14.1, all of which they intend to upstream (illumos-kvm-command). Other KVM-specific tooling (such as the kvmstat command for monitoring of KVM statistics) has already been upstreamed to illumos itself. According to Joyent, this port is at or near production quality.
Joyent hasn't pushed any bug fixes back to KVM, but the reason for this is quite simple: they didn't find any bugs in KVM. Cantrill explains this in an email interview:
As for the enhancements Joyent's team made to illumos KVM: all of them are specific to illumos features like ZFS, DTrace, Zones, Crossbow, kstats, and the like. As these features do not exist on a Linux host, it doesn't make any sense to upstream them.
But of course there have been a lot of changes to Linux KVM since 2.6.34, the version on which illumos KVM is based. Cantrill is not very concerned about this, he explains:
Of course some questions arose about the license: Joyent has copied the GPL-ed KVM code from Linux, while the illumos kernel uses the CDDL (Common Development and Distribution License). However, according to Cantrill this doesn't pose any problems. On his blog he answers a question from a reader about the issue:
And on Hacker News he clarifies that their KVM port doesn't use the hooks that Linux KVM has into the Linux kernel (which are marked as EXPORT_SYMBOL_GPL in the Linux kernel): "Actually, our port does not use these hooks – there were zero mods to the illumos kernel to support KVM per se." So, although there seem to be some questions about the legality of the KVM module in illumos, the developers are fairly confident that the problems don't apply because the illumos kernel (CDDL-licensed) is not a derived work of the illumos KVM module (GPL-licensed).
SmartOS, for which you can find the source on GitHub (smartos-live), can be downloaded as an ISO or USB image, and it's a minimal live distribution meant to run as a virtualization server. SmartOS just boots from a USB stick or CD-ROM and runs from RAM. It's not meant to be installable, and Joyent doesn't intend to develop an installer, but with some elaborate commands it's possible to install SmartOS on a hard disk.
When running SmartOS from RAM, changes made on the running system naturally don't persist across boots. This doesn't have to be a big issue, as long as you don't want to change your operating system's configuration. Just create one or more ZFS pools with zfs create to initialize your hard disks and to be able to store data on them. However, because of the transient nature of SmartOS, you have to manually import all pools with the zpool import command after each boot.
There's some scarce and fragmented documentation on the SmartOS wiki, with some help about creating zones, creating virtual machines, and other tasks. If you're not comfortable with the Solaris commands, you can also read the topic Finding Your Way Around a SmartMachine on the wiki of SmartMachines, Joyent's commercial cloud offering based on SmartOS.
As SmartOS is really stripped down to only have the minimal bits to work as a virtualization hosts, many tools for other purposes are lacking out-of-the-box. To install extra software, you can use pkgsrc, NetBSD's portable package manager, by downloading the pkgsrc bootstrap image and unpacking it.
If you really want to have illumos KVM installed on hard disk instead and don't want to get your hands dirty with the manual installation of SmartOS, there's another option. OpenIndiana, the spiritual successor of the OpenSolaris distribution, recently released their development build 151a exactly one year after the initial release of the distribution. It's the first build based on the illumos core and it also has integrated Joyent's KVM support. Installing OpenIndiana oi_151a gives you a Gnome desktop for a workstation or a text-based installation for headless servers, and the KVM bits can be installed with the pkg install command of the package manager IPS (Image Packaging System). The OpenIndiana wiki shows you the needed commands.
If anyone doubted that illumos would be able to build enough momentum, Joyent's KVM port to illumos and the subsequent illumos-based OpenIndiana development release have surely answered these doubts. Illumos appears to be here to stay, and it offers a lot of interesting technology, such as ZFS, DTrace, Crossbow, Zones, and now KVM. For Linux users who were interested in these Solaris technologies but wouldn't want to lose their favorite hypervisor KVM, SmartOS and OpenIndiana are now able offer the best of both worlds.
| Index entries for this article | |
|---|---|
| GuestArticles | Vervloesem, Koen |
SmartOS: virtualization with ZFS and KVM
Posted Sep 22, 2011 7:11 UTC (Thu) by kragilkragil2 (guest, #76172) [Link]
What would be wrong with MS porting btrfs to Windows?
Posted Sep 22, 2011 12:29 UTC (Thu) by NRArnot (subscriber, #3033) [Link]
The difference I can see is that a filesystem operates down at the kernel level, and Microsoft would have to prove that they had not turned Windows into a derivation of btrfs. But if Windows booted from FAT or NTFS and then loaded a btrfs module from its boot partition, without any btrfs code or derived code being compiled into Windows, then why not? (Obviously, they'd have to comply with the GPL and offer the source of their btrfs module).
Whether that's do-able, I have no idea. Windows source is secret, btrfs source I haven't read. Related - there's the Linux network module which encapsulates and executes an NDIS driver written for Windows under Linux. A kluge, but no legal challenges that I've heard of.
In passing, I'd love to see a port of Linux LVM to Windows - probably less of a technical challenge, though I'd guess only MS could do it and MS probably has many reasons not to.
SmartOS: virtualization with ZFS and KVM
Posted Sep 22, 2011 15:28 UTC (Thu) by ballombe (subscriber, #9523) [Link]
SmartOS: virtualization with ZFS and KVM
Posted Sep 22, 2011 17:32 UTC (Thu) by zooko (guest, #2589) [Link]
SmartOS: virtualization with ZFS and KVM
Posted Sep 25, 2011 1:09 UTC (Sun) by Rudd-O (subscriber, #61155) [Link]
(I use this on my laptop and NAS. It's brutally good.)
SmartOS: virtualization with ZFS and KVM
Posted Sep 22, 2011 15:43 UTC (Thu) by aliguori (guest, #30636) [Link]
The KVM kernel module is actually an exceptionally high quality piece of the kernel IMHO.
SmartOS: virtualization with ZFS and KVM
Posted Sep 23, 2011 21:59 UTC (Fri) by ewen (subscriber, #4772) [Link]
In case anyone else is interested, there's a copy of the slides (PDF) which doesn't involve dealing with Slideshare (Slideshare needs Flash and/or a login, and historically seemed to display slides at a tiny size; it's always struck me as the least sharing way to "share" slides :-(). Also the talk appears to be on YouTube now (40 minutes long), apparently uploaded by Joyent themselves.
Ewen
Derived works
Posted Sep 27, 2011 21:06 UTC (Tue) by robbe (subscriber, #16131) [Link]
Derived works
Posted Oct 3, 2011 17:13 UTC (Mon) by filipjoelsson (guest, #2622) [Link]
Nested virtualisation
Posted Sep 27, 2011 21:11 UTC (Tue) by robbe (subscriber, #16131) [Link]
Every problem in the datacenter can be solved via another layer of virtualisation.
Why not just use Linux?
Posted Sep 29, 2011 11:52 UTC (Thu) by slashdot (guest, #22014) [Link]
Why not just use Linux?
Posted Sep 29, 2011 13:06 UTC (Thu) by anselm (subscriber, #2796) [Link]
Judging from their web site they are big fans of ZFS. If ZFS, and in particular the speed of ZFS, is your major selling point, it makes sense to hang on to Solaris, especially if you manage to port KVM in a way that doesn't violate its license. Consider that the license on ZFS precludes putting it into the Linux kernel, so running Linux is currently not a viable proposition for ZFS speed freaks.
Why not just use Linux?
Posted Sep 29, 2011 17:32 UTC (Thu) by dlang (guest, #313) [Link]
If they can't beat the license incompatibility, they can't distribute the result anyway.
Why not just use Linux?
Posted Sep 30, 2011 0:23 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
Videos of presentation available
Posted Oct 5, 2011 22:29 UTC (Wed) by dowdle (subscriber, #659) [Link]
http://www.youtube.com/watch?v=cwAfJywzk8o (flv and mp4)
http://www.archive.org/details/KvmForum2011Presentations (webm)
Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds