LWN.net Logo

SmartOS: virtualization with ZFS and KVM

September 21, 2011

This article was contributed by Koen Vervloesem

On August 15, at the KVM forum 2011, Bryan Cantrill, VP Engineering at Joyent, gave a presentation entitled "Experiences Porting KVM to SmartOS." The SmartOS in the title is Joyent's illumos-based operating system that is the foundation of its public cloud and its SmartDataCenter product. With this talk, Cantrill essentially announced that Joyent has ported KVM to the illumos (Solaris) kernel.

Thanks to its illumos base, Joyent's SmartOS already had several key features for a cloud operating system, such as the ZFS file system, the dynamic tracing possibilities of DTrace, network virtualization with Crossbow, and operating system-level virtualization (Zones) to isolate virtual operating systems, all running on the same kernel. However, one essential piece was missing in this puzzle of enterprise technologies: hardware virtualization. Granted, a few years ago OpenSolaris had Xen Dom0 support (called xVM), even with hardware virtualization, but the project was abandoned even before Oracle walked away from OpenSolaris.

Joyent (which is a member of the Open Virtualization Alliance dedicated to the awareness and adoption of KVM) believes in the thesis that the best hypervisor is the host operating system itself, because anyone attempting to implement a thin hypervisor would end up retracing the history of operating systems. This is exactly the vision of KVM, so when Joyent decided in the fall of last year that it needed to port KVM to SmartOS, this was a natural (but not trivial) choice.

Because its resources were constrained, Joyent decided to focus exclusively on KVM support for Intel processors. More specifically, a machine running KVM on illumos needs an Intel processor with VT-x and EPT (Extended Page Tables), such as the Nehalem Core i3/i5/i7. However, the developers made sure that they didn't make decisions that would impede later AMD support. Also, only x86-64 hosts and x86 and x86-64 guests are supported. Apart from these constraints, one of the design goals was that the KVM port to illumos would maintain compatibility with the QEMU/KVM interface as much as possible.

Porting an unportable component

At first sight, it seems impossible to port a component that is essentially not designed to be portable: KVM is very specific to Linux. Some of the Linux-specific facilities that KVM uses could be emulated in the illumos kernel, but, because of the big differences between the Linux and the illumos kernel, this would not be a clean solution. Joyent engineer Max Bruning started working on the port in the fall of 2010 by copying the KVM bits from the stable Linux 2.6.34 source and getting it to compile on illumos; in April 2011 he was joined by Robert Mustacchi and Bryan Cantrill. As Cantrill explained in his presentation, DTrace (invented by Cantrill when he was working at Sun) was essential in the porting process: it let them understand how much the still unported code was used by virtual machines.

In his blog post KVM on illumos, Cantrill explains why the porting effort was so grueling:

Because KVM is so tightly integrated into the Linux kernel, it was difficult to determine dependencies - and hard to know when to pull in the Linux implementation of a particular facility or function versus writing our own or making a call to the illumos equivalent (if any).

According to Joyent's measurements, the illumos KVM port performs as well as Linux KVM, at bare-metal speeds for entirely CPU-bound workloads. For other workloads, such as a MySQL benchmark, the performance is obviously not at full bare-metal speed, but the Linux and illumos implementations of KVM don't diverge much from each other. Tested guest operating systems include SmartOS, 64-bit Linux, Windows Server 2008 R2, FreeBSD, OpenBSD, QNX, Plan 9 and Haiku. As for VM resources, the same limitations as for Linux KVM apply: up to 256 virtual CPUs per virtual machine and up to 2 TB virtual memory.

Limitations and enhancements

Illumos KVM diverges from Linux KVM in some areas. For starters, apart from the focus on Intel and x86/x86-64 there are some limitations in functionality. As a cloud provider, Joyent doesn't believe in overselling memory, so it locks down guest memory in KVM: if the host hasn't enough free memory to lock down all the needed memory for the virtual machine, the guest fails to start. Using the same argument, Joyent also didn't implement kernel same-page merging (KSM), the Linux functionality for memory deduplication. According to Cantrill's presentation, it's technically possible to implement this in illumos, but Joyent doesn't see an acute need for it. Another limitation is that illumos KVM doesn't support nested virtualization.

However, tying KVM to illumos instead of Linux also makes some interesting enhancements possible. For instance, you can create ZFS volumes for your virtual machine images. At first sight, this looks like just a convenient way to store your VM images, but it really improves the virtualization experience. Because ZFS can clone volumes in constant time, you can provision new KVM guests nearly instantly if you already have a reference image. Moreover, ZFS remote replication with the zfs send and zfs receive commands makes an efficient foundation for remote cloning and migration of virtual machines over the network. To streamline this, Cantrill intends to integrate QEMU migration with ZFS remote replication. Also, because ZFS has a unified adaptive replacement cache (ARC), guest I/O will be efficiently cached in the host, resulting in improved random I/O operations.

Another improvement that illumos makes possible is the use of Zones, the OS virtualization feature. While the illumos KVM implementation doesn't require it, SmartOS runs KVM guests in a local zone, with the QEMU process as the only process running in the zone. While zones were originally used for resource management, quality of service, I/O throttling, and so on, containing QEMU in its own zone also improves security by reducing the attack surface for QEMU exploits. If an exploit 'breaks out of the virtual machine', it's still contained in the local zone and has no access to the global zone of the virtualization host or the local zones of the other virtualization guests.

Another interesting feature of illumos is network virtualization, also known as Crossbow. With a few commands, you can create a virtual network interface card (VNIC) per KVM guest. The SmartOS developers have written some glue code to connect this feature to virtio and have been able to attain 1Gbit/second data rates to/from a KVM guest. VNICs also makes managing the virtual machine's network usage more easy. Thanks to flow management, guests can be capped at specified levels of bandwidth, and guests can be confined to specified IP addresses, hereby making IP spoofing impossible.

And last but not least, the dynamic tracing possibilities of the DTrace framework let the system administrator understand the workload characteristics of his virtual machines and helps with troubleshooting. For this purpose, Joyent has added some DTrace probes to QEMU and KVM to examine the behavior of KVM guests. They have even integrated several of these metrics into their Cloud Analytics tool to visualize the KVM guest behavior in graphs. In his presentation, Cantrill even suggests that, thanks to the better visibility in guest behavior, DTrace will help in finding performance improvements for KVM, which will likely carry from illumos KVM to the original Linux implementation.

The source

Joyent was already a big contributor to illumos, the successor to the OpenSolaris community. However, their KVM port is the first major addition of functionality to the illumos source since Oracle let the OpenSolaris community die. The source code is published in two parts: a GitHub repository for KVM itself (illumos-kvm) and one with some minor patches to QEMU 0.14.1, all of which they intend to upstream (illumos-kvm-command). Other KVM-specific tooling (such as the kvmstat command for monitoring of KVM statistics) has already been upstreamed to illumos itself. According to Joyent, this port is at or near production quality.

Joyent hasn't pushed any bug fixes back to KVM, but the reason for this is quite simple: they didn't find any bugs in KVM. Cantrill explains this in an email interview:

I was actually surprised by this: while I knew that KVM broadly worked, I would have assumed that we would have found some bug at some point in KVM – but all of the bugs we found in the course of the project were essentially self-inflicted wounds. The high quality of KVM is a tribute to both Avi Kivity and to the KVM engineering team – but also to the folks who have put together the automated testing for KVM: after having met Lucas Rodrigues and the KVM autotest team, it's clear that the quality in KVM is due at least in part to a superlative verification effort.

As for the enhancements Joyent's team made to illumos KVM: all of them are specific to illumos features like ZFS, DTrace, Zones, Crossbow, kstats, and the like. As these features do not exist on a Linux host, it doesn't make any sense to upstream them.

But of course there have been a lot of changes to Linux KVM since 2.6.34, the version on which illumos KVM is based. Cantrill is not very concerned about this, he explains:

In part because KVM is so rock-solid, we are less concerned about being based on Linux 2.6.34: we are monitoring patches against 2.6.34, and will incorporate those patches into our implementation as appropriate, but we don't feel a desire to track Linux KVM any more tightly than that. The features that have been implemented since 2.6.34 are not ones that we feel strongly about integrating. For example, nested virtualization adds a tremendous amount of complexity but brings essentially no value for us – we are using KVM in a datacenter environment where nested virtualization is of dubious utility. All of this is not to say that we won't revisit this in the future – but for now we are 2.6.34-based.

The license

Of course some questions arose about the license: Joyent has copied the GPL-ed KVM code from Linux, while the illumos kernel uses the CDDL (Common Development and Distribution License). However, according to Cantrill this doesn't pose any problems. On his blog he answers a question from a reader about the issue:

Our KVM port remains GPL and its own work (and lives in its own repo) - the illumos kernel is CDDL but is in no way a derived work of our KVM port.

And on Hacker News he clarifies that their KVM port doesn't use the hooks that Linux KVM has into the Linux kernel (which are marked as EXPORT_SYMBOL_GPL in the Linux kernel): "Actually, our port does not use these hooks – there were zero mods to the illumos kernel to support KVM per se." So, although there seem to be some questions about the legality of the KVM module in illumos, the developers are fairly confident that the problems don't apply because the illumos kernel (CDDL-licensed) is not a derived work of the illumos KVM module (GPL-licensed).

SmartOS and OpenIndiana

SmartOS, for which you can find the source on GitHub (smartos-live), can be downloaded as an ISO or USB image, and it's a minimal live distribution meant to run as a virtualization server. SmartOS just boots from a USB stick or CD-ROM and runs from RAM. It's not meant to be installable, and Joyent doesn't intend to develop an installer, but with some elaborate commands it's possible to install SmartOS on a hard disk.

When running SmartOS from RAM, changes made on the running system naturally don't persist across boots. This doesn't have to be a big issue, as long as you don't want to change your operating system's configuration. Just create one or more ZFS pools with zfs create to initialize your hard disks and to be able to store data on them. However, because of the transient nature of SmartOS, you have to manually import all pools with the zpool import command after each boot.

There's some scarce and fragmented documentation on the SmartOS wiki, with some help about creating zones, creating virtual machines, and other tasks. If you're not comfortable with the Solaris commands, you can also read the topic Finding Your Way Around a SmartMachine on the wiki of SmartMachines, Joyent's commercial cloud offering based on SmartOS.

As SmartOS is really stripped down to only have the minimal bits to work as a virtualization hosts, many tools for other purposes are lacking out-of-the-box. To install extra software, you can use pkgsrc, NetBSD's portable package manager, by downloading the pkgsrc bootstrap image and unpacking it.

If you really want to have illumos KVM installed on hard disk instead and don't want to get your hands dirty with the manual installation of SmartOS, there's another option. OpenIndiana, the spiritual successor of the OpenSolaris distribution, recently released their development build 151a exactly one year after the initial release of the distribution. It's the first build based on the illumos core and it also has integrated Joyent's KVM support. Installing OpenIndiana oi_151a gives you a Gnome desktop for a workstation or a text-based installation for headless servers, and the KVM bits can be installed with the pkg install command of the package manager IPS (Image Packaging System). The OpenIndiana wiki shows you the needed commands.

Conclusion

If anyone doubted that illumos would be able to build enough momentum, Joyent's KVM port to illumos and the subsequent illumos-based OpenIndiana development release have surely answered these doubts. Illumos appears to be here to stay, and it offers a lot of interesting technology, such as ZFS, DTrace, Crossbow, Zones, and now KVM. For Linux users who were interested in these Solaris technologies but wouldn't want to lose their favorite hypervisor KVM, SmartOS and OpenIndiana are now able offer the best of both worlds.


(Log in to post comments)

SmartOS: virtualization with ZFS and KVM

Posted Sep 22, 2011 7:11 UTC (Thu) by kragilkragil2 (guest, #76172) [Link]

Great interesting project, thanks for the article. I didn't think the KVM code is totally flawless(tm) (Linus seems to complain a lot about Avi).
My guess is that RMS wouldn't concurr with their licensing logic. If their reasoning is true MS could incorporate BtrFS into Windows as long as they release their BtrFS changes. That is probably not like the FSF envisioned the GPL.

What would be wrong with MS porting btrfs to Windows?

Posted Sep 22, 2011 12:29 UTC (Thu) by NRArnot (subscriber, #3033) [Link]

What would be the difference between a port of btrfs to Windows and the port of (say) Ghostscript to Windows? Lots of open-source projects do have Windows ports these days.

The difference I can see is that a filesystem operates down at the kernel level, and Microsoft would have to prove that they had not turned Windows into a derivation of btrfs. But if Windows booted from FAT or NTFS and then loaded a btrfs module from its boot partition, without any btrfs code or derived code being compiled into Windows, then why not? (Obviously, they'd have to comply with the GPL and offer the source of their btrfs module).

Whether that's do-able, I have no idea. Windows source is secret, btrfs source I haven't read. Related - there's the Linux network module which encapsulates and executes an NDIS driver written for Windows under Linux. A kluge, but no legal challenges that I've heard of.

In passing, I'd love to see a port of Linux LVM to Windows - probably less of a technical challenge, though I'd guess only MS could do it and MS probably has many reasons not to.

SmartOS: virtualization with ZFS and KVM

Posted Sep 22, 2011 15:28 UTC (Thu) by ballombe (subscriber, #9523) [Link]

More to the point: someone could port ZFS to Linux and make the same 'not a derived work' claim.

SmartOS: virtualization with ZFS and KVM

Posted Sep 22, 2011 17:32 UTC (Thu) by zooko (subscriber, #2589) [Link]

SmartOS: virtualization with ZFS and KVM

Posted Sep 25, 2011 1:09 UTC (Sun) by Rudd-O (subscriber, #61155) [Link]

WOOHOO!

(I use this on my laptop and NAS. It's brutally good.)

SmartOS: virtualization with ZFS and KVM

Posted Sep 22, 2011 15:43 UTC (Thu) by aliguori (guest, #30636) [Link]

Linus complains about a lot :-) I don't think he has complained about Avi more than anyone else really. To my knowledge, it's always been process related too (form-based pull request was the most recent one).

The KVM kernel module is actually an exceptionally high quality piece of the kernel IMHO.

SmartOS: virtualization with ZFS and KVM

Posted Sep 23, 2011 21:59 UTC (Fri) by ewen (subscriber, #4772) [Link]

In case anyone else is interested, there's a copy of the slides (PDF) which doesn't involve dealing with Slideshare (Slideshare needs Flash and/or a login, and historically seemed to display slides at a tiny size; it's always struck me as the least sharing way to "share" slides :-(). Also the talk appears to be on YouTube now (40 minutes long), apparently uploaded by Joyent themselves.

Ewen

Derived works

Posted Sep 27, 2011 21:06 UTC (Tue) by robbe (subscriber, #16131) [Link]

I think Cantrill goes against strawmen here. Obviously, Illumos is not a work derived from KVM. But loading the KVM module into it may very well generate a derived work. If you buy into that, they are giving their customers a copyright-infringement kit. Either you put them on the hook for that (it works for bittorrent trackers...) or just go directly after their customers.

Derived works

Posted Oct 3, 2011 17:13 UTC (Mon) by filipjoelsson (subscriber, #2622) [Link]

The GPL2 is a distribution license, not a EULA. What you can do with the source is not regulated at all, unless you distribute the result. So could you please tell us what part of the copyright a user infringes upon by inserting a module?
Now, giving away a running laptop with the KVM may well be infingment, but it seems like a marginal case...

Nested virtualisation

Posted Sep 27, 2011 21:11 UTC (Tue) by robbe (subscriber, #16131) [Link]

Nested virtualisation is quite useful in our datacenter, where we use it to test the hypervisor itself and train people on it. Doing that in a VM that can be cloned, snapshotted, etc. beats diddling about with bare-metal anytime.

Every problem in the datacenter can be solved via another layer of virtualisation.

Why not just use Linux?

Posted Sep 29, 2011 11:52 UTC (Thu) by slashdot (guest, #22014) [Link]

Why not just use Linux instead of wasting time on porting KVM to Solaris?

Why not just use Linux?

Posted Sep 29, 2011 13:06 UTC (Thu) by anselm (subscriber, #2796) [Link]

Judging from their web site they are big fans of ZFS. If ZFS, and in particular the speed of ZFS, is your major selling point, it makes sense to hang on to Solaris, especially if you manage to port KVM in a way that doesn't violate its license. Consider that the license on ZFS precludes putting it into the Linux kernel, so running Linux is currently not a viable proposition for ZFS speed freaks.

Why not just use Linux?

Posted Sep 29, 2011 17:32 UTC (Thu) by dlang (✭ supporter ✭, #313) [Link]

if they can beat the license incompatibility of putting Linux code into Solaris, why wouldn't the exact same approach work for putting Solaris code into Linux?

If they can't beat the license incompatibility, they can't distribute the result anyway.

Why not just use Linux?

Posted Sep 30, 2011 0:23 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

Hasn't Linus also stated that even if ZFS were GPL he wouldn't accept it? Something about it doing too much in an fs layer or something.

Videos of presentation available

Posted Oct 5, 2011 22:29 UTC (Wed) by dowdle (subscriber, #659) [Link]

Just wanted to mention that the video of this presentation has been posted on youtube and on archive.org:

http://www.youtube.com/watch?v=cwAfJywzk8o (flv and mp4)

http://www.archive.org/details/KvmForum2011Presentations (webm)

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds