LWN.net Logo

AlacrityVM

By Jake Edge
August 5, 2009

While virtualization has been a boon for many users and data centers, it tends to suffer from performance problems, particularly I/O performance. Addressing that problem is the goal of a newly announced project, AlacrityVM, which has created a hypervisor based on KVM. By shortening the I/O path for guests, AlacrityVM seeks to provide I/O performance near that of "bare metal" hardware.

The project is in a "pre-alpha" stage, according to the web page, but it is already reporting some fairly impressive results from a proof-of-concept network driver. Both for throughput and latency, the AlacrityVM guest performance compared favorably to that of 2.6.28 and 2.6.29-rc8 hosts. It also clearly out-performed the virtio drivers in a KVM guest.

The major change that allows AlacrityVM to achieve those gains come from a new kernel-based virtual I/O scheme known as Virtual-Bus (or vbus). Currently, KVM guests use emulated devices—implemented in user space by QEMU—in order to handle I/O requests. That leads to multiple kernel-to-user-space transitions for each I/O operation. The idea behind vbus is to allow guests to directly access the host kernel driver, thus reducing the overhead for I/O.

Using vbus, a host administrator can define a virtual bus that contains virtual devices—closely patterned on the Linux device model—which allow access to the underlying kernel driver. The guest accesses the bus through vbus guest drivers and will only be able to use those devices that the administrator explicitly instantiates on that vbus. The vbus interface supports only two "verbs": call() for synchronous requests, and shm() for asynchronous communication using shared memory.

A document [PDF] by AlacrityVM developer Gregory Haskins describes how to configure and use vbus. Vbus provides a sysfs interface that an administrator can use to create container-like objects that will constrain guests so that they can only access those devices specifically configured for their use. That helps alleviate one of the potential problems with guests accessing kernel drivers more-or-less directly: security.

The vbus web page has a look at the security issues and how they are handled. The main concerns are ensuring that guests cannot use the vbus mechanism to escape their isolation from other guests and processes, as well as making sure that guests cannot cause a denial of service on the host. The bus can only be created and populated on the host side, and each lives in an isolated namespace, which reduces or eliminates the risk of a cross-bus exploit to violate the isolation. In addition, each task can only be associated with one vbus—enforced by putting a vbus reference in the task struct—so that a guest can only see the device ids specified for that bus.

Care was taken in the vbus implementation to punish guests for any misbehavior, rather than the host. The two areas mentioned are for guests that, maliciously or otherwise, mangle data structures in the shared memory or fail to service their ring buffer. A naïve implementation could allow these conditions to cause a denial of service by stalling host OS threads or by creating a condition that might normally be handled by a BUG_ON(). Vbus takes steps to ensure that the host to guest path is resistant to stalling, while also aborting guests that write garbage to the ring buffer data structures.

Haskins has posted a series of patches to add the vbus infrastructure, along with a driver for accelerated ethernet. So far, the patches seem to be fairly well-received, though there are not, yet, very many comments. The web page makes it clear that the project's goal is "to work towards upstream acceptance of the project on a timeline that suits the community". The flexibility shown in that goal should serve the project well in getting mainline acceptance down the road.

The project sums up its status and future plans on the web page as well: "we have a working design which includes the basic hypervisor, linux-guest support, and accelerated networking. We will be expanding this to include other areas of importance, such as accelerated disk-io, IPC, real-time extensions, and accelerated MS Windows guest support." As one might guess, the web page also has mailing lists for users and developers as well as kernel and user-space git trees available for interested folks.

AlacrityVM and vbus both look to be interesting projects, that are probably worth investigating as potential virtualization solutions sometime in the future. The performance gains that come with vbus make it likely to be useful to other projects as well.


(Log in to post comments)

AlacrityVM

Posted Aug 6, 2009 7:56 UTC (Thu) by rahulsundaram (subscriber, #21946) [Link]

Is this a series of patches on top of KVM or a fork of some sort? Seems to missing some more history or context.

paravirtulised drivers?

Posted Aug 6, 2009 10:21 UTC (Thu) by alex (subscriber, #1355) [Link]

I seem to remember a virtulisation session at OLS 2008 where the discussion was both Xen and KVM moving to common paravirtualised drivers for guests to speed up I/O. Isn't this basically the same thing?

paravirtulised drivers?

Posted Aug 7, 2009 13:09 UTC (Fri) by ghaskins (guest, #49425) [Link]

> I seem to remember a virtulisation session at OLS 2008 where the discussion was
> both Xen and KVM moving to common paravirtualised drivers for guests to speed
> up I/O. Isn't this basically the same thing?

Its the same in terms of these are also PV drivers. Its different in that this is much faster PV
infrastructure than the currently deployed versions. Ideally we will be able to use the same
drivers and just swap out the inefficient part in the hypervisor side. As of right now, the drivers
are also different (venet vs virtio-net), but this may change in the future.

In addition, it is also infrastructure that allows us to do new kinds of PV operations, such as
supporting real-time guests.

(Note: the graphs posted are against the virtio-net based PV drivers that were probably the
result of that presentation you saw at OLS)

-Greg

AlacrityVM

Posted Aug 6, 2009 13:12 UTC (Thu) by ghaskins (guest, #49425) [Link]

The patches apply on top of KVM

AlacrityVM

Posted Aug 6, 2009 10:04 UTC (Thu) by dunlapg (subscriber, #57764) [Link]

And what kind of support does this require in the guest? Is this another paravirtualized interface (along the lines of Xen's frontend/backend interface)?

AlacrityVM

Posted Aug 6, 2009 13:18 UTC (Thu) by ghaskins (guest, #49425) [Link]

> what kind of support does this require in the guest?

You load drivers in the guest for various IO subsystems (network, disk, etc)

> Is this another paravirtualized interface

Yes, though it is not entirely orthogonal. For instance, it is possible to tunnel existing PV protocols
over it (e.g. virtio-net). This means you swap out the low-level protocol (virtio-pci is exchanged
for virtio-vbus) but the higher layer PV drivers (virtio-net, virtio-blk) remain unchanged.

-Greg

Virtualization and InfiniBand

Posted Aug 7, 2009 17:46 UTC (Fri) by abacus (guest, #49001) [Link]

I'm surprised that although several virtualization implementations are looking at high-speed I/O that none of these is using the InfiniBand (IB) stack. With current IB hardware data rates of up to 2.4 GB/s are possible -- between different systems. This is because the IB stack has been designed for high throughput and low latency. Linux' IB stack already has implementations of networking and storage drivers. So implementing a single IB driver would allow virtualization software to reuse the drivers in the IB stack.

Virtualization and InfiniBand

Posted Aug 7, 2009 18:04 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

This is because the IB stack has been designed for high throughput and low latency.

As opposed to what? Doesn't every protocol seek to be fast? Does IB make different tradeoffs than the alternatives?

Virtualization and InfiniBand

Posted Aug 7, 2009 18:37 UTC (Fri) by abacus (guest, #49001) [Link]

> > This is because the IB stack has been designed for high throughput and low latency.

> As opposed to what? Doesn't every protocol seek to be fast? Does IB make different tradeoffs than the alternatives?

IB is a technology that comes from the supercomputing world and that has a higher throughput and a lower latency than any other popular storage or networking technology (IDE, SATA, 10 GbE, ...). Key features of IB are support for zero-copy I/O (RDMA) and the possibility of performing I/O without having to invoke any system call, even for user-space processes.

Some impressive graphs can be found in this paper: Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms.

Note: I'm not affiliated with any vendor of IB equipment.

Virtualization and InfiniBand

Posted Aug 7, 2009 20:31 UTC (Fri) by giraffedata (subscriber, #1954) [Link]

has a higher throughput and a lower latency than any other popular storage or networking technology (IDE [ATA], SATA, 10 GbE, ...)

Nonetheless, ATA, SATA, and 10 GbE were all designed to have high throughput and low latency. So one can't say that being designed for high throughput and low latency sets IB apart from them.

So what does? Were the IB engineers just smarter? Did they design for higher cost of implementation? Did they design for implementation technology that wasn't available when the alternatives were designed?

Virtualization and InfiniBand

Posted Aug 7, 2009 21:05 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

IB is faster and lower latency, but is significantly more expensive, has shorter cable length restrictions, (IIRC) many more wires in the cables (which make them more expensive and more fragile)

IB was designed for a system interconnect within a rack (or a couple of nearby racks)

ATA and SATA aren't general interconnects, they are drive interfaces

10 GbE is a fair comparison for IB, but it was designed to allow longer cable runs, with fewer wires in the cable (being fairly compatible with existing cat5 type cabling)

Virtualization and InfiniBand

Posted Aug 8, 2009 7:24 UTC (Sat) by abacus (guest, #49001) [Link]

What you wrote above about cabling is correct but completely irrelevant in this discussion. What I proposed is to use the IB API's (RDMA) and software stack (IPoIB, SDP, iSER, SRP, ...) for communication between a virtual machine and the host system. In such a setup no physical cables are necessary. An additional kernel driver will be necessary in the virtual machine however that implements the RDMA API and allows communication between guest and host.

Virtualization and InfiniBand

Posted Aug 8, 2009 9:27 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

if you are talking a virtual interface, why would you use either?

define a driver that does page allocation tricks to move data between the client and the host for zero-copy communication. at that point you beat anything that's designed for a real network.

then you can pick what driver to run on top of this interface, SCSI, IP, custom depending on what you are trying to talk to on the other side.

Virtualization and InfiniBand

Posted Aug 8, 2009 10:52 UTC (Sat) by abacus (guest, #49001) [Link]

As I wrote above, implementing an IB driver would allow to reuse a whole software stack (called OFED) and the implementation of several communication protocols. Yes it is possible to develop all this from scratch, but that is more or less like reinventing the wheel.

Virtualization and InfiniBand

Posted Aug 13, 2009 4:54 UTC (Thu) by jgg (guest, #55211) [Link]

10GBASE-T is not compatible with Cat5e, it needs Cat6 cabling. It is also still a pipe dream, who knows what process node will be necessary to get acceptable cost+power. All 10GIGE stuff currently is deployed with CX-4 (Identical to SDR IB) or XFP/SFP+ (still surprisingly expensive).

The big 10GIGE vendors are desperately pushing the insane FCoE stuff to try and get everyone to re-buy all their FC switches and HBAs since 10GIGe has otherwise been a flop. IB is 4x as fast, and 1/4th the cost of 10GIGE stuff from CISCO :|

Virtualization and InfiniBand

Posted Aug 8, 2009 7:42 UTC (Sat) by abacus (guest, #49001) [Link]

Yes, 10 GbE has been designed for low latency. But look at the numbers: the best performing 10 GbE interface today (Chelsio) has a latency of 3.8 us [1] while recent IB interfaces have a latency of 1.8 us [2]. The difference is small but it matters when communicating messages that are less than 64 KB in size. And IB interfaces do not cost more than 10 GbE interfaces that support iWARP.
IB interfaces have a lower latency than 10 GbE interfaces because the whole IB stack has been designed for low latency while 10 GbE had to remain compatible with Ethernet.

References:
  1. Chelsio about The Cybermedia Center at Osaka University.
  2. Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms.

Virtualization and InfiniBand

Posted Aug 8, 2009 9:24 UTC (Sat) by dlang (✭ supporter ✭, #313) [Link]

yes, IB has lower latencies than 10GB ethernet

10G ethernet isn't low latencies at all costs. it benifits/suffers from backwards compatibility issues.

it also allows for longer cable runs than IB

it's not that one is alwaysbetter than the other, it's that each has it's use

if you are wiring a cluster of computers and price is not an issue, then IB clearly wins

if you are wiring a building, than IB _can't_ do the job, but 10G Ethernet can, so it clearly wins

Virtualization and InfiniBand

Posted Aug 8, 2009 11:00 UTC (Sat) by abacus (guest, #49001) [Link]

As I wrote above, the physical limitations of IB are not relevant in the context of the AlacrityVM project. These limitations only apply to the physical layer of the IB protocol and not to the higher communication layers. By the way, the InfiniBand Architecture Specification is available online. And

Virtualization and InfiniBand

Posted Aug 29, 2009 12:18 UTC (Sat) by abacus (guest, #49001) [Link]

Note: there are already kernel drivers in the Linux kernel that use this concept for communication between a virtual machine and the hypervisor or another virtual machine. These drivers are ibmvscsi (initiator, runs in the virtual machine) and ibmvstgt (target, runs in the entity exporting the SCSI device). See also Virtual SCSI adapters for more information.

Virtualization and InfiniBand

Posted Aug 29, 2009 17:13 UTC (Sat) by giraffedata (subscriber, #1954) [Link]

You've given an example I actually know something about, so I can comment further. You're talking about the mechanism used on IBM's System P processors (which come standard with virtual machines) to allow a server in virtual machine S to present a SCSI disk device to client virtual machine C.

The server in S bases its disk devices on real SCSI disk devices (e.g. it splits a 10G real SCSI disk into 5 2G disks, one for each of 5 client virtual machines), and the actual data transfer is conventional DMA done by the real SCSI HBA to the memory of C, using hypervisor facilities specifically designed for this I/O server VM application.

AFAICS the only infiniband-related part of this is SRP (SCSI RDMA (Remote DMA) Protocol). SRP is how the program running in S initiates (by communicating with C) that DMA into memory C owns, much as a server at the end of an IB cable might set up to transmit data down the IB wire into the client's memory.

And as I recall, SRP is simple and not especially fast or low-latency -- just what anybody would design if he needed to communicate DMA parameters. A person could be forgiven for just reinventing SRP for a particular application instead of learning SRP and reusing SRP code.

Virtualization and InfiniBand

Posted Sep 2, 2009 8:02 UTC (Wed) by xoddam (subscriber, #2322) [Link]

Wish *I* got a System P processor standard with *my* virtual machine!

High-performance I/O in Virtual Machines

Posted Aug 8, 2009 11:31 UTC (Sat) by abacus (guest, #49001) [Link]

In the past research has been carried out about high-performance I/O in virtual machines running on the Xen hypervisor and in a system equipped with an InfiniBand HCA. See also Jiuxing Liu e.a., High Performance VMM-Bypass I/O in Virtual Machines, USENIX 2006, Boston.

High-performance I/O in Virtual Machines

Posted Aug 13, 2009 4:43 UTC (Thu) by jgg (guest, #55211) [Link]

The new IB HCAs when combined with the snazzy PCI express virtualization stuff let the guest safely talk directly to the hardware and this whole issue becomes fairly moot. I've heard some of the FCoE chips will be able to do the same thing too. Any serious deployment with shared storage will want to go that way.

High-performance I/O in Virtual Machines

Posted Aug 13, 2009 13:38 UTC (Thu) by mcmanus (subscriber, #4569) [Link]

"the snazzy PCI express virtualization stuff let the guest safely talk directly to the hardware"

SR-IOV et al is really cool, but it still leaves open the problems of hairpin routing, firewall enforcement, etc where the alacrity approach really helps. There is talk of hardware answers to that too, but its further down the pipe. This kind of hardware is going to be substantially more expensive in the short term as well, so having more efficient answers in software is a plus for the ecosystem overall.

Also interesting, not long after this article was published there was a patch made available to virtio (for pure kvm) that reduces the trips through userspace for that code too: https://lists.linux-foundation.org/pipermail/virtualizati...

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds