LWN: Comments on "AlacrityVM"
http://lwn.net/Articles/345296/
This is a special feed containing comments posted
to the individual LWN article titled "AlacrityVM".
hourly2Virtualization and InfiniBand
http://lwn.net/Articles/350289/rss
2009-09-02T08:02:56+00:00xoddam
<div class="FormattedComment">
Wish *I* got a System P processor standard with *my* virtual machine!<br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/349773/rss
2009-08-29T17:13:25+00:00giraffedata
<p>
You've given an example I actually know something about, so I can comment further. You're talking about the mechanism used on IBM's System P processors (which come standard with virtual machines) to allow a server in virtual machine S to present a SCSI disk device to client virtual machine C.
<p>
The server in S bases its disk devices on real SCSI disk devices (e.g. it splits a 10G real SCSI disk into 5 2G disks, one for each of 5 client virtual machines), and the actual data transfer is conventional DMA done by the real SCSI HBA to the memory of C, using hypervisor facilities specifically designed for this I/O server VM application.
<p>
AFAICS the only infiniband-related part of this is SRP (SCSI RDMA (Remote DMA) Protocol). SRP is how the program running in S initiates (by communicating with C) that DMA into memory C owns, much as a server at the end of an IB cable might set up to transmit data down the IB wire into the client's memory.
<p>
And as I recall, SRP is simple and not especially fast or low-latency -- just what anybody would design if he needed to communicate DMA parameters. A person could be forgiven for just reinventing SRP for a particular application instead of learning SRP and reusing SRP code.
Virtualization and InfiniBand
http://lwn.net/Articles/349766/rss
2009-08-29T12:18:21+00:00abacus
Note: there are already kernel drivers in the Linux kernel that use this concept for communication between a virtual machine and the hypervisor or another virtual machine. These drivers are ibmvscsi (initiator, runs in the virtual machine) and ibmvstgt (target, runs in the entity exporting the SCSI device). See also <a href="http://publib.boulder.ibm.com/infocenter/systems/scope/hw/index.jsp?topic=/iphat/iphatvirtualscsi.htm">Virtual SCSI adapters</a> for more information.
High-performance I/O in Virtual Machines
http://lwn.net/Articles/346913/rss
2009-08-13T13:38:08+00:00mcmanus
<div class="FormattedComment">
"the snazzy PCI express virtualization stuff let the guest safely talk directly to the hardware"<br>
<p>
SR-IOV et al is really cool, but it still leaves open the problems of hairpin routing, firewall enforcement, etc where the alacrity approach really helps. There is talk of hardware answers to that too, but its further down the pipe. This kind of hardware is going to be substantially more expensive in the short term as well, so having more efficient answers in software is a plus for the ecosystem overall.<br>
<p>
Also interesting, not long after this article was published there was a patch made available to virtio (for pure kvm) that reduces the trips through userspace for that code too: <a href="https://lists.linux-foundation.org/pipermail/virtualization/2009-August/013525.html">https://lists.linux-foundation.org/pipermail/virtualizati...</a><br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/346812/rss
2009-08-13T04:54:14+00:00jgg
<div class="FormattedComment">
10GBASE-T is not compatible with Cat5e, it needs Cat6 cabling. It is also still a pipe dream, who knows what process node will be necessary to get acceptable cost+power. All 10GIGE stuff currently is deployed with CX-4 (Identical to SDR IB) or XFP/SFP+ (still surprisingly expensive).<br>
<p>
The big 10GIGE vendors are desperately pushing the insane FCoE stuff to try and get everyone to re-buy all their FC switches and HBAs since 10GIGe has otherwise been a flop. IB is 4x as fast, and 1/4th the cost of 10GIGE stuff from CISCO :|<br>
</div>
High-performance I/O in Virtual Machines
http://lwn.net/Articles/346811/rss
2009-08-13T04:43:42+00:00jgg
<div class="FormattedComment">
The new IB HCAs when combined with the snazzy PCI express virtualization stuff let the guest safely talk directly to the hardware and this whole issue becomes fairly moot. I've heard some of the FCoE chips will be able to do the same thing too. Any serious deployment with shared storage will want to go that way.<br>
</div>
High-performance I/O in Virtual Machines
http://lwn.net/Articles/346027/rss
2009-08-08T11:31:14+00:00abacus
In the past research has been carried out about high-performance I/O in virtual machines running on the Xen hypervisor and in a system equipped with an InfiniBand HCA. See also Jiuxing Liu e.a., <a href="http://www.cc.gatech.edu/classes/AY2007/cs8803hpc_fall/papers/dk-vmmio.pdf"><em>High Performance VMM-Bypass I/O in Virtual Machines</em></a>, USENIX 2006, Boston.
Virtualization and InfiniBand
http://lwn.net/Articles/346022/rss
2009-08-08T11:00:11+00:00abacus
As I wrote above, the physical limitations of IB are not relevant in the context of the AlacrityVM project. These limitations only apply to the physical layer of the IB protocol and not to the higher communication layers. By the way, the <a href="http://www.infinibandta.org/content/pages.php?pg=technology_download">InfiniBand Architecture Specification</a> is available online. And
Virtualization and InfiniBand
http://lwn.net/Articles/346021/rss
2009-08-08T10:52:28+00:00abacus
As I wrote above, implementing an IB driver would allow to reuse a whole software stack (called <a href="http://www.openfabrics.org/downloads/OFED/">OFED</a>) and the implementation of several communication protocols. Yes it is possible to develop all this from scratch, but that is more or less like reinventing the wheel.
Virtualization and InfiniBand
http://lwn.net/Articles/346010/rss
2009-08-08T09:27:16+00:00dlang
<div class="FormattedComment">
if you are talking a virtual interface, why would you use either?<br>
<p>
define a driver that does page allocation tricks to move data between the client and the host for zero-copy communication. at that point you beat anything that's designed for a real network. <br>
<p>
then you can pick what driver to run on top of this interface, SCSI, IP, custom depending on what you are trying to talk to on the other side.<br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/346007/rss
2009-08-08T09:24:25+00:00dlang
<div class="FormattedComment">
yes, IB has lower latencies than 10GB ethernet<br>
<p>
10G ethernet isn't low latencies at all costs. it benifits/suffers from backwards compatibility issues.<br>
<p>
it also allows for longer cable runs than IB<br>
<p>
it's not that one is alwaysbetter than the other, it's that each has it's use<br>
<p>
if you are wiring a cluster of computers and price is not an issue, then IB clearly wins<br>
<p>
if you are wiring a building, than IB _can't_ do the job, but 10G Ethernet can, so it clearly wins<br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/345988/rss
2009-08-08T07:42:02+00:00abacus
Yes, 10 GbE has been designed for low latency. But look at the numbers: the best performing 10 GbE interface today (Chelsio) has a latency of 3.8 us [1] while recent IB interfaces have a latency of 1.8 us [2]. The difference is small but it matters when communicating messages that are less than 64 KB in size. And IB interfaces do not cost more than 10 GbE interfaces that support iWARP.<br>
IB interfaces have a lower latency than 10 GbE interfaces because the whole IB stack has been designed for low latency while 10 GbE had to remain compatible with Ethernet.
<br>
<br>
References:
<ol>
<li><a href="http://www.chelsio.com/poster.html">Chelsio about The Cybermedia Center at Osaka University</a>.</li>
<li><a href="http://www.cse.ohio-state.edu/~koop/pub/surs-hoti07.pdf">Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms</a>.</li>
</ol>
Virtualization and InfiniBand
http://lwn.net/Articles/345986/rss
2009-08-08T07:24:59+00:00abacus
<div class="FormattedComment">
What you wrote above about cabling is correct but completely irrelevant in this discussion. What I proposed is to use the IB API's (RDMA) and software stack (IPoIB, SDP, iSER, SRP, ...) for communication between a virtual machine and the host system. In such a setup no physical cables are necessary. An additional kernel driver will be necessary in the virtual machine however that implements the RDMA API and allows communication between guest and host.<br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/345942/rss
2009-08-07T21:05:17+00:00dlang
<div class="FormattedComment">
IB is faster and lower latency, but is significantly more expensive, has shorter cable length restrictions, (IIRC) many more wires in the cables (which make them more expensive and more fragile)<br>
<p>
IB was designed for a system interconnect within a rack (or a couple of nearby racks)<br>
<p>
ATA and SATA aren't general interconnects, they are drive interfaces<br>
<p>
10 GbE is a fair comparison for IB, but it was designed to allow longer cable runs, with fewer wires in the cable (being fairly compatible with existing cat5 type cabling)<br>
</div>
Virtualization and InfiniBand
http://lwn.net/Articles/345936/rss
2009-08-07T20:31:46+00:00giraffedata
<blockquote>
has a higher throughput and a lower latency than any other popular storage or networking technology (IDE [ATA], SATA, 10 GbE, ...)
</blockquote>
<p>
Nonetheless, ATA, SATA, and 10 GbE were all designed to have high throughput and low latency. So one can't say that being designed for high throughput and low latency sets IB apart from them.
<P>
So what <em>does</em>? Were the IB engineers just smarter? Did they design for higher cost of implementation? Did they design for implementation technology that wasn't available when the alternatives were designed?
Virtualization and InfiniBand
http://lwn.net/Articles/345915/rss
2009-08-07T18:37:28+00:00abacus
> > This is because the IB stack has been designed for high throughput and low latency.<br><br>
> As opposed to what? Doesn't every protocol seek to be fast? Does IB make different tradeoffs than the alternatives?<br><br>
IB is a technology that comes from the supercomputing world and that has a higher throughput and a lower latency than any other popular storage or networking technology (IDE, SATA, 10 GbE, ...). Key features of IB are support for zero-copy I/O (RDMA) and the possibility of performing I/O without having to invoke any system call, even for user-space processes.<br><br>
Some impressive graphs can be found in this paper: <a href="http://www.cse.ohio-state.edu/~koop/pub/surs-hoti07.pdf">Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms</a>.<br><br>
Note: I'm not affiliated with any vendor of IB equipment.
Virtualization and InfiniBand
http://lwn.net/Articles/345912/rss
2009-08-07T18:04:15+00:00giraffedata
<blockquote>
This is because the IB stack has been designed for high throughput and low latency.
</blockquote>
<p>
As opposed to what? Doesn't every protocol seek to be fast? Does IB make different tradeoffs than the alternatives?
Virtualization and InfiniBand
http://lwn.net/Articles/345902/rss
2009-08-07T17:46:55+00:00abacus
<div class="FormattedComment">
I'm surprised that although several virtualization implementations are looking at high-speed I/O that none of these is using the InfiniBand (IB) stack. With current IB hardware data rates of up to 2.4 GB/s are possible -- between different systems. This is because the IB stack has been designed for high throughput and low latency. Linux' IB stack already has implementations of networking and storage drivers. So implementing a single IB driver would allow virtualization software to reuse the drivers in the IB stack.<br>
</div>
paravirtulised drivers?
http://lwn.net/Articles/345838/rss
2009-08-07T13:09:24+00:00ghaskins
<div class="FormattedComment">
<font class="QuotedText">> I seem to remember a virtulisation session at OLS 2008 where the discussion was</font><br>
<font class="QuotedText">> both Xen and KVM moving to common paravirtualised drivers for guests to speed</font><br>
<font class="QuotedText">> up I/O. Isn't this basically the same thing?</font><br>
<p>
Its the same in terms of these are also PV drivers. Its different in that this is much faster PV <br>
infrastructure than the currently deployed versions. Ideally we will be able to use the same <br>
drivers and just swap out the inefficient part in the hypervisor side. As of right now, the drivers <br>
are also different (venet vs virtio-net), but this may change in the future.<br>
<p>
In addition, it is also infrastructure that allows us to do new kinds of PV operations, such as <br>
supporting real-time guests.<br>
<p>
(Note: the graphs posted are against the virtio-net based PV drivers that were probably the <br>
result of that presentation you saw at OLS)<br>
<p>
-Greg<br>
</div>
AlacrityVM
http://lwn.net/Articles/345644/rss
2009-08-06T13:18:32+00:00ghaskins
<div class="FormattedComment">
<font class="QuotedText">> what kind of support does this require in the guest?</font><br>
<p>
You load drivers in the guest for various IO subsystems (network, disk, etc)<br>
<p>
<font class="QuotedText">> Is this another paravirtualized interface</font><br>
<p>
Yes, though it is not entirely orthogonal. For instance, it is possible to tunnel existing PV protocols <br>
over it (e.g. virtio-net). This means you swap out the low-level protocol (virtio-pci is exchanged <br>
for virtio-vbus) but the higher layer PV drivers (virtio-net, virtio-blk) remain unchanged.<br>
<p>
-Greg<br>
<p>
</div>
AlacrityVM
http://lwn.net/Articles/345643/rss
2009-08-06T13:12:37+00:00ghaskins
<div class="FormattedComment">
The patches apply on top of KVM<br>
</div>
paravirtulised drivers?
http://lwn.net/Articles/345624/rss
2009-08-06T10:21:58+00:00alex
<div class="FormattedComment">
I seem to remember a virtulisation session at OLS 2008 where the discussion was both Xen and KVM moving to common paravirtualised drivers for guests to speed up I/O. Isn't this basically the same thing?<br>
</div>
AlacrityVM
http://lwn.net/Articles/345621/rss
2009-08-06T10:04:31+00:00dunlapg
<div class="FormattedComment">
And what kind of support does this require in the guest? Is this another paravirtualized interface (along the lines of Xen's frontend/backend interface)?<br>
</div>
AlacrityVM
http://lwn.net/Articles/345592/rss
2009-08-06T07:56:36+00:00rahulsundaram
<div class="FormattedComment">
Is this a series of patches on top of KVM or a fork of some sort? Seems to missing some more history or context. <br>
</div>