The next KS2012 ARM minisummit session discussed the virtualization work
that has been going on
for ARM. Both KVM and Xen are under development for ARM, but neither has
gotten to the point of being merged. Marc Zyngier gave an overview of the
KVM status, while Stefano Stabellini reported on Xen.
Zyngier began by noting that virtualization extensions were added to the
most recent revisions of the ARMv7 architecture. There is now a hypervisor
mode in the processor, which runs at a higher privilege level than the OS.
For KVM, physical interrupts are handled by the host, with guests only
seeing virtual interrupts. That stands in contrast to Xen where certain
physical interrupts are delivered to the guests, as Stabellini reported.
According to Olof Johansson, the virtualization model provided by ARM fits
the Xen hypervisor-based virtualization better than KVM's kernel-based model.
Paul Walmsley asked about vendors who were using the hypervisor mode for
doing cluster switch operations, and wondered how well that would work with
KVM. Zyngier said that it would work "badly", because KVM and the cluster
code would "fight" over hypervisor mode; whoever got there first would
win. Will Deacon noted that those who wanted to run KVM on their systems
would need to move the cluster code to a higher level.
In answer to a question from Magnus Damm, Zyngier said that KVM on ARM
would not support virtual machine nesting. It also would not support the
emulation of other CPUs, so the guest CPU must match that of the underlying
hardware. The QEMU developers have decided that the work necessary to do
that emulation was not worth the trouble, as one of the participants reported.
The KVM guests run at privilege level 1 (PL1), which is the level used for
normal kernels, but the host kernel runs at PL2. That means that switching
between guests requires lots of transitions, from PL1 to PL2, then back to
PL1 for the switched-to guest (and possibly to a lower privilege level
depending on what the guest is
running).
Guests get preempted whenever a physical interrupt occurs, but the guests
never see those, Zyngier said. A stage 2 page table is populated by the
host for each of the guests, and the host has a stage 1 page table. There
are no shadow page tables. Guests can also be preempted when pages need to
be faulted in via the stage 2 page tables.
Devices are mapped into the guests. The virtual CPU interface—part
of the ARM generic interrupt controller (GIC)—is mapped in as well.
It is believed that all devices can be mapped into the guests, but that has
yet to be tried. Because of that, the same kernel can be used for both
host and guests. Stabellini noted that the same is true for Xen, which is
unlike the x86 situation.
Caches and TLB entries are tagged with an 8-bit virtual machine ID (VMID).
Guests are not aware that there are no physical devices, they just poke
what they
think are hardware registers, a stage 2 translation is done, and the data
is forwarded on to the hardware. These memory-mapped IO devices are
emulated by QEMU.
Interrupts are injected into the guest by manipulating bits on the guest
stack to indicate an interrupt. Xen, on the other hand, uses a "spare"
interrupt to signal events to the guest.
There is some concern that there is no real guarantee that there is always
a free interrupt number to be used. Right now, Xen uses a fixed interrupt
number, but that will likely change.
In order to boot a KVM host, the kernel must be started in hypervisor
mode. That requires a KVM-compliant bootloader. When booting, a very
small hypervisor is loaded, whose "only purpose in life is to be
replaced". It has a simple API with just two calls, one to return a
pointer to the stub itself, and one to query whether hypervisor mode is
available. Zyngier said that he believes Xen could also use that hypervisor
stub if desired. One possible problem area is that some "other payloads"
(alternate operating systems) may not be able to handle being started with
hypervisor mode on, so there may need to be a way to turn it off in the
bootloader, Johansson said.
In contrast to KVM, Xen is a hypervisor that sits directly on the
hardware, Stabellini said. Everything else is a guest, including Linux.
All of the guests are fully aware that they are running on a hypervisor.
Xen for ARM assumes that the full virtualization extensions are present and
that nested page tables are available. Zyngier noted that KVM makes the
same assumptions.
The Xen ARM guest is based on the Versatile Express board, but with far
fewer devices defined in the device tree. The Xenbus virtualized bus is
used to add paravirtualized devices into the guest. QEMU is not used, so
there is no emulated hardware.
Xen ARM is "completely reliant" on device tree, Stabellini said. His
biggest worry is that device tree might go away for ARM as he has heard
that ACPI may be coming to ARM. The problem there is that the ACPI parser
is too large to go into the Xen hypervisor (it roughly doubles the code
size). Parsing device trees is much easier, and requires much less code, so
trying to do the same things with ACPI "would be a nightmare".
Johansson pointed out that the decision about ACPI would not be made by
Linux developers or ARM; there is a large company in Washington that will
determine that. For power management on some devices, ACPI handling may be
required. But, as Zyngier said, adding ACPI to ARM does not mean the death
of device tree.
The governance of ACPI is closed now, and that needs to change so that the
ARM community can participate, one participant said. According to Arnd
Bergmann, embedded systems will not be moving to ACPI any time soon, but
there is a real danger that it will be present on server systems. ARM
devices that are targeted at booting other OSes will be using UEFI, which
can pass the device tree to the kernel in the right format, he said.
The ARM Xen hypervisor is almost fully upstream in the Xen tree at this
point. The Linux
kernel side has been posted, and is not very intrusive, Stabellini said. The
patches to the kernel are mostly self-contained, with only small changes to
the core.
Another concern was the stabilization of the device tree format. If that
changes between kernel releases, there can be a mismatch between the device
tree and the kernel. Bergmann said that kernel developers are being asked
to ensure that anything they add to the device tree formats continues to
work in the future, while firmware developers are being warned not to
assume a given device tree works with any earlier kernels. Once all main
platforms have been described with device trees, there will be an effort to
ensure that those don't break in the future, he said.
(
Log in to post comments)