|
|
Subscribe / Log in / New account

Accessing QEMU storage features without a VM

October 25, 2022

This article was contributed by Stefan Hajnoczi


KVM Forum

The QEMU emulator has a sizable set of storage features, including disk-image file formats like qcow2, snapshots, incremental backup, and storage migration, which are available to virtual machines. This software-defined storage functionality that is available inside QEMU has not been easily accessible outside of it, however. Kevin Wolf and Stefano Garzarella presented at KVM Forum 2022 on the new qemu-storage-daemon program and the libblkio library that make QEMU's storage functionality available even when the goal is not to run a virtual machine (VM).

[Kevin Wolf & Stefano Garzarella]

Like the Linux kernel, QEMU has a block layer that supports disk I/O, which it performs on behalf of the VM and supports additional features like throttling while doing so. The virtual disks that VMs see are backed by disk images. Typically they are files or block devices, but they can also be network storage. Numerous disk-image file formats exist for VMs and QEMU supports them, with its native qcow2 format being one of the most widely used. The QEMU block layer also includes long-running background operations called blockjobs for migrating, mirroring, and merging disk images.

The QEMU process model

Wolf began by describing QEMU's process model where each VM is a separate QEMU process, complete with its own block layer, which can be seen in the diagram below. A JSON-RPC-like management interface called QEMU Monitor Protocol (QMP) provides a multitude of commands for manipulating disk images while the QEMU process is running. QMP commands allow storage migration, incremental backups, and so on. One catch is that the QEMU process must be running and that makes it difficult to use the QMP commands while the VM is shut down.

[QEMU process model]

Another limitation of the one-VM-per-QEMU-process model is that disk images can only be shared read-only between VMs to avoid the data corruption that occurs when multiple QEMU processes update a shared disk image without coordination. This problem is relevant when several VMs were created from the same template "backing files". Those backing files must remain unmodified as long as two or more VMs are sharing them.

For these reasons, disk-image functionality in QEMU has been largely limited to active VMs and a few specific tools (qemu-img and qemu-nbd) until now.

qemu-storage-daemon

The new qemu-storage-daemon program makes disk-image functionality available outside the confines of the one-VM-per-QEMU-process model. qemu-storage-daemon runs as a separate process without any VM at all and offers the same QMP commands for manipulating disk images as previously found only in QEMU. qemu-storage-daemon can act as a server to export disk images for clients including, but not limited to, QEMU VMs.

Wolf described two ways of thinking about qemu-storage-daemon. It can be seen as an advanced qemu-nbd that supports QMP monitor commands and additional export types. Alternatively, it can be see as QEMU without the ability to run a VM. Both the command line and the available QMP commands closely resemble those of QEMU.

The following qemu-storage-daemon command serves a Network Block Device (NBD) export of the raw image file test.raw so that the disk image can be read over the network:

    $ qemu-storage-daemon \
      --nbd-server addr.type=inet,addr.host=0,addr.port=10809 \
      --blockdev file,filename=test.raw,node-name=disk \
      --export nbd,id=exp0,node-name=disk

Several use cases for qemu-storage-daemon were presented. Separating storage from the actual running of VMs makes sense when the two are managed independently. A storage-management tool should not need access to a VM's QMP interface and a VM-management tool should not need access to qemu-storage-daemon's QMP interface. Furthermore, separating storage makes it possible to apply tighter sandboxing to both the QEMU VM and qemu-storage-daemon so that a security compromise in one of these programs is less likely to affect other parts of the system.

[<tt>qemu-storage-daemon</tt> model]

Running qemu-storage-daemon as the sole process on the system that accesses a disk image unlocks use cases that were impossible with the one-VM-per-QEMU-process model. As seen in the diagram above, QEMU processes can be connected to qemu-storage-daemon so that VMs access the disk image through the daemon instead of directly from the QEMU process. It then becomes possible to modify backing files used from multiple VMs by using qemu-storage-daemon so that there is effectively only one process in the system with write access to the shared backing files.

Users who were unable to perform certain disk-image operations while the VM was shut down can now launch qemu-storage-daemon to cover that situation. While the VM is running, it accesses the disk image through qemu-storage-daemon so that there is no conflict between the running VM and activity taking place inside qemu-storage-daemon. When the VM is shut down, qemu-storage-daemon can still service requests to manipulate the disk image. This makes long-running operations like committing backing files safe across VM shutdown, which is useful because the commit operation might be performed by the cloud provider while the VM shutdown is performed independently by an end user. Should the VM be started again before the commit operation finishes, it still accesses its disk through qemu-storage-daemon and the commit operation continues to make progress.

The one-VM-per-QEMU-process model also has limitations when polling is enabled to increase performance. On a machine with many QEMU processes, each one performs its own polling and this consumes CPU time. qemu-storage-daemon can consolidate I/O processing into a single process that polls for multiple VMs, leaving more CPUs available for running VMs.

Another use case arises when QEMU's user-space NVMe PCI driver is used to squeeze the most performance out of a device. The NVMe device can only be accessed by one process, so normally only one running VM can have it open. If qemu-storage-daemon runs the user-space NVMe PCI driver instead of QEMU, then multiple VMs can connect to it and a single NVMe device can be sliced up into smaller virtual devices for the VMs.

Perhaps the most interesting use case is that qemu-storage-daemon makes QEMU's storage functionality available to other applications besides just QEMU. Backup applications, forensics tools, and other programs can use qemu-storage-daemon to access disk images, manipulate them, and take snapshots. Initially released in QEMU 5.0.0, qemu-storage-daemon can be found in the qemu-system-common package in Debian-based distributions and the qemu-img package in Fedora-based distributions.

Commands for taking snapshots, adding/removing exports at run time, managing dirty bitmaps for incremental backups, and more can be sent over a Unix domain socket using the QMP protocol. QMP is a control channel and not suitable for actually accessing the contents of disk images or dirty bitmaps. Instead, qemu-storage-daemon offers several ways to connect to disk images through its export types.

Block export types

The Network Block Device (NBD) protocol has a long history in Linux as a fairly simple way to access block devices over the network. Given that QEMU already contains an NBD server and qemu-nbd tool, it's no surprise that qemu-storage-daemon can export disk images via NBD. Programs can connect directly and there is also a Linux kernel driver that attaches NBD exports as block devices.

A Linux Filesystem in Userspace (FUSE) export type is also available in qemu-storage-daemon. The mounted FUSE filesystem looks like a regular file, but the underlying storage is actually a qcow2 file. qemu-storage-daemon handles the qcow2 file-format specifics so that it appears like a raw file that programs like fdisk, dd, and others know how to access. At this point, the implementation is synchronous and therefore it does not perform as well as other export types. Wolf mentioned that the FUSE export type offers an easy way to present a disk image as a raw file to programs that can only access regular files.

The vhost-user-blk export type is a Unix domain socket protocol that QEMU supports. Unlike NBD, it does not work over the network, but it takes advantage of shared memory so qemu-storage-daemon can read and write from and to the disk into guest RAM. This makes vhost-user-blk the natural choice for connecting QEMU VMs to qemu-storage-daemon as it is the most efficient export type. Other applications can also use this export type through the new libblkio library that was introduced later in the talk.

The vDPA Device in Userspace (VDUSE) export type processes I/O requests from the relatively new vDPA driver framework in the kernel. When the virtio_vdpa kernel module is loaded, the export appears as a virtio_blk device that can be used like any other Linux block device. qemu-storage-daemon acts as the user-space server for the vdpa-blk device similar to the way it can act as a FUSE filesystem server. Alternatively, the export appears as a Linux vhost device that can be added to QEMU VMs as virtio-blk devices when the vhost_vddpa kernel module is loaded on the host. The VDUSE export type therefore serves dual purposes of exposing storage both to the host and to VMs. Note that there is some overlap in functionality with the other export types here and those people who need VDUSE will know they need it, while others are likely to stick to the more traditional export types.

libblkio

While qemu-storage-daemon provides the server, the libblkio library offers a client API for efficiently accessing disk images. Since implementing vhost-user-blk and other protocols for accessing qemu-storage-daemon exports is involved, it's handy to have a library that provides this functionality and saves applications from having to duplicate it.

The libblkio 1.0 release uses Linux io_uring for file I/O, NVMe io_uring cmd primarily for use with NVMe benchmarking, and virtio-blk (vhost-user and vhost-vdpa) for connecting to qemu-storage-daemon and accessing vdpa-blk devices. This selection of drivers allows the library to be used both for connecting to qemu-storage-daemon as well as for directly accessing files or NVMe devices.

A full overview of libblkio was left for another KVM Forum talk; YouTube video and slides are available for those wishing to learn more. The main message, however, was that applications wishing to use qemu-storage-daemon can use libblkio to connect via vhost-user-blk. Packages of the library are not yet as widely available as qemu-storage-daemon, but that situation should improve over time.

Conclusion

QEMU's process model has made certain configurations hard to achieve, but qemu-storage-daemon offers a dedicated process for storage functionality that augments the traditional QEMU process model, which reduces those problems greatly. Furthermore, qemu-storage-daemon exposes QEMU's array of storage features to any program wishing to use them, even where VMs are not involved. libblkio offers the client side of the qemu-storage-daemon picture and allows programs to connect to storage. Like qemu-storage-daemon, libblkio is used by QEMU but is designed for general use by other programs unrelated to QEMU.

Where qemu-storage-daemon and libblkio will be used besides QEMU remains to be seen, but extracting functionality from QEMU and making it available for external consumption has opened the door to new developments in this area.

A YouTube video of the presentation, as well as the slides, are available for those looking for further information.
Index entries for this article
GuestArticlesHajnoczi, Stefan
ConferenceKVM Forum/2022


to post comments

Accessing QEMU storage features without a VM

Posted Oct 26, 2022 3:21 UTC (Wed) by xecycle (subscriber, #140261) [Link] (4 responses)

Have found a good use of this, and have been using it for a while: thinly-provisioned block from loop files. Neither loop.ko nor target core fileio is able to translate TRIM to FALLOC_FL_PUNCH_HOLE, thus I had been using LVM thin pool and ZVOL (super slow), and both are more clumsy to manage than normal files. The new VDUSE export eliminated the loopback network, making it even easier to configure without security surprises. I can now point target core iblock to a VDUSE export and serve it over iSCSI! But it may be even easier if QEMU introduces a userspace iSCSI target.

Accessing QEMU storage features without a VM

Posted Oct 26, 2022 10:03 UTC (Wed) by grawity (subscriber, #80596) [Link] (3 responses)

Neither loop.ko nor target core fileio is able to translate TRIM to FALLOC_FL_PUNCH_HOLE

loop.ko gained this ability in Linux 3.1.

Accessing QEMU storage features without a VM

Posted Oct 26, 2022 12:50 UTC (Wed) by xecycle (subscriber, #140261) [Link] (2 responses)

Ohhhhhhhhhhh I see how I made this mistake. I had set up a loop based off another block device (an LVM thin LV), and the loop device had lost discard ability. I've mistaken it that loop cannot discard...

Accessing QEMU storage features without a VM

Posted Oct 27, 2022 9:50 UTC (Thu) by grawity (subscriber, #80596) [Link] (1 responses)

Discard for loop-on-blockdev was broken at some point, although I believe it is fixed in all versions now.

Accessing QEMU storage features without a VM

Posted Oct 28, 2022 1:58 UTC (Fri) by xecycle (subscriber, #140261) [Link]

I did a fresh test when I was replying you above, off an LVM thin LV, on 6.0.2-arch1-1, and lsblk --discard still says 0. Did not test sending discards, though.


Copyright © 2022, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds