Private loop devices with loopfs
A loop device is a kernel abstraction that allows a file to be presented as if it were a physical block device. The typical use for a loop device is to mount a filesystem image stored in a file. Loop devices are global and shared between users, which causes a number of problems for container workloads where the instances are expected to be isolated from each other. Christian Brauner has been working on this problem; he has posted a patch set solving it by adding a small virtual filesystem called loopfs.
Loop devices typically appear under /dev with names like /dev/loopN. The special /dev/loop-control file can be used to create and destroy loop devices or to find the first available loop device. Associating a file with a specific device, or setting other parameters like offsets or block sizes, is done with ioctl() calls on the device itself. The loop(4) man page has the details on how it all works.
Users generally need not deal with specific devices, though; they can be managed behind the scenes with a special form of the mount command:
mount /tmp/myimage.img /mnt/disk -o loop
This causes mount to locate an available loop device, associate it with /tmp/myimage.img, then mount that loop device onto /mnt/disk. Some administrators may prefer a different form of the same mount command that gives more control:
mount /tmp/myimage.img /mnt/disk -o loop=/dev/loop1
In this mode, the administrator specifies the exact loop device to use. An administrator who needs more control over loop devices may also use the losetup command to query and set up loop-device properties.
As noted above, loop devices are global and shared between users; /dev/loop3 is the same device in all namespaces. If an application needs a private device, it has no way to request one. Loop devices are also, obviously, shared between containers, so one container can monitor the operations — or access the data — of the others.
A number of different use cases for loop devices were raised in the discussion of this patch set. Dmitry Vyukov gave one example: separating test processes from each other when they are using loop devices. He described the problems he has run into:
Brauner gave a number of examples from the container world. For example, systemd-nspawn does not support loop devices as they cannot be discovered dynamically and owned by a container. Chromium OS does not allow the use of loop devices. Kubernetes has also run into problems resulting from the global nature of loop devices: a file can remain bound to a device after its user has exited.
loopfs
Loopfs is a new, in-kernel, virtual filesystem that implements the loop devices and the loop-control file. This filesystem can be mounted multiple times; the loop devices in each instance are independent from all other loop devices in all other instances. This allows private loop devices for applications and containers. Both the loop devices and the loop-control file in loopfs accept the same operations as the legacy ones.
One use of loopfs is to provide compatibility with old-style applications, but with virtualized loop-device files. In this case, the administrator can mount the filesystem and then replace the default loop control files with those from loopfs. Consider the following example, adapted from the patch cover letter:
# Mount a new loopfs instance in /dev/loopfs/
mount -t loop loop /dev/loopfs/
# Replace the standard loop control file with the ones from loopfs
ln -sf /dev/loopfs/loop-control /dev/loop-control
# Find the first available loop device
loopdev=`losetup -f` # will be something like /dev/loop0
deventry=`basename $loopdev` # now just "loop0"
# Redirect that loop device to loopfs
ln -sf /dev/loopfs/$deventry /dev/$deventry
# mount an image
mount -o loop /image.img /mnt/disk
There is a knob provided to control the maximum number of loop devices that can be created in any given loopfs instance; it can be found as /proc/sys/user/max_loop_devices.
Christoph Hellwig disagreed with the loopfs approach, stating that the code is too big for the benefit it provides. Brauner explained the additional use cases it allows, but the discussion stopped there. There have not been other substantial complaints about this proposal.
Loopfs doesn't just allow an independent loop-device pool, it also opens a way to allow unprivileged users to mount loop devices. This can be enabled by combining loopfs with Brauner's earlier work on system-call interception, which uses seccomp to establish a separate process to make decisions on which operations can be allowed. In such a setup, the unprivileged user can run mount as usual; the privileged process intercepting the system call will perform the actual operation.
Jann Horn outlined one possible problem with loop-device usage by unprivileged applications: most filesystem implementations are not prepared to deal with malicious filesystem images. While some work has been done, filesystem images are still generally treated as trusted data; that is why previous attempts to allow unprivileged filesystem mounting have run into opposition in the past. If an attacker has the ability to modify the image on the fly — as they would if they had access to the loop device providing that image — the problem would be compounded.
Stéphane Graber pointed out that an implementation based on system-call interception does not have to mount filesystems directly; a FUSE-based mount could be used instead. That would prevent any filesystem-level vulnerabilities from turning into kernel vulnerabilities. The LXD implementation allows both types of mount.
Next steps
Loopfs seems to solve a problem that users experience in practice. It has had three iterations in a week's time, addressing the comments given during the review. It may still take some time to find its way into the mainline kernel, but it is clear is that there numerous users waiting for a solution to the loop-device sharing issue.
| Index entries for this article | |
|---|---|
| Kernel | Block layer/Loopback device |
| Kernel | Loopback device |
| GuestArticles | Rybczynska, Marta |
Posted May 7, 2020 18:12 UTC (Thu)
by josh (subscriber, #17465)
[Link] (2 responses)
Posted May 7, 2020 18:19 UTC (Thu)
by corbet (editor, #1)
[Link] (1 responses)
Posted May 10, 2020 22:20 UTC (Sun)
by simcop2387 (subscriber, #101710)
[Link]
Posted May 7, 2020 19:35 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link] (4 responses)
> If you have a use case that wouldn't be solved by supporting FUSE, please file a bug for us.
This does somewhat raise the question of whether FUSE is already an adequate replacement for loop devices in the context of containers. I imagine the answer is "No, because performance," but I would be very interested in seeing actual load tests and experimental results, if any exist.
Posted May 8, 2020 8:41 UTC (Fri)
by scientes (guest, #83068)
[Link] (2 responses)
Posted May 8, 2020 12:17 UTC (Fri)
by theonewolf (guest, #118690)
[Link] (1 responses)
Also, do you have a link describing the seL4 work?
Posted May 9, 2020 6:01 UTC (Sat)
by edomaur (subscriber, #14520)
[Link]
Posted May 11, 2020 6:53 UTC (Mon)
by maxfragg (subscriber, #122266)
[Link]
Posted May 8, 2020 4:44 UTC (Fri)
by pabs (subscriber, #43278)
[Link] (4 responses)
Posted May 8, 2020 13:47 UTC (Fri)
by corbet (editor, #1)
[Link]
Posted May 8, 2020 14:36 UTC (Fri)
by Sesse (subscriber, #53779)
[Link] (2 responses)
I guess in theory, you could create a loop backend for dm? But they are separate.
Posted May 8, 2020 19:07 UTC (Fri)
by dtlin (subscriber, #36537)
[Link]
Posted May 14, 2020 4:35 UTC (Thu)
by samuelkarp (subscriber, #131165)
[Link]
Posted May 11, 2020 16:29 UTC (Mon)
by ncultra (✭ supporter ✭, #121511)
[Link]
Posted May 12, 2020 5:36 UTC (Tue)
by jezuch (subscriber, #52988)
[Link]
Posted May 14, 2020 10:28 UTC (Thu)
by rwmj (subscriber, #5474)
[Link] (1 responses)
Posted Oct 4, 2022 18:50 UTC (Tue)
by hallyn (subscriber, #22558)
[Link]
Indeed Cellrox implemented device namespaces for android phones which, for instance, virtualized the display so that at any time one android container would have the real display while the other had a null display. See https://lwn.net/Articles/564854/ for instance from 2013 :)
Posted May 15, 2020 19:37 UTC (Fri)
by Kamilion (guest, #42576)
[Link]
Private loop devices with loopfs
There is a provision for creating detached mounts directly with the new API, yes. We sort of skipped over that in the article, but probably should not have. The changelog to this patch describes it briefly.
Hidden devices
Hidden devices
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Loop devices predate the device mapper by a long time and are a separate thing.
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Docker's devicemapper-based storage driver can be backed by a loopback device.
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
Private loop devices with loopfs
I still can't believe supporting the DMG-like behavior of OSX is so difficult.
