|
|
Subscribe / Log in / New account

Private loop devices with loopfs

May 7, 2020

This article was contributed by Marta Rybczyńska

A loop device is a kernel abstraction that allows a file to be presented as if it were a physical block device. The typical use for a loop device is to mount a filesystem image stored in a file. Loop devices are global and shared between users, which causes a number of problems for container workloads where the instances are expected to be isolated from each other. Christian Brauner has been working on this problem; he has posted a patch set solving it by adding a small virtual filesystem called loopfs.

Loop devices typically appear under /dev with names like /dev/loopN. The special /dev/loop-control file can be used to create and destroy loop devices or to find the first available loop device. Associating a file with a specific device, or setting other parameters like offsets or block sizes, is done with ioctl() calls on the device itself. The loop(4) man page has the details on how it all works.

Users generally need not deal with specific devices, though; they can be managed behind the scenes with a special form of the mount command:

    mount /tmp/myimage.img /mnt/disk -o loop

This causes mount to locate an available loop device, associate it with /tmp/myimage.img, then mount that loop device onto /mnt/disk. Some administrators may prefer a different form of the same mount command that gives more control:

    mount /tmp/myimage.img /mnt/disk -o loop=/dev/loop1

In this mode, the administrator specifies the exact loop device to use. An administrator who needs more control over loop devices may also use the losetup command to query and set up loop-device properties.

As noted above, loop devices are global and shared between users; /dev/loop3 is the same device in all namespaces. If an application needs a private device, it has no way to request one. Loop devices are also, obviously, shared between containers, so one container can monitor the operations — or access the data — of the others.

A number of different use cases for loop devices were raised in the discussion of this patch set. Dmitry Vyukov gave one example: separating test processes from each other when they are using loop devices. He described the problems he has run into:

Currently all loop devices and loop-control are global and cause test processes to collide, which in turn causes non-reproducible coverage and non-reproducible crashes.

Brauner gave a number of examples from the container world. For example, systemd-nspawn does not support loop devices as they cannot be discovered dynamically and owned by a container. Chromium OS does not allow the use of loop devices. Kubernetes has also run into problems resulting from the global nature of loop devices: a file can remain bound to a device after its user has exited.

loopfs

Loopfs is a new, in-kernel, virtual filesystem that implements the loop devices and the loop-control file. This filesystem can be mounted multiple times; the loop devices in each instance are independent from all other loop devices in all other instances. This allows private loop devices for applications and containers. Both the loop devices and the loop-control file in loopfs accept the same operations as the legacy ones.

One use of loopfs is to provide compatibility with old-style applications, but with virtualized loop-device files. In this case, the administrator can mount the filesystem and then replace the default loop control files with those from loopfs. Consider the following example, adapted from the patch cover letter:

    # Mount a new loopfs instance in /dev/loopfs/
    mount -t loop loop /dev/loopfs/

    # Replace the standard loop control file with the ones from loopfs
    ln -sf /dev/loopfs/loop-control /dev/loop-control

    # Find the first available loop device
    loopdev=`losetup -f`     	  # will be something like /dev/loop0
    deventry=`basename $loopdev`  # now just "loop0"

    # Redirect that loop device to loopfs
    ln -sf /dev/loopfs/$deventry /dev/$deventry

    # mount an image
    mount -o loop /image.img /mnt/disk

There is a knob provided to control the maximum number of loop devices that can be created in any given loopfs instance; it can be found as /proc/sys/user/max_loop_devices.

Christoph Hellwig disagreed with the loopfs approach, stating that the code is too big for the benefit it provides. Brauner explained the additional use cases it allows, but the discussion stopped there. There have not been other substantial complaints about this proposal.

Loopfs doesn't just allow an independent loop-device pool, it also opens a way to allow unprivileged users to mount loop devices. This can be enabled by combining loopfs with Brauner's earlier work on system-call interception, which uses seccomp to establish a separate process to make decisions on which operations can be allowed. In such a setup, the unprivileged user can run mount as usual; the privileged process intercepting the system call will perform the actual operation.

Jann Horn outlined one possible problem with loop-device usage by unprivileged applications: most filesystem implementations are not prepared to deal with malicious filesystem images. While some work has been done, filesystem images are still generally treated as trusted data; that is why previous attempts to allow unprivileged filesystem mounting have run into opposition in the past. If an attacker has the ability to modify the image on the fly — as they would if they had access to the loop device providing that image — the problem would be compounded.

Stéphane Graber pointed out that an implementation based on system-call interception does not have to mount filesystems directly; a FUSE-based mount could be used instead. That would prevent any filesystem-level vulnerabilities from turning into kernel vulnerabilities. The LXD implementation allows both types of mount.

Next steps

Loopfs seems to solve a problem that users experience in practice. It has had three iterations in a week's time, addressing the comments given during the review. It may still take some time to find its way into the mainline kernel, but it is clear is that there numerous users waiting for a solution to the loop-device sharing issue.


Index entries for this article
KernelBlock layer/Loopback device
KernelLoopback device
GuestArticlesRybczynska, Marta


to post comments

Private loop devices with loopfs

Posted May 7, 2020 18:12 UTC (Thu) by josh (subscriber, #17465) [Link] (2 responses)

Legacy applications aside, I wonder if loop-control could just have an ioctl that returns a new file descriptor for a loopback device, and then the new mount API could support passing that file descriptor in to mount it, without having to give it a "loopN" device name.

Hidden devices

Posted May 7, 2020 18:19 UTC (Thu) by corbet (editor, #1) [Link] (1 responses)

There is a provision for creating detached mounts directly with the new API, yes. We sort of skipped over that in the article, but probably should not have. The changelog to this patch describes it briefly.

Hidden devices

Posted May 10, 2020 22:20 UTC (Sun) by simcop2387 (subscriber, #101710) [Link]

A quick glance didn't seem to indicate it right away, but does anyone know if this mode works with raw disk images? in particular ones that are partitioned (GPT, MBR, etc). With the normal loopback devices you can tell the kernel to read the partition table from it (losetup -P in particular) and then use that to mount a given partition inside the image without having to do any offset, surgery, or other calculations to do it manually

Private loop devices with loopfs

Posted May 7, 2020 19:35 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (4 responses)

The Chromium OS docs have this line:

> If you have a use case that wouldn't be solved by supporting FUSE, please file a bug for us.

This does somewhat raise the question of whether FUSE is already an adequate replacement for loop devices in the context of containers. I imagine the answer is "No, because performance," but I would be very interested in seeing actual load tests and experimental results, if any exist.

Private loop devices with loopfs

Posted May 8, 2020 8:41 UTC (Fri) by scientes (guest, #83068) [Link] (2 responses)

Nobody has managed to make a performant secure multi-user filesystem yet. seL4 people made a first attempt which was not successful.

Private loop devices with loopfs

Posted May 8, 2020 12:17 UTC (Fri) by theonewolf (guest, #118690) [Link] (1 responses)

Could you elaborate by what you mean with "multi user" and "performant"?

Also, do you have a link describing the seL4 work?

Private loop devices with loopfs

Posted May 9, 2020 6:01 UTC (Sat) by edomaur (subscriber, #14520) [Link]

Multi user, probably as in "each layer of the fs can be freely read from any running container without security problems" ?

Private loop devices with loopfs

Posted May 11, 2020 6:53 UTC (Mon) by maxfragg (subscriber, #122266) [Link]

My answer would rather be a "no because of security/availability", by design FUSE allows to block the Kernel, which is a good reason not to use it in any true multi-user env.

Private loop devices with loopfs

Posted May 8, 2020 4:44 UTC (Fri) by pabs (subscriber, #43278) [Link] (4 responses)

Are loop devices part of the device mapper infrastructure or separate to it?

Private loop devices with loopfs

Posted May 8, 2020 13:47 UTC (Fri) by corbet (editor, #1) [Link]

Loop devices predate the device mapper by a long time and are a separate thing.

Private loop devices with loopfs

Posted May 8, 2020 14:36 UTC (Fri) by Sesse (subscriber, #53779) [Link] (2 responses)

dm devices are virtual block devices where the blocks are mapped to other block devices (well, not necessarily directly; they typically go through a module of some sort to add logic). Loopback devices are virtual block devices where the blocks are mapped to a file; think mounting an ISO file as if it were a CD-ROM.

I guess in theory, you could create a loop backend for dm? But they are separate.

Private loop devices with loopfs

Posted May 8, 2020 19:07 UTC (Fri) by dtlin (subscriber, #36537) [Link]

https://www.sourceware.org/lvm2/wiki/DMLoop existed but doesn't look like it was completed or upstreamed.

Private loop devices with loopfs

Posted May 14, 2020 4:35 UTC (Thu) by samuelkarp (subscriber, #131165) [Link]

Docker's devicemapper-based storage driver can be backed by a loopback device.

Private loop devices with loopfs

Posted May 11, 2020 16:29 UTC (Mon) by ncultra (✭ supporter ✭, #121511) [Link]

I feel Brauner's response to Hellwig's un-artful and wrong criticism was positive, to-the-point, and convincing. It is a great example of how to deal with what is essentially bullying on lkml or its component lists. I also agree with Brauner that this patch set fulfills more than one needed use case.

Private loop devices with loopfs

Posted May 12, 2020 5:36 UTC (Tue) by jezuch (subscriber, #52988) [Link]

It may be a silly question, seeing how nobody mentioned it before, which probably means this is very obvious, but why not a loop device namespace? Wouldn't it be more consistent with the rest of the system?

Private loop devices with loopfs

Posted May 14, 2020 10:28 UTC (Thu) by rwmj (subscriber, #5474) [Link] (1 responses)

This problem of separate "/dev spaces" must arise in other contexts too surely. TTYs? Non-loop block devices? I'm surprised that this problem hasn't already been solved more generically for any /dev subset.

Private loop devices with loopfs

Posted Oct 4, 2022 18:50 UTC (Tue) by hallyn (subscriber, #22558) [Link]

(years late, but referencing even longer timelines)

Indeed Cellrox implemented device namespaces for android phones which, for instance, virtualized the display so that at any time one android container would have the real display while the other had a null display. See https://lwn.net/Articles/564854/ for instance from 2013 :)

Private loop devices with loopfs

Posted May 15, 2020 19:37 UTC (Fri) by Kamilion (guest, #42576) [Link]

Is it possible to loopback mount a file that grows yet?
I still can't believe supporting the DMG-like behavior of OSX is so difficult.


Copyright © 2020, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds