The mismatched mount mess

Posted Aug 11, 2018 1:10 UTC (Sat) by zblaxell (subscriber, #26385)
Parent article: The mismatched mount mess

> The kernel is able to create new mount points that look like independent mounts, but it's all a single mounted filesystem underneath the cover.

...except for all the strange, undocumented, and probably unintentional places where it's not, and the mount points behave like separate filesystems (or at least separate VFS layers with different parameters running as an overlay top of a single filesystem). If it works accidentally some of the time, it could be made to work all of the time, given sufficient debugging.

It seems to me that the way to resolve this issue properly is to "simply" make the example with simultaneous noacl and acl mounts *work*. Add a context pointer in VFS somewhere so the filesystem can tell which mount point the request is coming from, and behave accordingly.

The mismatched mount mess

Posted Aug 11, 2018 6:37 UTC (Sat) by smurf (subscriber, #17840) [Link]

… and then gradually move all the other potentially-context-sensitive options over.

The mismatched mount mess

Posted Aug 11, 2018 7:17 UTC (Sat) by TheJH (subscriber, #101155) [Link] (7 responses)

Maaaaybe you could make it work for "acl". But then what about "sb="? "journal_path="? "data="?

The mismatched mount mess

Posted Aug 11, 2018 16:22 UTC (Sat) by nybble41 (subscriber, #55106) [Link] (6 responses)

To me it seems like the idea state would be to have two distinct sets of options: filesystem options which can only be set once per filesystem and are common across all shared mounts, like "sb" and "journal_path", and mount point options which may be set differently for each mount point, like "acl". Opening a filesystem and mounting it at a particular path would be two separate actions, so you could pass the appropriate sets of options to each of them. If a filesystem is already mounted then you wouldn't open it again, and attempting to do so should fail; instead, you would use a different call to obtain a reference to the already-open filesystem, which you can then mount. This call would not take any filesystem options.

The mismatched mount mess

Posted Aug 11, 2018 17:55 UTC (Sat) by josh (subscriber, #17465) [Link]

That seems plausible, and with some slight tweaks to the new API, userspace could provide those two sets of options to the fsopen fd and the fsmount fd, respectively.

The mismatched mount mess

Posted Aug 11, 2018 18:42 UTC (Sat) by zblaxell (subscriber, #26385) [Link] (4 responses)

That ends up propagating down to individual system calls and also has impact on caching. What happens when someone does a stat() through a mount point with the vfat 'uid=1000' option, then someone else does a stat() of the same file on a different mount point with the vfat 'uid=9999' option?

This opens up philosophical questions like "is the ownership of a file exclusively a property of its inode, or a property of something else? e.g. directory entry, parent directory, root of the filesystem, mount point" and "are inodes sufficient as file identifiers" and "who caches all this anyway?" VFS isn't good at dealing with filesystems where the filesystem's answer to this kind of question is different from ext4's answer.

So maybe it's only possible for cases where the VFS layer can implement the option without asking the filesystem (e.g. 'ro' or 'nosuid') or where the result doesn't have any effect VFS cares about (e.g. 'compress' or 'nodatasum' which affect only default parameters for new objects and data but don't create multiple views of the same object from different mount points).

But then what happens when someone mounts a filesystem once with nosuid, once with suid, and once with neither option? "nosuid" and "suid" clearly conflict, but does neither option mean "keep the value from the previous mount (but _which_ previous mount)" or does it mean "suid"?

The mismatched mount mess

Posted Aug 12, 2018 4:44 UTC (Sun) by viro (subscriber, #7872) [Link] (2 responses)

You know, what annoys me about that fsdevel thread is that presumably clued people do not bother to RTFM. Or RTFS. Or directly experiment. FYI: suid/nosuid is mount property. There is nothing to "conflict" - you either pass MS_NOSUID to mount(2), or you do not. Either way, it affects that mount and nothing else. Filesystem itself doesn't know and doesn't care.

All of the above could be found in a couple of minutes by reading through mount(2) or mount(8) and experimenting.
As in,
root@kvm1:~# dd if=/dev/zero of=/tmp/foo bs=1M count=2
2+0 records in
2+0 records out
2097152 bytes (2.1 MB, 2.0 MiB) copied, 0.00372987 s, 562 MB/s
root@kvm1:~# mkfs /tmp/foo
mke2fs 1.44.3 (10-July-2018)
Discarding device blocks: done
Creating filesystem with 2048 1k blocks and 256 inodes

Allocating group tables: done
Writing inode tables: done
Writing superblocks and filesystem accounting information: done

root@kvm1:~# losetup /dev/loop0 /tmp/foo
root@kvm1:~# mkdir /tmp/a /tmp/b
root@kvm1:~# mount /dev/loop0 /tmp/a
root@kvm1:~# mount /dev/loop0 /tmp/b -o nosuid
root@kvm1:~# cp /usr/bin/whoami /tmp/a/
root@kvm1:~# chown lp /tmp/a/whoami
root@kvm1:~# chmod +s /tmp/a/whoami
root@kvm1:~# /tmp/a/whoami
lp
root@kvm1:~# /tmp/b/whoami
root
root@kvm1:~# umount /tmp/a /tmp/b
root@kvm1:~# losetup -d /dev/loop0
root@kvm1:~# rm -rf /tmp/foo /tmp/a /tmp/b

Two minutes, beginning to end. Sigh...

The mismatched mount mess

Posted Aug 12, 2018 5:01 UTC (Sun) by viro (subscriber, #7872) [Link] (1 responses)

... and if you are asking if previous mounts would for some reason affect the suid/nosuid state on subsequent ones, the answer is (a) no; (b) check and see.

Incidentally, the worst part of cross-namespace sharing of mounts is not just the fs-level mount options being ignored rather than having mount(2) fail on their mismatch. The real nastiness comes from mount -o remount done in one place and affecting every mount of that sucker. The mount-level options (nosuid among them) are not an issue - they won't do anything to other mounts. Per-fs ones bloody well will. And no, we really can't make each option per-mount - I dare you to try and handle the things like -o errors=panic or -o errors=remount-ro on per-mountpoint basis; good luck propagating the "originating" mount towards each ext4_error() in there. Especially fun when it comes to errors on e.g. writeback. Or -o data=... for the same ext4...

The mismatched mount mess

Posted Aug 12, 2018 5:28 UTC (Sun) by bof (subscriber, #110741) [Link]

wrt. the fs-level shared mount options - does the shared data for that "know" how often the thing is mounted? If that is the case, maybe a general approach of only permitting change (on remount or additional mount) if it's not mounted more than once already, would work?

The mismatched mount mess

Posted Aug 13, 2018 13:39 UTC (Mon) by bandrami (guest, #94229) [Link]

At that point I feel the need to imagine a lecture by Jeff Goldblum about the difference in what we can do and what we should do. Wouldn't it make more sense to design a new userspace layer that simulates all this instead?