LWN: Comments on "configfd() and shifting bind mounts" https://lwn.net/Articles/809125/ This is a special feed containing comments posted to the individual LWN article titled "configfd() and shifting bind mounts". en-us Sun, 14 Sep 2025 09:36:54 +0000 Sun, 14 Sep 2025 09:36:54 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net configfd() and shifting bind mounts https://lwn.net/Articles/809553/ https://lwn.net/Articles/809553/ Cyberax <div class="FormattedComment"> Yeah, security is a problem.<br> <p> Probably at this point creating something like procfs2 and then mandating it would be the best approach. But then there's a question of what exactly is an "unsafe file"...<br> </div> Tue, 14 Jan 2020 21:09:31 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809551/ https://lwn.net/Articles/809551/ cyphar <div class="FormattedComment"> The new mount API (in particular fsopen(2)) could work for this.<br> <p> The problem is that there is a security issue with giving a program access to a /proc without any over-mounts if the /proc they already have access to has locked mounts on top of it (container runtimes use this technique to mask certain dangerous procfs files from containers). If we want to have a simple API that gives us a /proc handle, we'll need to make some kind of procfs2 (which has been suggested several times in the past) which removes all of the patently unsafe files so that untrusted programs can get access to all of it.<br> </div> Tue, 14 Jan 2020 21:06:58 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809414/ https://lwn.net/Articles/809414/ smurf <div class="FormattedComment"> New special FD arguments to "openat()" and friends should be sufficient, no need for a new syscall.<br> Alternately, the new "mount" syscalls can give you a handle to /proc or /sys without actually mounting them.<br> Alternately, just acknowledge that not mounting /dev, /proc and /sys is not supported and going to cause problems, and leave it at that.<br> </div> Tue, 14 Jan 2020 06:44:49 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809358/ https://lwn.net/Articles/809358/ Cyberax <div class="FormattedComment"> sysctl() was broken, it had only casual connection with /proc.<br> <p> Perhaps it would be better to add a new syscall like 'open_special(fs_type)' to open '/proc', '/sys', '/sys/fs/...' directories without them being mounted.<br> </div> Mon, 13 Jan 2020 21:45:44 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809357/ https://lwn.net/Articles/809357/ mirabilos <div class="FormattedComment"> Yeah, bad decision, that.<br> </div> Mon, 13 Jan 2020 21:40:57 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809263/ https://lwn.net/Articles/809263/ roc <div class="FormattedComment"> ... which has been removed<br> </div> Mon, 13 Jan 2020 03:34:25 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809244/ https://lwn.net/Articles/809244/ mirabilos <div class="FormattedComment"> it’s called sysctl…<br> </div> Sun, 12 Jan 2020 14:57:16 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809243/ https://lwn.net/Articles/809243/ mirabilos <div class="FormattedComment"> *so* a problem for /proc, which may not exist in your chroot<br> </div> Sun, 12 Jan 2020 14:56:52 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809232/ https://lwn.net/Articles/809232/ pbonzini <div class="FormattedComment"> Thanks James, that makes sense.<br> </div> Sat, 11 Jan 2020 20:09:17 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809229/ https://lwn.net/Articles/809229/ smurf <div class="FormattedComment"> Some more special negative pseudo-file-descriptors for openat() and friends?<br> </div> Sat, 11 Jan 2020 17:58:20 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809228/ https://lwn.net/Articles/809228/ jejb <div class="FormattedComment"> <font class="QuotedText">&gt; However, it does seem to me that passing an O_PATH file descriptor to fspick, plus a new flag for fspick that says "create a bind mount", would be a good API. The article hints that "fsconfig() is designed to work with superblocks" but it's not clear why.</font><br> <p> I did explain that problem in the original email: all the hooks for fsconfig actions are in sb-&gt;fs_type-&gt;init_fs_context() which the fs_context allocation uses. Now it is possible to special case this for bind mounts, but you also have to special case fsmount and fsconfig/reconfigure. By the time you've done all that, you've effectively got two separate paths through the same code, which isn't really such a good idea, which is why I asked the question "what would the generalisation of fsconfig look like".<br> </div> Sat, 11 Jan 2020 16:51:06 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809218/ https://lwn.net/Articles/809218/ roc <div class="FormattedComment"> OK, maybe it wouldn't. I have no experience with fds for paths in filesystems that aren't actually mounted anywhere.<br> </div> Sat, 11 Jan 2020 01:12:16 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809217/ https://lwn.net/Articles/809217/ quotemstr <div class="FormattedComment"> <font class="QuotedText">&gt; As a solution to the "what if procfs isn't mounted?" problem, that seems far more elegant than the alternative of creating new syscall APIs for every single feature in procfs that someone might need to use without procfs mounted</font><br> <p> Agreed. We don't need duplicate APIs. We just need some way to get a directory FD for /proc, /sys, whatever without going through the mount table.<br> </div> Fri, 10 Jan 2020 23:37:02 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809216/ https://lwn.net/Articles/809216/ quotemstr <div class="FormattedComment"> <font class="QuotedText">&gt; which might cause issues if the filesystem is not in fact mounted anywhere,</font><br> <p> Why would it cause problems? The actual FD would refer to a magical internal non-rooted mount, e.g., like the one the kernel sets up for pipefs on boot.<br> </div> Fri, 10 Jan 2020 23:36:16 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809215/ https://lwn.net/Articles/809215/ quotemstr <div class="FormattedComment"> I'm not saying that we should add stuff to proc. I'm saying that putting a virtual filesystem in the rooted filesystem namespace works fine, whether that virtual filesystem is proc or something else.<br> </div> Fri, 10 Jan 2020 23:35:25 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809214/ https://lwn.net/Articles/809214/ roc <div class="FormattedComment"> I like that idea, but rather than a directory file descriptor, which might cause issues if the filesystem is not in fact mounted anywhere, it might be better to have new open() flags or AT_ values that select specific magic filesystems --- e.g. procfs.<br> <p> As a solution to the "what if procfs isn't mounted?" problem, that seems far more elegant than the alternative of creating new syscall APIs for every single feature in procfs that someone might need to use without procfs mounted. (Same goes for other magic filesystems.)<br> </div> Fri, 10 Jan 2020 23:26:49 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809212/ https://lwn.net/Articles/809212/ cyphar <div class="FormattedComment"> Adding more things to /proc isn't a great idea (it's already full of lots of other crap that arguably shouldn't be there), and there are lots of problems with safely resolving paths in /proc. Any new kernel interfaces (*especially* ones that will be implemented through magic-links) should have a non-procfs counterpart.<br> </div> Fri, 10 Jan 2020 22:41:37 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809211/ https://lwn.net/Articles/809211/ pbonzini <div class="FormattedComment"> <font class="QuotedText">&gt; why there isn't a way to change the fd you get from fsopen of the existing filesystem into a separate filesystem with separate options for "bind"?</font><br> <p> Bind mounts can point to any file, even one that is not a mount point--or even one that isn't a directory.<br> <p> However, it does seem to me that passing an O_PATH file descriptor to fspick, plus a new flag for fspick that says "create a bind mount", would be a good API. The article hints that "fsconfig() is designed to work with superblocks" but it's not clear why.<br> </div> Fri, 10 Jan 2020 22:38:44 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809209/ https://lwn.net/Articles/809209/ quotemstr <div class="FormattedComment"> <font class="QuotedText">&gt; Because /dev or /dev/config/fs might not exist in your namespace.</font><br> <p> Not a problem for /proc<br> <p> And if it *is* a problem, the right approach isn't some random new twice on open(2), but a system call that retrieves a directory file descriptor for /dev/config or whatever, one that you could then use with openat --- <br> <p> open(get_configfs_fd(), "fs/tmpfs/create", O_CLOEXEC | O_RDWR)<br> <p> </div> Fri, 10 Jan 2020 22:24:20 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809207/ https://lwn.net/Articles/809207/ josh <div class="FormattedComment"> Because /dev or /dev/config/fs might not exist in your namespace.<br> <p> If you're binding a filesystem, though, I wonder why there isn't a way to change the fd you get from fsopen of the existing filesystem into a separate filesystem with separate options for "bind"?<br> </div> Fri, 10 Jan 2020 21:53:21 +0000 configfd() and shifting bind mounts https://lwn.net/Articles/809205/ https://lwn.net/Articles/809205/ quotemstr <div class="FormattedComment"> I don't like the special configfd_open system call. Why not just use a regular open with a special file?<br> <p> Instead of <br> <p> configfd_open("tmpfs", O_CLOEXEC, CONFIGFD_CMD_CREATE)<br> <p> write<br> <p> open("/dev/config/fs/tmpfs/create", O_CLOEXEC | O_RDWR)<br> <p> <p> </div> Fri, 10 Jan 2020 21:30:59 +0000