Mounting images inside a user namespace

Posted Jun 14, 2023 21:56 UTC (Wed) by geofft (subscriber, #59789)
In reply to: Mounting images inside a user namespace by mezcalero
Parent article: Mounting images inside a user namespace

Yeah, but in practice if a common filesystem is marked FS_USERNS_MOUNT then I can use it on both GKE (containerd on Google COS) and GitHub Actions (Docker on Ubuntu) within basically a year.

In other words, there is a subset of fun kernel features available to users - including unprivileged user namespaces, but also including stuff like seccomp - that is in practice made available to container users simply by virtue of Linux in normal configurations making them available. It helps a lot that the hosting userspace doesn't have to do anything to control unprivileged kernel features; they're just there. Very rarely I'd have to file a feature request with the COS team or the GitHub Actions teams to add a certain kernel module or maybe enable a configuration option, but it would just be build configuration - they wouldn't be changing any code or writing anything to plumb things through from the host - so they'd be likely to say yes.

You could imagine a world where the kernel didn't implement, say, timerfd for unprivileged users, and it said "Only root can create kernel timers, but root is free to run a daemon to configure timers requested by other processes and send them stuff over pipes when the timers expire." (Obviously there's no technical reason for this, but bear with me.) In practice that would mean that writing unprivileged programs that use timers would be difficult. You could do it on certain OSes, and you could maybe reconfigure certain environments to make it worse, but it would be a much less reliable experience than having timerfd in its current form.

(And it also doesn't require making any assumptions about what the guest container looks like and whether it follows a normal filesystem layout, so you don't have to figure out where that UNIX socket gets bind-mounted to. You can have one-file containers with a statically linked binary and no directories - and I think a lot of Go folks do exactly this - that use random kernel features, because a container keeps the kernel the same but changes out the userspace.)

The whole container ecosystem more or less relies on the idea (wildly technically invalid, but remarkably true in practice) that there is indeed a common Linux ABI available to userspace, and you can go from distro to distro or provider to provider and expect the same things to be available. And I think there is a sense on the kernel side that this should indeed be true - see e.g. the pushback to the "optional patches" in https://lwn.net/ml/linux-kernel/CAG_fn=WR3s3UMh76+bibN0nU... or the objection to mutually-exclusive major features in https://lwn.net/Articles/858023/ , both of which would have risked breaking the commonality of "Linux" as seen from userspace.

(Again I want to be clear that I'm not arguing against the work you're describing here. I'm excited for it! But I think we should _also_ have FS_USERNS_MOUNT or something like it.)

Mounting images inside a user namespace

Posted Jun 15, 2023 10:14 UTC (Thu) by SLi (subscriber, #53131) [Link]

Thanks, this was insightful! First I was baffled about why you want what you want, but I think you make an important point. It's kind of a mixed social (or standardization) and technical problem, and it makes sense to say that there is a de facto base standard (the kernel ABI), whether it's good or not.