|
|
Subscribe / Log in / New account

Progress for unprivileged containers

Progress for unprivileged containers

Posted Sep 29, 2022 14:31 UTC (Thu) by cortana (subscriber, #24596)
Parent article: Progress for unprivileged containers

I wonder why it's always necessary to map 2^16 UIDs into each container. Most containers will be running processes all as the same UID, and most container images will only include files owned by a handful of UIDs.

$ podman run --rm quay.io/centos/centos:7 find / -xdev -printf '%U\n' | sort | uniq -c | sort -n
      2 65534
      3 192
  10759 0

So quay.io/centos/centos:7 only requires 3 UIDs to be allocated to be unpacked. So couldn't the container runtime map 1023 host UIDs into the first 1023 UIDs within the container, plus one more for 65534? This would be compatible with the majority of container images, which take a base image, add a user, and then create files owned by that user.

In such a container, attempting to create a file or changing a process' UID to one of the unmapped UIDs could return an error.

We'd then be able to fit many more than 64k containers into our precious limited UID space. I expect there's some bit of POSIX that makes this more difficult than it seems?


to post comments

Progress for unprivileged containers

Posted Sep 29, 2022 17:55 UTC (Thu) by stgraber (subscriber, #57367) [Link]

You're correct, you can definitely get away with a map full of holes to reduce the number of IDs needed. This is definitely much easier with stateless containers, using a mostly immutable base image. You can scan the image, figure out what uid/gid you need and map those.

In many cases, you can get away with pretty much just 0 (root), 1000 (user), 65534 (nobody).

It's a bit trickier for what we do with LXD as we run full distribution images. Those commonly have anywhere between 20 and 50 users in place already for a variety of system services, shared paths, ... and more importantly as such containers are effectively used like physical systems or VMs, we have no idea what extra packages may do, or what some Ansible playbook may be adding later on.

For those, we've found our default of mapping a contiguous 65536 per container to work pretty well as it's somewhat rare for anything to need uid/gid above that range. Exceptions to that rule are systems using network authentication (often in the 200000-500000 range) and more recently, tools like systemd and snapd allocating ephemeral uid/gid for services, often using very very high uid/gid for that.

Progress for unprivileged containers

Posted Oct 1, 2022 1:35 UTC (Sat) by rcampos (subscriber, #59737) [Link] (1 responses)

Well, it is not really a problem to use 64k (2^16) IDs per container either, right?

You know they will work and the limit of unique mappings is 65k containers per node (UID space is 32 bits, so 2^32= 2^16 * 2^16), which doesn't seem like a problem.

Sure, you can squeeze it later too if you need it. But if you want as much people as possible to adopt it, why would you create possible barriers to optimize something you don't really need (more than 64k containers per node)?

If you don't use a fixed mapping width, then you have to deal with fragmentation of UIDs, which is not fun.

And if you use a fixed mapping but shorter, then you have to analyze images, guess a mapping that works for most images (this is a heuristic, and by definition you possibly leave people out), take into account that if the naptping doesn't use the SAME containers IDs for ALL the images, then sharing volumes will be problematic. It can be more problematic if people use LDAP to build some container images too, etc.

So, in the end, doesn't seem worth it to optimize this at this point.

It might be needed in the future? It MIGHT, if we need more than 64k pods in a node. When that comes, we can easily create a new mode that only maps a shorter mapping and apps can migrate to that.

For now, IMHO, it is not worth it.

Progress for unprivileged containers

Posted Oct 2, 2022 12:52 UTC (Sun) by cortana (subscriber, #24596) [Link]

It's not really about wanting to run more than 65k containers per nod at once. In a large organization with thousands of users in its directory, you start to get uncomfortably squeezed iff you want to assign each one 65k subids for use with rootless containers. Which is a shame because all most of their containers will need is a handful of ids...


Copyright © 2025, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds