I'd like to point out that we had/have the uid namespace that went in around 2.6.27, which given the existence of user namespace, apparently didn't cover capabilities within the uid namespace (ie, leaving capabilities global to all namespaces).
I haven't been following this part of the namespace progression so I can only surmise what happened to the uid namespace.
Are there two namespaces now, uid and user with user ns being a superset of uid ns? Or was uid ns extended to cover capabilities within uid ns and renamed to user ns?
> The user that creates the namespace will have all capabilities in that namespace, not just the set of capabilities they have in the parent. Essentially, the creator has the privileges of the root user in any namespace he or she creates.
Now since LXC doesn't have OpenVZ's simfs that lets you create a mountpoint based on any arbitrary directory, if you leave an unmodified distro in a container and you haven't used a separate filesystem (or btrfs subvolume), that distro can remount the host's filesystem as read-only (which typically happens just at the end of a halt or reboot inside the container).
One current workaround for this is to disable the CAP_SYS_ADMIN (or VXC_SECURE_MOUNT in Linux VServer) capability. Since the allowed capabilities are reset fully open upon the creation of a new user namespace, how do you limit child namespaces from causing trouble on your host system and share a filesystem with LXC?
OpenVZ is great in this respect because you can have one filesystem with many containers on it without needing to use image files and loop mounts or lvm.