|
|
Log in / Subscribe / Register

Making containers safer

Making containers safer

Posted Aug 22, 2019 8:36 UTC (Thu) by cyphar (subscriber, #110703)
In reply to: Making containers safer by corsac
Parent article: Making containers safer

User namespaces are an incredibly important security feature, and disabling them is unequivocally a bad idea. There are hundreds of userns-related security checks in-kernel that you simply cannot emulate without using user namespaces. Dropping CAP_SYS_ADMIN is nowhere near sufficient to protect you, just for a taste of the problem:

* You need to use seccomp (or drop CAP_DAC_READ_SEARCH) to stop the open_by_handle_at(2) attacks that allow a container to open the root filesystem of the host (don't forget that you're running code as kuid=0).
* There are a bunch of attacks against attaching processes if you don't drop CAP_SYS_PTRACE (CVE-2016-9962 is a good example, but there are many more such as variations on CVE-2019-5736). Even after dropping CAP_SYS_PTRACE, there are userns-related security checks (in ptrace_may_access()) that completely eliminate a bunch of container-attach attacks -- protections you don't get without using user namespaces.
* If (for whatever reason) the container gets access to a file descriptor from the host's mount namespace, it's game over without user namespaces (SELinux can protect you here too, but given most people run on Ubuntu that doesn't help much).

There are plenty of other examples, but those are the ones that immediately came to mind.


to post comments

Making containers safer

Posted Aug 22, 2019 8:41 UTC (Thu) by corsac (subscriber, #49696) [Link] (2 responses)

Fair points, but I think you missed the “et al” part. And yes I'm aware that capabilities are not perfect (far from it) and a lot of them are equivalent to SYS_ADMIN / full root. But dropping the relevant caps still seem more reasonable to me than exposing the kernel. There's still a lot of stuff not namespace-aware and thus a large attack surface which is reachable when you're uid=0 in a user namespace.

Making containers safer

Posted Aug 22, 2019 12:55 UTC (Thu) by cyphar (subscriber, #110703) [Link] (1 responses)

Unless you are setting CONFIG_USER_NS=n in your kernels (which isn't the case on basically every distribution these days), then you aren't reducing the attack surface by not using user namespaces (the code is still in your kernel) -- you're just choosing not to use an additional security feature. Any unprivileged user on your host can call unshare(CLONE_NEWUSER) and start exploiting user namespace 0days. But in containers, we block unshare(CLONE_NEWUSER) so you can use user namespaces but the container process cannot. In addition, user namespaces are used *alongside* capability dropping, seccomp, devices cgroup, AppArmor/SELinux, no_new_privs, and so on. Using user namespaces doesn't make any of those other security features stop working, it complements them.

As for uid=0, I would suggest that it's always a Very Bad Idea™ to run code as uid=0 unless it's absolutely necessary, even if you're doing it with user namespaces. But if you are going to do it, then using user namespaces is still much better than not using them (assuming the capability set is the same in both cases).

Making containers safer

Posted Aug 30, 2019 10:26 UTC (Fri) by Margaret48 (guest, #129042) [Link]

Security focused distros patch userns to be restricted to root be default which blocks unprivileged usage. This is what Debian, Linux-hardened, Grsecurity do. Disabling userns is also official KSPP recommendation.

It's also worth noting that granting user membership to lxd group = root[1], same as for docker. That means the "unprivileged" term is meaningless.

Systemd maintainers rejected userns support for systemd-nspawn saying thjat they always rely on some privileged process running behind the curtain.

[1] https://bugs.launchpad.net/ubuntu/+source/lxd/+bug/1829071


Copyright © 2026, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds