LWN: Comments on "Easier container security with entitlements"

Nesting (was: Easier container security with entitlements)

abufrejoval — Thu, 31 May 2018 09:25:19 +0000

I couldn't agree more that complexity kills the purpose and not just for security, but also for resource management.

It's one of the reasons I have always preferred running Docker containers inside OpenVZ containers, because I really want to separate the two conflicting angles: The developer specifying what he needs via Docker and the operator specifying what he's willing to give via OpenVZ.

Security and resources should be negotiated, especially since they may be dynamic and de-coupled in terms of life-cycle. And of course they should also be understandable, but that's unlikely to become easier going forward, because differentiation of security and resources can only get worse (more complex) in these days of special function units, storage and fabric classes.

Entitlements or 'credits' also make sense when it comes to resources: You give workloads credits to spend on resources such as CPU, accellerators, network, storage or memory which they can then choose to spend according to the value of what they are computing and the current cost of those resources, which are sure to become ever more dynamic as well in these days of Lambda and clouds.

In both cases nesting allows a top-down budget or entitlement approach which is as detailed as it needs to be and as abstract as it can be for the current nesting level, instead of trying to nail everything at one flat layer, where it's complexity overwhelms both the developer and the operator.

Easier container security with entitlements

zyga — Sat, 26 May 2018 06:50:14 +0000

Have a look at github.com/snapcore/snapd, inside the most interesting aspect would be cmd/snap-confine/*.[ch]. This is the code that arranges the sandbox. It works in tandem with other tools, specifically it consumes output of cmd/snap-seccomp (a seccomp profile compiler) and of the whole interfaces/* tree where the code there creates profiles for apparmor, seccomp, and for device cgroups. One last interesting tool is cmd/snap-update-ns/* which can modify a mount namespace in-place, figuring out what needs to change vs what is there already. Let me know if you find any issues or have questions about the design.

Easier container security with entitlements

simcop2387 — Fri, 25 May 2018 20:31:03 +0000

Definitely going to look at that. I've taken some inspiration from docker and a few other places but the tools from them aren't completely applicable since with my use-case I want quick to build ephemeral containers (every command gets a new container/sandbox and they're all completely discarded after execution).

I hadn't thought to look at what snapd and such were doing, since they've got a similar use-case (though maybe not in the complete discarding of all data/records of execution).

Easier container security with entitlements

droundy — Fri, 25 May 2018 16:55:12 +0000

I find myself doubting that a "high level" entitlement is going to work around the random crashes caused by security policy arbitrarily disabling system calls. What is the high level entitlement that would let me use ptrace? As long as "security" means disabling parts of the Linux ABI it's hard to see something like this fixing the problems with Docker's security defaults being unusable.

Easier container security with entitlements

zyga — Fri, 25 May 2018 12:53:28 +0000

Snap sandbox construction / entering is pretty complex and the solution is very much tailored to snapd. There's an interesting interplay of apparmor, seccomp, cgroups and mount namespaces (several layers of them) that makes this somewhat less than likely to be replaced by a generic tool.

What is generic are some of the libraries (libapparmor, libcap, libseccomp) and certainly the corresponding kernel features.

Easier container security with entitlements

zyga — Fri, 25 May 2018 12:50:37 +0000

Snapd ships a complement of tools that (while tailored to snapd) should be useful as a base for other tools or as inspiration. We have a stand-alone seccomp profile compiler, support for argument filtering and loading.

Easier container security with entitlements

bof — Fri, 25 May 2018 06:51:53 +0000

Hmm. Isn't all of that applicable to

1) systemd unit security config in general
2) flatpack / snap isolation of desktop apps

At least all the kernel stuff is valid for any of the use cases. WIBNI a highlevel "standard highlevel approach" would cover them all, too?

Easier container security with entitlements

simcop2387 — Thu, 24 May 2018 23:12:05 +0000

For seccomp I've actually been writing my own sandbox, it's still in progress but is pretty usable (by me). It's definitely more complicated than the JSON files I've seen from Kubernetes and Docker. It's using YAML and some custom stuff to handle constant values (things like O_APPEND, etc.).

You can get a high level overview of it from https://metacpan.org/pod/App::EvalServerAdvanced::Seccomp

It ends up setting up several namespaces (PID, SHM, mount, etc.), drops all capabilities, and then sets up seccomp as a whitelist for allowed syscalls. There's still more I could do to with apparmor or selinux but they haven't seemed necessary for my particular use.