Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

[Posted August 28, 2014 by jake]

Russell Pavlicek looks at the rivalry between containers and hypervisors over at Linux.com. He outlines the arguments for and against each, and follows it up with a description of a new contender for a "cloud operating system": unikernels. "Unikernel systems create tiny VMs. Mirage OS from the Xen Project incubator, for example, has created several network devices that run kilobytes in size (yes, that's “kilobytes” – when was the last time you heard of any VM under a megabyte?). They can get that small because the VM itself does not contain a general-purpose operating system per se, but rather a specially built piece of code that exposes only those operating system functions required by the application. There is no multi-user operating environment, no shell scripts, and no massive library of utilities to take up room – or to subvert in some nefarious exploit. There is just enough code to make the application run, and precious little for a malefactor to leverage. And in unikernels like Mirage OS, all the code that is present is statically type-safe, from the applications stack all the way down to the device drivers themselves. It's not the “end-all be-all” of security, but it is certainly heading in the right direction."

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 28, 2014 22:07 UTC (Thu) by ibukanov (subscriber, #3942) [Link] (10 responses)

The article compares apples to oranges. The goal of Docker is to pack an *existing* application code into a lightweight container while Mirage OS focuses on writing new code in a type safe language (OCaml) that runs directly by a hypervisor.

If anything, one can try to compare Mirage OS with a Google Native Client as both projects targets writing new code for a safe VM with rather limited API. Similarly, one can compare Docker with, say, Vagrant [1], as both projects uses the idea of shared image files.

[1] - https://www.vagrantup.com/

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 28, 2014 22:38 UTC (Thu) by edomaur (subscriber, #14520) [Link] (9 responses)

Unikernel in general aren't focused on rewriting things, that's like this with Mirage only because it's the way that project work. Ideally, unikernels should work as "hyper containers". And I think we will go that way, the way of containers exported to purpose built virtual machines.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 6:13 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (8 responses)

If a hyper container host can provide kernel support and libraries that existing application binaries need while still keeping general overhead low (like restarting containers in milliseconds), that indeed can compete with containers directly.

But if to run in a lightweight hyper container the code should be recompiled using some special library like uClibc or nacl_io requiring to involve developers, not just system administrators, that is not comparable with what Docker and friends offer.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 7:03 UTC (Fri) by Cyberax (✭ supporter ✭, #52523) [Link] (7 responses)

Why? You can run uClibc in Docker just fine. You can run anything, that doesn't need kernel services that are not available within containers.

I'm playing with a small musl-based system running using Docker in my spare time.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 7:34 UTC (Fri) by ibukanov (subscriber, #3942) [Link] (6 responses)

I was talking about a hypothetical hypervisor-based container, or extremely lightweight hardware VM running under hypervisor with minimal attack surface. If those require to compile applications against special libraries (effectively doing a port to yet another architecture), it makes them very different product compared with Docker that can run *existing* binaries as is with whatever system libraries they use.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 15:20 UTC (Fri) by justincormack (subscriber, #70439) [Link] (5 responses)

It is very hard to make something thats fully compatible with Linux applications that is not just Linux. We are gradually getting to the point with the NetBSD rump kernel (http://rumpkernel.org) where you can run many Posix apps on it, on bare metal or a VM. But Docker is sort of a packaging solution not a VM in many respects.

Applying the unikernel concept to more applications

Posted Aug 30, 2014 14:25 UTC (Sat) by dmarti (subscriber, #11625) [Link] (4 responses)

If you want to run an arbitary Linux application on a unikernel, you can try OSv.

Simple: toy HTTP server

Real application: Redis on OSv.

There are a few limitations--the main one is that it's a single address space, so there is no fork(2). However, many non-forking Linux applications will build and run on OSv with just a Makefile change (see the HTTP server example article for how to do that).

Detailed info is in the OSv paper from USENIX Annual Technical Conference: OSv—Optimizing the Operating System for Virtual Machines

(I work for the company behind OSv.)

Applying the unikernel concept to more applications

Posted Sep 1, 2014 19:36 UTC (Mon) by ibukanov (subscriber, #3942) [Link] (3 responses)

This makes OSv rather similar in porting level and isolation as with Google Native Client. However, with NaCl one can still update the system without touching the executable while, as I can see, OSv requires rebuilding of all executables after a security update in, say, networking code.

Applying the unikernel concept to more applications

Posted Sep 1, 2014 23:37 UTC (Mon) by dmarti (subscriber, #11625) [Link] (2 responses)

Yes, you do have to rebuild your OSv VMs if you update one of the third-party libraries you use.

OSv will use some libraries built for a Linux host (such as libevent in the HTTP server example above) so you may not have to do a separate build just for your OSv systems, and simply use the library from your Linux environment of choice.

I don't know if it's meaningful to say that OSv isolation level is similar to that of NaCl. Both of them definitely have the goal of strict isolation, but they approach it in totally different ways: NaCl by forcing you to use a safe subset of valid x86_64 code, and OSv by using the hypervisor/guest kernel barrier.

Applying the unikernel concept to more applications

Posted Sep 2, 2014 5:45 UTC (Tue) by ibukanov (subscriber, #3942) [Link] (1 responses)

By having networking and other parts of OS available as an application library without memory protection OSv increases an attack surface compared with heavyweight solution like running both the kernel and the application under the hypervisor. Due to shared memory space a bug in application could lead to an exploit affecting both low level storage and networking implementation. For many cases this would be enough for the attacker.

However, this situation is still much better than a typical setup for Linux containers where a bug in a big and fat Linux kernel allows to take the whole system. And I suppose OSv can archive the same if not better performance than container solutions.

What is interesting about NaCl is that it provides the same level of isolation as one gets using memory protection under normal OS with much cheaper system calls. They are still more expensive than function calls, but the performance toll should be small enough not to worry about it. So it would be interesting to port NaCL to OSv to get both performance of a lightweight VM and isolation one gets using using a memory-protected kernel for system services.

Applying the unikernel concept to more applications

Posted Sep 5, 2014 9:35 UTC (Fri) by justincormack (subscriber, #70439) [Link]

I think part of the OSv model is to use the JVM bytecode validation as the "NaCl" validation layer, from memory.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 1:43 UTC (Fri) by allesfresser (guest, #216) [Link] (7 responses)

Except Google is Google. They can afford to hire thousands of the best and brightest to do intelligent things that few others can do. After 30 years in this industry, and two decades dealing with customers on site, I doubt that most organizations could readily do what Google has done. If they could, they'd be Google, too.

Except, isn't that supposed to be one of the benefits of free/open source software--that a giant like Google can make the investment to research something like this, and then the whole ecosystem benefits from the techniques? Sure, maybe I won't use millions of servers like Google, but that doesn't mean I can't use the same containerizing sort of techniques they do, on a small scale. If the techniques are well-enough known for us to point to as a good example, then we should be able to use them as an example, no?

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 11:19 UTC (Fri) by dgm (subscriber, #49227) [Link] (6 responses)

Google Docs is not free/open AFAIK.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 12:52 UTC (Fri) by Cato (guest, #7643) [Link] (5 responses)

No, but Google has open sourced some interesting cloud components recently, such as http://www.wired.com/2014/06/google-kubernetes/ - and of course there are Go, Dart, Chrome and many other technologies.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 18:54 UTC (Fri) by Lennie (subscriber, #49641) [Link] (4 responses)

They didn't open source what they were using internally.

They created new code to do similar things they are doing internally and open sourced that. To be able to create a community around the project.

Just like Docker wasn't used internally by dotCloud, they created something new they wanted to create a community around.

Google also open source Let Me Contain That For You (lmctfy) and cadvisor in the same way. It's all new code.

In all cases their internal code most be years old and maybe not so pretty to look at, they probably wanted to start with a clear slate anyway without any legacy baggage.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 30, 2014 0:22 UTC (Sat) by Sesse (subscriber, #53779) [Link] (3 responses)

cgroups, which Docker relies heavily on, is an open-sourcing of exactly the same kernel technology Google is using internally. But you are right, the userspace is different.

(Disclaimer: I work at Google, but not with anything related to this.)

/* Steinar */

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 30, 2014 9:00 UTC (Sat) by Lennie (subscriber, #49641) [Link]

Yes, of course cgroups was code that was used internally for, I believe a year, and then released as open source.

But it was clearly not developed in isolation. It is based on code and design from BULL/SGI and IBM. A lot of it was developed in the mainline Linux kernel too.

cgroups is a generalization of cpusets which was already in mainline. Which originally came from BULL SA. cpusets was later rewritten by SGI. That all happened before it was used by cgroups.

An other example is I believe the memory controller accounting (design ?) which came from IBM. The memory controller had a number of competing implementations, including beancounters by the OpenVZ guys.

Here is the original commit of cgroups:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/lin...

Also have a look at the end of this email from 2 years before cgroups was created:

For example, the following sequence of commands will setup a cpuset
named "Charlie", containing just CPUs 2 and 3, and Memory Node 1,
and then move the current shell to that cpuset:

mount -t cpuset none /dev/cpuset
cd /dev/cpuset/top_cpuset
mkdir Charlie
cd Charlie
/bin/echo 2-3 > cpus
/bin/echo 1 > mems
/bin/echo $$ > tasks

http://lwn.net/Articles/91637/

Looks kind of familiar, right ? ;-)

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Sep 8, 2014 11:00 UTC (Mon) by dunlapg (guest, #57764) [Link] (1 responses)

The point is, a general purpose container -- one which gives you access to the full Linux system interface -- is only marginally more secure than not having any containers at all. Exploitable bugs, which allow you to execute code in the kernel (and thus bypass SELinux or any other security checks) are discovered in this interface basically every month.

The only way to make containers reasonably secure is to tailor the container to the exact program you're using, and then also to reduce the number of system calls required by that program. This can't be shared between applications; it needs to be done over again from scratch for *every new application*. The fact that Google has done this work for Google Docs doesn't benefit you at all when you're running Apache.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Sep 8, 2014 15:36 UTC (Mon) by Lennie (subscriber, #49641) [Link]

I don't know if every new application needs it.

I do know there is a long list of things you should do which could allow you to be somewhat secure.

If they implement all of the long list, would it be enough to run code from an untrusted source ? Maybe.

So far Google said they run customer supplied untrusted code for Docker in a Docker-container in a VM in a container.

Here is what RedHat's Dan Walsh is working on:
https://opensource.com/business/14/9/security-for-docker

Here is his list of tips at the end:

- Only run applications from a trusted source
- Run applications on a enterprise quality host
- Install updates regularly
- Drop privileges as quickly as possible
- Run as non-root whenever possible
- Watch your logs
- setenforce 1

He doesn't even mention the whole list, but as far as I can see. There is:
- cgroups to prevent a container to DOS CPU/Memory/disk for other containers
- seccomp to only allow certain syscalls:
https://github.com/docker/docker/blob/master/contrib/mkse...
- SELinux to only allow access to certain SELinux types and SELinux catagories
- capabilities whitelist to only allow certain capabilities
- readonly mounts to only a allow only a few entries from /sys /proc
- usernamespaces to make sure root in the container is not root to the kernel / outside the container - you'll need a fairly new kernel to be able to use it securely
- pid namespace to let the container only see it's own processes
- hostname namespace so the container has it's own hostname
- networking namespace to give the container it's own network stack

So far SELinux, capabilities, mounting /proc /sys readonly and namespaces (except for user) and I assume seccomp are implemented in Docker. Usernamespaces hasn't been implemented because not a lot of kernels running in the wild support it properly.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 11:13 UTC (Fri) by azilian (guest, #47340) [Link] (3 responses)

We are already providing Linux container hosting at http://www.getclouder.com
And I have to point that Docker is not the only container technology! LXC is out there, and is doing awesome job at providing you with full OS containers.

As far as security, if you configure your containers properly and give them their own physical storage, most of the security concerns disappear.

I'm not saying that containers are completely secure, but I'm trying to point out, that they are reasonably secure if they are reasonably setup.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 11:31 UTC (Fri) by dag- (guest, #30207) [Link]

The security benefits are a result of the decreasing number of attack vectors. However those "hyper containers" now have the (questionable) security benefit that they are not standardized.

But as soon as there is uptake on the idea, things will get standardized, and that opens the door to abusing standardized APIs or standardized setups. And if the storage layer is replaced with cloud storage APIs, you have to include the attack vectors against the cloud storage as well.

Things do not become necessarily less complex, but it might help to reduce the number of (currently used) attack vectors.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 13:15 UTC (Fri) by ewan (guest, #5533) [Link]

"reasonably secure if they are reasonably setup"

And full virt VMs are reasonably fast if they are reasonably set up.

This is probably one of those things not worth having a war over - some times containers will be better, some times VMs will be better, but most times either one will do just fine, and pretty much interchangeably.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Sep 2, 2014 2:22 UTC (Tue) by raven667 (subscriber, #5198) [Link]

> As far as security, if you configure your containers properly and give them their own physical storage, most of the security concerns disappear.

I don't think that is true at all, what I've heard from the security people and the container people is that containers are not useful for hostile multi-tenant environments, in the way that full VMs are useful. There are too many design holes which need to be plugged with SELinux or seccomp_bpf or whatever, the kernel attack surface is large and there are _always_ 0days floating around which break the kernel, especially when you support fully-featured guest images. Of course this doesn't mean that hosting services aren't being offered, but what you believe is "reasonably secure" may differ from others.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 17:57 UTC (Fri) by NightMonkey (subscriber, #23051) [Link] (2 responses)

Can someone knowledgeable in both compare the BSD "Jails" concept vs. Containers and the Unikernel concept? Are they really very different in the functionality and isolation they provide?

Thanks.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 29, 2014 23:47 UTC (Fri) by azilian (guest, #47340) [Link] (1 responses)

First I have to say that containers under any OS are not simply chroot/jails. They are combination of the chroot/jail functionality and the limits that can be imposed on the container(or at least the first processes of the container).

BSD Jails offer:
- chroot
- network isolation
- shared memory isolation(I'm not sure if it is a real isolation or simple permission)
- security level support per-container
- limit the network memory used by each process
- hostname/domainname per-container
- fine grained capabilities per-container can be achieved but not with the default tools
- Tools: jail/ezjail

Linux containers offer:
- chroot
- network isolation
- Shared memory isolation
- hostname/domainname per-container
- PID isolation (many container can see PID number 314 in each of them, however these are different PIDs on the host machine
- user mappings (map UID 433 on the host as UID 15 or UID 0 in the container)
- resource limit isolation
- CPU, Memory and I/O limits per-group
- Device isolation per-group
- there is a mechanism in the kernel to freeze/unfreeze all processes inside a container with a single command
- Fine grained capabilities per-container
- Tools: Docker, LXC and a lot of others.
- Live migration - CRIU. If certain rules are met, you can even live migrate a linux container to another physical machine.

Disclaimer: I'm not an expert in BSD and may have missed something.

Containers vs Hypervisors: The Battle Has Just Begun (Linux.com)

Posted Aug 30, 2014 5:17 UTC (Sat) by NightMonkey (subscriber, #23051) [Link]

Thank you!