|
|
Subscribe / Log in / New account

IncludeOS: a unikernel for C++ applications

July 25, 2017

This article was contributed by Nur Hussein

Is it truly an efficient use of cloud computing resources to run traditional operating systems inside virtual machines? In many cases, it isn't. An interesting alternative is to bundle a program into a unikernel, which is a single-tasking library operating system made specifically for running a single application in the cloud. A unikernel packs everything needed to run an application into a tiny bundle and, in theory, this approach would save disk space, memory, and processor time compared to running a full traditional operating system. IncludeOS is such a unikernel; it was created to support C++ applications. Like other unikernels, it is designed for resource-efficiency on shared infrastructure, and is primarily meant to run on a hypervisor.

Frequently, virtual machines end up running a full server operating system, though the entire instance is devoted to running only a few applications or even just one. However, every running instance on a physical machine means a full set of services and binaries that's unnecessarily replicated. Unikernel developers take the opportunity to aggressively pare down the operating system to a bare minimum. Unikernels are at the extreme end of the possible answers to the question "how small can you make an operating system?" A unikernel is an instance of a single program "baked together" with a small library that provides the operating system and acts as an interface to the (virtual) hardware.

A history of unikernels

The idea of shrinking the operating system has its roots in microkernel research, which was spurred by monolithic kernels that were growing in size and complexity to unwieldy levels. A microkernel implements only a tiny amount of necessary functionality in privileged mode (such as interrupt handling, low-level memory management, and scheduling), with the rest being implemented as servers in user space. Exokernels, which were proposed by systems researchers at MIT in the 1990s, take the concept further by implementing most of the operating system as custom libraries linked to applications. This concept of library operating systems proved popular, and a number of projects were created around the concept, such as Nemesis from the University of Glasgow, and Drawbridge from Microsoft Research.

The term unikernel was proposed by a group of operating systems researchers in a paper [PDF] from 2013 that described their MirageOS project. While early projects included various drivers to support a multitude of hardware much like a traditional operating system, unikernels were designed to primarily run on virtual hardware, so they do not need as much driver support. Unikernels are also compiled with just enough of the library to support the application contained within it, and nothing more. The idea is that unikernels could be deployed side by side on a hypervisor, much like regular programs are run on a traditional operating system.

Unikernels address the use case of needing strong isolation for a user's application on shared infrastructure. Multi-tenancy on clouds means that every user's application is completely separated from those of others, but requiring each user to run a full operating system is wasteful. Unlike Linux containers, which run a single instance of the kernel that partitions users' applications using namespaces, control groups, and security policies, unikernels benefit from the stronger resource isolation of hypervisors. They get that isolation while being nearly as lightweight as a container. The drawbacks to unikernels are that users are constrained by what the unikernel library provides in terms of operating system interfaces.

The choice of programming language to write a unikernel application in is also dependent on the underlying library support for it. IncludeOS supports C++, while MirageOS uses OCaml as its target programming language; other unikernel projects have been created that support languages like Haskell (HaLVM) and Erlang (LING). There is a collection of links to active unikernel projects found here.

IncludeOS

IncludeOS is a project to create a C++ API for the development of unikernel-based applications. When an application is built using IncludeOS, the development toolchain will link in the parts of the IncludeOS library required to run it and create a disk image with a bootloader attached. An IncludeOS image can be hundreds of times smaller than the Ubuntu system image for running an equivalent program. Start times for the images run in the hundreds of milliseconds, making it possible to spin up many such virtual machine images quickly.

When an IncludeOS image boots, it initializes the operating system by setting up memory, running global constructors, and registering drivers and interrupt handlers. In an IncludeOS unikernel, virtual memory is not enabled, and a single address space is used by both the application and the unikernel library. Therefore there is no concept of system calls or user space; all operating system services are called with a simple function call to the library and all run in privileged mode.

The unikernel is also single-threaded, and there is no preemption. Interrupts are deferred when they happen, and attended to at every iteration of the event loop. The design suggests user programs also be written to follow the asynchronous programming model, with callbacks installed to respond to operating system events. For example, a TCP socket can be set up in a user program and a callback inside the application handles the connection when a third party attempts to connect.

An advantage of IncludeOS's minimalist design is the reduction of the attack surface for the application. With a self-contained application appliance, there are no shells or other tools that would be helpful to an attacker if they manage to compromise the application. Additionally, the stack and heap locations are randomized to discourage attackers.

IncludeOS does not implement all of POSIX. It is the opinion of the developers that only parts of POSIX will be implemented, as needs arise. It is unlikely that full POSIX compliance will ever be pursued as a goal by the developers. Currently, there are no blocking calls implemented in IncludeOS, as the current event loop model is the favored way to use it. IncludeOS also lacks a writable filesystem at this point.

There are plans in the pipeline to implement threads as fibers, which are a cooperative form of threading. Since there is no preemption in IncludeOS, fibers yield voluntarily to give other fibers a chance to run. Apart from some standard C++ library calls, a special IncludeOS API is used to help construct applications as unikernels.

Business model

IncludeOS started off as a university research project at Oslo and Akershus University College of Applied Sciences; it was developed by Alfred Bratterud and his associates. The project spun off into a startup, founded by Bratterud together with Per Buer. IncludeOS is distributed under the Apache 2.0 license, with the code available on GitHub. Outside of the company, there is a small community of voluntary contributors that numbers around a dozen people. Although most contributions from volunteers are small bug fixes, there have been some considerable contributions by IBM, which added support for running IncludeOS on ukvm.

As a company, IncludeOS is still in the early stages. According to Buer, most of the funding it has received is in the form of grants from the Norwegian government. The code for the IncludeOS unikernel is open source, but there is a plan to create proprietary enterprise management tools for running unikernels in large deployments in data centers and in the cloud. The company has acquired a customer that it is adding features for, such as network load balancing, a firewall, and additional hardening of the codebase. Other missing features will be added as needed, which will primarily be driven by the business needs of customers.

Trying it out

Currently, there are no IncludeOS packages for Linux, but there are instructions on how to create a unikernel from the source code. IncludeOS works on KVM/QEMU and VirtualBox; in theory it could also boot on bare-metal hardware, but this has not been verified.

Since the code is currently not yet meant for production, the results of following the instructions may vary. I tried multiple installations in different versions of Ubuntu and got as far as compiling a unikernel image and running the sample application, which is an HTTP server. However the network bridging between the unikernel and its host was not set up right, and thus I could not connect to it from a web browser. Despite the helpful support from members of the developer community in the IncludeOS development chat room, something in my set up caused problems that could not be reproduced. The compilation and installation scripts are rough around the edges, so any user trying them out may also face problems. Ubuntu users will need at least version 16.04 to build the latest version of IncludeOS.

Conclusion

Despite the popularity of cloud computing and virtualization, we are still trying to figure out the best ways to take advantage of the technology. Containers grew out of the desire for lightweight partitioning of guest applications, but unikernels appear to provide an even better option with stronger isolation. The downside of a completely new operating system and programming paradigm is that most legacy software will not work on it without significant modification. However, lightweight, virtualized, and isolated software appliances are a logical way to run applications in the cloud; as IncludeOS and other unikernels become more sophisticated, it may become the primary method of deploying such services. With several different competing unikernel projects taking off, it will be interesting to see how IncludeOS (and the unikernel paradigm itself) fares against more traditional operating systems. Unikernels are highly specialized, and it remains to be seen if the lightweight virtualization aspect of deployment is enough of an incentive for developers to invest time and resources into building applications in this manner.

[I would like to thank Per Buer and the rest of the IncludeOS development community for their feedback when writing this article.]


Index entries for this article
GuestArticlesHussein, Nur


to post comments

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 9:45 UTC (Wed) by Sesse (subscriber, #53779) [Link] (9 responses)

So, a question I never really see answered in this kind of context is: Is it really worth it? Sure, the OS is 1/100 the size of a full Linux installation, but so? Is really 64 MB extra of RAM per instance (and some disk space) an issue if your software uses gigabytes of it anyway, and is it worth giving up significant amounts of debuggability and compatibility for? Booting up in 100 ms instead of two seconds is surely nice, but is really VM startup time an important metric in a cloud deployment? How do the TCP stacks of these things stack up against modern OSes anyway, and what happens if you actually need to use multiple cores in a non-trivial way?

/* Steinar */

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 10:05 UTC (Wed) by mnowak@suse.com (guest, #105589) [Link] (2 responses)

Agreed. Maybe if Linux (SELinux-less) containers actually contained, we would not have to bother with this concept at all. For every such article should mandatory to answer concerns from http://dtrace.org/blogs/bmc/2016/01/22/unikernels-are-unf..., I believe.

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 15:36 UTC (Wed) by drag (guest, #31333) [Link] (1 responses)

I think that sometimes people have a wrong perspective on virtual machines vs containers vs applications.

The virtual machine itself really is just a special case application. The virtual machine is not 'emulated PC' anymore. There is no interpretation of cpu instructions going on. The application code running in a virtual machine is executing on the CPU just like any normal application. Just has a couple extra layers of memory address abstraction and whatnot.

Modern VMs depend heavily on paravirtualization to be efficient. They are aware of the cpu types and make all sorts of efforts to reduce software layers between applications and physical network and physical storage to as minimal as possible. As time goes on the VM becomes more and more aware that it is a VM.

So what we end up with is really much closer to Java virtual machines then anything else. Major difference is that instead of executing Java bytecode they use x86_64 machine code.

After all this really isn't that much different then the same sort of transition that happened when you went from 'single user' application environments like a DOS.. which was little more then a fancy program loader.. to multi-user multi-process environments. Throw in some memory address abstractions and there you go.

And if a attacker is able to seize control of a virtual machine, which shouldn't be very difficult, then
it's not much more difficult to attack the hosting system then it is if you took over any other userland process like Firefox. If you want substantial improvements in security you are still going to need to depend on available DAC and MAC controls in the host OS to aid in the natural sort 'sandboxing' that VMs offer.

All these things... running virtual machines, containers, or executing applications... really should be thought of as pretty much the same things. There isn't a really good reason, other then historical happenstance and tradition, why they are treated so differently.

IncludeOS: a unikernel for C++ applications

Posted Jul 28, 2017 20:31 UTC (Fri) by epa (subscriber, #39769) [Link]

Moreover, remember that a Linux process's execution environment can also be considered a virtual machine. It certainly isn't running on the physical hardware with physical addresses...

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 10:47 UTC (Wed) by nix (subscriber, #2304) [Link] (1 responses)

You also lose multiple major benefits of normal OSes. The unikernel is almost certainly not as scalable, tested, or featureful as the kernel it replaces (definitely true here!), and you are essentially taking yourself back to the 1960s, when all OSes were distinct monsters tied to specific preferred languages with unique APIs, and if you wanted to introduce something in a different language you were suddenly writing lots of bridging layers: as for switching away from IncludeOS if you found you needed something a real kernel can provide, forget it.

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 12:59 UTC (Wed) by mato (guest, #964) [Link]

> You also lose multiple major benefits of normal OSes. The unikernel is almost certainly not as scalable, tested, or featureful as the kernel it replaces (definitely true here!)

That's only partially true. Modern green-field unikernels can provide better implementations of existing features (e.g. MirageOS' type- and memory-safe TCP/IP and TLS stacks). And the implementations will mature with time and use, as does all software.

> and you are essentially taking yourself back to the 1960s, when all OSes were distinct monsters tied to specific preferred languages with unique APIs, and if you wanted to introduce something in a different language you were suddenly writing lots of bridging layers

I'd argue that you'd face the same problem (bridging language "worlds" and competing runtimes) in any conventional application where you use multiple languages in the same *process*.

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 14:14 UTC (Wed) by josh (subscriber, #17465) [Link] (1 responses)

> Booting up in 100 ms instead of two seconds is surely nice, but is really VM startup time an important metric in a cloud deployment?

VM startup time is important, but these days, you can boot a Linux VM in around 150ms. See the Clear Containers work.

I do see value in these kinds of unikernels for other purposes, such as experiments in reduced overhead between the application and network. (On the other hand, you can also do that with Linux, and bypass its network stack. One of these days I hope we improve the Linux network stack to the point that nobody feels the need to.)

But for boot time, no, I don't think this helps.

IncludeOS: a unikernel for C++ applications

Posted Aug 4, 2017 7:01 UTC (Fri) by perbu (guest, #14372) [Link]

It is worth noting that port of IncludeOS to ukvm/solo5 that IBM has done boots, executes and exits in 4-5 ms. So it definitely possible for a unikernel to go way faster than Linux.

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 14:45 UTC (Wed) by jhoblitt (subscriber, #77733) [Link]

There are some novel uses for ulta-fast boot times. Imagine a service that follows a pre-fork style pattern but instead of fork()ing, spawns a new unikernel to handle each incoming request. That probably earns the service some extra tin-foil points.

My continuing ignorant and naive concern about the basic concept is debugging. In particular, some sort of valgrind analog...

IncludeOS: a unikernel for C++ applications

Posted Jul 26, 2017 15:06 UTC (Wed) by Tara_Li (guest, #26706) [Link]

The biggest advantage I see is the reduced attack surface. You can turn off services, you can block them from not connecting to the outside world - but if they're never compiled into existence in the first place, they *cannot* be turned back on, or unblocked. And if it were running on bare metal instead of in a VM, your responsiveness would likely be a lot more predictable.

IncludeOS: a unikernel for C++ applications

Posted Jul 27, 2017 12:07 UTC (Thu) by clugstj (subscriber, #4020) [Link] (2 responses)

An OS that only supports a single CPU core?

IncludeOS: a unikernel for C++ applications

Posted Jul 28, 2017 16:22 UTC (Fri) by flussence (guest, #85566) [Link]

It sounds like there's some logic to the design choice: not bothering to support multiple cores neatly sidesteps having to think about or write code for whole classes of problems - no need to handle scheduling, synchronisation or IPIs in every program, just pure computation and bit-banging virtio devices.

The downside is that ignoring the problems won't make most of them go away; to avoid side-channel attacks you'll need a whole air gap between instances, not a hypervisor.

IncludeOS: a unikernel for C++ applications

Posted Aug 3, 2017 14:43 UTC (Thu) by perbu (guest, #14372) [Link]

The dev branch of IncludeOS does support SMP. For an upcoming delivery we need to do TLS at scale and we're using SMP to handle a large amount of incoming TLS connections.

So multicore is supported. Sort of, at least.

Do Unikernels even have real value?

Posted Jul 28, 2017 7:22 UTC (Fri) by alonz (subscriber, #815) [Link] (3 responses)

I'm in the camp that considers the entire idea of unikernels to be a net regression in almost all meaningful metrics.

When you ignore the hype, a hypervisor is no different than a run-of-the-mill kernel. It has precisely the same tools at its disposal for dividing resources among tasks (sorry, “domains” is the new buzzword for those) and protecting them from interfering with each other. The only true difference is that hypervisors expose a lower-level API (virtual hardware) vs. kernels' richer interfaces, which actually gives the hypervisor less power to regulate behavior.

When this is combined with unikernels, we have userspace code running in the CPU's protected mode—which means vulnerabilities in this code give attackers even more powerful tools to play with. And this code talks to hardware via minimally-protected interfaces. Nothing to worry about, right?

In short: I hope this remains as an academic exercise. And I doubt any serious cloud providers will allow such magical-thinking solutions near their hardware.

Do Unikernels even have real value?

Posted Jul 30, 2017 10:05 UTC (Sun) by paulj (subscriber, #341) [Link] (1 responses)

The one difference is hardware protection. You now have the hypervisor in ring -1, the guest kernel(s) in ring 0 and userspace(s) in respective ring 1s. Each with different hardware privilege levels.

Do Unikernels even have real value?

Posted Jul 31, 2017 14:29 UTC (Mon) by robbe (guest, #16131) [Link]

For hypervisors in general, you are right. But unikernels eschew user space, so you are only left with rings -1 and 0…

Do Unikernels even have real value?

Posted Aug 3, 2017 14:59 UTC (Thu) by perbu (guest, #14372) [Link]

The amount of legacy shit we currently have to deal with in order to boot virtual machines is incredible. If virtual machines didn't have to pretend it's 1982 development would be a lot quicker.

The reason unikernels need to run in ring 0 is only because of legacy. At some point I expect IncludeOS to start in ring 0, set up page tables and hardware and then chain-load a second includeos unikernel that is running in ring 3. As they are single process the need to restrict access to things like virtual network adaptors is not needed.

But even if we're running in ring 0 trying to compromise a virtual machine with an unknown memory layout and no system calls is .,. challenging. And contrary to what you're indicating there is nothing magical about running in ring 0. Code doesn't automatically get insecure by escalating its privileges. If your application is running on ring 3 on a Linux server that application has a lot more control over the vm than a unikernel has. If I compromise your linux application I can execute the shell, write files, execute processes, call home and do all sort of crazy stuff. If I compromise a Unikernel I can ... well, there isn't really much you can do as everything that isn't explicitly used by the application gets left out by the linker.

Even if there is functionality to say, connect home, how would you call that function? Trying to guess 64 bit addresses?

IncludeOS: a unikernel for C++ applications

Posted Aug 1, 2017 19:26 UTC (Tue) by viesti (subscriber, #47763) [Link] (1 responses)

I might be totally off, but to me services like AWS Lambda provide some of the unikernel idea (narrowly scoped services), but with a very convential infrastructure. I'd think that these services would be a tough competition for unikernels.

IncludeOS: a unikernel for C++ applications

Posted Aug 2, 2017 2:14 UTC (Wed) by smckay (guest, #103253) [Link]

That's my take as well. If what you want is to ignore OS administration as much as possible, there are multiple cloud vendors working very hard on that. You even get logging!


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds