By Jake Edge
February 22, 2012
The Capsicum capabilities framework has been around for a couple of years
now, and support for it was added
to the recent FreeBSD 9.0 release. Capsicum takes a very different
approach from other capabilities systems (like Linux capabilities or POSIX
capabilities), and is geared toward sandboxing applications to limit the
damage that can be caused by buggy or misbehaving programs. While the
FreeBSD support is "experimental", it is available for researchers and
others to try out.
Capsicum came out of a collaboration between the University of Cambridge's
Computer Laboratory and Google. That resulted in a prototype
implementation for FreeBSD along with modification of several different
programs to take advantage of Capsicum. One of the main applications of
interest is the Chromium web browser, but several FreeBSD utilities
(tcpdump, dhclient, and gzip) were also converted, as described in the Capsicum
paper [PDF].
The idea behind Capsicum is to extend the standard Unix APIs by adding ways
that applications can "self-compartmentalize". Essentially, applications
can choose to restrict themselves to a sandbox that will disallow many
"dangerous" operations, while still allowing them to get their job done via
the capabilities they allow for themselves or those that are passed in
using special file descriptors (which are also, perhaps unfortunately,
called capabilities). It is, in some ways, conceptually similar to
programs that drop their privileges using the setuid() call but,
instead of being restricted to what a particular user is allowed to do
(which is often far more than the application needs), Capsicum allows much
finer-grained control over what restrictions are in place.
The starting point for a Capsicum-enabled process is the new
cap_enter() system call. This is a one-way gate that puts a
process and any subsequent children into "capability mode". It turns off
"ambient authority", which is a term for the normal Unix process model where a process has all of the
permissions of the UID it is running as. Capability mode restricts access to any of the global
namespaces, like the filesystem namespace, PID namespace, network
namespace, and others. Any system calls that operate on these global
namespaces are either disallowed entirely, or their arguments are constrained.
For example, the sysctl() call is constrained to only allow around
30 (of a possible 3000) of the different system parameters to be examined
via that call. The shared memory creation call, shm_open(), is
only allowed to create anonymous memory objects, while the
openat() family is restricted to allow access to files at or below
the directory file descriptor passed in (by essentially disallowing "/" or
".." at the start of the path). There are some other miscellaneous
restrictions that come with capability mode including disallowing the
loading of kernel modules or the execution of setuid and setgid binaries.
Capsicum wraps normal file descriptors with additional capability
information that restricts what can be done with the file. If
a capability file descriptor has the CAP_READ capability, that's all that can
be done to it,
unlike a file descriptor for a file that is opened read-only which can still be used to
make metadata changes (via fchmod() for example). In order to
change positions in the file, the CAP_SEEK capability is
required. A capability file descriptor can also wrap a directory file descriptor, which allows the
capability set to be applied to all members of that directory. That would
allow Apache to set up workers that only have access to a certain subset of
the web directory hierarchy, or for a sandboxed application to access a
library path, for example.
The capability file descriptors can be already open at the time that
cap_enter() is called (and wrapped by a set of capabilities
specified in an earlier cap_new() call) or passed to the process
using Unix sockets. That means that a fairly simple program can decrease
its ability to cause harm by setting up the file descriptors it needs and
then calling cap_enter() before performing more "dangerous"
operations. The tcpdump example given in the paper is
instructive, as it simply enters capability mode after setting up the packet
filter (which is a privileged operation), but before entering the
processing loop. That way, errors in the packet decoding code are very
limited in the kind of damage they can cause.
The simple two-line change to tcpdump() did expose a few problems,
however. For example the glibc DNS resolver code requires access to the
filesystem (/etc/resolv.conf) and to the network namespace (to
talk to the DNS server), which led to reduced functionality. Switching
tcpdump to use a lightweight local resolver restored that feature.
In addition to the "raw" Capsicum interface using cap_enter(), the
framework provides a libcapsicum that can be used to more
thoroughly isolate the sandboxed processes without each application having
to do its own start-up management of a sandboxed process. It handles
closing all undelegated file descriptors (those that are not meant for the
sandbox), forking the new sandboxed
process, flushing the address space using fexecve(), and setting
up a Unix socket that can be used for communication between the privileged
and unprivileged processes. None of the examples in the paper use
libcapsicum as it generally requires major changes to the
application in order to be used, so it may be more suitable for new
development.
The examples do show that substantial
improvements in the security of programs can be had with minimal code
changes, though. Roughly 100 new lines of code were all that was required
to use Capsicum in Chromium on FreeBSD, largely because the browser was
written with privilege separation in mind. Chromium already uses various
techniques, depending on the OS, to separate the rendering process from other renderers and the
rest of the browser. That made it fairly straightforward to adapt Chromium
and the paper says that switching to a libcapsicum-based
implementation should not be significantly harder.
Capsicum is an interesting idea that bears watching as it rolls out in
FreeBSD. The 9.0 release only contains the kernel changes required for
Capsicum but doesn't ship any applications that use the facility. 9.1 is
slated to have some of those, presumably starting with Chromium. Beyond
this brief introduction, those interested should take a look at the paper, this
article [PDF] from ;login: magazine, as well as the documentation page.
(
Log in to post comments)