Inheriting capabilities
Linux capabilities have long been seen as a way to avoid setuid programs, though the reality hasn't really lived up to the hope. While partitioning root privileges into distinct, fine-grained permissions has a lot of conceptual merit, the implementation suffers from a number of shortcomings that make it difficult to, say, use capabilities to allow ping and other utilities to run without being setuid root. Actually using capabilities has always been complex, and there are some fundamental limitations that are sometimes worked around in out-of-tree patches. A recent discussion on the linux-kernel mailing list looks at one of the limitations of the capabilities model, with an eye toward eliminating it.
Under the current model, processes cannot easily pass their capabilities to another program through an execve() call in the same way that other privileges are inherited. Christoph Lameter posted a patch that would allow administrators to specify which capabilities should be inherited by programs started by execve(). The idea is to allow privileged, non-setuid-root programs to spawn other programs with a limited set of privileges—exactly the kind of facility that capabilities are supposed to provide.
We have looked at Linux capabilities multiple times over the years, often in the context of a new attempt to fix some of the deficiencies in the existing Linux capabilities model (which is derived from a defunct POSIX effort). For example, file-based capabilities were added to use the extended attributes (xattrs) on program files to store capability information so that those capabilities were granted to programs when they were executed. Adding that allowed the CAP_SETPCAP capability to return to its original meaning and provided a way for administrators to grant capabilities to individual programs (rather than running processes). There are other problems we have noted; in some sense, this article is a continuation of that trend.
Lameter described his use case in the patch. He runs a network stack in user space that requires privileges for raw network access, but also may run arbitrary binary programs. He would like to restrict what those programs can do, but needs them to have a certain subset of root privileges. That could be done using setuid programs, but he would like to restrict those programs so that they don't get all of root's privileges.
A 2006 LWN article that Lameter referenced in his posting describes the underpinnings of capabilities. In particular, it looks at the different capability sets and how they are combined to determine which capabilities a process has.
This first patch from Lameter provides a way for an administrator to globally specify which capabilities should be inherited across execve(). Capability numbers could be written to a sysfs file, which would cause those capabilities to be inherited by any new program if the caller of execve() possessed them. He has been using a form of the patch in production for the last six years, but would like it (or something that solves the same problem) to go upstream, so that he doesn't need to continue carrying the patch in his kernels.
The usual way to make a binary run with some capability is to put that capability in the "permitted" and "effective" sets of the binary using the file capabilities xattrs. Doing that makes those binaries always have those capabilities, however, which may not be desired—especially for system binaries. Without putting file capabilities on many different binaries throughout the system, there is no way to pass CAP_NET_RAW (or others, such as CAP_NET_ADMIN and CAP_SYS_NICE) down to child processes or new programs started with execve(). Also, given that there is scripting (and LD_PRELOAD) involved in Lameter's use case, it would be necessary to give script interpreters (e.g. Bash, Python) whichever capabilities the scripts need—something that is not likely to lead to a more secure system.
It looks like a fundamental limitation of the Linux capabilities model, but is hardly the first. Capabilities bits have been arbitrarily chosen by kernel feature developers over the years, without much in the way of coordination. That has led to grab-bag capabilities like CAP_SYS_ADMIN that are effectively equivalent to root (though there are others where that is also true). Whatever can be said about the Linux capabilities feature, coherence in its design and implementation are not particularly evident.
But people who want to use the feature keep running up against barriers to doing so. As Serge Hallyn, one of the developers of capabilities, lamented, it is still not possible to make ping use capabilities (rather than setuid) by default. That's because some filesystems don't have support for xattrs; it's also true for some of the tools, such as older versions of tar and current versions of cpio (though work is being done to change that).
But the global nature of the inheritance setting in Lameter's patch set was not popular. Both Hallyn and Andy Lutomirski suggested ways to make it a per-process or per-user-namespace setting. Hallyn suggested adding an "ambient inheritable" set that would be combined with the inheritable set in a way that is somewhat analogous to the capability bounding set. Lutomirski thought that a single bit could be used to say that all files should be considered to have a full set of capabilities in their inherited set, which would basically have the same effect.
But Casey Schaufler objected to the idea
that Lameter's use case was reasonable from a security
perspective: "You're getting into pretty sketchy territory using that kind
of a programming model in a security enforcing environment.
" Though he
also responded to Lameter's claim that there is a "capabilities
mess
" that needs to be cleaned up:
There was some discussion of ways that capabilities could be improved but,
as Lutomirski noted, none of that addressed
the problem at hand: "If I hold a
capability and I want to pass that capability to an exec'd helper, I
shouldn't need the fs's help to do this.
" There was general
agreement, though Hallyn pointed out that
using the filesystem to place capabilities on binaries is how POSIX
capabilities work.
One interesting side note from the discussion is that Lameter is not the only one to use inheritable capabilities in production: the MeeGo-based N9 phone from Nokia did as well.
After that discussion, Lameter shifted gears, posting an RFC patch to implement Hallyn's suggestion of an ambient
capability set. That was met with a couple of objections on some of the
specifics.
To start with,
adding capabilities to the ambient set should not enable
capabilities that are not in the
permitted set of the process, Lutomirski said. He also suggested that adding
capabilities to the ambient set should require more than just having the
CAP_SETPCAP capability. Requiring
PR_SET_NO_NEW_PRIVS (see this article and Documentation/prctl/no_new_privs.txt
for more information), so that execve() could not add any
additional privileges, would make him more comfortable. But that
"would make the patch
pointless
" because programs that require more privileges
(e.g. setuid root programs) need to be run sometimes, Lameter said.
Given Lameter's use case, requiring PR_SET_NO_NEW_PRIVS would seem to be a non-starter, but he has addressed the other complaints from Lutomirski and Hallyn in his V1 ambient capability set patch. In that version, only capabilities that are permitted can be added to the ambient set. In addition, processes can clear their permitted set and be sure that no new capabilities will be granted to children or programs run via execve() based on the contents of the ambient set. That was one of the main concerns expressed about the RFC patch.
As was noted in the threads, there are a number of barriers in the way of using capabilities on Linux systems. They are complex to reason about, have an API that is difficult to use, and have been inconsistently applied over the years. All of that is unfortunate, but any movement to remove capabilities and start over, as was suggested, is pretty unlikely to gain any traction. For good or ill, capabilities are part of the kernel ABI and are thus likely to be with us "forever". Changes like the ambient set may not reduce the complexity at all, but may help provide a more usable capabilities system going forward.
| Index entries for this article | |
|---|---|
| Kernel | Capabilities |
| Security | Capabilities |
| Security | Linux kernel/Linux/POSIX capabilities |
Posted Feb 12, 2015 5:08 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (6 responses)
Posted Feb 12, 2015 11:22 UTC (Thu)
by ibukanov (subscriber, #3942)
[Link] (5 responses)
Posted Feb 12, 2015 12:08 UTC (Thu)
by vonbrand (subscriber, #4458)
[Link] (2 responses)
<cough>systemd</cough>
Posted Feb 12, 2015 12:34 UTC (Thu)
by ibukanov (subscriber, #3942)
[Link] (1 responses)
Instead it would be better if services allow to specify an external command that returns the bound socket so one can use whatever mechanism to bind the port like dynamic port numbers etc. and integration with systemd becomes trivial.
Posted Feb 12, 2015 14:15 UTC (Thu)
by fishface60 (subscriber, #88700)
[Link]
By default ProxyCommand requires the process to proxy the messages between stdin and stdout, but if you set ProxyUseFdpass, the proxy command is passed a socket pair instead, and should send a file descriptor over its stdout. So rather than having your proxy command needing to constantly process data, it can pass the connection back to ssh.
I'd guess the reason why more services don't let you do stuff like this is that it's awkward to do in C, and the networking abstractions in most of the programming languages I use don't support it.
Posted Feb 12, 2015 14:00 UTC (Thu)
by fishface60 (subscriber, #88700)
[Link]
I usually end up with a wrapper program that binds to an ephemeral port and writes out which port was chosen to a named pipe before launching the program in inetd mode.
Posted Feb 12, 2015 18:35 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Feb 12, 2015 11:26 UTC (Thu)
by justincormack (subscriber, #70439)
[Link] (1 responses)
The ping example is also odd, as we have specific ping sockets to solve this problem, but as far as I can see they have not reached mainstream distros. No idea why not.
The capsicum capability model seems nicer to use than Linux capabilities; it is in FreeBSD but is being ported to Linux. You do need to be able to pass around file descriptors though still, or have other programs do operations on your behalf.
Posted Feb 13, 2015 21:36 UTC (Fri)
by wahern (subscriber, #37304)
[Link]
Except in the most common case of fork and exec, where the descriptors are simply inherited. As in the cases being considered here.
The problem with solutions like POSIX capabilities and SELinux is that, all things being equal, the person most capable of judging the correct security policy is the software developer, especially when you're talking about modes of execution instead of simply access to user data. The developer needs tools which work well from a programmatic perspective; from his perspective it's not a configuration problem, but primarily a problem of mechanism and logic. But typical security solutions divorce policy from mechanism in a way that cripples and burdens the ability of the software developer to easily implement the most correct policy.
It would be _really_ nice if Capsicum gained momentum. It's a simple concept that has an astoundingly strong ability to solve privilege problems. And it integrates beautifully into the Unix process and object model. It's basically the most obvious and correct approach, at least in hindsight. But Linux has always suffered from a very bad case of NIH syndrome, and the fact that FreeBSD has incorporated Capsicum is probably a negative in terms of the likelihood of Linux hackers being excited and motivated to seriously consider Capsicum. More likely if they turn in the general direction of Capsicum it will be as a hack on existing Linux mechanisms with neither the simplicity, conciseness, or portability of the Capsicum model.
There are developers porting Capsicum to Linux. Actually, more like ported (past tense) at this point, because the bulk of the work has been done. But there are a million groups who have ported a million different things to Linux. But if it's not in mainline, it's irrelevant.
Posted Feb 19, 2015 5:32 UTC (Thu)
by kevinm (guest, #69913)
[Link]
The initial application would then add those capabilities to its Inheritable set before execve(), which would mean that they end up in the new process's Permitted set. On the other hand if those files are run by a process with an empty Inheritable set (ie. a normal unprivileged process) then the new process would end up with an empty Permitted set.
Am I missing something?
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
Inheriting capabilities
You do need to be able to pass around file descriptors though still
Inheriting capabilities
