Hardening the "file" utility for Debian
The file command would seem to be an ideal candidate for sandboxing; it routinely handles untrusted input. But an effort to add seccomp() filtering to file for Debian has run aground. The upstream file project has added support for sandboxing via seccomp() but it does not play well with other parts of the Debian world, package building in particular. This situation provides further evidence that seccomp() filtering is brittle and difficult to use.
The discussion began with a post to the debian-devel mailing list where Christoph Biedl announced that he had enabled the file sandbox feature for the unstable repository. He was asking that other Debian developers keep an eye out for problems. He noted that the feature has some drawbacks:
In addition, he had already encountered problems with file running in environments with non-standard libraries that were loaded using the LD_PRELOAD environment variable. Those libraries can (and do) make system calls that the regular file binary does not make; the system calls were disallowed by the seccomp() filter.
Building a Debian package often uses FakeRoot (or fakeroot) to run commands in a way that appears that they have root privileges for filesystem operations—without actually granting any extra privileges. That is done so that tarballs and the like can be created containing files with owners other than the user ID running the Debian packaging tools, for example. Fakeroot maintains a mapping of the "changes" made to owners, groups, and permissions for files so that it can report those to other tools that access them. It does so by interposing a library ahead of the GNU C library (glibc) to intercept file operations.
In order to do its job, fakeroot spawns a daemon (faked) that is used to maintain the state of the changes that programs make inside of the fakeroot. The libfakeroot library that is loaded with LD_PRELOAD will then communicate to the daemon via either System V (sysv) interprocess communication (IPC) calls or by using TCP/IP. Biedl referred to a bug report in his message, where Helmut Grohne had reported a problem with running file inside a fakeroot. The msgget() system call was the cause in that case; Biedl changed the Debian file whitelist to specifically allow that call before his announcement:
There is a workaround for such situations which is disabling seccomp, command line parameter --no-sandbox.
As it turns out, though, his fix was specific to the sysv IPC mechanism; in order to make it work with TCP/IP, more whitelisting of system calls will be needed, as Grohne pointed out. Furthermore, blocking mechanisms like IPC and networking is just what the filter should be doing; those are the kinds of calls you don't want to make if file is compromised, he said. Instead of playing whack-a-mole with system calls, he suggested checking for the presence of LD_PRELOAD libraries and turning off the sandbox for those cases.
That idea did not
sit entirely well with Biedl, who was concerned with "silently
disabling this security feature in a production system
". He thought
that perhaps disabling the filter for build environments might be a way
forward. Meanwhile, on debian-devel, several people thanked Biedl for
enabling the filter, seeing it as a good step toward helping to secure the
system. Russ Allbery said:
But Biedl eventually had to deliver some bad news in the thread. He disabled the system-call filtering in file because of the problems it caused:
However, he did point out that Grohne had suggested some ideas for ways to make the sandboxing of file more workable. In the bug report, Grohne said:
Of course, getting there is essentially rewriting the seccomp feature in file. You cannot easily bolt it onto file in the way it currently is.
That is something that will need to be worked out with the upstream project and Biedl said that he plans to do so. There were several suggestions on how to approach the problem in the mailing list thread as well. Colin Watson commiserated with Biedl, reporting on the problems he encountered when adding seccomp() filtering:
At the moment my compromise solution is to reluctantly open up the minimum possible set of syscalls I could find that stopped people sending me bug reports that were in fact caused by something injected from outside my software, and to limit most of that to only those cases where I've detected the relevant LD_PRELOAD wrappers as being present.
The fragility of the seccomp() solution extends to glibc and kernel versions, as Vincent Bernat pointed out. Those kinds of problems could be detected through automated testing, Philipp Kern suggested. Biedl said that it is something he is working on.
In file, we have a strong candidate for hardening, as it parses and handles file data that often has unknown origins—textbook untrusted input, in other words. But actually using seccomp() filtering to reduce its attack surface has not been successful for Debian. In truth, hardening programs that are often used in conjunction with LD_PRELOAD is always going to be difficult to impossible. But even just changing the version of glibc (which can potentially change the system calls it makes) or which kernel the tool is running on can invalidate the carefully crafted whitelist.
The OpenBSD pledge() system call provides a different path. Developers can specify which system calls are allowed, but only in broad categories like stdio (file operations, mostly), inet (IPv4 and IPv6 calls), or proc (process calls, such as fork(), but not including execve(), which is governed by the exec category). By not tying the filtering directly to individual system calls, some of the problems that Linux seccomp() users have encountered can be avoided. It also doesn't hurt that the OpenBSD user space is released in lockstep with the kernel.
For its file utility, OpenBSD systematically reduces the privileges that the tool has with multiple pledge() calls. It starts by disallowing all but a handful of categories after processing the command-line arguments. It then forks a process that executes the child() function, which reduces privileges further, eventually to only have stdio and recvfd. The child reads messages from the parent, each of which includes a file descriptor for a file to be tested. In that way, the code that is most at risk for compromise is only able to perform fairly minimal operations.
For Linux, it may well be that seccomp() filtering just isn't suitable for retrofitting onto existing projects. Completely separating the "worrisome" code (file-format parsing for file, for example) from the unavoidable code (e.g. opening files) may provide a path, but also probably means the existing code will have to be rewritten or at least majorly thrashed. The calls that LD_PRELOAD libraries are targeting for interception will likely be in that unavoidable part. Perhaps that could even lead hardened subprocesses to simply use the older, simpler seccomp() mode, as suggested by Grohne. That seems preferable to playing a never-ending game of whack-a-mole.
Index entries for this article | |
---|---|
Security | Hardening |
Posted Aug 14, 2019 19:19 UTC (Wed)
by clugstj (subscriber, #4020)
[Link] (20 responses)
Posted Aug 14, 2019 19:23 UTC (Wed)
by josh (subscriber, #17465)
[Link] (2 responses)
Posted Aug 14, 2019 21:40 UTC (Wed)
by clugstj (subscriber, #4020)
[Link]
Posted Aug 15, 2019 11:31 UTC (Thu)
by pbonzini (subscriber, #60935)
[Link]
Posted Aug 14, 2019 19:30 UTC (Wed)
by juliank (guest, #45896)
[Link] (8 responses)
Posted Aug 14, 2019 19:53 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link]
Posted Aug 14, 2019 20:09 UTC (Wed)
by epa (subscriber, #39769)
[Link] (6 responses)
Posted Aug 14, 2019 20:26 UTC (Wed)
by juliank (guest, #45896)
[Link] (5 responses)
Posted Aug 14, 2019 23:14 UTC (Wed)
by roc (subscriber, #30627)
[Link] (2 responses)
You can argue it still wouldn't be "safe" for some meaning of the word, but seccomp filters aren't "safe" in those terms either.
Having said that, extra layers of protection are still good and grappling with the issues in this post is still important. In particular, if `file` was written in Rust but users' systems inject C code into it via LD_PRELOAD, then savvy attackers would target that C code. Witness the security vulnerabilities introduced by AV filters over the years.
Posted Aug 18, 2019 3:02 UTC (Sun)
by k8to (guest, #15413)
[Link] (1 responses)
Posted Aug 18, 2019 5:59 UTC (Sun)
by dvdeug (guest, #10998)
[Link]
Posted Aug 15, 2019 13:01 UTC (Thu)
by epa (subscriber, #39769)
[Link]
While there are bugs in the Java or .NET runtimes, or other language runtimes, getting an exploit through one of those is usually much harder than the swarm of exploits a C program will contain unless written with exceptional discipline by a highly skilled programmer.
But actually I wasn't really suggesting one of these heavyweight managed languages that pulls along a runtime environment. Rust doesn't have a runtime, for example. The Cyclone programming language is a safer dialect of C which also doesn't have any special run time requirements.
Posted Aug 17, 2019 11:21 UTC (Sat)
by dvdeug (guest, #10998)
[Link]
Posted Aug 14, 2019 20:05 UTC (Wed)
by Deleted user 129183 (guest, #129183)
[Link] (7 responses)
Well, I guess that’s exactly the reason that openBSD has their own, seemingly NIH, implementation of it…
Posted Aug 14, 2019 22:55 UTC (Wed)
by wahern (subscriber, #37304)
[Link] (6 responses)
The original contributors of seccomp were quite familiar with systrace. seccomp only permits filtering scalars because it was through systrace that it was proven how easy it was to bypass string path filtering unless the kernel copied the string. (Later releases of systrace came with a huge warning that string path filtering wasn't secure.) As I recall, seccomp was originally intended for sandboxing Chrome NaCl, which by design only required read and write from the sandboxed process. seccomp was the minimal amount necessary to put into the kernel to make NaCl work. I don't think Google ever intended seccomp to grow into a general purpose syscall filtering or sandboxing mechanism as it was already obvious from the history of systrace that the low-level semantics don't lend themselves to more sophisticated use cases.
So, no, pledge isn't NIH. seccomp is basically a *worse* systrace, and everybody knew systrace was a dead end.
Another alternative is Capsicum. OpenBSD rejected this for the same reasons the file(1) maintainer hasn't yet refactored their codebase: Capsicum, like the original seccomp, is premised on using a multi-process privilege separation model, which requires alot of work, *especially* for preexisting codebases. Capsicum is great model, but it doesn't prove a viable solution for preexisting code.
Posted Aug 14, 2019 23:46 UTC (Wed)
by roc (subscriber, #30627)
[Link] (3 responses)
It's very important to keep in mind that implementing a sandbox with just seccomp is usually a terrible idea. In Linux, kernel namespaces are the best way to construct a sandbox. Then you apply a seccomp filter as an extra layer of defense, to reduce the kernel API attack surface exposed to sandboxed code. This is what Chrome and Firefox do. OpenBSD of course doesn't have kernel namespaces. Comparing pledge() to seccomp-bpf for constructing sandboxes is really a mistake, you should compare pledge() to kernel namespaces (with or without an additional seccomp-bpf layer).
> I don't think Google ever intended seccomp to grow into a general purpose syscall filtering or sandboxing mechanism
Perhaps Andrea Arcangeli didn't, but Google certainly did, otherwise their decision to use BPF to express arbitrary predicates is unfathomable.
Personally I'm pretty glad they did. We use seccomp-bpf for selective syscall interception in rr in a way that a dedicated sandbox API like pledge() would never have supported. That feature is critical for low overhead in rr recording.
Dead-end or not, seccomp-bpf is working in practice for Firefox, Chrome, rr, and others.
Posted Aug 15, 2019 0:34 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link] (2 responses)
Posted Aug 15, 2019 1:52 UTC (Thu)
by roc (subscriber, #30627)
[Link] (1 responses)
Posted Aug 15, 2019 2:29 UTC (Thu)
by Cyberax (✭ supporter ✭, #52523)
[Link]
Posted Aug 15, 2019 14:41 UTC (Thu)
by Deleted user 129183 (guest, #129183)
[Link] (1 responses)
I was talking about openBSD reimplementation of `file`, not about `pledge`.
Posted Aug 16, 2019 0:52 UTC (Fri)
by flussence (guest, #85566)
[Link]
Posted Aug 14, 2019 19:38 UTC (Wed)
by juliank (guest, #45896)
[Link] (8 responses)
Posted Aug 14, 2019 19:44 UTC (Wed)
by josh (subscriber, #17465)
[Link] (7 responses)
(There are complications here, such as chroots and similar, but this would be a reasonable default configuration.)
Posted Aug 14, 2019 20:28 UTC (Wed)
by juliank (guest, #45896)
[Link]
Posted Aug 14, 2019 20:52 UTC (Wed)
by dezgeg (subscriber, #92243)
[Link] (3 responses)
Posted Aug 14, 2019 21:28 UTC (Wed)
by mezcalero (subscriber, #45103)
[Link] (2 responses)
Posted Aug 14, 2019 22:39 UTC (Wed)
by dezgeg (subscriber, #92243)
[Link] (1 responses)
Having the NSS libraries loaded into the caller's address space just needs to die. Just this week I had to debug an issue with a (proprietary, but not relevant to discussion) software distributed as binaries with all the libraries bundled. And this broke since some NSS module from the system (with new glibc) needed to be loaded but was using some symbols that didn't exist in the bundled libc. Of course, installing nscd was the solution.
Posted Aug 14, 2019 22:43 UTC (Wed)
by juliank (guest, #45896)
[Link]
Posted Aug 27, 2019 7:57 UTC (Tue)
by cortana (subscriber, #24596)
[Link] (1 responses)
Posted Aug 27, 2019 13:25 UTC (Tue)
by Jonno (subscriber, #49613)
[Link]
Actually it can. By using the "proxy" id_provider sssd will use a specified nss library as a backend.
Unfortunately the sssd nss service does not support all NSS databases, so using sssd is not a complete solution (sssd_nss can provide passwd, shadow, group, netgroup and services; but not hosts, networks, protocols, ethers, or rpc).
Posted Aug 14, 2019 20:41 UTC (Wed)
by zblaxell (subscriber, #26385)
[Link] (4 responses)
Allowing LD_PRELOAD to propagate from a low-privileged context to a
fakeroot is a fun hack and all, but maybe there are better ways to solve
NSS is a real annoyance: you try to map a UID to a name, and
Posted Aug 15, 2019 9:58 UTC (Thu)
by cjwatson (subscriber, #7322)
[Link]
Posted Aug 15, 2019 17:01 UTC (Thu)
by rwmj (subscriber, #5474)
[Link] (2 responses)
Posted Aug 15, 2019 18:14 UTC (Thu)
by mathstuf (subscriber, #69389)
[Link]
Posted Aug 16, 2019 7:06 UTC (Fri)
by smcv (subscriber, #53363)
[Link]
My understanding is that for the minority of packages that contain files with different ownership (for example audit and shadow), there are plans for some sort of declarative file-ownership metadata (analogous to RPM does it), but that isn't available yet.
Posted Aug 14, 2019 21:44 UTC (Wed)
by mezcalero (subscriber, #45103)
[Link] (5 responses)
While I generally do agree that maintaining seccomp policies is nastier than people might think I also think it's managable if you are careful. Specifically, seccomp policies that trigger SIGSYS are a really bad idea, as are syscall blacklists. If you stick to whitelists and stick to returning EPERM for unlisted syscalls you should mostly be fine, as most code that runs in environments it doesn't know well (i.e. NSS module code, library code, or even LD_PRELOAD hacks) tends to be written carefully enough to handle EPERM in a graceful way. Moreover, new syscalls added to the kernel this way also return EPERM and most code using such new syscalls tends to have fallbacks in place anyway to support slightly older kernels, and these codepaths are triggered then. In addition in a world of SELinux and AppArmor apps are vaguely prepared to getting EPERM/EACCES from various places already, thus getting them from some syscalls is fine too.
In systemd we started out with blacklisting and our logic defaults to triggering SIGSYS. Today we know we probably should not even have bothered with blacklisting at all, nor with SIGSYS because it's unmanagable, but we didn't know that when we first added support. It appears the folks who started this work in the 'file' tool made the same mistakes...
(Oh, and grouping syscalls is kinda important too: ideally libseccomp would even do that on its own. Policies shouldn't need to spell all 4 syscalls for sending a datagram individually nor the 7 syscalls for changing ownership of a file. In systemd we defined our own grouping to make this managable, but this sounds like a concept to have in libseccomp itself. With such grouping you get the coarseness that pledge() provides.)
Lennart
Posted Aug 14, 2019 22:46 UTC (Wed)
by juliank (guest, #45896)
[Link] (1 responses)
Grouping is a bit tricky maybe. It's likely that there are different bugs in different variants of the same syscall, so it might make sense to only allow the latest one.
Posted Aug 15, 2019 17:01 UTC (Thu)
by luto (guest, #39314)
[Link]
I once wrote a library for this, but I wrote it as a patch to libseccomp that was a bit out of place. I should just release it standalone.
Posted Aug 14, 2019 23:01 UTC (Wed)
by roc (subscriber, #30627)
[Link] (2 responses)
It would be nice to have a composable, fast, reliable user-space syscall interception mechanism. LD_PRELOAD isn't it.
Posted Aug 19, 2019 14:59 UTC (Mon)
by bpearlmutter (subscriber, #14693)
[Link] (1 responses)
Posted Aug 24, 2019 12:41 UTC (Sat)
by xilun (guest, #50638)
[Link]
Posted Aug 14, 2019 22:47 UTC (Wed)
by jamesmorris (subscriber, #82698)
[Link]
Posted Aug 15, 2019 10:42 UTC (Thu)
by rbranco (subscriber, #129813)
[Link] (2 responses)
alias ldd='podman run --rm -v /:/:ro --net=none --env-file <(env) scratch ldd'
Posted Aug 22, 2019 6:24 UTC (Thu)
by Siosm (subscriber, #86882)
[Link] (1 responses)
Posted Aug 22, 2019 8:55 UTC (Thu)
by rbranco (subscriber, #129813)
[Link]
Posted Aug 15, 2019 14:59 UTC (Thu)
by joey (guest, #328)
[Link]
Hardening libmagic will benefit all of them, while hardening `file` does not.
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
They do have unveil()-based file "namespaces".
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
something, file doesn't need to participate in fakeroot stat mangling
(or NSS for that matter--file doesn't translate uids to names or
interact with hostnames or URLs). Detect LD_PRELOAD in file's
main function, unset it, and re-exec file so seccomp works properly.
high-privileged one is obviously a bad idea--which is why it's disabled
for setuid programs. Why does allowing LD_PRELOAD to propagate from a
high-privileged context to a low-privileged one seem like a good idea?
Sounds like it's just asking for exactly the kind of problems listed
above. If you're going to sandbox a process, that should include
deleting most of its environment variables, so that you can predict what
environment it's going to run in. That conflicts with the fundamental
ideas behind fakeroot and NSS, but...well, maybe they weren't particularly
good ideas anyway.
the underlying problem? e.g. run packaging tasks on a FUSE filesystem
that acts like the user is doing everything as root, and patch the two
or three packages that still do euid == 0 checks during build. That
approach is going to work for things that aren't running on top of the
C library, too.
suddenly your thread blocks for network IO, or crashes, or gets RCE
vulnerabilities, because someone configured NSS to do something dumb, and
unprivileged users don't get an easy way to turn it off. There should be
an easy way to turn NSS off per process--users can already LD_PRELOAD in
a sane implementation of getpwnam(), getpwuid(), and so on, so it does
not introduce any new vulnerabilities to have an environment variable
(ignored for setuid programs) that lets users override nsswitch.conf
more conveniently.
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
Hardening the "file" utility for Debian
alias file='podman run --rm -v /:/:ro --net=none --env-file <(env) scratch file'
Nice one! Here is similar one with Bubblewrap:
Hardening the "file" utility for Debian
alias file='bwrap --unshare-all --ro-bind / / --dev /dev --tmpfs /tmp --tmpfs /proc --tmpfs /sys -- file'
alias ldd='bwrap --unshare-all --ro-bind / / --dev /dev --tmpfs /tmp --tmpfs /proc --tmpfs /sys -- ldd'
My trick doesn't work anymore I don't know why. I just can't mount the root filesystem anymore with podman.
Also, the Hardening the "file" utility for Debian
-v /:/:ro
is problematic because recursive bind mounts don't mount read-only other filesystems on /
Code would have to detect the filesystems (ideally from /etc/fstab, skip pseudo-filesystems and network filesystems), and mount them all readonly as a volume using the option bind-nonrecursive.
Hardening the "file" utility for Debian