LWN: Comments on "Hardening the "file" utility for Debian" https://lwn.net/Articles/796108/ This is a special feed containing comments posted to the individual LWN article titled "Hardening the "file" utility for Debian". en-us Sat, 04 Oct 2025 13:41:26 +0000 Sat, 04 Oct 2025 13:41:26 +0000 https://www.rssboard.org/rss-specification lwn@lwn.net Hardening the "file" utility for Debian https://lwn.net/Articles/797353/ https://lwn.net/Articles/797353/ Jonno <div class="FormattedComment"> <font class="QuotedText">&gt; Although I don't think it can make use of arbitrary NSS modules itself,</font><br> <p> Actually it can. By using the "proxy" id_provider sssd will use a specified nss library as a backend.<br> <p> Unfortunately the sssd nss service does not support all NSS databases, so using sssd is not a complete solution (sssd_nss can provide passwd, shadow, group, netgroup and services; but not hosts, networks, protocols, ethers, or rpc).<br> </div> Tue, 27 Aug 2019 13:25:03 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/797350/ https://lwn.net/Articles/797350/ cortana <div class="FormattedComment"> See also sssd. Although I don't think it can make use of arbitrary NSS modules itself, rather it just provides a daemon that knows how to talk to IPA, AD, generic LDAP and so on.<br> </div> Tue, 27 Aug 2019 07:57:05 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/797158/ https://lwn.net/Articles/797158/ xilun <div class="FormattedComment"> Can ptrace have multiple users targeting a process now? If not I don't see how it is composable. And it is even not debuggable...<br> </div> Sat, 24 Aug 2019 12:41:08 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796915/ https://lwn.net/Articles/796915/ rbranco My trick doesn't work anymore I don't know why. I just can't mount the root filesystem anymore with podman. Also, the <code> -v /:/:ro</code> is problematic because recursive bind mounts don't mount read-only other filesystems on / Code would have to detect the filesystems (ideally from /etc/fstab, skip pseudo-filesystems and network filesystems), and mount them all readonly as a volume using the option bind-nonrecursive. Thu, 22 Aug 2019 08:55:25 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796909/ https://lwn.net/Articles/796909/ Siosm Nice one! Here is similar one with <a href="https://github.com/projectatomic/bubblewrap">Bubblewrap</a>: <br/><br/> <code> alias file='bwrap --unshare-all --ro-bind / / --dev /dev --tmpfs /tmp --tmpfs /proc --tmpfs /sys -- file' </code><br/> <code> alias ldd='bwrap --unshare-all --ro-bind / / --dev /dev --tmpfs /tmp --tmpfs /proc --tmpfs /sys -- ldd' </code> Thu, 22 Aug 2019 06:24:07 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796647/ https://lwn.net/Articles/796647/ bpearlmutter <div class="FormattedComment"> The fakeroot-ng program uses ptrace instead of library hacking, and might meet all your desiderata.<br> </div> Mon, 19 Aug 2019 14:59:52 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796549/ https://lwn.net/Articles/796549/ dvdeug <div class="FormattedComment"> Okay? You can give up on worrying about potential attacks on software, but it seems bizarre to worry about potential attacks on software and ignore the ability to ignore memory overruns, use after free, etc. type problems.<br> </div> Sun, 18 Aug 2019 05:59:33 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796547/ https://lwn.net/Articles/796547/ k8to <div class="FormattedComment"> Your code will not be exploitable by memory overruns, use after free, etc type problems. There are other potential attacks on software.<br> </div> Sun, 18 Aug 2019 03:02:15 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796517/ https://lwn.net/Articles/796517/ dvdeug <div class="FormattedComment"> There are a wide variety of languages wherein buffer overflows and similar tricks can not run arbitrary code. A Scheme program, for example, can crash due to being out of memory, but will never allow arbitrary code execution. Do Scheme interpreters and compilers, and the libraries they use, have bugs? Sure, but it's like driving a rusted-out car across country instead of flying because "there's no such thing as a safe vehicle".<br> </div> Sat, 17 Aug 2019 11:21:19 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796385/ https://lwn.net/Articles/796385/ smcv <div class="FormattedComment"> Yes, fakeroot is the problem here, and Debian is moving away from it. Packages with the "Rules-Requires-Root: no" field are built without fakeroot or similar tricks, which works for packages where every file in the .deb is owned by root:root (including those that use dpkg-statoverride to chown a file during installation, like dbus). This is opt-in because it's potentially a backwards-incompatible change.<br> <p> My understanding is that for the minority of packages that contain files with different ownership (for example audit and shadow), there are plans for some sort of declarative file-ownership metadata (analogous to RPM does it), but that isn't available yet.<br> </div> Fri, 16 Aug 2019 07:06:28 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796377/ https://lwn.net/Articles/796377/ flussence <div class="FormattedComment"> I think it's incorrect to say OpenBSD's the one suffering from a Not Invented Here problem here.<br> </div> Fri, 16 Aug 2019 00:52:02 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796353/ https://lwn.net/Articles/796353/ mathstuf <div class="FormattedComment"> Indeed. How do Go programs (which don't call libc) expect to be affected by `fakeroot`?<br> </div> Thu, 15 Aug 2019 18:14:03 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796342/ https://lwn.net/Articles/796342/ rwmj <div class="FormattedComment"> I'm also inclined to think fakeroot is the root cause of the problem here, rather than file or seccomp. Other distros manage to package broadly the same set of packages as Debian and they don't use fakeroot. Instead the package builder uses a combination of DESTDIR and added metadata marking the desired ownership and permission on installed files.<br> </div> Thu, 15 Aug 2019 17:01:48 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796341/ https://lwn.net/Articles/796341/ luto <div class="FormattedComment"> One could use SIGSYS but catch the SIGSYS, log it, and emulate -ENOSYS.<br> <p> I once wrote a library for this, but I wrote it as a patch to libseccomp that was a bit out of place. I should just release it standalone.<br> </div> Thu, 15 Aug 2019 17:01:16 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796335/ https://lwn.net/Articles/796335/ joey <div class="FormattedComment"> There are quite a few programs besides `file` that use libmagic. It seems likely that some of them are easier to exploit than `file` because they accept untrusted input from the network and pass it directly to libmagic.<br> <p> Hardening libmagic will benefit all of them, while hardening `file` does not.<br> </div> Thu, 15 Aug 2019 14:59:43 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796331/ https://lwn.net/Articles/796331/ Deleted user 129183 <div class="FormattedComment"> <font class="QuotedText">&gt; So, no, pledge isn't NIH.</font><br> <p> I was talking about openBSD reimplementation of `file`, not about `pledge`.<br> </div> Thu, 15 Aug 2019 14:41:26 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796303/ https://lwn.net/Articles/796303/ epa <div class="FormattedComment"> There may not be any safe languages but there are certainly dangerous ones.<br> <p> While there are bugs in the Java or .NET runtimes, or other language runtimes, getting an exploit through one of those is usually much harder than the swarm of exploits a C program will contain unless written with exceptional discipline by a highly skilled programmer.<br> <p> But actually I wasn't really suggesting one of these heavyweight managed languages that pulls along a runtime environment. Rust doesn't have a runtime, for example. The Cyclone programming language is a safer dialect of C which also doesn't have any special run time requirements.<br> </div> Thu, 15 Aug 2019 13:01:17 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796302/ https://lwn.net/Articles/796302/ pbonzini <div class="FormattedComment"> But it seems to me that the same issues would happen with pledge and OpenBSD has fixed them. The article itself hints that you could even use seccomp v1 if the architecture of file is changed to split restricted code into a separate process.<br> </div> Thu, 15 Aug 2019 11:31:21 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796300/ https://lwn.net/Articles/796300/ rbranco <div class="FormattedComment"> echo FROM scratch | podman build -t scratch -f - .<br> <p> alias ldd='podman run --rm -v /:/:ro --net=none --env-file &lt;(env) scratch ldd'<br> alias file='podman run --rm -v /:/:ro --net=none --env-file &lt;(env) scratch file'<br> <p> <p> </div> Thu, 15 Aug 2019 10:42:15 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796298/ https://lwn.net/Articles/796298/ cjwatson <div class="FormattedComment"> Unsetting LD_PRELOAD doesn't even solve all the preloading problems: for better or worse, people are apparently using antivirus tools and VPNs that inject themselves using /etc/ld.so.preload. Convincing ld.so to enter "secure mode" would help, but as far as I know all the methods for doing that involve being privileged in some way.<br> </div> Thu, 15 Aug 2019 09:58:02 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796283/ https://lwn.net/Articles/796283/ Cyberax <div class="FormattedComment"> That's why I put "namespaces" in scare quotes, because in practice it functions similarly to the unshare()-then-bind-mount trick that systemd and other software use on Linux.<br> </div> Thu, 15 Aug 2019 02:29:56 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796281/ https://lwn.net/Articles/796281/ roc <div class="FormattedComment"> unveil() lets you whitelist filesystem paths. I think it's confusing to call that "namespaces". chroot() is more namespace-like.<br> </div> Thu, 15 Aug 2019 01:52:48 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796278/ https://lwn.net/Articles/796278/ Cyberax <div class="FormattedComment"> <font class="QuotedText">&gt; OpenBSD of course doesn't have kernel namespaces. </font><br> They do have unveil()-based file "namespaces".<br> </div> Thu, 15 Aug 2019 00:34:50 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796273/ https://lwn.net/Articles/796273/ roc <div class="FormattedComment"> You're mixing up the original seccomp with seccomp-bpf. The original seccomp was designed by Andrea Arcangeli, not Google, and only allowed read/write on open file descriptors. Later Google added seccomp-bpf to reduce exposed kernel attack surface from sandboxed Chrome processes, not just NaCl but also Web content processes.<br> <p> It's very important to keep in mind that implementing a sandbox with just seccomp is usually a terrible idea. In Linux, kernel namespaces are the best way to construct a sandbox. Then you apply a seccomp filter as an extra layer of defense, to reduce the kernel API attack surface exposed to sandboxed code. This is what Chrome and Firefox do. OpenBSD of course doesn't have kernel namespaces. Comparing pledge() to seccomp-bpf for constructing sandboxes is really a mistake, you should compare pledge() to kernel namespaces (with or without an additional seccomp-bpf layer).<br> <p> <font class="QuotedText">&gt; I don't think Google ever intended seccomp to grow into a general purpose syscall filtering or sandboxing mechanism</font><br> <p> Perhaps Andrea Arcangeli didn't, but Google certainly did, otherwise their decision to use BPF to express arbitrary predicates is unfathomable.<br> <p> Personally I'm pretty glad they did. We use seccomp-bpf for selective syscall interception in rr in a way that a dedicated sandbox API like pledge() would never have supported. That feature is critical for low overhead in rr recording.<br> <p> Dead-end or not, seccomp-bpf is working in practice for Firefox, Chrome, rr, and others.<br> </div> Wed, 14 Aug 2019 23:46:19 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796270/ https://lwn.net/Articles/796270/ roc <div class="FormattedComment"> In practice, if you write the parsing code in Rust or Go and avoid doing something exceptionally stupid like using Rust's "unsafe" keyword, your code will not be exploitable. For evidence, take a look at <a href="https://github.com/rust-fuzz/trophy-case">https://github.com/rust-fuzz/trophy-case</a> and observe how few security bugs there are, and how they involved explicit use of "unsafe".<br> <p> You can argue it still wouldn't be "safe" for some meaning of the word, but seccomp filters aren't "safe" in those terms either.<br> <p> Having said that, extra layers of protection are still good and grappling with the issues in this post is still important. In particular, if `file` was written in Rust but users' systems inject C code into it via LD_PRELOAD, then savvy attackers would target that C code. Witness the security vulnerabilities introduced by AV filters over the years.<br> </div> Wed, 14 Aug 2019 23:14:19 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796269/ https://lwn.net/Articles/796269/ roc <div class="FormattedComment"> Hear hear! LD_PRELOAD is simply not a reliable tool for intercepting syscalls in a production system. Using LD_PRELOAD for syscall interception completely fails if the application does raw syscalls in its own code, and composition is terrible so trying to compose multiple LD_PRELOAD interceptors together invariably fails.<br> <p> It would be nice to have a composable, fast, reliable user-space syscall interception mechanism. LD_PRELOAD isn't it.<br> </div> Wed, 14 Aug 2019 23:01:40 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796261/ https://lwn.net/Articles/796261/ wahern <div class="FormattedComment"> OpenBSD had a syscall filtering mechanism, systrace, years before seccomp. Unlike the ptrace-based version of systrace on Linux, BSD systrace was incorporated into the kernel. The problem was that nobody used it--it was too low-level. So after several years systrace was ripped out. pledge and unveil is what came about after people chewed on the problem for a few more years.<br> <p> The original contributors of seccomp were quite familiar with systrace. seccomp only permits filtering scalars because it was through systrace that it was proven how easy it was to bypass string path filtering unless the kernel copied the string. (Later releases of systrace came with a huge warning that string path filtering wasn't secure.) As I recall, seccomp was originally intended for sandboxing Chrome NaCl, which by design only required read and write from the sandboxed process. seccomp was the minimal amount necessary to put into the kernel to make NaCl work. I don't think Google ever intended seccomp to grow into a general purpose syscall filtering or sandboxing mechanism as it was already obvious from the history of systrace that the low-level semantics don't lend themselves to more sophisticated use cases.<br> <p> So, no, pledge isn't NIH. seccomp is basically a *worse* systrace, and everybody knew systrace was a dead end.<br> <p> Another alternative is Capsicum. OpenBSD rejected this for the same reasons the file(1) maintainer hasn't yet refactored their codebase: Capsicum, like the original seccomp, is premised on using a multi-process privilege separation model, which requires alot of work, *especially* for preexisting codebases. Capsicum is great model, but it doesn't prove a viable solution for preexisting code.<br> <p> </div> Wed, 14 Aug 2019 22:55:50 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796265/ https://lwn.net/Articles/796265/ jamesmorris <div class="FormattedComment"> seccomp is useful for reducing the attack surface of the kernel (i.e. restrict access to only required syscalls), but it's not intended as a least privilege mechanism or sandboxing on its own. seccomp operates at the wrong abstraction level, as evidenced by comments about having to specify 4 syscalls for one type of operation. The LSM API with higher level policies is a better fit for least priv, as you can utilize all security relevant information at an operation-focused granularity for policy enforcement.<br> <p> <p> <p> </div> Wed, 14 Aug 2019 22:47:53 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796267/ https://lwn.net/Articles/796267/ juliank <div class="FormattedComment"> My idea was to return ENOSYS for blocked syscalls. This should have one advantage that it does not break when libc migrates to a newer syscall, as it silently falls back to the old one.<br> <p> Grouping is a bit tricky maybe. It's likely that there are different bugs in different variants of the same syscall, so it might make sense to only allow the latest one.<br> </div> Wed, 14 Aug 2019 22:46:40 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796266/ https://lwn.net/Articles/796266/ juliank <div class="FormattedComment"> Wow that's terrible<br> </div> Wed, 14 Aug 2019 22:43:19 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796260/ https://lwn.net/Articles/796260/ dezgeg <div class="FormattedComment"> Seriously? That is... terrible. Is this actually documented somewhere? Who knows how many setups are (not necessarily intentionally) relying on lookups always going through nscd, not just for sandboxes that might lack the nss libraries but for example not having the 32-bit equivalents of the nss libraries installed. For one, the NixOS distribution relies on nscd for all other nss queries except for the ones included in glibc since there is no global /lib.<br> <p> Having the NSS libraries loaded into the caller's address space just needs to die. Just this week I had to debug an issue with a (proprietary, but not relevant to discussion) software distributed as binaries with all the libraries bundled. And this broke since some NSS module from the system (with new glibc) needed to be loaded but was using some symbols that didn't exist in the bundled libc. Of course, installing nscd was the solution.<br> </div> Wed, 14 Aug 2019 22:39:17 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796253/ https://lwn.net/Articles/796253/ mezcalero <div class="FormattedComment"> Well, I think the lesson to learn here is that LD_PRELOAD is a crutch, a hacker tool and should not be used for clean codepaths. It's not just incompatible with seccomp style stuff, but totally incompatible with anything involving suid binaries or fcaps or such. I am not sure why anyone would bother with making seccomp work with LD_PRELOAD if not even suid works with it...<br> <p> While I generally do agree that maintaining seccomp policies is nastier than people might think I also think it's managable if you are careful. Specifically, seccomp policies that trigger SIGSYS are a really bad idea, as are syscall blacklists. If you stick to whitelists and stick to returning EPERM for unlisted syscalls you should mostly be fine, as most code that runs in environments it doesn't know well (i.e. NSS module code, library code, or even LD_PRELOAD hacks) tends to be written carefully enough to handle EPERM in a graceful way. Moreover, new syscalls added to the kernel this way also return EPERM and most code using such new syscalls tends to have fallbacks in place anyway to support slightly older kernels, and these codepaths are triggered then. In addition in a world of SELinux and AppArmor apps are vaguely prepared to getting EPERM/EACCES from various places already, thus getting them from some syscalls is fine too.<br> <p> In systemd we started out with blacklisting and our logic defaults to triggering SIGSYS. Today we know we probably should not even have bothered with blacklisting at all, nor with SIGSYS because it's unmanagable, but we didn't know that when we first added support. It appears the folks who started this work in the 'file' tool made the same mistakes...<br> <p> (Oh, and grouping syscalls is kinda important too: ideally libseccomp would even do that on its own. Policies shouldn't need to spell all 4 syscalls for sending a datagram individually nor the 7 syscalls for changing ownership of a file. In systemd we defined our own grouping to make this managable, but this sounds like a concept to have in libseccomp itself. With such grouping you get the coarseness that pledge() provides.)<br> <p> Lennart<br> </div> Wed, 14 Aug 2019 21:44:46 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796255/ https://lwn.net/Articles/796255/ clugstj <div class="FormattedComment"> No, the problem is that one group of people want functionality (using LD_PRELOAD to do kewl things) and another group want to use "seccomp()" for security. These two "wants" don't play nice together.<br> </div> Wed, 14 Aug 2019 21:40:40 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796252/ https://lwn.net/Articles/796252/ mezcalero <div class="FormattedComment"> nscd is not what you appear to think it is: the nscd client in glibc has a very short time-out, in which case it falls back to traditional, non-nscd client side NSS. It is thus not suitable as a sandboxing solution, and only and exclusively as a cache for speeding things up following the theory that such a daemon makes no sense to block on for a longer time when its purpose is to make sure lookups only take a shorter time. The short time-out is also an effective method to make deadlocks due to local IPC less penalizing.<br> </div> Wed, 14 Aug 2019 21:28:56 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796249/ https://lwn.net/Articles/796249/ dezgeg <div class="FormattedComment"> The solution already exists in glibc: nscd<br> </div> Wed, 14 Aug 2019 20:52:29 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796241/ https://lwn.net/Articles/796241/ zblaxell <div class="FormattedComment"> Why not just unset LD_PRELOAD before calling file? Unless I missed<br> something, file doesn't need to participate in fakeroot stat mangling<br> (or NSS for that matter--file doesn't translate uids to names or<br> interact with hostnames or URLs). Detect LD_PRELOAD in file's<br> main function, unset it, and re-exec file so seccomp works properly.<br> <p> Allowing LD_PRELOAD to propagate from a low-privileged context to a<br> high-privileged one is obviously a bad idea--which is why it's disabled<br> for setuid programs. Why does allowing LD_PRELOAD to propagate from a<br> high-privileged context to a low-privileged one seem like a good idea?<br> Sounds like it's just asking for exactly the kind of problems listed<br> above. If you're going to sandbox a process, that should include<br> deleting most of its environment variables, so that you can predict what<br> environment it's going to run in. That conflicts with the fundamental<br> ideas behind fakeroot and NSS, but...well, maybe they weren't particularly<br> good ideas anyway.<br> <p> fakeroot is a fun hack and all, but maybe there are better ways to solve<br> the underlying problem? e.g. run packaging tasks on a FUSE filesystem<br> that acts like the user is doing everything as root, and patch the two<br> or three packages that still do euid == 0 checks during build. That<br> approach is going to work for things that aren't running on top of the<br> C library, too.<br> <p> NSS is a real annoyance: you try to map a UID to a name, and<br> suddenly your thread blocks for network IO, or crashes, or gets RCE<br> vulnerabilities, because someone configured NSS to do something dumb, and<br> unprivileged users don't get an easy way to turn it off. There should be<br> an easy way to turn NSS off per process--users can already LD_PRELOAD in<br> a sane implementation of getpwnam(), getpwuid(), and so on, so it does<br> not introduce any new vulnerabilities to have an environment variable<br> (ignored for setuid programs) that lets users override nsswitch.conf<br> more conveniently.<br> <p> </div> Wed, 14 Aug 2019 20:41:50 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796246/ https://lwn.net/Articles/796246/ juliank <div class="FormattedComment"> Hmm, I don't know, I think that's an upstream libc/systemd question. I could imagine a systemd-nssd that provides nss stuff over dbus.<br> </div> Wed, 14 Aug 2019 20:28:14 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796245/ https://lwn.net/Articles/796245/ juliank <div class="FormattedComment"> I don't think there are safe languages. Runtime bugs add a lot of unsafety; having another layer of protection is important.<br> </div> Wed, 14 Aug 2019 20:26:56 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796243/ https://lwn.net/Articles/796243/ epa <div class="FormattedComment"> You could write the parsing code in a safe language. Then, if there isn't a call to exec() literally appearing in the source code, there's no way the code can be tricked into calling exec() by overwriting the stack due to a missing bounds check, integer overflow or whatever. There are safe dialects of C which are probably compatible enough for the parsing code to work.<br> </div> Wed, 14 Aug 2019 20:09:32 +0000 Hardening the "file" utility for Debian https://lwn.net/Articles/796240/ https://lwn.net/Articles/796240/ Deleted user 129183 <div class="FormattedComment"> <font class="QuotedText">&gt; At this point, would't it be easier to rewrite "file" from scratch with security in mind instead of trying to use "seccomp()"?</font><br> <p> Well, I guess that’s exactly the reason that openBSD has their own, seemingly NIH, implementation of it…<br> </div> Wed, 14 Aug 2019 20:05:26 +0000