Unprivileged chroot()
Typically, chroot() is used for tasks like "jailing" a network daemon process; should that process be compromised, its ability to access the filesystem will be limited to the directory tree below the new root. The resulting security boundary is not the strongest — there are a number of ways to break out of chroot() jails — but it can still present a barrier to attackers. chroot() can also be used to create a different view of the file system to, for example, run containers within.
This system call is not available to just anybody; the CAP_SYS_CHROOT capability is required to be able to call chroot(). This restriction is in place to thwart attackers who would otherwise try to confuse (and exploit) setuid programs by running them inside a specially crafted filesystem tree. As a simple example, consider the sort of mayhem that might be possible if setuid programs saw a version of /etc/passwd or /etc/sudoers that was created by an attacker.
The limitations of chroot() have long limited its applicability; in recent years it has fallen even further out of favor. Mount namespaces are a much more flexible mechanism for creating new views of the filesystem; they can also be harder to break out of. So relatively few developers see a reason to use chroot() for anything new.
Thus, some folks were a bit surprised when Salaün showed up with his chroot() patch. Once applied, unprivileged processes are able to call chroot(), but only if a few conditions apply:
- The process in question must have done a prctl() call with the PR_SET_NO_NEW_PRIVS option. That prevents the process from gaining any new privileges; running setuid and setgid programs will no longer gain the privileges of the owner of the executable file, for example. Since privileged programs no longer exist in that mode, their privileges cannot be exploited.
- The process cannot be sharing its filesystem context (struct fs_struct, which contains the root and current working directories) with any other processes; otherwise the chroot() call would affect both processes, and the other one may not be expecting its filesystem environment to change abruptly.
- The new root must be underneath the current root in the filesystem hierarchy. This prevents trickery that could otherwise facilitate escape from an existing jail or mount namespace.
If these conditions are met, it is argued, it is safe to allow a process to call chroot().
There is still the question of why one might want to do that. Among other things, a functioning chroot() environment normally needs to have a minimally populated /dev directory; creating device nodes remains a privileged operation. And, as noted above, Linux has had better options than chroot() for some time now. But Salaün says that there are use cases where a process might want to sandbox itself after the things it needs from the wider environment (libraries, for example) have been loaded, and device files can often be done without.
The initial reception for this patch has been a bit chilly at best. Eric Biederman worried about the security implications of unprivileged chroot() when mixed with other mechanisms:
Still allowing chroot after the sandbox has been built, a seccomp filter has been installed and no_new_privs has been enabled seems like it is asking for trouble and may weaken existing sandboxes.
Casey Schaufler argued
that chroot() is obsolete and also worried about interactions:
"We're still finding edge cases (e.g. ptrace) where no_new_privs is
imperfect
". He also pointed
out that access to chroot() is already finely controlled with
the CAP_SYS_CHROOT capability:
CAP_SYS_CHROOT is specific to chroot. It doesn't give you privilege beyond what you expect, unlike CAP_CHOWN or CAP_SYS_ADMIN. Making chroot unprivileged is silly when it's possibly the best example of how the capability mechanism is supposed to work.
Salaün has not answered all of these points, but seems undeterred; he posted a second version of the patch set after that discussion had occurred. Without a stronger answer, though, upstreaming this change is likely to be difficult. Security-oriented developers will need some convincing that chroot() merits any improvements at all; the bar for changes that raise worries about unexpected interactions with other security mechanisms will be higher.
The discussion is likely to come down to use cases in the end; is there
truly a need for unprivileged chroot() in 2021? If there are
users out there who could benefit from this feature, now would probably be
a good time for them to come forward and explain why they need it. In the
absence of that information, unprivileged chroot() seems likely to
be one of those ideas that didn't quite make it.
Index entries for this article | |
---|---|
Kernel | System calls/chroot() |
Security | chroot() |
Posted Mar 15, 2021 19:04 UTC (Mon)
by nickodell (subscriber, #125165)
[Link] (12 responses)
Outrun lets you execute a local command using the processing power of another Linux machine. In order to do this, it runs the process on the remote system, and redirects all filesystem calls back to the local system. It does this through two systems: FUSE and chroot. FUSE can be done in userspace with no extra permissions. chroot, however, requires root. For that reason, Outrun requires root privileges, even if the application you're running doesn't.
There doesn't seem to be a great way to solve this problem under the current permission scheme. Sure, there's a chroot capability. But how do you give that chroot capability to processes running in Outrun? Outrun spawns processes from a normal login shell. If you give all login shells chroot capability, then that opens a security hole, due to setuid programs which can't be allowed to run inside chroots.
One solution which Outrun discussed was to write a setuid helper, which could run the chroot syscall on behalf of Outrun. However, those carry their own security risks. (See also: calibre's setuid helper.)
For these reasons, I think this patchset would be useful.
Posted Mar 15, 2021 19:20 UTC (Mon)
by mcon147 (subscriber, #56569)
[Link] (11 responses)
Posted Mar 15, 2021 20:59 UTC (Mon)
by floppus (guest, #137245)
[Link] (1 responses)
Posted Mar 15, 2021 22:30 UTC (Mon)
by josh (subscriber, #17465)
[Link]
Posted Mar 16, 2021 7:52 UTC (Tue)
by smurf (subscriber, #17840)
[Link] (8 responses)
There's no way to do that without chroot.
Posted Mar 16, 2021 19:07 UTC (Tue)
by floppus (guest, #137245)
[Link] (7 responses)
logfile = fopen("foo.log", "a");
Posted Mar 17, 2021 10:09 UTC (Wed)
by smurf (subscriber, #17840)
[Link] (5 responses)
Posted Mar 17, 2021 17:39 UTC (Wed)
by floppus (guest, #137245)
[Link] (4 responses)
Posted Mar 18, 2021 1:03 UTC (Thu)
by pabs (subscriber, #43278)
[Link] (3 responses)
Posted Mar 18, 2021 12:54 UTC (Thu)
by domenpk (guest, #12382)
[Link] (2 responses)
Posted Mar 27, 2021 18:54 UTC (Sat)
by l0kod (subscriber, #111864)
[Link] (1 responses)
Posted Apr 6, 2021 19:28 UTC (Tue)
by immibis (subscriber, #105511)
[Link]
Posted Mar 17, 2021 22:39 UTC (Wed)
by NYKevin (subscriber, #129325)
[Link]
1. You have to set up uid_map and gid_map if you want to interact with the filesystem. Since you are using chroot(), you almost certainly do want to interact with the filesystem, so this is an obvious source of friction. Not impossible, just annoying.
Posted Mar 16, 2021 0:11 UTC (Tue)
by roc (subscriber, #30627)
[Link] (5 responses)
Our application runs in a container. It needs access to subtrees of the host filesystem. We mount each subtree /a/b/c under /host/a/b/c. Unfortunately this breaks because absolute symbolic links in the host filesystem (e.g. /a/b/c -> /foo/bar) don't exist in the container's mount namespace (it would need to be interpreted as /host/foo/bar). I ended up writing an implementation of `realpath` that manually resolves symbolic links and knows how to rebase absolute symbolic links to the /host directory. It's probably not nearly as efficient as doing it in the kernel though. I would have thought a lot of people ran into a need for this.
Obviously unprivileged chroot() would provide a solution. Though, maybe unprivileged chroot alone wouldn't be that great for us; we'd have to fork a helper process to do the chroot and pass fds back to the main process, which would be fairly complicated and maybe not faster than manual symlink resolution.
Posted Mar 16, 2021 6:30 UTC (Tue)
by dbnichol (subscriber, #39622)
[Link] (4 responses)
Posted Mar 16, 2021 10:05 UTC (Tue)
by roc (subscriber, #30627)
[Link] (3 responses)
Posted Mar 16, 2021 13:45 UTC (Tue)
by gscrivano (subscriber, #74830)
[Link] (1 responses)
Posted Mar 17, 2021 2:07 UTC (Wed)
by roc (subscriber, #30627)
[Link]
Unfortunately I can't use it yet because I can't guarantee we're running on 5.6 or above, but this is the right API for me.
Posted Mar 17, 2021 0:28 UTC (Wed)
by dbnichol (subscriber, #39622)
[Link]
Posted Mar 16, 2021 0:50 UTC (Tue)
by geofft (subscriber, #59789)
[Link] (4 responses)
Anyway, there's one advantage of direct unprivileged chroot over making an unprivileged user + mount namespace and calling chroot inside there: you retain a full UID map of the outside system. If you use "unshare -cm --keep-caps," you get to map a single UID, your own, and so things like "ls -l /bin" don't display the results you'd expect. Since unprivileged chroot wouldn't create a user namespace, things would still look normal.
Maybe this could be worked around by saying something like, if you're inside a user namespace, you have no capabilities, and you map to the same UID outside the namespace, and you call PR_SET_NO_NEW_PRIVS, you get the ability to write an identity map to uid_map and gid_map, even if they've already been written to. After all, in no new privs mode, you can't switch users or gain any capabilities, so it doesn't matter what UID mapping you see. But it seems extremely tricky to get the detail of that right and you'd probably introduce exploitable bugs the first few times you try.
(I don't believe CAP_SYS_CHROOT is a meaningful alternative here. How would you grant it? Would you give filesystem capabilities to the chroot command? It won't be enforcing the NO_NEW_PRIVS requirement, then, and will turn into an immediate local root escalation. It _can't_ enforce that requirement, in fact: if you run a setcap program under NO_NEW_PRIVS, those capabilities are ignored, specifically because you asked for no new privs! So it wouldn't work if run from a no-new-privs parent process. If you somehow could avoid that constraint and run a setcap chroot, you could call it twice, the second time with a modified /lib thanks to being able to modify your chroot, and you could use that to escape the first chroot and hold onto chrooting privileges. It is technically true that CAP_SYS_CHROOT is the best example of how the capability mechanism is supposed to work - it's a fantastic demonstration of how useless that mechanism is.)
Posted Mar 18, 2021 9:14 UTC (Thu)
by matthias (subscriber, #94967)
[Link] (3 responses)
Posted Mar 18, 2021 12:26 UTC (Thu)
by winstonx86 (subscriber, #138536)
[Link]
Posted Mar 18, 2021 16:54 UTC (Thu)
by floppus (guest, #137245)
[Link] (1 responses)
For that reason (I think), unprivileged processes can't create user namespaces when they're already chrooted, and the proposed unprivileged chroot would likewise be forbidden.
Posted Mar 18, 2021 17:05 UTC (Thu)
by matthias (subscriber, #94967)
[Link]
And of course, if someone chroots a process without NO_NEW_PRIVS in a classic way, there should be no enchanted chroot command that gets capabilities from the filesystem laying around inside the new root.
Posted Mar 16, 2021 3:47 UTC (Tue)
by rsidd (subscriber, #2582)
[Link] (1 responses)
Posted Mar 16, 2021 4:32 UTC (Tue)
by pabs (subscriber, #43278)
[Link]
Posted Mar 16, 2021 8:21 UTC (Tue)
by l0kod (subscriber, #111864)
[Link] (3 responses)
They are answered, especially with the third version: https://lore.kernel.org/lkml/20210311105242.874506-2-mic@...
Posted Mar 17, 2021 12:49 UTC (Wed)
by walters (subscriber, #7396)
[Link] (2 responses)
And yes, there's now a syscall instead of `/dev/urandom` but still.
Posted Mar 17, 2021 13:15 UTC (Wed)
by l0kod (subscriber, #111864)
[Link]
Posted Mar 26, 2021 23:13 UTC (Fri)
by jrincayc (guest, #29129)
[Link]
Posted Mar 17, 2021 17:58 UTC (Wed)
by metalheart (guest, #89328)
[Link] (2 responses)
Posted Mar 18, 2021 1:48 UTC (Thu)
by NYKevin (subscriber, #129325)
[Link]
* If it explicitly checks geteuid() == 0, then it will continue to fail for non-root. This is probably a bad design decision, but not impossible if the application writer was trying to be "helpful" and provide a more explicit error message. On non-Linux systems, it would not be wrong to insert such a check, and some of these tools are written for "any random Unix-like" rather than Linux specifically.
TL;DR: You probably still need to be root to profitably use chroot(8), even with this patch.
Posted Mar 18, 2021 11:01 UTC (Thu)
by l0kod (subscriber, #111864)
[Link]
Posted Mar 18, 2021 16:45 UTC (Thu)
by jcpunk (subscriber, #95796)
[Link]
Posted Mar 18, 2021 22:11 UTC (Thu)
by kentonv (subscriber, #92073)
[Link] (5 responses)
In the latter case, I think allowing unprivileged chroot() ironically makes it possible to escape a preexisting chroot jail by the following means:
1. chdir("/foo")
Step 3 opens the parent of the previous root! Because ".." is no longer recognized as being the current root, the kernel doesn't prevent traversing up past it.
Verifying that the current directory is under the new root is not enough... Instead of chdir() in step 1 you could also open a file descriptor to "/foo" and then openat() in step 3.
Verifying that all open file descriptors are under the new root still isn't enough, because file descirptors could be transmitted via SCM_RIGHTS over a unix socket from an accomplice program that isn't inside the new chroot.
I think it only works if chroots stack, but my understanding is that they don't.
Posted Mar 19, 2021 0:12 UTC (Fri)
by flussence (guest, #85566)
[Link] (4 responses)
Posted Mar 19, 2021 0:22 UTC (Fri)
by kentonv (subscriber, #92073)
[Link] (3 responses)
Posted Mar 19, 2021 18:32 UTC (Fri)
by l0kod (subscriber, #111864)
[Link] (2 responses)
Posted Mar 19, 2021 21:56 UTC (Fri)
by kentonv (subscriber, #92073)
[Link] (1 responses)
That seems like a disappointing limitation though... any program that uses this feature will mysteriously break when run in a chroot.
Posted Mar 21, 2021 10:50 UTC (Sun)
by smurf (subscriber, #17840)
[Link]
Much better to use systemd-nspawn or some other tool that sets up a "real" file system namespace. The unprivileged chroot(2) will work there.
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
It seems like you can do
$ unshare -mr chroot os-tree-dir bash
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
sqlite3_open("foo.db", &db);
sprintf(rootdir, "/run/user/%d/my-jail", getuid());
chdir(rootdir);
unshare(CLONE_NEWUSER);
chroot(".");
caps = cap_get_proc();
cap_clear(caps);
cap_set_proc(caps);
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
Unprivileged chroot() and Outrun
2. Assuming you don't have CAP_SETUID/GID (in the parent user namespace), which is a safe assumption because otherwise you wouldn't be asking for "unprivileged chroot" in the first place, then the man page appears to say that you can only map your own UID/GID. That certainly makes logical sense (the whole point of this operation is to give you a "containerized" or unprivileged CAP_SETUID, so we need to constrain it somehow), but it also means that stat(2) will lie to you about the ownership of any file you don't own (the UID/GID is unmapped, so it gets converted to a generic "don't know" value in the child namespace).
3. SCM_CREDENTIALS will also produce unmapped garbage, as will plenty of other UID/GID-related interfaces. If you want to IPC with any process owned by a different user (e.g. a daemon running under a role account), you basically can't confirm its identity, although it can confirm yours (which may be sufficient in some cases).
4. Pervasively fixing all of the above, testing it, and maintaining everything, is likely harder than just granting CAP_SYS_CHROOT in the first place.
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
The last time I used chroot was to run a linux subsystem in an android tab. I had a full xfce-based desktop on android, and did actual work on it. It required a rooted tab and, even so, later android releases made it hard. I haven't tried it recently but it seems these days they use proot for this purpose, and root is not needed.
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged mknod() or use FUSE?
Unprivileged chroot()
Unprivileged chroot()
* Unless it calls prctl() with PR_SET_NO_NEW_PRIVS, it will continue to fail for non-root. I see nothing about this in the man page for the GNU version, but it's possible a vendor might ship a version of chroot which does this. If this patch does get implemented, future versions of the GNU tool might grow a command-line argument to enable this functionality (or they might not; I can't read the GNU people's collective mind).
* Because chroot(8) runs a separate executable after doing the chroot, shared libraries etc. need to be accessible from within the chroot environment. It is complicated (but not categorically impossible) for a non-privileged user to set this up.
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
2. chroot("/bar")
3. open("../..")
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()
Unprivileged chroot()