Hmm, if the linked source code is indeed the sandbox they use in Chrome, then this is really weird code.
- They appear the create a file system namespace, but then use old chroot() instead of overmounting the root dir inside it. This is a weird combination. Either you chroot() or you use fs namespaces+bind mounts but mixing both half-way is weird.
- The try to dynamically allocate an "unused" UID. In Linux we really don't have any sane infrastructure for that and trying to do that independently of any low-level infrastructure that can ensure we don't end up in UID clashes is just risky business.
- They are not setting the capability bounding set but the other sets even though the bounding set is probably the most interesting one.
Tinnes: Introducing Chrome's next-generation Linux sandbox
Posted Sep 8, 2012 21:04 UTC (Sat) by hibiscus (subscriber, #86633)
[Link]
You have been looking at the setuid sandbox (the old sandbox), not at the new one (seccomp-bpf based).
It is weird code, no questions, but you have been misreading it in parts:
- No file system namespace is created. It's much worse that than ;)
Instead, a privileged process shares its filesystem view (CLONE_FS) with the future sandboxed process. The future sandboxed process can start and initialize normally, with filesystem access. When it's ready to be sandboxed, it sends a message over an IPC channel asking to be chrooted. The privileged process will then chroot to /proc/self/fdinfo and exit. The sandboxed process is now chrooted to a private inode that doesn't even appear on any file system.
Sharing the FS view between a privileged and untrusted process is scary.
- The uid thing only uses "safe uid" that need to be administratively defined for this purpose. To prevent uid collisions, it relies on setuid() failing with RLIMIT_NPROC to 1 (something that is true since Linux 2.6, as many security guys know).
Yep, it's still a bit too clunky / weird, and it's dropped in favor of PID namespaces.
That's why this code is included inside:
/* this is not implemented yet (PoC code below) */
#ifdef POC_CODE_DONT_RUN
(also FYI, Chrome has an independent implementation in its code base, because Chrome needs to do even more weird stuff, like closing specific magic file descriptors in the helper process so that they stay unique in the system).
- Capabilities bounding sets were not widely available when this was written. At first, many kernels didn't even support CLONE_NEWPID (2.6.24), which was a much bigger problem.
Also using the capabilities bounding set is ignoring the forest for the tree. The privileged process has uid 0 (so it can rewrite your kernel files on disk!). Capabilities limitation won't bring much security. Instead it was thought that capabilities, as well as RLIMIT_NOFILE to 0 could help in "confused deputy" situation brought by the scary practice of sharing the filesystem view with an untrusted process.
But yeah, in this modern age, it could be updated to use the new capabilities bounding set, and even more importantly PR_SET_NO_NEW_PRIVS.
Tinnes: Introducing Chrome's next-generation Linux sandbox
Posted Sep 10, 2012 6:16 UTC (Mon) by mezcalero (subscriber, #45103)
[Link]
Well, it's a good idea for modern code not to use chroot() anymore for these kind of things, but simply overmount /.
And the RLIMIT_NPROC thing is completely useless as it will only protect you from UID clashes with running processes. But there are many services that run only temporarily or are started stopped at any time. The RLIMIT_NPROC check is entirely mislead and wrong.
Tinnes: Introducing Chrome's next-generation Linux sandbox
Posted Sep 10, 2012 17:42 UTC (Mon) by hibiscus (subscriber, #86633)
[Link]
- Yeah, CLONE_NEWNS + mount should work too, but I think there were unwanted side effects. Perhaps problems with clean-up ? There shouldn't, it should be cleaned up when the last process dies. Maybe we didn't want to rely on namespaces, which were quite new for the main functionality.
- As I tried to explain before, the UID ranges need to be administratively defined, exclusively for the purpose of the sandbox, in the same way that uids are allocated to given users or daemon. I've always been adamant about that.
Also note that uid collisions (from sandboxed processes) are not a big issue, the ptrace check won't pass on uid-only match, and the processes are marked non dumpable. Signals would be the biggest issue with collisions.
But anyway, nowadays setuid() doesn't fail anymore, and uid ranges have stayed at the level of POC, deprecated in favor of PID namespaces.
Tinnes: Introducing Chrome's next-generation Linux sandbox
Posted Sep 10, 2012 21:40 UTC (Mon) by hibiscus (subscriber, #86633)
[Link]