User: Password:
|
|
Subscribe / Log in / New account

Google's Chromium sandbox

Google's Chromium sandbox

Posted Aug 19, 2009 15:37 UTC (Wed) by johill (subscriber, #25196)
Parent article: Google's Chromium sandbox

You once said 'process' rather than 'thread', I think that was an error.

Also -- I first wondered why they weren't using processes to start with to get the secure/insecure boundary more defined, but once you think about it more it doesn't seem like you could then do the disasm stuff ... might be worth mentioning that :)

Either way, interesting method, and nice article!


(Log in to post comments)

Google's Chromium sandbox

Posted Aug 19, 2009 16:23 UTC (Wed) by jake (editor, #205) [Link]

I should have been more clear about why a thread is needed. Certain operations, memory allocation for example, cannot be done in one process on behalf of another because they don't share address space.

I don't think, but don't know for sure, that it is required to have a thread to do the disassembling. I believe that is done by the untrusted thread before it handles any user input, and before it enters seccomp mode.

jake

Google's Chromium sandbox

Posted Aug 20, 2009 0:43 UTC (Thu) by cventers (guest, #31465) [Link]

I should have been more clear about why a thread is needed. Certain operations, memory allocation for example, cannot be done in one process on behalf of another because they don't share address space.

On the contrary, I experimented with a technique to do just that. This may not be the perfect solution for Chrome's needs, but I played around with the idea of open()ing a shared memory segment on the vfs, using ftruncate() to resize it, and then sending the fd via a UNIX-domain socket to the untrusted process and allowing it to mmap() the pages.

Now, in my case, I was using this technique to allow dynamically-grown, runtime-allocated shared memory segments between untrusted processes. There are still complications (such as the need to install a SIGBUS handler since the untrusted process might ftruncate() the mmaped fd() to 0, causing the trusted process to fault when it tries to access its own mmap()), and perhaps the requirements for this kind of an implementation are not easy to satisfy for desktop applications. But it's Linux, and there's more than one way to do it. My implementation had the advantage of being architecture-agnostic, as well-behaved user-space code should be.

Google's Chromium sandbox

Posted Aug 20, 2009 0:58 UTC (Thu) by agl (guest, #4541) [Link]

That seems like a perfectly reasonable way to allocate memory for another
process. However, we would still need non-seccomp processes to receive the
file descriptor from the socket (recvmsg) and to do the mmap. The first
process need only share the descriptor table with the untrusted process, but
the second needs to share an address space for mmap to be effective. We
merge these two processes into one and, since it shares an address space, we
call it the 'trusted thread'.

Google's Chromium sandbox

Posted Aug 20, 2009 8:59 UTC (Thu) by mingo (subscriber, #31122) [Link]

Btw., (and i raised this on lkml too in the past - at that time the code i referred to was not upstream yet), there's a way you could further increase the restrictions (and hence, the security) of the untrusted seccomp thread: by the use of the C expressions filter engine that is included in the upstream kernel. (right now used by ftrace and will also be used by perfcounters)

The engine accepts an ASCII C-ish expression runtime, such as:

 "fd <= 2 && addr == 0x1234000 && len == 4096" 

... and turns/parses that into a cached list of safe predicaments that the kernel will execute atomically on syscall arguments. Once parsed (by the kernel), the execution of the filter expression is very fast.

Despite it being used for tracing currently, the filter engine is generic and can be reused not just to limit trace entries of syscalls, but also to restrict execution on syscalls.

This is real, working code very close to what you need. With latest -tip you can use the filter engine on a per syscall basis, and the kernel knows about the parameter names of system calls. So on a testbox i can do this:

  # cd /debug/tracing/events/syscalls/sys_enter_read

  # echo "fd <= 2 && buf == 0x120000 && count == 1024" > filter

  # cat filter 
  fd <= 2 && buf == 0x120000 && count == 1024

... and from that point on the kernel can execute that filter expression to limit trace entries that match the expression.

All you need is a small extension to seccomp to allow the installation of such expressions from user-space, by passing in the ASCII string. The filter engine can be used by unprivileged user-space as well. (but obviously the untrusted sandboxed thread should not be allowed to modify it.)

The filter engine has no deep dependence on tracing (other than being used by it currently) - it is a safe parser and atomic script execution engine that can be utilized by unprivileged tasks too and so it could be reused in seccomp and could be reused by other Linux security frameworks as well, such as selinux or netfilter.

Google's Chromium sandbox

Posted Aug 20, 2009 14:41 UTC (Thu) by paragw (guest, #45306) [Link]

Does this approach work on a per process basis? I.e. do the restrictions
apply to a particular process/thread while others are not impacted?

How would one deal with which process can specify which other process or
thread can do what syscalls with what arguments and is the change permanent
and localized w.r.t the target thread? How does one go about safely modifying
the restrictions dynamically - the restricted thread needs to open a FD with
user permission that wasn't in the originally specified restrictions list?

From what you described there seem to be some significant usability problems
(need to have tracing enabled, debug file system mounted, user-space access
to the filtering mechanism and per PID operation etc.) that need to be
addressed before it can become generally usable?

Google's Chromium sandbox

Posted Aug 20, 2009 19:33 UTC (Thu) by mingo (subscriber, #31122) [Link]

Does this approach work on a per process basis? I.e. do the restrictions apply to a particular process/thread while others are not impacted?

It's an engine - and as such it takes ASCII strings, turns them into a 'filter object' in essence which you can then attach to anything and pass in values to evaluate.

Note that there's nothing 'tracing' about that concept.

Right now we attach such filters to tracepoints - such as syscall tracepoints.

It could be attached via seccomp and to an untrusted process as well, with minimal amount of code, if there's interest to share this facility for such purposes.


Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds