User: Password:
Subscribe / Log in / New account

Google's Chromium sandbox

Google's Chromium sandbox

Posted Aug 20, 2009 0:58 UTC (Thu) by agl (guest, #4541)
In reply to: Google's Chromium sandbox by cventers
Parent article: Google's Chromium sandbox

That seems like a perfectly reasonable way to allocate memory for another
process. However, we would still need non-seccomp processes to receive the
file descriptor from the socket (recvmsg) and to do the mmap. The first
process need only share the descriptor table with the untrusted process, but
the second needs to share an address space for mmap to be effective. We
merge these two processes into one and, since it shares an address space, we
call it the 'trusted thread'.

(Log in to post comments)

Google's Chromium sandbox

Posted Aug 20, 2009 8:59 UTC (Thu) by mingo (subscriber, #31122) [Link]

Btw., (and i raised this on lkml too in the past - at that time the code i referred to was not upstream yet), there's a way you could further increase the restrictions (and hence, the security) of the untrusted seccomp thread: by the use of the C expressions filter engine that is included in the upstream kernel. (right now used by ftrace and will also be used by perfcounters)

The engine accepts an ASCII C-ish expression runtime, such as:

 "fd <= 2 && addr == 0x1234000 && len == 4096" 

... and turns/parses that into a cached list of safe predicaments that the kernel will execute atomically on syscall arguments. Once parsed (by the kernel), the execution of the filter expression is very fast.

Despite it being used for tracing currently, the filter engine is generic and can be reused not just to limit trace entries of syscalls, but also to restrict execution on syscalls.

This is real, working code very close to what you need. With latest -tip you can use the filter engine on a per syscall basis, and the kernel knows about the parameter names of system calls. So on a testbox i can do this:

  # cd /debug/tracing/events/syscalls/sys_enter_read

  # echo "fd <= 2 && buf == 0x120000 && count == 1024" > filter

  # cat filter 
  fd <= 2 && buf == 0x120000 && count == 1024

... and from that point on the kernel can execute that filter expression to limit trace entries that match the expression.

All you need is a small extension to seccomp to allow the installation of such expressions from user-space, by passing in the ASCII string. The filter engine can be used by unprivileged user-space as well. (but obviously the untrusted sandboxed thread should not be allowed to modify it.)

The filter engine has no deep dependence on tracing (other than being used by it currently) - it is a safe parser and atomic script execution engine that can be utilized by unprivileged tasks too and so it could be reused in seccomp and could be reused by other Linux security frameworks as well, such as selinux or netfilter.

Google's Chromium sandbox

Posted Aug 20, 2009 14:41 UTC (Thu) by paragw (guest, #45306) [Link]

Does this approach work on a per process basis? I.e. do the restrictions
apply to a particular process/thread while others are not impacted?

How would one deal with which process can specify which other process or
thread can do what syscalls with what arguments and is the change permanent
and localized w.r.t the target thread? How does one go about safely modifying
the restrictions dynamically - the restricted thread needs to open a FD with
user permission that wasn't in the originally specified restrictions list?

From what you described there seem to be some significant usability problems
(need to have tracing enabled, debug file system mounted, user-space access
to the filtering mechanism and per PID operation etc.) that need to be
addressed before it can become generally usable?

Google's Chromium sandbox

Posted Aug 20, 2009 19:33 UTC (Thu) by mingo (subscriber, #31122) [Link]

Does this approach work on a per process basis? I.e. do the restrictions apply to a particular process/thread while others are not impacted?

It's an engine - and as such it takes ASCII strings, turns them into a 'filter object' in essence which you can then attach to anything and pass in values to evaluate.

Note that there's nothing 'tracing' about that concept.

Right now we attach such filters to tracepoints - such as syscall tracepoints.

It could be attached via seccomp and to an untrusted process as well, with minimal amount of code, if there's interest to share this facility for such purposes.

Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds