> It seems to me that this kind of sandboxing is required by many (all?)
> programs dealing with potentially hostile data....
> If the kernel would provide a flexible mechanism for an application to
> limit what it can do, the threat of hostile data could be reduced.
I thought that this was what selinux was all about.
The basic idea behind selinux is that rather than using identity-based security, you use capability-based security
Identity-based security works like this: I am a process started by bob, therefore I can do everything bob can do. Capability-based security works like this: bob starts a process and gives it only the capabilities it needs to do the work it's supposed to do.
So bob runs a spell-checker program (aspell or whatever), it shouldn't have the capability to open network sockets and send messages to evilhackers.com. It's the difference between giving the application a few keys, to open the doors it needs, and giving it the whole keyring, which is what we do with traditional uid / gid based security.
It seems like what the google people are trying to do here is to reinvent the selinux concept with seccomp. I'm curious as to why. I guess selinux is difficult to set up and configure, and a lot of distributions have been slow to adopt it. Perhaps they are also trying to be cross-platform?
I'm also curious why Google is using threads rather than processes here. If you don't want to share your memory with the untrusted guy, processes are the obvious solution. As other have noted, you can always use posix shared memory if you feel the need to directly access the memory of the untrusted guy. As a bonus, you could run the untrusted processes as "nobody," and prevent them from doing a lot of nasty things -- even on a system like openBSD, where seccomp and selinux are unheard-of.
I seem to remember that the openBSD ssh daemon was written in a similar way. There was an trusted part which ran as root, and an untrusted part which ran as a regular user.