> Why not just use shared memory for anything performance critical, such as data uploads to the GPU?
> Hardware 3D already usually communicates to a remote GPU via a DMA-based FIFO and uploads, so having an additional mechanism (faster due to using shared memory instead of DMA) shouldn't be the end of the world.
The FIFO is for the command queue, not large chunks of data like VBO uploads. There is no 'additional shared memory' mechanism, because such a thing doesn't even make sense, nor is it even remotely safe even if it did exist. The kernel DRI/DRM interfaces exist for a reason.
> As for context switches, most modern CPUs are multicore, so you might not need any actual context switches at all (just some cacheline bouncing).
You don't appear to understand how multi-core CPUs or multi-tasking operating systems work. Of course there is going to be a context-switch involved. What you're suggesting implies that the other core will have a process sitting there busy-waiting on an atomic, eating up 100% of the processing time on that core, just in case the sandboxed process possibly maybe wants to do something. That would be a ridiculously bad idea.
Any privileged process -- on another core or not -- is going to be blocked in a syscall waiting for an IPC message of some form, and calling a remote method on that privileged process from the sandboxed one will require OS context switches. A minimum of four of them in total, in fact. It would actually be faster to _not_ have the privileged process on another core due to the additional overhead of sharing data between cores, and if such a scheme were used the processor affinity facilities should be used to coerce both processes to be on the same core.
> I'm not sure whether this additional IPC overhead would be actually higher than the performance degradation imposed by limiting the instruction set (for example, memory accesses seem to have extra overhead due to that).
Memory accesses do not have extra overhead in the NaCl implementation. The segmented memory model is a core part of the x86 instruction set and is always active, even if generally all segments are set to 'contain' all of system memory. Using it to isolate memory is effectively free. The only reason it's not used normally to isolate processes is because the CPU by itself doesn't stop a process from changing the segmentation configuration, so without a software arbiter to ban programs using those instructions before they even start it would not have been effective protection.
> Of course, you could also in principle trust the OS to be secure, and run arbitrary code in a security context with limited privileges, but with access to the GPU and other useful stuff
Most operating systems do not actually allow you to set up a sandbox like this, Linux included (unless you make something like SELinux mandatory for your browser to work, which won't fly well with anyone but Fedora/RHEL users). Sandboxing processes is a relatively recent addition to the security toolbox (despite how obviously powerful it is) and most OSes haven't caught up to the needs of these techniques, yet, making frameworks like NaCl mandatory for now.
Again, Google's engineers know what they're talking about, and you seem to have some holes in your knowledge of these topics. Please just go read their documentation. It's very easy to find and quite easy to understand.
> the history of local root holes on all OSes (not to mention the graphics drivers...) makes this probably an unwise choice.
That logic implies that all security is worthless and we should just stop trying to protect anything, because all OSes have local root holes and hence cannot be protected at all. A more useful way to look at things would be that holes are likely going to be found, and they will get fixed, and life will move on and people will still be more secure (no, not absolutely secure, but 'more' is still better than 'less') by having sandboxed processes than they were without.