you can't easily analyze a binary to make sure there are not invalid opcodes in it. the problem is that you can jump into the middle of data, or jump to an address that is not the start of the instruction and the chip will start executing from there.
Posted Aug 6, 2011 10:28 UTC (Sat) by elanthis (guest, #6227)
[Link]
That is why they only allow certain opcode patterns, and disallow any patterns that cannot be easily verified, and do require a special compiler to generate compatible machine code that will pass the verifier's requirements and implement the tricks needed to actually work in the sandboxed environment, along with applying significantly more knowledge of the various hardware architectures than you apparently think a team of Google's top engineers are capable of doing. Tricks they have written several in-depth papers on, have implemented fully in completely open source code, and have had working in real environments for quite a while now.
In particular on x86, they are using several different features of the architecture. One is the segmented memory model of x86, another is the ability to ban any code that calls the instructions to change segments, and yet another is a very tight control on where branches can be and where they can target. Non-writable code pages along with non-executable data pages ensure that the untrusted code cannot subvert the machine code verifier by modifying or creating machine code. Simple trampolines handle the code segment changes and stack pointer swaps necessary to call into and return from the trusted code.
If you want more information, just go read their documentation and papers. It's all very accessible and easy to grok.