LWN.net Logo

Google's Native Client forges ahead

July 27, 2011

This article was contributed by Nathan Willis

Google released an update to its Native Client (NaCl) framework in June, which is an open source utility to enable web developers to deploy faster applications by allowing them to run native binary code in a sandboxed environment within the browser. The new release incorporates API changes and updates to the SDK and toolchain, but the technology remains disabled by default in the Chrome browser. NaCl has been listed as "experimental" since its inception, but the company is beginning to shift its message, trying to attract developers to the platform and other browser makers to the framework.

NaCl is essentially a plugin in which "untrusted" native code can be executed in a secure, sandboxed environment within the browser. Native code in this context means machine language — compiled binaries, delivered as self-contained modules. They do not have access to OS subsystems or toolkits, but only a minimal support library provided by NaCl. Most other browsers plugins (Java, Flash, etc.) are already native code, of course, and like them NaCl modules can only interact with the containing page's contents through JavaScript and a restrictive API. Of course, the mere mention of Java and Flash raises warning flags about security and performance, to which Google is doing its best to respond.

The project has been in development since 2008, and originally ran only on 32-bit x86 architectures, although ARM and 64-bit x86 implementations are now under development as well. Google describes the goal of NaCl as enabling developers to leverage existing software components and legacy applications, and to develop more compute-intensive web applications that would run too slowly in JavaScript or HTML5 — all without compromising security.

Shaking it out

The NaCl plugin isolates code in the sandbox by using the memory segmentation available in processes, thus providing a contiguous, private address space for each component — currently 256MB in size. It also attempts to detect insecure code (and refuses to run it), by restricting each component to a set of "safe" instructions, and enforcing structural rules to prevent code obfuscation techniques — such as jumping to a location in the middle of an instruction. Loaded modules are also read-only in memory, to prevent self-modifying code.

In addition to the "inner sandbox" dedicated to isolating native code modules, NaCl also implements an "outer sandbox" that intercepts any system calls. Furthermore, code modules are isolated from each other. They can only communicate by calling NaCl's inter-module communication (IMC) mechanism. IMC is a bi-directional datagram service designed to resemble Unix domain sockets. IMC is also used to facilitate communication between modules and the document object model (DOM) object that created them (e.g. a web page or JavaScript application). The DOM object, of course, can pass messages between native modules or provide them access to shared storage.

NaCl also provides two higher-level mechanisms built on top of IMC: the Simple Remote Procedure Call (SRPC) facility, and an implementation of the traditional Netscape Plugin API (NPAPI). SRPC can be used to access native module routines from other modules or directly from JavaScript, while the NPAPI implementation provides access to the same browser facilities and information open to other browser plugins.

Each NaCl module also runs as its own OS process (although at the moment, the NaCl plugin itself is run in the browser's process). NaCl cannot provide secure, cross-platform exception handling for modules to recover from hardware exceptions. As a result, a module that triggers a hardware exception will be shut down by the OS, but, by running each module in its own process, other modules should be unaffected.

Developing NaCl modules

For application developers, the project is also introducing a native code API named Pepper, which is currently provided in C and C++ form. Pepper evolved out of Google's earlier efforts to expand on NPAPI, and is thus sometimes referred to in NaCl documentation as the Pepper Plugin API (PPAPI).

Pepper includes interfaces for NaCl's messaging systems and the existing NPAPI functionality, but also provides interfaces for image handling, 2D drawing, and audio, plus memory management, timing, threads, strongly typed variables, and managing module instances.

June's 0.4 release of the NaCl SDK includes minor changes to the C interfaces, and introduces a new method for including an NaCl module in an HTML page: by linking to it with the src= attribute inside of an <embed> tag. However, there are more substantial changes in the build system. It has migrated to the Python-based SCons build tool in place of GNU make, Cygwin has been removed from the Windows toolchain, and experimental support for Valgrind on 64-bit Linux has been added.

The toolchain itself is built on top of a customized version of GCC and GNU binutils that implement the constraints of the NaCl sandbox. Thus re-compilation is necessary, even for the "existing software components" and "legacy applications" use cases. The NaCl plugin provides a C library customized from NewLib.

As discussed earlier, the current SDK can build binary modules for x86-32, x86-64, and ARM, and there are mechanisms for web developers to provide all three varieties of their module within an application. Google is intent on expanding the processor support offerings, however, by adapting the build tools to produce a "portable" binary instead of the processor-specific code. Portable NaCl (PNaCl) compiles source to an intermediate LLVM bytecode format, which is then translated at runtime into the relevant machine code.

Google maintains a gallery of NaCl examples, including a Monte Carlo pi calculator, audio synthesizer, and Conway's game of Life. The NaCl white papers also describe internal efforts to port Quake, Bullet, and an H.264 decoder to NaCl, and claim the performance to be "indistinguishable" from normal executables, although that code has evidently not been released to the public.

The view outside the Googleplex

From a security standpoint, most of the ideas implemented by NaCl are not new. Rather than using code signing to provide a measure of security as ActiveX does for its binary modules, NaCl uses a static verifier to check all modules before they are allowed to run, and terminates any that pass that check and still manage to make an unsafe system call. The fault-isolation methods used by the code verifier are also well-known. On the development side, the modified GCC and binutils act as a "trusted" compiler, in theory ensuring that no unsafe code gets executed in the first place. Code that doesn't conform to the structural and alignment requirements that the toolchain emits will be rejected.

Reaction from other browser vendors has been decidedly negative, however. Although NaCl is marketed as an open source project open to any browser developer, both Mozilla and Opera have said they have no interest in the technology, and view it as conflicting with the goal of promoting open standards like HTML5 as the unified, cross-platform target platform for web application developers.

In addition, both browser vendors have focused attention on refuting Google's claim that NaCl enables substantially faster applications in the first place, citing the increased performance of modern JavaScript engines. Last year, Mozilla's Chris Blizzard demonstrated a JavaScript version of Google's own NaCl photo-editing demo running at comparable speeds — although video of the session does not appear to be online, so it is unclear on which version of Firefox the demo ran.

The specific version could make a difference; Mozilla introduced TraceMonkey (a JavaScript optimizer that compiles certain JavaScript loops down to native code) with the release of Firefox 3.5 in 2008. Firefox 4.0 then introduced a second optimizer named JaegerMonkey, further improving performance. JaegerMonkey is a "just in time" (JIT) compiler that also compiles JavaScript to machine code, and is similar to the optimizer employed by Chrome. Mozilla claims that Firefox achieves better JavaScript performance through the fail-over combination of TraceMonkey and JaegerMonkey than JIT-only solutions. Its successor IonMonkey is projected to perform better still.

Of course, NaCl lines up with Google's interest in promoting the ChromeOS platform. If NaCl can squeeze additional performance out of netbook CPUs with code delivered in the browser, the need for locally-installed applications is reduced. But that concern may not line up with increasing the performance of standards-based web applications that run in every browser. The NaCl project itself is not on a standardization path, although the FAQ hints at interest in pursuing it

If Google remains unsuccessful at persuading the other browsers to include support for NaCl, it might attempt to build NaCl plugins for the other browsers (which it did in years past, but it's been deprecated due to the limitations of having only the NPAPI interface). But it may have a harder time convincing a significant number of developers to re-engineer NaCl-based applications. As tantalizing as "native speed" sounds from afar, the double sandbox security restrictions, limited execution environment, and current need to develop for three separate processor architectures does not sound as exciting up close. As for PNaCl's promise to eliminate the architecture problem by targeting an intermediate byte-code representation instead — that platform starts to sound more and more like client-side Java. Perhaps it does hold the key for a performance increase, but it is not going to be an easy sales pitch.


(Log in to post comments)

The part I don't like...

Posted Jul 28, 2011 4:33 UTC (Thu) by tstover (subscriber, #56283) [Link]

is that it's in a browser. Sounds like it would actually make some sense otherwise. We have yet to see a portable, sandbox-able, target come out of llvm.

You can pull it out of browser if you want...

Posted Jul 28, 2011 9:08 UTC (Thu) by khim (subscriber, #9252) [Link]

Actually it's quite possible to use NaCl independently from browser. You need to implement SRPC to talk to sel_ldr, but SRPC itself is completely independent from browser. The tricky part is to have code usable both in browser and outside of browser.

Google's Native Client forges ahead

Posted Jul 28, 2011 7:55 UTC (Thu) by ikm (subscriber, #493) [Link]

> As for PNaCl's promise to eliminate the architecture problem by targeting an intermediate byte-code representation instead — that platform starts to sound more and more like client-side Java

This comparison is incorrect. LLVM uses bytecode too, but that doesn't make it resemble Java.

Google's Native Client forges ahead

Posted Jul 28, 2011 8:46 UTC (Thu) by kragilkragil2 (guest, #76172) [Link]

AFAIK they use what they call it bitcode and it translate much more directly to native machine language. I can't be bothered to really check, but I guess it looks more like an machine indepentent assembly with more primitive idioms.
More like for 1+2 = move.l 1,d0; move.l 2,d1; add.l d0,d1; (my 68k assembly days are gone 20 years, not sure that is still correct code or if they changed it ;-)

But I guess someelse around here knows better.

Google's Native Client forges ahead

Posted Jul 28, 2011 18:18 UTC (Thu) by n8willis (editor, #43041) [Link]

It doesn't say that they are the same, just that the need to compile to an intermediate representation which is then translated again is going to make the sales pitch more difficult thanks to Java's storied history.

Nate

Small corrections...

Posted Jul 28, 2011 9:05 UTC (Thu) by khim (subscriber, #9252) [Link]

Disclaimer: I'm NaCl developer and we KNOW our documentation suck. We are working on it.

The NaCl plugin isolates code in the sandbox by using the memory segmentation available in processes, thus providing a contiguous, private address space for each component — currently 256MB in size.

This is old information (year or so old). Today we provide 1GB on x86 and ARM and 4GB on x86-64. 256MB are reserved for code and 768MB are available for data.

Loaded modules are also read-only in memory, to prevent self-modifying code.

We support some very limited modifications using specialized "syscalls". Enough to support V8 and Mono.

NaCl also provides two higher-level mechanisms built on top of IMC: the Simple Remote Procedure Call (SRPC) facility, and an implementation of the traditional Netscape Plugin API (NPAPI).

Direct SRPC access is deprecated and NPAPI was completely replaced with PPAPI. This is the change I personally don't like all that much, but it was price to pay to be accepted by Chrome :-(

Small corrections...

Posted Jul 28, 2011 17:27 UTC (Thu) by loevborg (guest, #51779) [Link]

As you're involved in the project, do you have an estimate when we will see, say, quake working in a stable chrome version?

Good question...

Posted Jul 29, 2011 12:46 UTC (Fri) by khim (subscriber, #9252) [Link]

As you're involved in the project, do you have an estimate when we will see, say, quake working in a stable chrome version?

Who knows? There are two problems:
1. NaCl will not be enabled for general web sites till PNaCl will be stable (which may take a long time yet).
2. Quake source is free, but artwork is not so you can not put in in Chrome Store.

So the answer is, sadly: not for a long time yet. With M14 (should reach developer channel in about six weeks and stable channel in about three months) you should be able to put Quake with some kind of original artwork in Chrome Store if you would like it.

Google's Native Client forges ahead

Posted Jul 29, 2011 11:04 UTC (Fri) by appie (subscriber, #34002) [Link]

Any information on what effect these jit etc. optimizations and native client stuff has on battery life ? With mobile devices becoming more and more important, speed is nice but longer battery life is king

I'm not so sure...

Posted Jul 29, 2011 12:52 UTC (Fri) by khim (subscriber, #9252) [Link]

This is not really true, if you'll think about it: the most frugal OS of them all (Symbian) is dying.

But in general... I think NaCl is comparable to frugal JIT (like Dalvik's one) and way more power-effective then other Web technologies. The reason is simple: JavaScript (and ActionScript) are highly dynamic language and it's hard to avoid doing lots of additional work. But all that potential can be easily wasted if the app is written by sub-par programmer.

Google's Native Client forges ahead

Posted Jul 31, 2011 1:08 UTC (Sun) by mgedmin (subscriber, #34497) [Link]

Palm V could survive four weeks on one battery charge. A modern PDA/smartphone is lucky if it can survive eight hours. The market has spoken, and it values features more than battery life.

Google's Native Client forges ahead

Posted Aug 5, 2011 9:28 UTC (Fri) by slashdot (guest, #22014) [Link]

What NaCl seems to do is to restrict the instruction set, so that it can be verifiable, at the expense of requiring a special compiler.

However, honestly, that seems an absolutely unnecessary complication, since it's possible to just run any arbitrary binary code is a separate process, isolated by OS functionality that only allows a very limited set of system calls (e.g. seccomp in Linux).

Windows might not provide a similar system call limitation feature, but it should be very easy to implement in a Windows kernel driver.

That would increase performance, allow to avoid having to use special compilers, as well as allowing trivial portability to any CPU with an MMU and built-in protection features.

In other words, IMHO this technology is horribly complicated without any real reason for that (since hardware offers the same security, much better), although it's somewhat academically interesting to see how instruction sets can be restricted to be made verifiable.

Google's Native Client forges ahead

Posted Aug 5, 2011 19:56 UTC (Fri) by elanthis (guest, #6227) [Link]

> However, honestly, that seems an absolutely unnecessary complication, since it's possible to just run any arbitrary binary code is a separate process, isolated by OS functionality that only allows a very limited set of system calls (e.g. seccomp in Linux).

Which is exactly what Chrome already does for its individual tab processes, so the Google folks are quite aware of this facility. They're going for more than just sandboxing of an individual process with NaCl, though.

> Windows might not provide a similar system call limitation feature, but it should be very easy to implement in a Windows kernel driver.

Actually, from all reports I've heard, Windows' built-in mechanisms for sandboxing are actually better than Linux's. (That is, at least when comparing Vista/7 to the least common denominator of what the popular Linux distributions provide; SELinux and such likely blow Windows out of the water in this area.)

> That would increase performance, allow to avoid having to use special compilers, as well as allowing trivial portability to any CPU with an MMU and built-in protection features.

I'm not sure at all that it would help performance compared to what Google is trying to do. One of the goals of what NaCl does with its opcode sanity checking is to allow trusted and untrusted code to exist inside the same process. That allows for trusted library code to be called by the untrusted application code without context switches, without IPC, without anything other than a simple function call.

For example, for something like a game, you probably don't want your D3D/GL calls to be going over an IPC mechanism. Especially when dealing with streaming buffer uploads and the like as that would just make NaCl useless for high-end games (or other graphically rich real-time interactive applications). So you want the actual NaCl process to directly communicate with the GPU driver, but you don't want the untrusted code to be able to do that itself. You also don't want the untrusted code to be able to access/subvert the memory owned by trusted parts of the same process, as that would subvert the sandbox. These goals require the opcode verification (and the accompanying x86 machine code restrictions) that NaCl enforces.

Google's Native Client forges ahead

Posted Aug 6, 2011 14:05 UTC (Sat) by slashdot (guest, #22014) [Link]

Why not just use shared memory for anything performance critical, such as data uploads to the GPU?

As for context switches, most modern CPUs are multicore, so you might not need any actual context switches at all (just some cacheline bouncing).

Hardware 3D already usually communicates to a remote GPU via a DMA-based FIFO and uploads, so having an additional mechanism (faster due to using shared memory instead of DMA) shouldn't be the end of the world.

I'm not sure whether this additional IPC overhead would be actually higher than the performance degradation imposed by limiting the instruction set (for example, memory accesses seem to have extra overhead due to that).

Of course, you could also in principle trust the OS to be secure, and run arbitrary code in a security context with limited privileges, but with access to the GPU and other useful stuff; unfortunately, the history of local root holes on all OSes (not to mention the graphics drivers...) makes this probably an unwise choice.

Google's Native Client forges ahead

Posted Aug 6, 2011 21:38 UTC (Sat) by elanthis (guest, #6227) [Link]

> Why not just use shared memory for anything performance critical, such as data uploads to the GPU?
> Hardware 3D already usually communicates to a remote GPU via a DMA-based FIFO and uploads, so having an additional mechanism (faster due to using shared memory instead of DMA) shouldn't be the end of the world.

The FIFO is for the command queue, not large chunks of data like VBO uploads. There is no 'additional shared memory' mechanism, because such a thing doesn't even make sense, nor is it even remotely safe even if it did exist. The kernel DRI/DRM interfaces exist for a reason.

> As for context switches, most modern CPUs are multicore, so you might not need any actual context switches at all (just some cacheline bouncing).

You don't appear to understand how multi-core CPUs or multi-tasking operating systems work. Of course there is going to be a context-switch involved. What you're suggesting implies that the other core will have a process sitting there busy-waiting on an atomic, eating up 100% of the processing time on that core, just in case the sandboxed process possibly maybe wants to do something. That would be a ridiculously bad idea.

Any privileged process -- on another core or not -- is going to be blocked in a syscall waiting for an IPC message of some form, and calling a remote method on that privileged process from the sandboxed one will require OS context switches. A minimum of four of them in total, in fact. It would actually be faster to _not_ have the privileged process on another core due to the additional overhead of sharing data between cores, and if such a scheme were used the processor affinity facilities should be used to coerce both processes to be on the same core.

> I'm not sure whether this additional IPC overhead would be actually higher than the performance degradation imposed by limiting the instruction set (for example, memory accesses seem to have extra overhead due to that).

Memory accesses do not have extra overhead in the NaCl implementation. The segmented memory model is a core part of the x86 instruction set and is always active, even if generally all segments are set to 'contain' all of system memory. Using it to isolate memory is effectively free. The only reason it's not used normally to isolate processes is because the CPU by itself doesn't stop a process from changing the segmentation configuration, so without a software arbiter to ban programs using those instructions before they even start it would not have been effective protection.

> Of course, you could also in principle trust the OS to be secure, and run arbitrary code in a security context with limited privileges, but with access to the GPU and other useful stuff

Most operating systems do not actually allow you to set up a sandbox like this, Linux included (unless you make something like SELinux mandatory for your browser to work, which won't fly well with anyone but Fedora/RHEL users). Sandboxing processes is a relatively recent addition to the security toolbox (despite how obviously powerful it is) and most OSes haven't caught up to the needs of these techniques, yet, making frameworks like NaCl mandatory for now.

Again, Google's engineers know what they're talking about, and you seem to have some holes in your knowledge of these topics. Please just go read their documentation. It's very easy to find and quite easy to understand.

> the history of local root holes on all OSes (not to mention the graphics drivers...) makes this probably an unwise choice.

That logic implies that all security is worthless and we should just stop trying to protect anything, because all OSes have local root holes and hence cannot be protected at all. A more useful way to look at things would be that holes are likely going to be found, and they will get fixed, and life will move on and people will still be more secure (no, not absolutely secure, but 'more' is still better than 'less') by having sandboxed processes than they were without.

Google's Native Client forges ahead

Posted Aug 7, 2011 7:30 UTC (Sun) by viro (subscriber, #7872) [Link]

a) there is a very good reason why everyone sets segments to maximal size and it's exactly the fact that this crap is *not* free. It's turned off as an optimisation when processor sees that limit is set to maximum.

b) on amd64 segment limits are not verified in 64bit mode. End of story.

c) segments can be changed only when you are running in ring 0, at which point the game is really over. You can switch between the segments present in GDT + your LDT, but that's it. Said that, on anything that runs Linux kernel you will have segments spanning the entire user address space in GDT, making the segment-based protection only as good as your code sanitizer. And x86 instruction set is not well-suited for analysis, to put it mildly; it's not RISC. Prohibiting jumps into the middle of instruction is nice, but how do you prohibit return into the same? And with that added into the mix, you can construct far ret as part of the immediate constant, bugger the stack frame, hit normal ret (which is going to be in the allowed set), "return" to that far ret and there you are - %cs:%eip is set to your data. Arbitrary jump to other code segment... You are still within the same process, of course, but the sandbox boundary is broken through. At the very least you can read any data anywhere in your process' address space, segmentation be damned.

Google's Native Client forges ahead

Posted Aug 8, 2011 3:04 UTC (Mon) by elanthis (guest, #6227) [Link]

> a) there is a very good reason why everyone sets segments to maximal size and it's exactly the fact that this crap is *not* free. It's turned off as an optimisation when processor sees that limit is set to maximum.

Have any references? All I can find when searching for performance of segmented memory in protected mode are a few papers on using it for efficient array bounds checking. :/ Not saying you're wrong, I'd just like to read more about it and I can't find anything useful.

> b) on amd64 segment limits are not verified in 64bit mode. End of story.

NaCl is 32-bit only, even on OSes/machines that support 64-bit mode, in no small part because the tricks employed on x86 depend on such details. The ARM port uses a different set of tricks, naturally.

> And x86 instruction set is not well-suited for analysis, to put it mildly; it's not RISC. Prohibiting jumps into the middle of instruction is nice, but how do you prohibit return into the same?

Since even kernel developers are apparently too lazy to even try to look this stuff up, let me answer your particular attack scenario: the RET instruction is also banned by the NaCl verifier (you are more than free to read the paper on how returns from functions are implemented, if you're wondering how it works). This is one of the reasons why a modified compiler is needed to produce binaries that work inside the NaCl sandbox.

Here is their original paper on their x86 sandboxing; there is more information available to anyone who can bother to spend 30 seconds looking for it:

http://src.chromium.org/viewvc/native_client/data/docs_ta...

NaCl isn't for regular desktop apps. It's for smaller, more contained apps. It's for the kinds of things you can already do on the Web or in Flash, except that it allows native speed (or very very close to native, depending on whether you consider hardware-executed but notably non-optimal instructions to be "native", I suppose) and allows for the use of C/C++ code and libraries (I can have a 3D math library that doesn't suck donkey nuts like every last single vector library in every single language other than C, C++, and D does due to the overwhelming limitations of the academia-designed high-level languages; yay!). NaCl isn't intended to be used outside of a browser or for complex applications that couldn't reasonably be implemented and deployed on top of something like Flash (save for the speed).

Google's Native Client forges ahead

Posted Aug 5, 2011 22:23 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

you can't easily analyze a binary to make sure there are not invalid opcodes in it. the problem is that you can jump into the middle of data, or jump to an address that is not the start of the instruction and the chip will start executing from there.

Google's Native Client forges ahead

Posted Aug 6, 2011 10:28 UTC (Sat) by elanthis (guest, #6227) [Link]

That is why they only allow certain opcode patterns, and disallow any patterns that cannot be easily verified, and do require a special compiler to generate compatible machine code that will pass the verifier's requirements and implement the tricks needed to actually work in the sandboxed environment, along with applying significantly more knowledge of the various hardware architectures than you apparently think a team of Google's top engineers are capable of doing. Tricks they have written several in-depth papers on, have implemented fully in completely open source code, and have had working in real environments for quite a while now.

In particular on x86, they are using several different features of the architecture. One is the segmented memory model of x86, another is the ability to ban any code that calls the instructions to change segments, and yet another is a very tight control on where branches can be and where they can target. Non-writable code pages along with non-executable data pages ensure that the untrusted code cannot subvert the machine code verifier by modifying or creating machine code. Simple trampolines handle the code segment changes and stack pointer swaps necessary to call into and return from the trusted code.

If you want more information, just go read their documentation and papers. It's all very accessible and easy to grok.

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds