LWN.net Logo

Google's Native Client

By Jake Edge
June 3, 2009

Allowing browsers to run native code downloaded from a web site has some attractions, at least at first blush. But, once some thought is put into it—or the serious security problems with Microsoft's ActiveX are recalled—the security flaws of the scheme become readily apparent. Google is resurrecting the idea in their Native Client (NaCl) research project, but rather than rely on trust, as ActiveX does, NaCl takes steps to verify the code before running it. As a weblog posting by Matasano Security describes, there are rather substantial technical barriers to overcome, but, even then, there are still some fairly serious repercussions to running native code from an untrusted site.

The reasons that native code is so attractive are that it allows for much better performance along with access to graphics and a user interface that isn't HTTP-based. One of the NaCl demos is a port of Quake so that it can run in the browser. Certainly games are one place where NaCl is attractive, but, also, for any existing program—at least those that aren't written in Java, Flash, or Silverlight—porting it to a new language is not required. For those who think that essentially all applications will eventually be delivered by the web, NaCl (or something like it) seems required.

But, as malware developers know, the x86 architecture has lots of ways to obscure the operation of a program in order to try to elude any kind of automatic vetting. The instructions are of variable length and malicious programs can jump anywhere in the stream, not just at the instruction boundaries found by a disassembler. In addition, x86 programs can execute from data, so that malicious programs can write some code to memory and jump there. These kinds of things cannot be determined by just examining the program binary, so Google leveraged some earlier work [PDF] to restrict the kinds of programs NaCl will execute.

Basically, NaCl requires that the code be structured such that it can be verified automatically. That means that disassembling the code must produce a stream of recognizable instructions and that jumps must land at the beginning of one of those instructions. In addition, self-modifying code is disallowed. With those restrictions in place, NaCl can verify that the code doesn't do anything that is disallowed.

NaCl then enforces some additional rules, disallowing memory management hacks that could fool the verifier and requiring that all system calls go through a "gate" in the first 64K of the code. Only certain calls are allowed through the gate, which is how NaCl protects against arbitrary code being executed. Google has created a patched version of GCC that will create an ELF-format file which follows the rules.

All of that may sound enticing, but Matasano puts a definite damper on enthusiasm for the technique. In some ways, it is similar to what Java applet sandboxes do, but Java has been around for quite some time, so many of the problems with its implementation have been found and fixed. Google sponsored a contest to try to shake out some of the problems with NaCl. Matasano participated and the blog post is essentially a report of what they and others found.

The basic problem is that bugs in the verifier, loader, or trusted system call gate can generally be immediately turned into exploits to run arbitrary code. The posting outlines a number of problems that they or other contest teams found. Until the NaCl components reach a level of maturity similar to—or even beyond—that of the Java applet sandbox, running native code in the browser is going to be a dicey proposition. A particular area of concern is that the system call gate must do its job based on what call is being made and the contents of the memory being passed, which is a much harder job than the equivalent for the Java sandbox (which is expressed in terms of Java classes and data structures).

But, even if all of the bugs with NaCl itself were found and fixed—an impossible task—there is still an architectural hole that was specifically removed from consideration in the contest: side-channel attacks. There are a number of attacks against keys and other sensitive information that can be made using timing analysis. By timing repeated executions of the code of interest, cache effects as well as branch prediction information can be extracted, which can then be used to recover keys or other information.

While the side-channel attacks are probabilistic in nature, they get better with repetition. If an attacker were able to add that kind of analysis as part of a popular game, for example, it would have ample opportunity to run. Since the kind of abilities required by the side-channel program are not very different than other, legitimate programs that NaCl would want to run, there is little that can be done to stop this kind of abuse. Whether it is a practical attack is hard to judge, but undoubtedly some attackers are already looking at it.

It seems likely that any security-conscious user is not going to be too interested in running code in NaCl anytime soon—if ever. Unfortunately, the same folks who are willing to run ActiveX programs from random internet sites might be quite willing to do the same with NaCl. That could lead to an ugly security breach of some kind, but one could argue that it is not really any worse than things are today. Running untrusted code is dangerous and there aren't many ways around that.


(Log in to post comments)

Google's Native Client

Posted Jun 4, 2009 2:33 UTC (Thu) by zooko (subscriber, #2589) [Link]

I disagree with your overall summary, Jake. You're right that there are basically two approaches to running code securely -- making sure the author of the code was honest and competent (what you refer to in this article as "relying on trust"), and making sure that the code can't do things that you don't want it to do. I think in the long run the latter is a much more promising strategy.

Put it this way: making sure there are no security holes in your interpreter/sandbox/operating system is a really hard, to be sure, but it is much easier than making sure that nobody who ever touched the code was malicious or error-prone.

In unix, for example, we rely on the operating system to run some code (a process owned by a user) without letting that code do absolutely anything it wants (permissions, etc.). There are plenty of ways that the operating system can fail at this, accidentally allowing the code to escape from the constraints and take over added authority. We have a long history of those sorts of holes, and we may never get it perfectly right. But on the other hand it is awfully useful for some things to be able to run code under a separate user account without thereby letting it gain access to all user's authorities. Much *more* useful, for those cases, than the alternative of having a human inspect the code before allowing it to run, or of trying to ensure that all patches to the code come from trusted authors.

Another way to think of this is that the scale of authors of code has changed dramatically in the last 30 years. When the multics and unix security paradigms were developed, there were probably hundreds or at most thousands of people who typically authored code that you might actually want to use.

Today there are probably millions, tens of millions, or perhaps even hundreds of millions of people who write code that you (or someone) might find useful. If we include javascript on web pages, macros in spreadsheets, and so on, there may soon be a billion people (if everything goes well) who occasionally write some code that someone else may occasionally find to be useful.

The "trusted authors" approach might have been useful if there were only a few hundred or a few thousand people who typically generated source code that you wanted to run, and you could be suspicious and cautious if a stranger posted a patch. Today, that approach seems extremely limiting.

(Hierarchical-trusted-authors approaches such as Microsoft code-signing or Debian gpg-keys don't really scale up to modern-day needs either, in my opinion -- they err on both sides, by excluding good code from distribution and by allowing malicious or buggy code into distribution. The bigger the scale, the larger both kinds of errors will be. Sure, you can "try harder" to reduce one or both kinds of error, and this can help a little, but the whole approach is just inherently non-scalable.)

Fortunately, other people have realized the inherent limitations of the relying-on-trust approach and are now actively pursuing the alternative of running-code-safely. Google NaCl is a big, exciting step forward on that axis. (Google caja is another.)

Frankly, the side-channel issue seemed like Matasano grasping at straws, to me. Side-channels surely exist, and can be important in cases where you are juggling secrets, but there are plenty of uses where they don't matter, and for those uses Google NaCl seems to be coming along nicely.

For comparison, those same side-channel attacks would also mean that a user account on a multi-user unix system might be able to steal secrets such as passwords or private keys from another user. Cryptographers have been developing some defenses against that sort of thing, but if you have extremely valuable secrets then you should indeed probably not allow those secrets to be used on physical hardware which is shared by processes owned by other people.

Oh, I just realized that the same side-channel argument probably applies to virtualization. *Probably* similar attacks can extract your secrets from your Amazon EC2 instance, if that instance finds itself sharing a CPU with an instance owned by an attacker. Or maybe not -- such attacks are inherently a noisy, probabilistic situation and I don't recall any report of such a thing being exploited in the wild.

Anyway, the fact that NaCl is susceptible to side-channel attacks is rather unremarkable -- so is Linux, Amazon EC2, The JVM, the Javascript implementation in your web browser, and probably every actually deployed access control system.

Google's Native Client

Posted Jun 4, 2009 3:11 UTC (Thu) by jake (editor, #205) [Link]

> I disagree with your overall summary, Jake.

Hmm, interesting. I don't find much that I disagree with in your message, so either I didn't communicate well (likely) or your disagreement is not in an area that I considered to be central to the article.

I think it is a promising strategy to try to confine programs to doing "what we want", but that is a horribly difficult and error-prone process.

I guess you are more optimistic than I about removing the parser/loader/system call gate bugs in any kind of near-term timeframe. The side-channel attacks exist, and could be problematic, but that is just a demonstration of an inherent, architectural weakness of the scheme. The real problems are likely to come from all of the rest of it.

Bottom line, for me, is that I think I am about as likely to run NaCl binaries from untrusted sources anytime soon as I am to run ActiveX controls. Maybe I am behind the times, though.

jake

Google's Native Client

Posted Jun 4, 2009 4:02 UTC (Thu) by elanthis (guest, #6227) [Link]

I see no reason not to have a hybrid approach. I like signed binaries not because it tells me that everyone who touched it was a Good Guy, but because it lets me know who (or which organization, at least) is vouching for the code. I trust the Linux kernel sources to be the foundation of my security-sensitive systems. That doesn't mean I trust every person who's touched the code, but it does mean that I trust the people reviewing and signing off on the code. And I want to know that my system is running an unmodified version of official Linux (or an unmodified distribution kernel, at least) and not some random hacker-supplied version.

That's where code signing comes in. It tells me that yes, this is the version of FooApp that I meant to download and run, and that yes, it really is from FooSoft.

What code verification does is an extra step to help protect me for cases when I find need to run code from some specific (but not yet fully trusted) software provider. I've never run anything from FooSoft before, but I find myself needing to run FooApp because it is the only software that does what I need. I'm not going to do a full source analysis (be honest, you almost certainly haven't even looked at the source for 99% of the software you run) because I have better things to do with my life. So I rely on code verification (like Google's) along with the positive reputation of FooSoft (and the fact that I know the FooApp I'm about to run really is the real deal from FooSoft) to keep me safe.

I can't guarantee I'm 100%. You can never do that. I might even have a box with no Internet access but my assistant/wife/janitor/whoever might decide to abuse his or her physical access.

ALL security -- be it computer security, a deadbolt, or whatever -- is about risk management. You can never completely remove security holes, but you can reduce the ease of finding them, increase the complexity to utilize them, and decrease the potential damage of abusing them.

Sandboxing, code verification, and code signing all are tools to help manage risk. No one method is fail proof and no one method is better than all methods.

Google's Native Client

Posted Jun 4, 2009 8:29 UTC (Thu) by tzafrir (subscriber, #11501) [Link]

Yeah. But automatically trusting your computer to trust anybody Microsoft (or Google, or whatever) trusts well enough to grant a certificate is a bit too many levels of indirection of trust.

Google's Native Client

Posted Jun 4, 2009 5:50 UTC (Thu) by nikanth (guest, #50093) [Link]

Who needs that extra bit of performance? Even then it can't match native code performance, because of the restrictions implied on the code generated anyway. Java is good enough for me.

A standard to make use of GPU in the webpage is what we need.

Google's Native Client

Posted Jun 4, 2009 12:41 UTC (Thu) by nix (subscriber, #2304) [Link]

Those restrictions really don't slow the code down much. They slow down its *theoretical* max speed: a compiler written by an ultraintelligent being could make use of overlapping instructions and that sort of thing.

But code generated by compilers written by human beings never does that. Only attacks do. (And it is very rare to rely on self-modifying code or data execution, either: SELinux and other systems have blocked it for ages, and programs relying on it have long been considered buggy.)

Actually slowdown is noticeable

Posted Jun 4, 2009 19:34 UTC (Thu) by khim (subscriber, #9252) [Link]

You are losing 5-10% of speed with NaCl. But the biggest win is not even speed. It's usage of memory resources. NaCl sandbox is half-meg of memory. Java is few megabytes for "Hello, World". Not even close to being equal.

Google's Native Client

Posted Jun 4, 2009 7:52 UTC (Thu) by Thue (subscriber, #14277) [Link]

even if all of the bugs with NaCl itself were found and fixed—an impossible task

A bit too categorically pessimistic, IMO. What if a formal proof was written that all programs which pass the validator are safe, and the formal proof itself was run through a proof validator?

That is not science fiction. Computer scientists are working on similar projects right now :). See for example http://www.springerlink.com/content/fw2u6488576w68j4/

Google's Native Client

Posted Jun 4, 2009 14:03 UTC (Thu) by NAR (subscriber, #1313) [Link]

Scientists also work on power plant based on nuclear fusion :-) I think that will be ready earlier than an automatic prover.

Google's Native Client

Posted Jun 5, 2009 13:26 UTC (Fri) by Thue (subscriber, #14277) [Link]

Google already wrote the automatic prover. Now we just need to write the proof that the automatic prover is correct. Such a proof is written by hand with the current frontier science, and the proof is then checked by an automated proof validator.

I don't see anything here which is not already done today, just on a smaller scale.

Google's Native Client

Posted Jun 10, 2009 15:26 UTC (Wed) by amarjan (guest, #25108) [Link]

What about implementation defects -- say in a given stepping of a particular CPU?

I'm not a low lever programmer so I can't site relevant examples from memory, but I know there have been numerous such defects with security implications over the years.

Why do all this? Shouldn't the OS do it?

Posted Jun 4, 2009 9:00 UTC (Thu) by epa (subscriber, #39769) [Link]

I am not sure why running 'untrusted' native code is considered so dangerous or novel. Since the seventies or earlier time-sharing systems have allowed different users to run their own code on the system, and each user or process is isolated from the others. Modern hardware such as the 386 family was specifically designed to support this. Each process runs in a virtual machine set up by the operating system and cannot access memory belonging to other processes or the kernel. The only access granted to it is what the operating system explicitly provides through its system call interface.

Why, then, is it necessary to go to all this trouble of verifying binaries? Surely it would be far simpler for the operating system to provide a bit of help, setting up a new process with its own memory space, CPU quota, and a limited set of system calls (perhaps just read() and write() to a pair of pipes that already exist). Then you could execute any native code you want, and if it tries to do something naughty, the CPU's built-in mechanisms will trigger a fault and the OS kills the process.

We only think this is exotic because popular OSes of today do not provide a lot of control over what resources a process can have. Typically access to files is set with access control bits, but any process can open TCP/IP connections. Or if the OS does provide capabilities, jails, masking out of system calls and so on, there isn't a single dominant API and model, and the necessary base of knowledgeable people to make good use of it.

Why do all this? Shouldn't the OS do it?

Posted Jun 4, 2009 10:32 UTC (Thu) by dgm (subscriber, #49227) [Link]

What is laking the the capability to let applications act as kernels by themselves. It would be much easier if the browser could have its own private filesystem, system calls and users exported to child processes.
AFAIK, Plan9 does some of this (at leask the filesystem part).
Not an easy task, but maybe worth some more thought.

Yet more lock-in to X86

Posted Jun 4, 2009 12:33 UTC (Thu) by addw (guest, #1771) [Link]

The first thought that I had was what about people who want to use other hardware ? Most of the desktop world seems to be Intel compatible at the moment, but I can see that this will change - ARM looks like it will take some space.

I don't like the idea at all.

Yet more lock-in to X86

Posted Jun 4, 2009 21:04 UTC (Thu) by jimparis (subscriber, #38647) [Link]

Agreed, people should take all this NaCl stuff with a grain of salt :)
I thought the "this plugin works only on x86" mentality was beaten to death years ago. Glad to see Google's picking up the slack and making the web suck again...

Yet more lock-in to X86

Posted Jun 5, 2009 23:54 UTC (Fri) by vomlehn (subscriber, #45588) [Link]

Not only no to NaCl, but hell, no. We're just on the cusp of cracking the Microsoft operating system monopoly and we're going to shore up Intel's processor monopoly? We are definitely getting into the evil zone, here. It's pretty hard to compete on price with a company that dominates the market but, from the technical standpoint, Intel's power consumption and die size are way greater than comparable RISC processors. Hardly Green technology.

Java was developed from the ground up to be verified and it does the job darn well. I'm starting to see benchmarks where it goes toe to toe with C, once you are done with JITting the code. It's well proven and platform-independent. Please, Google, let's stick with something that is already working. And, of course, if you want to spend time and money making Java run faster, that would not be evil.

Paradigm has run it's course

Posted Jun 4, 2009 15:03 UTC (Thu) by tstover (subscriber, #56283) [Link]

To me, this is just one more sign that the browser's decade is coming to an end. Browser tech no longer provides a solid hyper linked document system, nor a client-server application framework. Just layer after layer of directionless, nebulous standards, legacy ideas, compatibility issues, and limitations. Is it really that hard to see the downside of an "every thing you can think of in a single application" architecture? All for what? - kids don't want to learn native programming anymore? Ok, rant over.

Kinda ironic

Posted Jun 4, 2009 19:44 UTC (Thu) by khim (subscriber, #9252) [Link]

Browser tech no longer provides a solid hyper linked document system, nor a client-server application framework.

What other alternative is there? Requirement is "simple": I want to run random programs from different (often hostile) sources and still have all my documents alive and well and system under my control. Browser is not a good solution for that, but... it's still the best solution available.

All for what? - kids don't want to learn native programming anymore?

Hmm... If you actually read the articly you'll find out that NaCl tries to bring the ability to program browser in "native code". And this is not about kids - this is about distribution. Browser offered one thing to the users: ability to safely run programs written by anyone without installation and dangers. Sure, security holes exist but if you'll compare result of installation of 100 random "native programs" (downloaded from hostile sites) and 100 random "hostile sites" opened in browser... it's not even a contest.

Browser is becoming an OS - like Netscape Communications promised. Only decase later from initial plans :-)

Kinda ironic

Posted Jun 5, 2009 4:37 UTC (Fri) by tstover (subscriber, #56283) [Link]

No there is not a real alternative, things sure could have gone better though. I was being somewhat facetious. I'm obviously using a browser for lwn.net. I've actually been following NaCl for sometime, and anxiously await some means to achieve some of its goals.

Being a little silly yes, but I was alluding to the under discussed culture clash between the ideas of the web generation and the system & application programming generation that has nothing to do with age (I'm 29 for instance, while some of biggest web advocates I know are in their late 50s). There is no way a browser will ever replace multiuser, protected memory OS. Even if that were - windows. I may be in a dying breed, but I'm here to the end. I'll turn out the light on the last copy of *nix just before I retire, and let you kids with your browser in bios, iphone, or whatever wonder what it was like when computers were fun.

:)

Why not?

Posted Jun 5, 2009 6:13 UTC (Fri) by khim (subscriber, #9252) [Link]

There is no way a browser will ever replace multiuser, protected memory OS.

What exactly is lacking and why this can not be added to browser?

I can name two programs I can not see embedded in browser which I use regularly:
1. DVD burner app (the exact kind is unimportant)
2. USB Gecko toolset.

The was majority of applications I use can be embedded in browser. If it's good idea or not is debatable, but I see the trend goes this way... because it's convenient. I can work with the same doucments on my desktop, my laptop and even my phone! And important things like regular backups are done by professionals.

So while I can not see situation where all applications are embedded in browser, I can easily see situation where most applications are embedded in browser...

Why not?

Posted Jun 5, 2009 17:21 UTC (Fri) by yokem_55 (subscriber, #10498) [Link]

Actually an embedded CD/DVD burner would be a cool app to help people burn ISO's. I don't know how many times I seen people download a linux ISO, and end up with CD containing the ISO file. Sigh.

This is not even the worst outcome!

Posted Jun 6, 2009 7:52 UTC (Sat) by khim (subscriber, #9252) [Link]

If they have installed something like WinRAR then said ISO is shown as archive and can be opened as one - thus they end up with files extracted from "archive" and then saved on CD. Of course the resulting CD is unbootable, but it looks genuine enough on first glance...

Google's Native Client

Posted Jun 10, 2009 18:36 UTC (Wed) by Ford_Prefect (subscriber, #36934) [Link]

"In addition, x86 programs can execute from data"

I don't understand why things like the NX bit [1] have not yet become on-by-default. Surely performance impact is either non-existent or fixable?

[1] http://en.wikipedia.org/wiki/NX_bit

Google's Native Client

Posted Jun 15, 2009 9:28 UTC (Mon) by forthy (guest, #1525) [Link]

Remember: One reason to actually have native code is to implement other languages. E.g. Lisp or Forth or Ocaml. These languages are interactive, so they can download additional code, compile and run it - to native code on fast implementations. Game engines have an AI subsystem, which is quite often written in such a language.

Of course it is impossible to prove such a thing as "safe" (or rather the reverse: It is easy to prove it as unsafe). Therefore I think the right thing to do for untrusted native code is indeed to sandbox the native code in a VM, and not to check the code itself - and rely on something like NX bit and disable "self-modifying code" (generated code is "self-modified").

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds