Leading items

The case of the fraudulent SSL certificates

By Jake Edge
March 23, 2011

Certificate authorities (CAs) exist to issue certificates that protect users' encrypted traffic when they use SSL/TLS (i.e. HTTPS) for web browsing—at least ostensibly—but the CA system has been a source of general unhappiness for quite some time. The discovery of fraudulent certificates in the wild can only lead to more calls for changes to the CA model, and for some, that model is irretrievably broken.

A Tor project blog posting by Jacob Appelbaum (aka ioerror) has the most detailed look at this particular incident. Basically, sometime around March 15, a CA (evidently UserTrust, which is part of Comodo) noticed that there were certificates signed by the CA, but not properly issued by that CA, floating around the internet. On March 15, the CA issued certificate revocations for nine different certificates.

Man in the middle

Part of the problem, though, is that certificate revocation doesn't really work, ~~as browsers are generally not checking the revocation status of certificates~~. [Update: As pointed out in the comments, browsers do check the revocation status, but if that check fails, most do not give the user any indication of that.] In addition, many browsers do not keep track of the certificates that they have received and alert users when they change. So, a fraudulent, but correctly signed, certificate offered by a man-in-the-middle (MITM) attacker will be accepted by most browsers with no indication to the user of any problem.

MITM attacks have traditionally been considered difficult to pull off because they require control of some intermediate node in the path between the user and the web site. The pervasiveness of wireless networking has reduced that barrier considerably. Any access point, even one using encrypted communications (e.g. WPA), could be subverted—or intentionally configured—to perform MITM attacks. Public WiFi hotspots would be a perfect location for an attacker to set up a fraudulent certificate for, say, Paypal, and sniff the credentials of any user that connected to the service. Offering "Free Public WiFi" in crowded places like airports might be another route to setting up an MITM attack.

So far, only addons.mozilla.org (the site for Firefox extensions) has been identified as a victim site, one for which a fraudulent certificate has been issued. According to Appelbaum, there are seven uniquely named certificates floating around (one of which is issued to an invalid hostname: "global trustee"). He speculates that "Facebook, Skype, Google, Microsoft, Mozilla, and others are worthy of targeting". But Comodo has not released any information about which hostnames were targeted, at least yet. It was Mozilla who disclosed that addons.mozilla.org was a target.

As pointed out by LWN reader sumC, Microsoft has put out an advisory that lists the affected domains: login.live.com, mail.google.com, www.google.com, login.yahoo.com (3 certificates), login.skype.com, and addons.mozilla.org.

Alerted by browser code changes

Since browsers do not generally check the revocation status of certificates, some other mechanism must be used to reject these bad certificates. A change to the Chromium browser code on March 16 is how Appelbaum was first alerted to the problem. That change added a new X509Certificate::IsBlacklisted() function for Chromium, which listed the serial numbers for multiple certificates any of which would cause the function to return "true". Some twelve hours later, Google issued a Chrome update that "blacklists a small number of HTTPS certificates".

At around the same time, Mozilla also patched Firefox with a similar, but not exactly the same, list of serial numbers. This was clearly an indication that major browser vendors had been alerted to a problem. Appelbaum started digging further, looking at the published certificate revocation lists (CRLs) using his crlwatch program. Crlwatch used the EFF's SSL Observatory to find a canonical list of CRLs, then fetched them to look for a match to any of the serial numbers blacklisted by Chromium or Mozilla.

He found matches for all but the two test certificates that were listed, all pointing back to UserTrust. In addition, the Mozilla patch also points to UserTrust as the compromised CA. It means that somehow an attacker was able to get certificates issued by UserTrust for domain names that should not have been issued. That could happen if the signing key were compromised or UserTrust was somehow tricked into issuing those certificates. So far, there are no details as to how the fraudulent certificates were created.

Disclosure issues

As Appelbaum points out, it is clear that at least some browser makers were alerted to the problem, but certainly not all. Tor releases a "Tor Browser Bundle" and was not advised of the problem, so the project is left scrambling to update its browsers (though, one would guess Appelbaum's alertness in spotting the issue will have given the project a head start). Other, smaller browsers are likely affected by the problem as well, but are just now hearing about it.

Appelbaum agreed to an embargo on releasing information about the problem until the Firefox 4 release on March 22. That embargo was extended to March 23 to ensure that Microsoft could release an IE update, but Mozilla put out its posting on the issue on March 22, at which point Appelbaum considered the embargo lifted and posted his own information.

The embargo is very troubling since these certificates were evidently already out there potentially causing trouble for users; hiding their existence doesn't help users at all. Also worrisome is that both Google and Mozilla told Appelbaum that the CA "had done something quite remarkable by disclosing this compromise". It seems that other CAs may have fallen prey to the same kinds of attacks and not disclosed that fact. So it is possible that there are other known fraudulent certificates in the wild, presumably just listed on CRLs without any special browser blacklisting. An attacker holding one of those certificates must be cackling with glee.

Even for the certificates that UserTrust/Comodo alerted about, it took some time for browsers to be updated and will take even more time before users get and install those updates. Since we don't know how the certificate-issuing process was subverted, we have to hope that the CA is taking proper steps to either invalidate the signing key if it was compromised, or to change its process so things like this can't happen again. It would also be good if UserTrust itself disclosed exactly which domain names were affected, though we may already have that information via Microsoft's advisory.

Browser certificate handling

It's easy to see that there is a problem with certificate handling in browsers, but it is less clear what the proper solution is. Keeping track of the certificate that is sent when a domain is first encountered and comparing it on subsequent visits would be useful, but the browser makers are likely to be uncomfortable with how they present any changes to users. Certificates expire and are changed for other legitimate reasons, so it may be difficult for users to distinguish a legitimate certificate change from one that may be happening due to an MITM attack.

The browsers could also default to using the CRL or online certificate status protocol (OCSP) queries to ensure that the certificates are still valid. That requires that the CA be available to answer those queries every time a certificate is sent by a site, as any downtime will result in web sites becoming unavailable. Adam Langley offers the idea of short-lived certificates in his "Revocation doesn't work" blog posting that was linked above. There are other possible solutions as well, some of which Appelbaum mentions toward the end of his post.

The real problem, though, may be that the CA model doesn't work well on a (largely) decentralized network like the internet. The problems range from incidents like this one to worries about possibly rogue CAs. There are other ideas for handling certificates in a non-CA world (or one where CAs have much reduced authority), but there is a rather large stumbling block to changing the current system: the enormous economic interest that the CAs have in keeping things more or less as they are. CAs derive a huge amount of money from their semi-monopoly on the issuing of certificates, and one could expect that any threat to that income stream would be met with strong resistance. But cases like this one may start to make it clear that changes are required.

Comments (20 posted)

Has Bionic stepped over the GPL line?

By Jake Edge
March 20, 2011

Way back in the early days of Linux, shortly after Linus Torvalds switched the kernel from his own "non-commercial" license to the GPL, he also added an important clarification to the kernel's license. In the COPYING file at the top of the kernel tree since mid-1993, there has been a clear statement that Torvalds, at least, does not consider user-space programs to be derived from the kernel, and thus are not subject to the kernel's license:

This copyright does *not* cover user programs that use kernel services by normal system calls - this is merely considered normal use of the kernel, and does *not* fall under the heading of "derived work".

One could easily argue that this distinction is one of the reasons that Linux is so popular today as programs written to run on Linux can be under whatever license that the developer chooses. Some recent analyses of Google's Bionic libc implementation, which claim that Google may be violating the kernel's license, seem to be missing—or misunderstanding—that clarification.

A blog posting from Raymond T. Nimmer, who is a professor specializing in intellectual property (IP) law, was the starting point. That posting looks at the boundaries between copyleft and non-copyleft code. Nimmer specifically analyzes the question of whether header files that specify an API to a GPL-covered work can be incorporated into a program that is not released under the GPL. He points to Google's use of the kernel header files in the Bionic library as an example and concludes:

For entities that do not desire to disclose code or force their customers to do so, or otherwise conform to copyleft obligations, working with copyleft platforms and programs presents a very significant and uncertain, risk-reward equation.

Nimmer's post was noticed by Edward J. Naughton, a practicing IP attorney, who then wrote briefly about it at the Huffington Post. Naughton also did a much longer analysis [PDF] as an advisory for his law firm, Brown Rudnick. That advisory concludes with a fairly ominous warning:

But if Google is right, if it has succeeded in removing all copyrightable material from the Linux kernel headers, then it has unlocked the Linux kernel from the restrictions of GPLv2. Google can now use the "clean" Bionic headers to create a non-GPL'd fork of the Linux kernel, one that can be extended under proprietary license terms. Even if Google does not do this itself, it has enabled others to do so. It also has provided a useful roadmap for those who might want to do the same thing with other GPLv2-licensed programs, such as databases.

In turn, Naughton and Nimmer's analyses were picked up by Florian Mueller who wrote a blog post about the serious threat that Google and Android face because of this supposed GPL violation. So, is Google really circumventing the GPL in a way that could threaten Linux? To answer that, we'll have to dig into what Bionic is, how it is built, and whether it violates the letter or spirit of the Linux COPYING file.

An interface for user space

The kernel exists to provide services to user space, and one can do nothing useful from user space on a Linux system without invoking the kernel via a system call. That system call boundary is quite clear. It requires a special instruction that puts the CPU into kernel mode in order to invoke one. While programmers may see system calls as simple library calls, that's not what's happening under the covers.

In order to use Linux system calls, though, it is necessary to get information from the kernel header files. Various pieces of information are needed including system call numbers (which is how they are invoked), type information for various system call arguments, as well as constants that are required to properly invoke those calls. That information is stored in the kernel headers and any program that wants to run on Linux needs to get that information somehow.

The most common way to invoke the kernel is by using GNU libc (glibc). Glibc has a set of "sanitized" kernel header files that are used to build the library, and distributions typically provide packages with those header files to be installed into /usr/include. Programs can then be built by using those header files and linking to glibc. While "sanitized" may sound like it refers to the removal of GPL-covered elements from those files, the main reason it is done is to remove kernel-specific elements from the files. The kernel headers have lots of kernel-internal types, constants, and functions that are not part of the kernel interface.

It isn't really correct to call the interface that the kernel provides to user space an API (i.e. application programming interface), as it is really an application binary interface (ABI), and one that the kernel hackers strive to maintain for each new kernel release. Removing something from the kernel ABI almost never happens, though new features expand that ABI frequently. The ABI is what allows binaries that were built on an earlier kernel to run on newer kernels. The API, on the other hand, is provided by glibc or some other library.

Using glibc is just one way for a program to be built to run on Linux. There are other libc implementations, including uClibc and dietlibc, which are targeted at embedded devices, as well as the embedded fork of glibc, EGLIBC. A program could also use assembly language instructions to make system calls more directly. Using any of those methods to get at the system call interface is perfectly reasonable, and will require information from the kernel headers. Glibc may be the most popular, but it certainly isn't the only way.

Android's Bionic libc is, at some level, just another alternative C library implementation. It is based on libc from the BSDs with some Google additions like a simple pthread implementation, and has a BSD license. It's also a lot smaller than glibc—roughly half the size. The license satisfies one of the goals for Android: keeping the GPL out of user space. While glibc is not under the GPL, as it is licensed under the LGPL (v2 currently, with a plan to move it to v3), that may concern Google (and its partners) because LGPLv3 requires that users be able to replace the library—something that doesn't mesh well with locking down phones and other Android devices. In the end, it doesn't matter, as Google, like any other kernel user, can make Linux system calls any way it chooses.

Bionic's use of kernel headers

So what does Google do that causes Nimmer, Naughton, and Mueller to claim that it is circumventing the GPL to the detriment of the community? To create the header files used by Bionic, and applications, Google processes the kernel header files to remove all of the extra stuff that is either only there for the kernel, or doesn't make sense in the Bionic environment. In short, with minor exceptions, Bionic is doing exactly what glibc is doing, taking the kernel header files and massaging them into a form that defines the interface so that they can be used by the library itself and any applications that use the library. Nor has Google hidden what it's done, as there is a README.TXT file that is quite clear on what it is doing and why it is doing it.

Glibc and others may be using the kernel headers that can be generated from a kernel source tree by doing a "make headers_install". That Makefile target was added to help library developers and distributions create the header files that are required to use the kernel ABI. It is not a requirement, as there are other ways to generate (or create) the required headers, and various libraries have done it differently along the way. The Android developers intend to eventually use the headers that can be created from the kernel tree, but there are currently some technical barriers to doing so. The key piece to understand is that the information required to use the kernel ABI are contained in one and only one place: the kernel header files.

There are two things that Bionic does that are perhaps a bit questionable. The first is that as part of munging the header files, it removes the comments from them, including the copyright notice at the top of the file. It replaces the copyright information with a generic "This header was automatically generated ..." message, which concludes with: "It contains only constants, structures, and macros generated from the original header, and thus, contains no copyrightable information." The latter part is likely what has the IP experts up in arms. Much of Naughton and Nimmer's postings indicate that they believe Google overreached in terms of copyright law by stating that the files do not contain elements eligible for copyright protection.

They may be right in a technical sense, but it still may not make any difference at all. Calling into the kernel requires constants and types (structures mostly) that can only come from the kernel headers. Those make up the functional definition of the ABI, and that ABI has been explicitly cleared for use by non-GPL code. One could argue that Google should keep the copyright information intact—one would guess lawyers were involved in the decision not to and the wording of that statement—but that is most likely only a nicety and not required once one understands that those files just contain the ABI information, nothing more.

Well, perhaps there is a bit more. The Bionic README, notes that "the 'clean headers' only contain type and macro definitions, with the exception of a couple static inline functions used for performance reason (e.g. optimized CPU-specific byte-swapping routines)". The latter might be considered elements worthy of copyright protection—and not part of the kernel ABI—but they might not as well. Those routines are written in assembly code, so they might well be considered to be the only way to efficiently write byte-swapping routines for each of the architectures and thus might be considered purely functional elements.

Misunderstanding Torvalds

Both Naughton and Mueller make a big deal about a posting from Torvalds in 2003 that ends with the shout: "BUT YOU CAN NOT USE THE KERNEL HEADER FILES TO CREATE NON-GPL'D BINARIES." While it would seem to be a statement from Torvalds damning exactly what Google is doing, that would be a misreading of what he is saying. One need look no further than the subject of the thread ("Linux GPL and binary module exception clause?") to see that the context is not about user-space binaries, but instead about binary kernel modules. Torvalds may have been a little loose with his terminology in that post, but stepping back through the thread makes it clear he is talking about kernel modules. Furthermore, in another post in that same thread, he reiterates his stance on user-space programs:

This was indeed one of the worries that people had a long time ago, and is one (but only one) of the reasons for the addition of the clarification to the COPYING file for the kernel.

So I agree with you from a technical standpoint, and I claim that the clarification in COPYING about user space usage through normal system calls covers that special case.

But at the same time I do want to say that I discourage use of the kernel header files for user programs for _other_ reasons (ie for the last 8 years or so, the suggestion has been to have a separate copy of the header files for the user space library). But that's due to technical issues (since I think the language of the COPYING file takes care of all copyright issues): trying to avoid version dependencies.

That's a pretty unambiguous statement about using the kernel headers for user-space programs. In fact, in the early days, the accepted practice was to symbolically link the kernel headers into /usr/include, and one might guess that any number of proprietary (and other non-GPL) programs were built that way. Torvalds is no lawyer (nor am I), but his (and the other kernel hackers') intent is likely to be very important in the extremely unlikely case this ever gets litigated.

It is almost amusing that Mueller argues that Google should switch to using glibc, rather than Bionic. It reflects a grave misunderstanding of the differences between the two libraries. If the Nimmer/Naughton arguments are right, it's hard to see how glibc is any different. Their argument essentially boils down to there being no way to use the kernel headers without a requirement to apply the GPL to the resulting code.

It's certainly not impossible to imagine someone filing a lawsuit over Android's use of the kernel headers. It's also possible that a judge might rule against Android. But given that kernel hackers want user-space programs to use the ABI and that the COPYING file explicitly excludes those programs from being considered derived works of the kernel, one would guess that some kind of workaround would be found rather quickly. Other than the fear, uncertainty, and doubt that these arguments might engender, one would guess that Google isn't really losing much sleep over them.

Comments (199 posted)

Target: Android

By Jonathan Corbet
March 23, 2011

One way to know that a product has become successful is to see others lining up to attack it in the courts. Imitation may be the sincerest form of flattery, but FUD and lawsuits also show, in a rather less sincere way, a certain form of respect. By this measure, the Android system, by virtue of being the target of both, must be riding high indeed.

At the top of the news is Microsoft's just-announced lawsuit against Barnes & Noble, an American bookstore chain. According to Microsoft, the Nook ebook reader, which is based on Android, violates five of Microsoft's software patents. By now, the fact that these patents cover trivial "innovations" should not come as a surprise. Here is a quick overview of what Microsoft claims to own:

#5,778,372: "Remote retrieval and display management of electronic document with incorporated images." What's described here is displaying a document on top of a background image with the brilliant feature that the document is displayed while the image is still transferring and the whole page is redrawn once the image shows up.
#5,889,522: "System provided child window controls." This patent covers an application settings dialog with tabs.
#6,339,780: "Loading status in a hypermedia browser having a limited available display area." This patent covers the idea of putting up a "loading" image over a page while the page is loading.
#6,891,551: "Selection handles in editing electronic documents." Covered here is the technique of putting little images around a selection area so that the user can, by dragging those handles, resize the area.
#6,957,233: "Method and apparatus for capturing and rendering annotations for non-modifiable electronic content" - a method for storing "annotations" outside of the text of a read-only document. Broadly read, this patent could be said to cover browser bookmarks, an idea that somebody might just have thought of before this patent's 1999 filing date.

It seems unlikely that any of these patents would stand up against a suitably determined challenge in court - though such an opinion does certainly rely on a possibly naïve view of the rationality of American patent courts. In the real world, challenging patents is a time-consuming and expensive affair. Barnes & Noble, which may well have been chosen because it looks like a weak target, may not have an appetite for that kind of fight. That said, the company has, evidently, refused offers to license these patents from Microsoft; its management had to know that a lawsuit would be the next step.

Perhaps we'll see a counterattack from the rest of the industry; the idea of paying rent to Microsoft for the privilege of using bookmarks is unlikely to have broad appeal. Or, perhaps, the entire mobile marketplace in the US will simply collapse under the weight of the steadily increasing pile of patent-related lawsuits.

The patent suit is not the only attack ongoing against Android currently; there is also, seemingly, a determined FUD campaign underway. Most recently, this campaign has taken the form of assertions that Android is violating the Linux kernel's license; LWN examined those claims in a separate article. It's worth noting that one of the proponents of these claims has represented Microsoft in the past - a fact which he chose to remove from his online biography shortly before beginning the attack.

The goal of these attacks seems clear - to inject fear, uncertainty, and doubt into the growing Android ecosystem. Hardware vendors have been served notice that they may well be sued for using Android. Software vendors are being told that writing for Android could expose them to copyright infringement claims. It's all aimed at getting these companies to reconsider investing in Android in favor of the "safer" alternatives.

The thing is, of course, we have seen this kind of FUD campaign before. As early as the 1990's, people were warning that use of Linux could cause companies to "lose their intellectual property." Fortunately, most companies have figured out that they face no such threat. So, when we see something like this ludicrous claim:

If Google is proven wrong, pretty much that entire software stack -- and also many popular third-party closed-source components such as the Angry Birds game and the Adobe Flash Player -- would actually have to be published under the GPL.

We can relax in the knowledge that we have seen such things before. Linux turns out to be surprisingly resilient to FUD; this campaign seems unlikely to even slow things down.

It is worth noting that, despite the recent claims that companies working with Android might be sued by free software developers, the actual lawsuit was filed by a company which is generally hostile to free software. There are indeed threats out there, but they do not come from our community; they come from companies which feel they are losing in the market and which, as such companies so often do, are turning to the courts instead. It is one of the less pleasant signs of success; our community should feel flattered indeed.

Comments (3 posted)

Page editor: Jonathan Corbet
Next page: Security>>