By Jake Edge
March 23, 2011
Certificate authorities (CAs) exist to issue certificates that protect
users' encrypted traffic
when they use SSL/TLS (i.e. HTTPS) for web browsing—at least
ostensibly—but
the CA system has been
a source of general unhappiness for quite some time. The discovery of fraudulent certificates in the
wild can only lead to more calls for changes to the CA model, and for some,
that model is irretrievably broken.
A Tor
project blog posting by Jacob Appelbaum (aka ioerror) has the most
detailed look at this particular incident. Basically, sometime around March
15, a CA (evidently
UserTrust, which is part
of Comodo) noticed that there were
certificates signed by the CA, but not properly issued by that CA, floating
around the internet. On March 15, the CA issued
certificate revocations for nine different certificates.
Man in the middle
Part of the problem, though, is that certificate
revocation doesn't really work, as browsers are generally not checking
the revocation status of certificates. [Update: As pointed out in the comments, browsers do check the revocation status, but if that check fails, most do not give the user any indication of that.] In addition, many browsers do not
keep track of the certificates that they have received and alert users when
they change. So, a fraudulent, but correctly signed, certificate offered
by a man-in-the-middle (MITM)
attacker will be accepted by most browsers with no indication to the user
of any problem.
MITM attacks have traditionally been considered difficult to pull off
because they require control of some intermediate node in the path between
the user and the web site. The pervasiveness of wireless networking has
reduced that barrier considerably. Any access point, even one using
encrypted communications (e.g. WPA), could be subverted—or
intentionally configured—to perform MITM attacks. Public WiFi
hotspots would be a perfect location for an attacker to set up a fraudulent
certificate for, say, Paypal, and sniff the credentials of any user that
connected to the service. Offering "Free Public
WiFi" in crowded places like airports might be another route to setting
up an MITM attack.
So far, only addons.mozilla.org
(the site for Firefox extensions) has been identified as a victim site, one
for which a fraudulent certificate has been issued. According to Appelbaum,
there are seven uniquely named certificates floating around (one of which
is issued to an invalid hostname: "global trustee"). He
speculates that "Facebook, Skype, Google, Microsoft, Mozilla, and
others are worthy of targeting". But Comodo has not released any
information about which hostnames were targeted, at least yet. It was
Mozilla who disclosed that addons.mozilla.org was a target.
As pointed out by LWN reader sumC,
Microsoft has put out an advisory
that lists the affected domains: login.live.com, mail.google.com,
www.google.com, login.yahoo.com (3 certificates), login.skype.com, and
addons.mozilla.org.
Alerted by browser code changes
Since browsers do not generally check the revocation status of
certificates, some other mechanism must be used to reject these bad
certificates. A change
to the Chromium browser code on March 16 is how Appelbaum was first
alerted to the problem. That change added
a new X509Certificate::IsBlacklisted() function for Chromium,
which listed the serial numbers for multiple certificates any of which
would cause the function to return "true". Some twelve hours later, Google
issued
a Chrome update that "blacklists a small number of HTTPS
certificates".
At around the same time, Mozilla also patched
Firefox with a similar, but not exactly the same, list of serial
numbers. This was clearly an indication that major browser vendors had been
alerted to a problem. Appelbaum started digging further, looking at the
published certificate revocation lists (CRLs) using his crlwatch program. Crlwatch
used the EFF's SSL
Observatory to find a canonical list of CRLs, then fetched them to look
for a match to any of the serial numbers blacklisted by Chromium or Mozilla.
He found matches for all but the two test certificates that were listed,
all pointing back to UserTrust. In addition, the Mozilla patch also points
to UserTrust as the compromised CA. It means that somehow an attacker was
able to get certificates issued by UserTrust for domain names that should
not have been issued. That could happen if the signing key were
compromised or UserTrust was somehow tricked into issuing those
certificates. So far, there are no details as to how the fraudulent
certificates were created.
Disclosure issues
As Appelbaum points out, it is clear that at least some browser makers were
alerted to the problem, but certainly not all. Tor releases a "Tor Browser
Bundle" and was not advised of the problem, so the project is left
scrambling to update its browsers (though, one would guess Appelbaum's
alertness in spotting the issue will have given the project a head start).
Other, smaller browsers are likely affected by the problem as well, but are just
now hearing about it.
Appelbaum agreed to an embargo on releasing information about the problem
until the Firefox 4
release on March 22. That embargo was extended to March 23 to ensure that
Microsoft could release an IE update, but Mozilla put out its posting
on the issue on March 22, at which point Appelbaum considered the
embargo lifted and posted his own information.
The embargo is very troubling since these certificates were evidently already
out there potentially causing trouble for users; hiding their existence
doesn't help users at all. Also worrisome is that
both Google and Mozilla told Appelbaum that the CA "had done
something quite remarkable by disclosing this compromise". It seems
that other CAs may have fallen prey to the same kinds of attacks and not
disclosed that fact. So it is possible that there are other known
fraudulent certificates in the wild, presumably just listed on CRLs without
any special browser blacklisting. An attacker holding one of those
certificates must be cackling with glee.
Even for the certificates that UserTrust/Comodo alerted about, it took some
time for browsers to be updated and will take even more time before users
get and install those updates. Since we don't know how the
certificate-issuing process was subverted, we have to hope that the CA is
taking proper steps to either invalidate the signing key if it was
compromised, or to change its process so things like this can't happen
again. It would also be good if UserTrust itself disclosed exactly which domain
names were affected, though we may already have that information via
Microsoft's advisory.
Browser certificate handling
It's easy to see that there is a problem with certificate handling in
browsers, but it is less clear what the proper solution is. Keeping track
of the certificate that is sent when a domain is first encountered and
comparing it on subsequent visits would be useful, but the browser makers
are likely to be uncomfortable with how they present any changes to users.
Certificates expire and are changed for other legitimate reasons, so it may
be difficult for users to distinguish a legitimate certificate change from
one that may be happening due to an MITM attack.
The browsers could also default to using the CRL or online certificate
status protocol (OCSP) queries to ensure that the certificates are
still valid. That requires that the CA be available to answer those
queries every time a certificate is sent by a site, as any downtime will
result in web sites becoming unavailable. Adam Langley offers the idea of
short-lived certificates in his "Revocation doesn't work" blog posting that
was linked above. There are other possible solutions as well, some of
which Appelbaum mentions toward the end of his post.
The real problem, though, may be that the CA model doesn't work well on a
(largely) decentralized network like the internet. The problems range from
incidents like this one to worries about possibly rogue CAs. There are other ideas
for handling certificates in a non-CA world (or one where CAs have much
reduced authority), but there is a rather large stumbling block to changing
the current system: the enormous economic interest that the CAs have in
keeping things more or less as they are. CAs derive a huge amount of money
from their semi-monopoly on the issuing of certificates, and one could
expect that any threat to that income stream would be met with strong
resistance. But cases like this one may start to make it clear that changes
are required.
Comments (20 posted)
By Jake Edge
March 20, 2011
Way back in the early days of Linux, shortly after Linus Torvalds switched
the kernel from his own "non-commercial" license to the GPL, he also added
an important clarification to the kernel's license. In the
COPYING file at the
top of the kernel tree since mid-1993, there has been a clear statement
that Torvalds, at least, does not consider user-space programs to be derived
from the kernel, and thus are not subject to the kernel's license:
This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived
work".
One could easily argue that this
distinction is one of the reasons that Linux is so popular today as
programs written to run on Linux can be under whatever license that the
developer chooses. Some recent analyses of Google's Bionic libc
implementation, which claim that Google may be violating the kernel's license,
seem to be missing—or misunderstanding—that clarification.
A blog
posting
from Raymond T. Nimmer, who is a professor specializing in intellectual
property (IP) law, was the starting point.
That posting looks at the boundaries between copyleft and
non-copyleft code. Nimmer specifically analyzes the question of whether
header files that specify an API to a GPL-covered work can be incorporated
into a program that is not released under the GPL. He points to
Google's use of the kernel header files in the Bionic library as an example
and concludes:
For entities that do not desire to disclose code or force their
customers to do so, or otherwise conform to copyleft obligations, working
with copyleft platforms and programs presents a very significant and
uncertain, risk-reward equation.
Nimmer's post was noticed by Edward J. Naughton, a practicing IP attorney,
who then wrote briefly
about it at the Huffington Post. Naughton also did a much
longer analysis [PDF] as an advisory for his law firm, Brown
Rudnick. That advisory concludes with a fairly ominous warning:
But if Google is right, if it has succeeded
in removing all copyrightable material from the Linux kernel headers, then it has unlocked the Linux kernel from the
restrictions of GPLv2. Google can now use the "clean" Bionic headers to create a non-GPL'd fork of the Linux kernel,
one that can be extended under proprietary license terms. Even if Google does not do this itself, it has enabled others
to do so. It also has provided a useful roadmap for those who might want to do the same thing with other GPLv2-licensed programs, such as databases.
In turn, Naughton and Nimmer's analyses were picked up by Florian Mueller
who wrote a blog
post
about the serious threat that Google and Android face because of this
supposed GPL violation. So, is Google really circumventing
the GPL in a way that could threaten Linux? To answer that, we'll have to
dig into what Bionic is, how it is built, and whether it violates the
letter or spirit of the Linux COPYING file.
An interface for user space
The kernel exists to provide services to user space, and one can do nothing
useful from user space on a Linux system without invoking the kernel via a
system
call. That system call boundary is quite clear. It requires a special
instruction that puts the CPU into kernel mode in order to invoke one.
While programmers may see system calls as simple library calls, that's not
what's
happening under the covers.
In order to use Linux system calls, though, it is necessary to get
information from the kernel header files. Various pieces of information
are needed including system call numbers (which is how they are invoked),
type information for various system call arguments, as well as constants
that are required to properly invoke those calls. That information is
stored in the kernel headers and any program that wants to run on Linux
needs to get that information somehow.
The most common way to invoke the kernel is by using GNU libc (glibc).
Glibc has a set of "sanitized" kernel header files that are used to build
the library, and distributions typically provide packages with those header
files to be installed into /usr/include. Programs can then be
built by using those header files and linking to glibc. While "sanitized"
may sound like it refers to the removal of GPL-covered elements from those
files, the main reason it is done is to remove kernel-specific elements
from the files. The kernel headers have lots of kernel-internal types,
constants, and functions that are not part of the kernel interface.
It isn't really correct to call the interface that the kernel
provides to user space an API (i.e. application programming interface), as
it is really an application binary
interface (ABI), and one that the kernel hackers strive to maintain for
each new kernel release. Removing something from the kernel ABI almost
never happens, though new features expand that ABI frequently. The ABI is
what allows binaries that were
built on an earlier kernel to run on newer kernels. The API, on the other
hand, is provided by
glibc or some other library.
Using glibc is just one way for a program to be built to run on Linux.
There are other libc implementations, including uClibc and dietlibc, which
are targeted at embedded devices, as well as the embedded fork of glibc,
EGLIBC. A program could also use assembly language instructions to make
system calls more directly. Using any of those methods to get at the
system call interface is perfectly reasonable, and will require information
from the kernel headers. Glibc may be the most popular, but it certainly
isn't the only way.
Android's Bionic libc is, at some level, just another alternative C library
implementation. It is based on libc from the BSDs with some Google
additions like a simple pthread implementation, and has a BSD license.
It's also a lot smaller than glibc—roughly half the size. The
license satisfies one of the goals
for Android: keeping the GPL out of user space. While glibc is not
under the GPL, as it is licensed under the LGPL (v2 currently, with a plan
to move it to v3), that may concern Google
(and its partners) because LGPLv3 requires that users be able to replace the
library—something that doesn't mesh well with locking down phones and
other Android devices. In the end,
it doesn't matter, as Google, like any other kernel user, can make Linux
system calls any way it chooses.
Bionic's use of kernel headers
So what does Google do that causes Nimmer, Naughton, and Mueller to claim
that it is circumventing the GPL to the detriment of the community? To
create the header files used by Bionic, and applications, Google processes
the kernel header files to remove all of the extra stuff that is either
only there for the kernel, or doesn't make sense in the Bionic environment.
In short, with minor exceptions, Bionic is doing exactly what glibc is
doing, taking the kernel header files and massaging them into a form that
defines the interface so that they can be used by the library itself and
any applications
that use the library. Nor has Google hidden what it's done, as there is
a README.TXT
file that is quite clear on what it is doing and why it is doing it.
Glibc and others may be using the kernel headers that can be generated from
a kernel source tree by doing a "make headers_install". That
Makefile target was added to help library developers and distributions
create the header files that are required to use the kernel ABI. It is not
a requirement, as there are other ways to generate (or create) the required
headers, and various libraries have done it differently along the way.
The Android developers intend to eventually use the headers that can be
created from the kernel tree, but there are currently some technical
barriers to doing so. The
key piece to understand is that the information required to use the
kernel ABI are contained in one and only one place: the kernel header files.
There are two things that Bionic does that are perhaps a bit questionable.
The first is that as part of munging the header files, it removes the
comments from them, including the copyright notice at the top of the file.
It replaces the copyright information with a generic "This header was
automatically generated ..." message, which concludes with:
"It contains only constants, structures, and macros generated from
the original header, and thus, contains no copyrightable
information." The latter part is likely what has the IP experts up
in arms. Much of Naughton and Nimmer's postings indicate that they believe
Google overreached in terms of copyright law by stating that the files do
not contain elements eligible for copyright protection.
They may be right in a technical sense, but it still may not make any
difference at all. Calling into the kernel requires constants and types
(structures mostly) that can only come from the kernel headers. Those make
up the functional definition of the ABI, and that ABI has been
explicitly cleared for use by non-GPL code. One could argue that
Google should keep the copyright information intact—one would guess
lawyers were involved in the decision not to and the wording of that
statement—but that is most likely only a nicety and not required once
one understands that those files just contain the ABI information, nothing
more.
Well, perhaps there is a bit more. The Bionic README, notes that
"the 'clean headers' only contain type and macro definitions, with
the exception of a couple static inline functions used for performance
reason (e.g. optimized CPU-specific byte-swapping routines)". The
latter might be considered elements worthy of copyright
protection—and not part of the kernel ABI—but they might not as
well. Those routines are written in assembly code, so they might well be
considered to be the only way to efficiently write byte-swapping routines for
each of the architectures and thus might be considered purely functional
elements.
Misunderstanding Torvalds
Both Naughton and Mueller make a big deal about a posting from Torvalds in
2003 that ends with the shout: "BUT YOU CAN NOT USE THE KERNEL HEADER
FILES TO CREATE NON-GPL'D BINARIES." While it would seem to be a
statement from Torvalds damning exactly what Google is doing, that would be
a misreading of what he is saying. One need look no further than the
subject of the thread ("Linux GPL and binary module exception
clause?") to see that the context is not about user-space
binaries, but instead about binary kernel modules. Torvalds may have been a
little loose with his terminology in that post, but stepping back through
the thread makes it clear he is talking about kernel modules. Furthermore,
in another post in that
same thread, he reiterates his stance on user-space programs:
This was indeed one of the worries that people had a long time ago, and is
one (but only one) of the reasons for the addition of the clarification to
the COPYING file for the kernel.
So I agree with you from a technical standpoint, and I claim that the
clarification in COPYING about user space usage through normal system
calls covers that special case.
But at the same time I do want to say that I discourage use of the kernel
header files for user programs for _other_ reasons (ie for the last 8
years or so, the suggestion has been to have a separate copy of the header
files for the user space library). But that's due to technical issues
(since I think the language of the COPYING file takes care of all
copyright issues): trying to avoid version dependencies.
That's a pretty unambiguous statement about using the kernel headers for
user-space programs. In fact, in the early days, the accepted practice was
to symbolically link the kernel headers into /usr/include, and one
might guess that any number of proprietary (and other non-GPL) programs
were built that way.
Torvalds is no lawyer (nor am I), but his (and the other kernel hackers')
intent is likely to be very important in the extremely unlikely case this
ever gets litigated.
It is almost amusing that Mueller argues that Google should switch to using
glibc, rather than Bionic. It reflects a grave misunderstanding of the
differences between the two libraries. If the Nimmer/Naughton arguments
are right, it's hard to see how glibc is any different. Their argument
essentially boils down to there being no way to use the kernel headers
without a requirement to apply the GPL to the resulting code.
It's certainly not impossible to imagine someone filing a lawsuit over
Android's use of the kernel headers. It's also possible that a judge might
rule
against Android. But given that kernel hackers want user-space programs to
use the ABI and that the COPYING file explicitly excludes those
programs from being considered derived works of the kernel, one would guess
that some kind of workaround would be found rather quickly. Other than the
fear, uncertainty, and doubt that these arguments might engender, one would
guess that
Google isn't really losing much sleep over them.
Comments (199 posted)
By Jonathan Corbet
March 23, 2011
One way to know that a product has become successful is to see others
lining up to attack it in the courts. Imitation may be the sincerest form
of flattery,
but FUD and lawsuits also show, in a rather less sincere way, a certain
form of respect. By this measure, the Android system, by virtue of being
the target of both, must be riding high indeed.
At the top of the news is Microsoft's just-announced lawsuit against Barnes
& Noble, an American bookstore chain. According to Microsoft, the Nook ebook reader, which is
based on Android, violates five of Microsoft's software patents. By now,
the fact that these patents cover trivial "innovations" should not come as
a surprise. Here is a quick overview of what Microsoft claims to own:
- #5,778,372:
"Remote retrieval and display management of electronic document with
incorporated images." What's described here is displaying a document
on top of a background image with the brilliant feature that the
document is displayed while the image is still transferring and the
whole page is redrawn once the image shows up.
- #5,889,522:
"System provided child window controls." This patent covers an
application settings dialog with tabs.
- #6,339,780:
"Loading status in a hypermedia browser having a limited available
display area." This patent covers the idea of putting up a "loading"
image over a page while the page is loading.
- #6,891,551:
"Selection handles in editing electronic documents." Covered here is
the technique of putting little images around a selection area so that
the user can, by dragging those handles, resize the area.
- #6,957,233:
"Method and apparatus for capturing and rendering annotations for
non-modifiable electronic content" - a method for storing
"annotations" outside of the text of a read-only document. Broadly
read, this patent could be said to cover browser bookmarks, an idea
that somebody might just have thought of before this patent's 1999
filing date.
It seems unlikely that any of these patents would stand up against a
suitably determined challenge in court - though such an opinion does
certainly rely on a possibly naïve view of the rationality of American
patent courts. In the real world, challenging patents is a time-consuming
and expensive affair. Barnes & Noble, which may well have been chosen
because it looks like a weak target, may not have an appetite for that kind
of fight. That said, the company has, evidently, refused offers to license
these patents from Microsoft; its management had to know that a lawsuit
would be the next step.
Perhaps we'll see a counterattack from the rest of the industry; the idea
of paying rent to Microsoft for the privilege of using bookmarks is
unlikely to have broad appeal. Or, perhaps, the entire mobile marketplace
in the US will simply collapse under the weight of the steadily increasing
pile of patent-related lawsuits.
The patent suit is not the only attack ongoing against Android currently; there
is also, seemingly, a determined FUD campaign underway. Most recently,
this campaign has taken the form of assertions that Android is violating
the Linux kernel's license; LWN examined those
claims in a separate article. It's worth noting that one of the
proponents of these claims has
represented Microsoft in the past - a fact which he chose to remove
from his online
biography shortly before beginning the attack.
The goal of these attacks seems clear - to inject fear, uncertainty, and
doubt into the growing Android ecosystem. Hardware vendors have been
served notice
that they may well be sued for using Android. Software vendors are being
told that writing for Android could expose them to copyright infringement
claims. It's
all aimed at getting these companies to reconsider investing in Android in
favor of the "safer" alternatives.
The thing is, of course, we have seen this kind of FUD campaign before. As
early as the 1990's, people were warning that use of Linux could cause
companies to "lose their intellectual property." Fortunately, most
companies have figured out that they face no such threat. So, when we see
something like this
ludicrous claim:
If Google is proven wrong, pretty much that entire software stack
-- and also many popular third-party closed-source components such
as the Angry Birds game and the Adobe Flash Player -- would
actually have to be published under the GPL.
We can relax in the knowledge that we have seen such things before. Linux
turns out to be surprisingly resilient to FUD; this campaign seems
unlikely to even slow things down.
It is worth noting that, despite the recent claims that companies working
with Android might be sued by free software developers, the actual lawsuit
was filed by a company which is generally hostile to free software. There
are indeed threats out there, but they do not come from our community; they
come from companies which feel they are losing in the market and which, as
such companies so often do, are turning to the courts instead. It is one
of the less pleasant signs of success; our community should feel flattered
indeed.
Comments (3 posted)
Page editor: Jonathan Corbet
Next page: Security>>