Has Bionic stepped over the GPL line?
Way back in the early days of Linux, shortly after Linus Torvalds switched the kernel from his own "non-commercial" license to the GPL, he also added an important clarification to the kernel's license. In the COPYING file at the top of the kernel tree since mid-1993, there has been a clear statement that Torvalds, at least, does not consider user-space programs to be derived from the kernel, and thus are not subject to the kernel's license:
One could easily argue that this distinction is one of the reasons that Linux is so popular today as programs written to run on Linux can be under whatever license that the developer chooses. Some recent analyses of Google's Bionic libc implementation, which claim that Google may be violating the kernel's license, seem to be missing—or misunderstanding—that clarification.
A blog posting from Raymond T. Nimmer, who is a professor specializing in intellectual property (IP) law, was the starting point. That posting looks at the boundaries between copyleft and non-copyleft code. Nimmer specifically analyzes the question of whether header files that specify an API to a GPL-covered work can be incorporated into a program that is not released under the GPL. He points to Google's use of the kernel header files in the Bionic library as an example and concludes:
Nimmer's post was noticed by Edward J. Naughton, a practicing IP attorney, who then wrote briefly about it at the Huffington Post. Naughton also did a much longer analysis [PDF] as an advisory for his law firm, Brown Rudnick. That advisory concludes with a fairly ominous warning:
In turn, Naughton and Nimmer's analyses were picked up by Florian Mueller who wrote a blog post about the serious threat that Google and Android face because of this supposed GPL violation. So, is Google really circumventing the GPL in a way that could threaten Linux? To answer that, we'll have to dig into what Bionic is, how it is built, and whether it violates the letter or spirit of the Linux COPYING file.
An interface for user space
The kernel exists to provide services to user space, and one can do nothing useful from user space on a Linux system without invoking the kernel via a system call. That system call boundary is quite clear. It requires a special instruction that puts the CPU into kernel mode in order to invoke one. While programmers may see system calls as simple library calls, that's not what's happening under the covers.
In order to use Linux system calls, though, it is necessary to get information from the kernel header files. Various pieces of information are needed including system call numbers (which is how they are invoked), type information for various system call arguments, as well as constants that are required to properly invoke those calls. That information is stored in the kernel headers and any program that wants to run on Linux needs to get that information somehow.
The most common way to invoke the kernel is by using GNU libc (glibc). Glibc has a set of "sanitized" kernel header files that are used to build the library, and distributions typically provide packages with those header files to be installed into /usr/include. Programs can then be built by using those header files and linking to glibc. While "sanitized" may sound like it refers to the removal of GPL-covered elements from those files, the main reason it is done is to remove kernel-specific elements from the files. The kernel headers have lots of kernel-internal types, constants, and functions that are not part of the kernel interface.
It isn't really correct to call the interface that the kernel provides to user space an API (i.e. application programming interface), as it is really an application binary interface (ABI), and one that the kernel hackers strive to maintain for each new kernel release. Removing something from the kernel ABI almost never happens, though new features expand that ABI frequently. The ABI is what allows binaries that were built on an earlier kernel to run on newer kernels. The API, on the other hand, is provided by glibc or some other library.
Using glibc is just one way for a program to be built to run on Linux. There are other libc implementations, including uClibc and dietlibc, which are targeted at embedded devices, as well as the embedded fork of glibc, EGLIBC. A program could also use assembly language instructions to make system calls more directly. Using any of those methods to get at the system call interface is perfectly reasonable, and will require information from the kernel headers. Glibc may be the most popular, but it certainly isn't the only way.
Android's Bionic libc is, at some level, just another alternative C library implementation. It is based on libc from the BSDs with some Google additions like a simple pthread implementation, and has a BSD license. It's also a lot smaller than glibc—roughly half the size. The license satisfies one of the goals for Android: keeping the GPL out of user space. While glibc is not under the GPL, as it is licensed under the LGPL (v2 currently, with a plan to move it to v3), that may concern Google (and its partners) because LGPLv3 requires that users be able to replace the library—something that doesn't mesh well with locking down phones and other Android devices. In the end, it doesn't matter, as Google, like any other kernel user, can make Linux system calls any way it chooses.
Bionic's use of kernel headers
So what does Google do that causes Nimmer, Naughton, and Mueller to claim that it is circumventing the GPL to the detriment of the community? To create the header files used by Bionic, and applications, Google processes the kernel header files to remove all of the extra stuff that is either only there for the kernel, or doesn't make sense in the Bionic environment. In short, with minor exceptions, Bionic is doing exactly what glibc is doing, taking the kernel header files and massaging them into a form that defines the interface so that they can be used by the library itself and any applications that use the library. Nor has Google hidden what it's done, as there is a README.TXT file that is quite clear on what it is doing and why it is doing it.
Glibc and others may be using the kernel headers that can be generated from a kernel source tree by doing a "make headers_install". That Makefile target was added to help library developers and distributions create the header files that are required to use the kernel ABI. It is not a requirement, as there are other ways to generate (or create) the required headers, and various libraries have done it differently along the way. The Android developers intend to eventually use the headers that can be created from the kernel tree, but there are currently some technical barriers to doing so. The key piece to understand is that the information required to use the kernel ABI are contained in one and only one place: the kernel header files.
There are two things that Bionic does that are perhaps a bit questionable.
The first is that as part of munging the header files, it removes the
comments from them, including the copyright notice at the top of the file.
It replaces the copyright information with a generic "This header was
automatically generated ...
" message, which concludes with:
"It contains only constants, structures, and macros generated from
the original header, and thus, contains no copyrightable
information.
" The latter part is likely what has the IP experts up
in arms. Much of Naughton and Nimmer's postings indicate that they believe
Google overreached in terms of copyright law by stating that the files do
not contain elements eligible for copyright protection.
They may be right in a technical sense, but it still may not make any difference at all. Calling into the kernel requires constants and types (structures mostly) that can only come from the kernel headers. Those make up the functional definition of the ABI, and that ABI has been explicitly cleared for use by non-GPL code. One could argue that Google should keep the copyright information intact—one would guess lawyers were involved in the decision not to and the wording of that statement—but that is most likely only a nicety and not required once one understands that those files just contain the ABI information, nothing more.
Well, perhaps there is a bit more. The Bionic README, notes that
"the 'clean headers' only contain type and macro definitions, with
the exception of a couple static inline functions used for performance
reason (e.g. optimized CPU-specific byte-swapping routines)
". The
latter might be considered elements worthy of copyright
protection—and not part of the kernel ABI—but they might not as
well. Those routines are written in assembly code, so they might well be
considered to be the only way to efficiently write byte-swapping routines for
each of the architectures and thus might be considered purely functional
elements.
Misunderstanding Torvalds
Both Naughton and Mueller make a big deal about a posting from Torvalds in
2003 that ends with the shout: "BUT YOU CAN NOT USE THE KERNEL HEADER
FILES TO CREATE NON-GPL'D BINARIES.
" While it would seem to be a
statement from Torvalds damning exactly what Google is doing, that would be
a misreading of what he is saying. One need look no further than the
subject of the thread ("Linux GPL and binary module exception
clause?
") to see that the context is not about user-space
binaries, but instead about binary kernel modules. Torvalds may have been a
little loose with his terminology in that post, but stepping back through
the thread makes it clear he is talking about kernel modules. Furthermore,
in another post in that
same thread, he reiterates his stance on user-space programs:
So I agree with you from a technical standpoint, and I claim that the clarification in COPYING about user space usage through normal system calls covers that special case.
But at the same time I do want to say that I discourage use of the kernel header files for user programs for _other_ reasons (ie for the last 8 years or so, the suggestion has been to have a separate copy of the header files for the user space library). But that's due to technical issues (since I think the language of the COPYING file takes care of all copyright issues): trying to avoid version dependencies.
That's a pretty unambiguous statement about using the kernel headers for user-space programs. In fact, in the early days, the accepted practice was to symbolically link the kernel headers into /usr/include, and one might guess that any number of proprietary (and other non-GPL) programs were built that way. Torvalds is no lawyer (nor am I), but his (and the other kernel hackers') intent is likely to be very important in the extremely unlikely case this ever gets litigated.
It is almost amusing that Mueller argues that Google should switch to using glibc, rather than Bionic. It reflects a grave misunderstanding of the differences between the two libraries. If the Nimmer/Naughton arguments are right, it's hard to see how glibc is any different. Their argument essentially boils down to there being no way to use the kernel headers without a requirement to apply the GPL to the resulting code.
It's certainly not impossible to imagine someone filing a lawsuit over
Android's use of the kernel headers. It's also possible that a judge might
rule
against Android. But given that kernel hackers want user-space programs to
use the ABI and that the COPYING file explicitly excludes those
programs from being considered derived works of the kernel, one would guess
that some kind of workaround would be found rather quickly. Other than the
fear, uncertainty, and doubt that these arguments might engender, one would
guess that
Google isn't really losing much sleep over them.
