By Jake Edge
March 20, 2011
Way back in the early days of Linux, shortly after Linus Torvalds switched
the kernel from his own "non-commercial" license to the GPL, he also added
an important clarification to the kernel's license. In the
COPYING file at the
top of the kernel tree since mid-1993, there has been a clear statement
that Torvalds, at least, does not consider user-space programs to be derived
from the kernel, and thus are not subject to the kernel's license:
This copyright does *not* cover user programs that use kernel
services by normal system calls - this is merely considered normal use
of the kernel, and does *not* fall under the heading of "derived
work".
One could easily argue that this
distinction is one of the reasons that Linux is so popular today as
programs written to run on Linux can be under whatever license that the
developer chooses. Some recent analyses of Google's Bionic libc
implementation, which claim that Google may be violating the kernel's license,
seem to be missing—or misunderstanding—that clarification.
A blog
posting
from Raymond T. Nimmer, who is a professor specializing in intellectual
property (IP) law, was the starting point.
That posting looks at the boundaries between copyleft and
non-copyleft code. Nimmer specifically analyzes the question of whether
header files that specify an API to a GPL-covered work can be incorporated
into a program that is not released under the GPL. He points to
Google's use of the kernel header files in the Bionic library as an example
and concludes:
For entities that do not desire to disclose code or force their
customers to do so, or otherwise conform to copyleft obligations, working
with copyleft platforms and programs presents a very significant and
uncertain, risk-reward equation.
Nimmer's post was noticed by Edward J. Naughton, a practicing IP attorney,
who then wrote briefly
about it at the Huffington Post. Naughton also did a much
longer analysis [PDF] as an advisory for his law firm, Brown
Rudnick. That advisory concludes with a fairly ominous warning:
But if Google is right, if it has succeeded
in removing all copyrightable material from the Linux kernel headers, then it has unlocked the Linux kernel from the
restrictions of GPLv2. Google can now use the "clean" Bionic headers to create a non-GPL'd fork of the Linux kernel,
one that can be extended under proprietary license terms. Even if Google does not do this itself, it has enabled others
to do so. It also has provided a useful roadmap for those who might want to do the same thing with other GPLv2-licensed programs, such as databases.
In turn, Naughton and Nimmer's analyses were picked up by Florian Mueller
who wrote a blog
post
about the serious threat that Google and Android face because of this
supposed GPL violation. So, is Google really circumventing
the GPL in a way that could threaten Linux? To answer that, we'll have to
dig into what Bionic is, how it is built, and whether it violates the
letter or spirit of the Linux COPYING file.
An interface for user space
The kernel exists to provide services to user space, and one can do nothing
useful from user space on a Linux system without invoking the kernel via a
system
call. That system call boundary is quite clear. It requires a special
instruction that puts the CPU into kernel mode in order to invoke one.
While programmers may see system calls as simple library calls, that's not
what's
happening under the covers.
In order to use Linux system calls, though, it is necessary to get
information from the kernel header files. Various pieces of information
are needed including system call numbers (which is how they are invoked),
type information for various system call arguments, as well as constants
that are required to properly invoke those calls. That information is
stored in the kernel headers and any program that wants to run on Linux
needs to get that information somehow.
The most common way to invoke the kernel is by using GNU libc (glibc).
Glibc has a set of "sanitized" kernel header files that are used to build
the library, and distributions typically provide packages with those header
files to be installed into /usr/include. Programs can then be
built by using those header files and linking to glibc. While "sanitized"
may sound like it refers to the removal of GPL-covered elements from those
files, the main reason it is done is to remove kernel-specific elements
from the files. The kernel headers have lots of kernel-internal types,
constants, and functions that are not part of the kernel interface.
It isn't really correct to call the interface that the kernel
provides to user space an API (i.e. application programming interface), as
it is really an application binary
interface (ABI), and one that the kernel hackers strive to maintain for
each new kernel release. Removing something from the kernel ABI almost
never happens, though new features expand that ABI frequently. The ABI is
what allows binaries that were
built on an earlier kernel to run on newer kernels. The API, on the other
hand, is provided by
glibc or some other library.
Using glibc is just one way for a program to be built to run on Linux.
There are other libc implementations, including uClibc and dietlibc, which
are targeted at embedded devices, as well as the embedded fork of glibc,
EGLIBC. A program could also use assembly language instructions to make
system calls more directly. Using any of those methods to get at the
system call interface is perfectly reasonable, and will require information
from the kernel headers. Glibc may be the most popular, but it certainly
isn't the only way.
Android's Bionic libc is, at some level, just another alternative C library
implementation. It is based on libc from the BSDs with some Google
additions like a simple pthread implementation, and has a BSD license.
It's also a lot smaller than glibc—roughly half the size. The
license satisfies one of the goals
for Android: keeping the GPL out of user space. While glibc is not
under the GPL, as it is licensed under the LGPL (v2 currently, with a plan
to move it to v3), that may concern Google
(and its partners) because LGPLv3 requires that users be able to replace the
library—something that doesn't mesh well with locking down phones and
other Android devices. In the end,
it doesn't matter, as Google, like any other kernel user, can make Linux
system calls any way it chooses.
Bionic's use of kernel headers
So what does Google do that causes Nimmer, Naughton, and Mueller to claim
that it is circumventing the GPL to the detriment of the community? To
create the header files used by Bionic, and applications, Google processes
the kernel header files to remove all of the extra stuff that is either
only there for the kernel, or doesn't make sense in the Bionic environment.
In short, with minor exceptions, Bionic is doing exactly what glibc is
doing, taking the kernel header files and massaging them into a form that
defines the interface so that they can be used by the library itself and
any applications
that use the library. Nor has Google hidden what it's done, as there is
a README.TXT
file that is quite clear on what it is doing and why it is doing it.
Glibc and others may be using the kernel headers that can be generated from
a kernel source tree by doing a "make headers_install". That
Makefile target was added to help library developers and distributions
create the header files that are required to use the kernel ABI. It is not
a requirement, as there are other ways to generate (or create) the required
headers, and various libraries have done it differently along the way.
The Android developers intend to eventually use the headers that can be
created from the kernel tree, but there are currently some technical
barriers to doing so. The
key piece to understand is that the information required to use the
kernel ABI are contained in one and only one place: the kernel header files.
There are two things that Bionic does that are perhaps a bit questionable.
The first is that as part of munging the header files, it removes the
comments from them, including the copyright notice at the top of the file.
It replaces the copyright information with a generic "This header was
automatically generated ..." message, which concludes with:
"It contains only constants, structures, and macros generated from
the original header, and thus, contains no copyrightable
information." The latter part is likely what has the IP experts up
in arms. Much of Naughton and Nimmer's postings indicate that they believe
Google overreached in terms of copyright law by stating that the files do
not contain elements eligible for copyright protection.
They may be right in a technical sense, but it still may not make any
difference at all. Calling into the kernel requires constants and types
(structures mostly) that can only come from the kernel headers. Those make
up the functional definition of the ABI, and that ABI has been
explicitly cleared for use by non-GPL code. One could argue that
Google should keep the copyright information intact—one would guess
lawyers were involved in the decision not to and the wording of that
statement—but that is most likely only a nicety and not required once
one understands that those files just contain the ABI information, nothing
more.
Well, perhaps there is a bit more. The Bionic README, notes that
"the 'clean headers' only contain type and macro definitions, with
the exception of a couple static inline functions used for performance
reason (e.g. optimized CPU-specific byte-swapping routines)". The
latter might be considered elements worthy of copyright
protection—and not part of the kernel ABI—but they might not as
well. Those routines are written in assembly code, so they might well be
considered to be the only way to efficiently write byte-swapping routines for
each of the architectures and thus might be considered purely functional
elements.
Misunderstanding Torvalds
Both Naughton and Mueller make a big deal about a posting from Torvalds in
2003 that ends with the shout: "BUT YOU CAN NOT USE THE KERNEL HEADER
FILES TO CREATE NON-GPL'D BINARIES." While it would seem to be a
statement from Torvalds damning exactly what Google is doing, that would be
a misreading of what he is saying. One need look no further than the
subject of the thread ("Linux GPL and binary module exception
clause?") to see that the context is not about user-space
binaries, but instead about binary kernel modules. Torvalds may have been a
little loose with his terminology in that post, but stepping back through
the thread makes it clear he is talking about kernel modules. Furthermore,
in another post in that
same thread, he reiterates his stance on user-space programs:
This was indeed one of the worries that people had a long time ago, and is
one (but only one) of the reasons for the addition of the clarification to
the COPYING file for the kernel.
So I agree with you from a technical standpoint, and I claim that the
clarification in COPYING about user space usage through normal system
calls covers that special case.
But at the same time I do want to say that I discourage use of the kernel
header files for user programs for _other_ reasons (ie for the last 8
years or so, the suggestion has been to have a separate copy of the header
files for the user space library). But that's due to technical issues
(since I think the language of the COPYING file takes care of all
copyright issues): trying to avoid version dependencies.
That's a pretty unambiguous statement about using the kernel headers for
user-space programs. In fact, in the early days, the accepted practice was
to symbolically link the kernel headers into /usr/include, and one
might guess that any number of proprietary (and other non-GPL) programs
were built that way.
Torvalds is no lawyer (nor am I), but his (and the other kernel hackers')
intent is likely to be very important in the extremely unlikely case this
ever gets litigated.
It is almost amusing that Mueller argues that Google should switch to using
glibc, rather than Bionic. It reflects a grave misunderstanding of the
differences between the two libraries. If the Nimmer/Naughton arguments
are right, it's hard to see how glibc is any different. Their argument
essentially boils down to there being no way to use the kernel headers
without a requirement to apply the GPL to the resulting code.
It's certainly not impossible to imagine someone filing a lawsuit over
Android's use of the kernel headers. It's also possible that a judge might
rule
against Android. But given that kernel hackers want user-space programs to
use the ABI and that the COPYING file explicitly excludes those
programs from being considered derived works of the kernel, one would guess
that some kind of workaround would be found rather quickly. Other than the
fear, uncertainty, and doubt that these arguments might engender, one would
guess that
Google isn't really losing much sleep over them.
(
Log in to post comments)