Where did that code come from?
[Posted June 4, 2003 by corbet]
SCO's lawsuit claims that code has been copied from its (or somebody's)
proprietary code base into the Linux kernel. Beyond that, SCO claims that
IBM, in particular, is responsible for that copying. These claims remain
hypothetical as long as SCO refuses to provide any proof. As an
intellectual exercise, let us imagine for a moment that SCO is actually
able to produce examples of code that appear to have been copied from one
system to the other. How then do they go about proving where it came from?
The kernel development process is nearly unique in a couple of ways. In
one sense, it is one of the mostly tightly controlled projects out there;
one - and exactly one - person can commit code to the mainline kernel
tree. If Linus does not merge a patch, it simply does not go in. On the
other hand, Linus did not use any sort of source code management system
until early 2002. He also did not maintain any sort of records of what he
merged, as far as anybody can tell; changelogs for kernel releases had to
be created by digging through the large release-to-release patches and
seeing what changed.
So, while Linus is the "choke point" through which all
patches pass, the record of what happened at that point is limited to his
official kernel releases. One can look at Linus's output to determine,
with great precision, when a particular patch went in, and,
importantly, the evolutionary steps it took to get to its present state.
Figuring out where it came from will be another story.
Since there is little information to be had from Linus on the provenance of
patches before the BitKeeper era (which is the time period SCO is
interested in), it will be necessary to try to trace any offending patches
backward. And that means looking at how code reaches Linus. The basic
nature of the submission process has not changed in a long time.
Some code is written by Linus himself. Linus's contributions have
become a very small part of the whole, but he does still have something to
add at times. It is probably safe to assume that Linus is not copying his
work from proprietary Unix.
Some patches get to him by way of the linux-kernel mailing list. It
is rare for Linus to pick up patches directly from linux-kernel, but it
does happen. If a particular piece of allegedly infringing code was posted
publicly, it should be possible to determine who sent it out. Chances are
that the SCO investigators, if they really have infringing code to show,
have been digging through the linux-kernel archives in the hopes of finding
this sort of "smoking gun." The thought of SCO lawyers wading through old
devfs flamewars is good for a smile or two.
Many patches go directly to Linus, often with no public posting at all.
For example, much of Alexander Viro's work - big changes to core parts of
the kernel - are first seen when they show up in a kernel release. There
will be no record of these contributions other than Linus's memory and,
perhaps, any existing backups of his mail spool. Unless this code comes
with a comment like:
/*
* Copyright © Caldera International
*
* Ripped off in 2000 by plagiarist@ibm.com; I'm too lazy to
* do this myself and they'll never notice.
*/
it will be very hard for SCO (or anybody else) to prove where the code came
from.
The rest of the patches arrive by way of one of the "lieutenants" -
developers like Alan Cox, David Miller, Greg Kroah-Hartman, Andrew Morton,
and others. Some of these developers have used source management systems
at times, others have not. Again, much of this code goes into the kernel
without ever being posted on a public list. It can have two layers of
private communication obscuring its true origin.
The end result of all the above is that the kernel development process is
not quite as open as many people believe. A lot of code is posted
publicly and its authors duly flamed for anything that does not look quite
right. But a lot of code takes a quieter path and only sees the light of
day when it shows up mixed into a development kernel. Much of the code
that went into the 2.3 development series could be nearly impossible to
trace back to its contributor.
It is also worth bearing in mind that no sort of paperwork is required to
contribute code to the kernel. No copyright assignments, no warranties of
originality, no indemnification. So there is no paper trail behind
contributions to the Linux kernel - at least, not on the Linus side of
things. One can only assume that companies like IBM have rather more
rigorous procedures internally. But before that matters, a particular
chunk of code will have to be traced back to IBM, and that could prove
difficult.
The "low-ceremony" nature of kernel development is one of its attractions;
the only thing you need in order to to play the game is some worthwhile
code. It would be a
shame if legal pressures eventually forced Linus to erect a wall of
paperwork between himself and aspiring contributors.
For SCO's purposes, however, it is too late. Unless an IBM employee went
out of his or her way to attach their name to a code contribution via a
public posting or internal comments, it may never be possible to prove the
origin of that contribution. And that could be bad news for SCO, which has
gone out of its way to state that IBM, in particular, is responsible for
the copying that SCO claims has occurred.
(
Log in to post comments)