Where did that code come from?

[Posted June 4, 2003 by corbet]

SCO's lawsuit claims that code has been copied from its (or somebody's) proprietary code base into the Linux kernel. Beyond that, SCO claims that IBM, in particular, is responsible for that copying. These claims remain hypothetical as long as SCO refuses to provide any proof. As an intellectual exercise, let us imagine for a moment that SCO is actually able to produce examples of code that appear to have been copied from one system to the other. How then do they go about proving where it came from?

The kernel development process is nearly unique in a couple of ways. In one sense, it is one of the mostly tightly controlled projects out there; one - and exactly one - person can commit code to the mainline kernel tree. If Linus does not merge a patch, it simply does not go in. On the other hand, Linus did not use any sort of source code management system until early 2002. He also did not maintain any sort of records of what he merged, as far as anybody can tell; changelogs for kernel releases had to be created by digging through the large release-to-release patches and seeing what changed. So, while Linus is the "choke point" through which all patches pass, the record of what happened at that point is limited to his official kernel releases. One can look at Linus's output to determine, with great precision, when a particular patch went in, and, importantly, the evolutionary steps it took to get to its present state. Figuring out where it came from will be another story.

Since there is little information to be had from Linus on the provenance of patches before the BitKeeper era (which is the time period SCO is interested in), it will be necessary to try to trace any offending patches backward. And that means looking at how code reaches Linus. The basic nature of the submission process has not changed in a long time.

Some code is written by Linus himself. Linus's contributions have become a very small part of the whole, but he does still have something to add at times. It is probably safe to assume that Linus is not copying his work from proprietary Unix.

Some patches get to him by way of the linux-kernel mailing list. It is rare for Linus to pick up patches directly from linux-kernel, but it does happen. If a particular piece of allegedly infringing code was posted publicly, it should be possible to determine who sent it out. Chances are that the SCO investigators, if they really have infringing code to show, have been digging through the linux-kernel archives in the hopes of finding this sort of "smoking gun." The thought of SCO lawyers wading through old devfs flamewars is good for a smile or two.

Many patches go directly to Linus, often with no public posting at all. For example, much of Alexander Viro's work - big changes to core parts of the kernel - are first seen when they show up in a kernel release. There will be no record of these contributions other than Linus's memory and, perhaps, any existing backups of his mail spool. Unless this code comes with a comment like:

    /*
     * Copyright © Caldera International
     * 
     * Ripped off in 2000 by plagiarist@ibm.com; I'm too lazy to 
     * do this myself and they'll never notice.
     */

it will be very hard for SCO (or anybody else) to prove where the code came from.

The rest of the patches arrive by way of one of the "lieutenants" - developers like Alan Cox, David Miller, Greg Kroah-Hartman, Andrew Morton, and others. Some of these developers have used source management systems at times, others have not. Again, much of this code goes into the kernel without ever being posted on a public list. It can have two layers of private communication obscuring its true origin.

The end result of all the above is that the kernel development process is not quite as open as many people believe. A lot of code is posted publicly and its authors duly flamed for anything that does not look quite right. But a lot of code takes a quieter path and only sees the light of day when it shows up mixed into a development kernel. Much of the code that went into the 2.3 development series could be nearly impossible to trace back to its contributor.

It is also worth bearing in mind that no sort of paperwork is required to contribute code to the kernel. No copyright assignments, no warranties of originality, no indemnification. So there is no paper trail behind contributions to the Linux kernel - at least, not on the Linus side of things. One can only assume that companies like IBM have rather more rigorous procedures internally. But before that matters, a particular chunk of code will have to be traced back to IBM, and that could prove difficult.

The "low-ceremony" nature of kernel development is one of its attractions; the only thing you need in order to to play the game is some worthwhile code. It would be a shame if legal pressures eventually forced Linus to erect a wall of paperwork between himself and aspiring contributors.

For SCO's purposes, however, it is too late. Unless an IBM employee went out of his or her way to attach their name to a code contribution via a public posting or internal comments, it may never be possible to prove the origin of that contribution. And that could be bad news for SCO, which has gone out of its way to state that IBM, in particular, is responsible for the copying that SCO claims has occurred.

Copyright notices

Posted Jun 4, 2003 20:57 UTC (Wed) by hensema (guest, #980) [Link]

You seem to have forgotten the possiblity that the alegedly copied code has a header with a copyright notice claiming it was written by IBM.

Copyrights assignment policy sorely needed.

Posted Jun 4, 2003 21:37 UTC (Wed) by leandro (guest, #1460) [Link] (3 responses)

Because of situations such as this is that FSF ages ago implemented the copyrights assignment policy...

Copyrights assignment policy sorely needed.

Posted Jun 5, 2003 3:59 UTC (Thu) by jamesh (guest, #1159) [Link] (2 responses)

The only part of copyright assignment that would help here is having a record of where code came from, and having contributors warrant that they owned their contributions and were allowed to assign them. This is possible to do without assignments.

Mozilla is an example of such a project (all contributions need to be attributed, and people with CVS access sign an agreement).

The place where copyright assignment can help is when chasing up other parties who have violated the license on a package, since only the copyright holder can sue in such a situation.

Copyrights assignment policy sorely needed.

Posted Jun 5, 2003 7:48 UTC (Thu) by dd9jn (✭ supporter ✭, #4459) [Link]

BTW, GNU projects have to keep track of all contributions in an AUTHORS file. As this is a tedious task, most authors only write vrief statements to AUTHOR but chose to write exact ChangeLog entries. The latter has the additional advantage that it also helps to track bugs.

Copyrights assignment policy sorely needed.

Posted Jun 5, 2003 18:42 UTC (Thu) by leandro (guest, #1460) [Link]

> The only part of copyright assignment that would help here is having a record of where code came from, and having contributors warrant that they owned their contributions and were allowed to assign them. This is possible to do without assignments.

I fail to see how. If one doesn't assign, he can much more easily pass others' code as his own. Assignment clearly places the responsibility for any copy rights violation on the contributor away from the maintainer in a neat way simple records don't.

Please, anyone, make it clear to me...

Posted Jun 5, 2003 12:25 UTC (Thu) by hummassa (subscriber, #307) [Link] (1 responses)

I tought Linus kept a Changelog, even before BKtimes.

Please, anyone, make it clear to me...

Posted Jun 5, 2003 12:27 UTC (Thu) by puetzk (guest, #3318) [Link]

he did, but it might not identify the source of a change, espescially if
the patch came to Linus via a Lieutenant

Does email last that long in some backup somewhere?

Posted Jun 5, 2003 18:22 UTC (Thu) by southey (guest, #9466) [Link]

I would of thought that email is used to pass the code from source to another until it got into the kernel or tossed. So it may be possible to find a suitable backup that shows the code.

The date alone is important. If this code entered prior to the contract date or before IBM started with Linux then SCO can not claim it was IBM. The transcript of the meeting clearly shows that they have not checked when the this code was entered.

Where did that code come from?

Posted Jun 6, 2003 2:08 UTC (Fri) by mbp (subscriber, #2737) [Link] (2 responses)

And that could be bad news for SCO, which has gone out of its way to state that IBM, in particular, is responsible for the copying that SCO claims has occurred.

IANAL, but I would like to think that there is some kind of legal sanction for bring a billion dollar lawsuit (one billion dollars!) without getting these kind of facts sorted out.

Even if there was some other infringement, if IBM was in the clear you'd have to expect a helluva countersuit. SCO's future looks more and more like a smoking crater.

Where did that code come from?

Posted Jun 6, 2003 9:44 UTC (Fri) by jdthood (guest, #4157) [Link]

Don't take the dollar figure too seriously. That number
is the maximum amount that the plaintiff would be willing
to accept as an award. Any actual award decided by the
court would be far lower. SCO could have asked for a
trillion dollars.

IANAL, but that's my understanding.

Where did that code come from?

Posted Jun 10, 2003 3:31 UTC (Tue) by giraffedata (guest, #1954) [Link]

I would like to think that there is some kind of legal sanction for bring a billion dollar lawsuit (one billion dollars!) without getting these kind of facts sorted out.

The system is designed to have the lawsuit filed before the facts get sorted out. After the lawsuit is filed, a process known as discovery can begin. That is the process of the parties sharing knowledge, which is usually essential for sorting out the facts.

Filing a lawsuit isn't the act of hostility that people make it out to be; sanctions aren't appropriate.

There are sanctions for lawsuits that are filed for sheer purpose of harrassing the defendant, though. That's not what lawsuits are for.

you'd have to expect a helluva countersuit

On what grounds? You can't countersue just because someone accused you of something that you didn't do.

Linux doesn't control Linux

Posted Jun 10, 2003 3:36 UTC (Tue) by giraffedata (guest, #1954) [Link]

In one sense, it is one of the mostly tightly controlled projects out there; one - and exactly one - person can commit code to the mainline kernel tree. If Linus does not merge a patch, it simply does not go in.

This is a common misconception about Linux. There are a ton of Linux operating systems out there. Linus controls just one version of just one part -- the kernel. And hardly anybody runs Linus' version of the kernel. The provenance of the kernel code that did not go through Linus, and of all the code that isn't even part of the kernel is of equal concern to people such as SCO.