LWN.net Logo

Advertisement

E-Commerce & credit card processing - the Open Source way!

Advertise here

Documenting kernel code provenance

Linus's request for discussion made his motivation clear:

Some of you may have heard of this crazy company called SCO (aka "Smoking Crack Organization") who seem to have a hard time believing that open source works better than their five engineers do. They've apparently made a couple of outlandish claims about where our source code comes from, including claiming to own code that was clearly written by me over a decade ago.

He notes that the process of debunking these claims, while highly effective, has not been entirely fun. As a way of making life easier when the next SCO comes along, Linus is proposing a lightweight mechanism which would document how each patch finds its way into the kernel. In essence, this scheme would require each patch to contain at least one line like:

    Signed-off-by: Some kernel hacker <skh@some.host>

One such line would be added by each person who handles the patch on its way to the mainline kernel. Together, these lines would document the originator of the patch and the path it took before it was merged. Each developer, by "signing off" on the patch in this way, would indicate that he or she has the right to submit it to the kernel under a free license - either by virtue of having written the code, or by having obtained it from a source which allows this form of redistribution. Companies which require review of code contributed to external projects can designate a person who must sign off on patches before they go out.

This procedure is a far cry from, for example, the full-blown copyright assignment required from contributors to GNU projects. Contributions to the kernel will still require no physical, signed papers, no assignment of copyright, no indemnification, and no documented permission from the contributor's employer. The Free Software Foundation, with its assignment policy, is trying to set itself up as the owner and custodian of the GNU system, with clear title to the code, the ability to specify the license under which that code will be released and to enforce the terms of that license. The kernel hackers, instead, seem to feel that they can get by without such a custodian, wish to retain ownership of their code, and, as the netfilter team has demonstrated, they feel entirely capable of enforcing their own licenses.

The kernel system is, instead, aimed entirely at documentation. The next time somebody questions the legitimacy of code in the kernel, it would be nice to be able to point out, quickly, exactly where the code came from. In this way, perhaps, people can spend less time digging through ancient mail archives and more time developing. For this reason, suggestions varying from GPG-signing of patches to the (poorly received) idea of adopting an ISO-9000 process will probably not be implemented. Some tweaking will probably happen, but whatever system finally gets adopted will remain a simple, lightweight documentation mechanism.

While the new kernel contribution scheme is aimed at documenting future contributions, the just-launched Grokline project is trying to document the past. From the site:

This is an open, community-based, collaborative research project, a living history, designed to carefully trace the ownership history of UNIX and UNIX-like code with the goal of reducing, or eliminating, the amount of software subject to superficially plausible but ultimately invalid copyright, patent and trade secret claims against Linux or other free and open source software.

The project has put together a basic Unix timeline, and is soliciting input from anybody who can help document where all this code came from.

Grokline will, without doubt, yield no end of interesting historical information. One can't help wondering, however, if the community isn't gearing up to fight last year's war. The SCO Group has done us a tremendous favor by showing that (1) finding copyright infringements in free software (and the Linux kernel in particular) really is hard, and (2) the community will unite with devastating effect against anybody who seeks to profit from baseless attacks on free software. It is hard to imagine another company wanting to be the next SCO. The next time a copyright claim is raised against free software, the claimants will be well advised to have their evidence in place from the beginning - and to be right.

If there is another SCO-scale war in our near future, it will probably not involve copyrights. It will be, instead, a patent fight. Unless it serves to establish prior art, documentation of the provenance of code will not be helpful in a patent case. It is also worth noting that the SCO case has forced a remarkable alignment of interests between many large, deep-pocketed companies and the broader free software community. That alignment of interests may well be absent in a patent battle. Next year's patent war may not be fought off as easily as this year's copyright and (formerly) trade secret suit. By all means, we should be documenting where our code comes from, and, in general, doing our best to ensure that it has been contributed legitimately. But it would be a mistake to believe that this documentation alone will be sufficient to defend us from all "intellectual property" charges.


(Log in to post comments)

Documenting kernel code provenance

Posted May 27, 2004 3:35 UTC (Thu) by error27 (subscriber, #8346) [Link]

It used to be that people would keep a changelog at the top of kernel files to record who has copyrights. Now that Linus uses bitkeeper, these changelogs are frowned on because the data is stored in the revision logs.

This new method is better than the old manual way of adding copyrights and it's more explicit than relying on bk log files. It's easy too.

Documenting kernel code provenance

Posted May 27, 2004 4:26 UTC (Thu) by smoogen (subscriber, #97) [Link]

One thing that it would help with a patent fight is get the history involved so that prior art could be established easier. Showing that it was implemented before a patent was awarded could help.

Documenting kernel code provenance

Posted May 27, 2004 8:45 UTC (Thu) by dwalters (subscriber, #4207) [Link]

One thing that it would help with a patent fight is get the history involved so that prior art could be established easier. Showing that it was implemented before a patent was awarded could help.

This is very true. People can help to achieve this by contributing to the Grokline Project: http://www.grokline.net/

Documenting kernel code provenance

Posted May 27, 2004 23:30 UTC (Thu) by im14u2c (subscriber, #5246) [Link]

Another aim it could achieve is to show whether a given piece of presumably-patent-infringing code came in from a party associated with the patent, or came in from someone else. In other words, it could be used to deflect liability from Linux to some 3rd party.

For instance, suppose IBM (or some other company) licensed some patent, coded the technique into Linux, and their patch-signoff person accidentally signed off on the patch. They shouldn't've, because the patch contains IP that does not belong to IBM--only IP that IBM is allowed to use. Now this 3rd party sues over infringement.

The lawsuit should (and hopefully would) get laser-focused: Linux would remove the offending code, and IBM (and only IBM) would be liable for damages. Alternately, IBM could settle by broadening their license to include all GPL implementations, thereby freeing the patent for Linux and GPL derivatives.

Another, different circumstance (closer to the circumstance the parent post to this one gives) is when a submitter had invented something independently of an inventor that patents the same idea. The patch may have arrived in Linux well after the patent-holder's date of invention. But, the submitter may have private prior art showing an earlier date of invention. Knowing who the submitter is is invaluable in identifying that possible source of prior art. When working with and receiving patches from a large organization (IBM, Oracle, Novell), such prior art may be plentiful and secret.

Basically, the Signed-off-by gives a trail leading towards who to point at if something wrong slips through.

Documenting kernel code provenance

Posted May 27, 2004 9:26 UTC (Thu) by james (subscriber, #1325) [Link]

I don't think that these efforts are aimed so much at last year's war as a repeat performance in twenty or thirty years time.

Most hackers tend to be relatively young. I don't think anyone has done a statistical survey of hackers' life expectancies, but I suspect they're more likely to look after themselves than Joe Average. So an average piece of code written by an individual for themselves will probably be under copyright for over fifty years of their lifetime plus seventy years (assuming the copyright laws don't change again).

It's hard to tell how useful current code will be then, but successful computer systems have a tendency to far outlive their creators' expectations, as we found out in the years before 2000.

On the other hand, statistically, some will die in the next twenty / thirty / N years (we've lost a couple of Debian developers in the past couple of weeks) and more will become difficult to trace.

It's probable that people will still want to be able to prove the provenance of code. These efforts will at least let them know the questions to ask.

James.

Documenting kernel code provenance

Posted May 27, 2004 16:10 UTC (Thu) by iabervon (subscriber, #722) [Link]

One of the things that Linus mentioned later in the thread is that this system should also be helpful for tracking down the people who worked on some code that turns out to have technical rather than legal problems. Obviously, that's not the reason this was initially proposed, but this will provide exactly the list of people who should discuss cases in which a patch turns out to have problems, from the person who was having a problem that the patch solved through all of the people who looked at it, didn't have issues with it, and passed it on. There have been plenty of cases where a patch went it, people complained, and then nobody could figure out exactly who had had a problem and whether it could be fixed better.

Documenting kernel code provenance

Posted May 27, 2004 16:15 UTC (Thu) by jeffg (guest, #20473) [Link]

for nasty legal/patent purposes shouldn't this line:
Signed-off-by: Some kernel hacker <skh@some.host>
also have a time-stamp? i know all my lab books have
to be dated for that reason [shrug]

Documenting kernel code provenance

Posted May 27, 2004 17:44 UTC (Thu) by Max.Hyre (subscriber, #1054) [Link]

Yes, and the time-stamp would be quite valuable in documenting prior art for patent fights.

Documenting kernel code provenance

Posted May 28, 2004 19:07 UTC (Fri) by stevef (subscriber, #7712) [Link]

Just to be clear, is what is being requested that all bitkeeper changesets have such
a line for the changeset line or for the description of every changed file in a
changeset? When you only have one person with write authority to add to a
project's bktree (child tree of linux.bkbits.net/linux-2.5) it would be nice to have it
added automatically by bitkeeper.

Now I need to figure out how to edit the comment line on all pending changesets
to add the missing line ... hope there is an easy way to do this in bitkeeper.

Documenting kernel code provenance RE: ISO 9000

Posted Jun 7, 2004 15:06 UTC (Mon) by jimwelch (guest, #178) [Link]

ISO 9000 is a little mis-understood. Just having a policy that this one line will be added and enforcing the policy is just about enough to meet ISO.

Copyright © 2004, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds