Linus on documenting patch provenance

[Posted May 23, 2004 by corbet]

From:		Linus Torvalds <torvalds-AT-osdl.org>
To:		Kernel Mailing List <linux-kernel-AT-vger.kernel.org>
Subject:		[RFD] Explicitly documenting patch submission
Date:		Sat, 22 May 2004 23:46:29 -0700 (PDT)


Hola!

This is a request for discussion..

Some of you may have heard of this crazy company called SCO (aka "Smoking
Crack Organization") who seem to have a hard time believing that open
source works better than their five engineers do. They've apparently made
a couple of outlandish claims about where our source code comes from,
including claiming to own code that was clearly written by me over a
decade ago.

People have been pretty good (understatement of the year) at debunking
those claims, but the fact is that part of that debunking involved
searching kernel mailing list archives from 1992 etc. Not much fun.

For example, in the case of "ctype.h", what made it so clear that it was
original work was the horrible bugs it contained originally, and since we
obviously don't do bugs any more (right?), we should probably plan on
having other ways to document the origin of the code.

So, to avoid these kinds of issues ten years from now, I'm suggesting that 
we put in more of a process to explicitly document not only where a patch 
comes from (which we do actually already document pretty well in the 
changelogs), but the path it came through. 

Why the full path, and not just originator?

These days, most of the patches in the kernel don't actually get sent
directly to me. That not just wouldn't scale, but the fact is, there's a
lot of subsystems I have no clue about, and thus no way of judging how
good the patch is. So I end up seeing mostly the maintainers of the
subsystem, and when a bug happens, what I want to see is the maintainer
name, not a random developer who I don't even know if he is active any
more. So at least for me, the _chain_ is actually mostly more important
than the actual originator.

There is also another issue, namely the fact than when I (or anybody else,
for that matter) get an emailed patch, the only thing I can see directly
is the sender information, and that's the part I trust. When Andrew sends
me a patch, I trust it because it comes from him - even if the original
author may be somebody I don't know. So the _path_ the patch came in
through actually documents that chain of trust - we all tend to know the
"next hop", but we do _not_ necessarily have direct knowledge of the full
chain.

So what I'm suggesting is that we start "signing off" on patches, to show 
the path it has come through, and to document that chain of trust.  It 
also allows middle parties to edit the patch without somehow "losing" 
their names - quite often the patch that reaches the final kernel is not 
exactly the same as the original one, as it has gone through a few layers 
of people.

The plan is to make this very light-weight, and to fit in with how we 
already pass patches around - just add the sign-off to the end of the 
explanation part of the patch. That sign-off would be just a single line 
at the end (possibly after _other_ peoples sign-offs), saying:

	Signed-off-by: Random J Developer <random@developer.org>

To keep the rules as simple as possible, and yet making it clear what it
means to sign off on the patch, I've been discussing a "Developer's
Certificate of Origin" with a random collection of other kernel
developers (mainly subsystem maintainers).  This would basically be what
a developer (or a maintainer that passes through a patch) signs up for
when he signs off, so that the downstream (upstream?) developers know
that it's all ok:

	Developer's Certificate of Origin 1.0

	By making a contribution to this project, I certify that:

	(a) The contribution was created in whole or in part by me and I
            have the right to submit it under the open source license
	    indicated in the file; or

	(b) The contribution is based upon previous work that, to the best
	    of my knowledge, is covered under an appropriate open source
	    license and I have the right under that license to submit that
	    work with modifications, whether created in whole or in part
	    by me, under the same open source license (unless I am
	    permitted to submit under a different license), as indicated
	    in the file; or

	(c) The contribution was provided directly to me by some other
	    person who certified (a), (b) or (c) and I have not modified
	    it.

This basically allows people to sign off on other peoples patches, as long
as they see that the previous entry in the chain has been signed off on.  
And at the same time it makes the "personal trust" explicit to people who
don't necessarily understand how these things work. 

The above also allows for companies that have "release criteria" to have
the company "release person" sign off on a patch, so that a company can
easily incorporate their own internal release procedures and see that all
the patches have gone through the right channel. At the same time it is
meant to _not_ cause anybody to have to change how they work (ie there is
no "extra paperwork" at any point).

Comments, improvements, ideas? And yes, I know about digital signatures
etc, and that is _not_ what this is about. This is not about proving
authorship - it's about documenting the process. This does not replace or
preclude things like PGP-signed emails, this is _documenting_ how we work,
so that we can show people who don't understand the open source process.

			Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Linus on documenting patch provenance

Posted May 23, 2004 15:08 UTC (Sun) by leandro (guest, #1460) [Link] (11 responses)

Wouldn't it be just easier to get copy rights assignments, like per the FSF policy?

Not easier

Posted May 23, 2004 16:40 UTC (Sun) by rfunk (subscriber, #4054) [Link] (6 responses)

The FSF-style copyright assignment may (or may not) be better legally,
but it's certainly not easier, since that would require a LOT more
paperwork than a maintainer adding a single line to the patch
description.

Anyway, copyright assignment solves a different problem than Linus's
solution does. Linus is trying to track the path a patch takes, not who
owns it.

Not easier

Posted May 24, 2004 4:02 UTC (Mon) by leandro (guest, #1460) [Link] (1 responses)

> that would require a LOT more paperwork than a maintainer adding a single line to the patch description [...] Linus is trying to track the path a patch takes, not who owns it.

The copy rights assignment thing automatically takes care of the path -- the person assigning the copy rights implicitly states he owns the code, so any misappropriation responsibily is pushed over to the contributor. Who BTW also gets properly identified.

Not easier

Posted May 24, 2004 4:35 UTC (Mon) by rfunk (subscriber, #4054) [Link]

Sure. It's still not easier to do copyright assignment.

Not easier

Posted May 24, 2004 5:52 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (3 responses)

It's not really a lot more paperwork; under the GNU scheme, each contributor would do paperwork only one time, at least until changing jobs, and then a new employer disclaimer is needed. Getting that first paperwork done, and getting the employer to agree, can be a hassle.

What about the trivial submitters?

Posted May 24, 2004 12:00 UTC (Mon) by Duncan (guest, #6647) [Link] (2 responses)

> It's not really a lot more paperwork [because]
> each contributor would do paperwork only one time[.]

Perhaps it /would/ be done only once per person-job. That doesn't
eliminate the problem, however.

Why? Because what we've then done is make it impossible for an individual
to point to a problem and provide even a one-changed character patch (in
the event of a detected typo, say), for the first and possibly last time
in his life.

There are a lot of this type of "trivial" submitters around. People who
know C but don't really know the kernel. However, they use a driver that
doesn't work, or quits working, take a look at the code, and despite not
knowing much about the kernel, see something obvious and provide a bug
report together with a patch that "works for them." Then they go on about
their normal life, leaving the driver maintainer to check the patch, see
that it is indeed a typo affecting a corner case that nobody ever caught
before, and apply it.

What we are now suggesting is that said "trivial" submitters could no
longer submit anything, until they jumped thru a bunch of legal paperwork,
that isn't really worth it for that one character.

That's obviously an extreme example, but make it a 10-line change instead
of single character change, and it's a significant amount of people over
the course of a year, I'd guess.

Of course, not having assigned copyright does therefore make it
essentially impossible to ever change the license to the kernel in
general, because it's going to be impossible to find all those <= 10-line
submitters from over the years, but that's actually part of the object --
a practical guarantee that the kernel license can never and will never
change.

Duncan

What about the trivial submitters?

Posted May 24, 2004 15:17 UTC (Mon) by bfields (subscriber, #19510) [Link]

What we are now suggesting is that said "trivial" submitters could no longer submit anything, until they jumped thru a bunch of legal paperwork, that isn't really worth it for that one character.

I don't think GNU has ever required that: http://www.gnu.org/prep/maintain_5.html

In fact, it looks to me (I'm not really familiar with the process, just doing a google search on "site:www.gnu.org copyright assignment") like they have two levels short of full copyright assignment: for trivial contributions nothing may be required, for smaller contributions a disclaimer that's short of a copyright assignment may be sufficient.

--Bruce Fields

What about the trivial submitters?

Posted May 24, 2004 15:41 UTC (Mon) by madscientist (subscriber, #16861) [Link]

Trivial submitters on the scale you describe (one changed character, or even 10 lines) does *not* require approval under the GNU guidelines. The GNU guidelines for developers say:

> 4.2 Legally Significant Changes

> If a person contributes more than around 15 lines of code and/or text
> that is legally significant for copyright purposes, which means we need
> copyright papers for it as described above.
>
> A change of just a few lines (less than 15 or so) is not legally
> significant for copyright. A regular series of repeated changes, such as
> renaming a symbol, is not legally significant even if the symbol has to
> be renamed in many places. Keep in mind, however, that a series of minor
> changes by the same person can add up to a significant contribution. What
> counts is the total contribution of the person; it is irrelevant which
> parts of it were contributed when.

Also, on copyright assignment: it's true that copyright is assigned to the FSF, but the FSF grants back to the contributor full unrestricted rights to the code they contributed. So, they can take the code they submitted (but only that code of course) and continue to use it in their proprietary applications if they would like--they have a license to use it outside the GPL.

Linus on documenting patch provenance

Posted May 23, 2004 16:44 UTC (Sun) by corbet (editor, #1) [Link] (3 responses)

I disagree; tracking the source of code submissions has little to do with copyright assignment. They are two separate issues. The kernel project has made a deliberate decision that contributors keep their copyright on their work; among other things, this approach ensures that nobody will ever attempt to force a different license on the kernel.

Whether you ask for assignment or not, you still want to be sure that the contributor has the right to contribute the code--and to be able to document that in the future. An assignment of copyright does not make that happen in some magic way.

Linus on documenting patch provenance

Posted May 24, 2004 16:20 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (2 responses)

Jon, you need to take a more detailed look at the FSF's process before you comment further. Find out exactly what the FSF requires from both the contributor and the contributor's employer, and you'll see more clearly the steps they take to make sure that the bases are covered. A legally binding contract is signed by three parties: the FSF, the contributor, and the contributor's employer. The contract does more than just assign copyright. It also has some pretty strong patent language (though the FSF has negotiated weaker language with some contributors, notably IBM).

Linus could start with that procedure and adapt it to his desire that the contributor retain copyright. The contract would grant Linus (or some designated body) the right to distribute, and forbid the contributor from coming back later and impeding the distribution (by, say, asserting a patent claim).

Linus on documenting patch provenance

Posted May 24, 2004 17:46 UTC (Mon) by josh_stern (guest, #4868) [Link]

Pointing out the obvious...To assume that the entity that the
contributor describes as her current employer is the only other
party that might have copyright claims on the contributed code
is, at best, a fairly weak heuristic, while going for something
much stronger than that is likely involve substantial detective
work. Perhaps there is some close tie in between the FSF
standard and legal precedents for what constitutes reasonable
due diligence?? I'd be interested to hear Eben Moglen's
commentary on that.

Linus on documenting patch provenance

Posted May 25, 2004 0:21 UTC (Tue) by mbp (subscriber, #2737) [Link]

Another problem with the FSF system is that (when last I looked) the contributor indemnified the FSF from some legal liabilities arising from the contribution.

Much as I love the FSF, I am not going to personally stand between them and a psychopath like SCO. Would they hold me to it? Probably not. But the risk is there.

Taking on a potentially unbounded legal liability is a heavy disincentive to signing the FSF assignment.

Error in recursive (c)

Posted May 23, 2004 15:50 UTC (Sun) by ccyoung (guest, #16340) [Link] (1 responses)

(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and any modifications I might have made are certified by (a) and (b).

Otherwise, upstream people are not allowed to insert their two cents.

Error in recursive (c)

Posted May 23, 2004 17:36 UTC (Sun) by iabervon (subscriber, #722) [Link]

Case (c) is exclusively for when the intermediary is merely acting as a
channel for the content and certification. If the upstream person inserts
their two cents, they have to certify that those two cents are actually
theirs to insert, in which case, they use (a) or (b); the certification
they got for the original patch suffices as a reason to believe that the
original patch may be modified, even though those clauses don't
specifically mention this system.

Linus on documenting patch provenance

Posted May 23, 2004 16:25 UTC (Sun) by stuart (subscriber, #623) [Link] (2 responses)

Yes this does smell a bit of "maybe GNU has a point" doesn't it but good for Linus for acknowledging that a problem exists and taking the first steps to fixing it.

Stu.

Linus on documenting patch provenance

Posted May 23, 2004 22:11 UTC (Sun) by khim (subscriber, #9252) [Link] (1 responses)

The strange thing is that it's not about "GNU has a point" (of course GNU does but it's other story). It's about who to ask abots this or that change when something goes wrong (I do not mean legally).

Linus on documenting patch provenance

Posted May 24, 2004 13:41 UTC (Mon) by stuart (subscriber, #623) [Link]

and there was me thinking it was everything to do with the legal issues...that'd be why Linus referred to SCO in his email wouldn't it?

Stu.

Linus on documenting patch provenance

Posted May 24, 2004 0:16 UTC (Mon) by Eudyptes (guest, #15589) [Link] (1 responses)

I seems that Linus wants to track the "chain of custody" as it were. This
is standard with any org that needs to do investigations. From what I can
gather it's seems to go something like this:

A. D. Veloper submits the orignal patch for kpatch.h

Then:

B. D. Veloper changes this patch (still read as) kpatch.h

And then:

C. D. Veloper and D. D. Veloper, etc..

Now some Company screams "foul" and states that this was ripped of their
"proprietary" work. How do you prove or disprove this?

So it comes to light that "C. D. Veloper" worked for or had access to said
company's source code and got lazy and folding in a piece of code for said
company's stuff. Well, now you have a fairly good idea where the "taint"
came from. This would afford you knowledge of who got lazy or careless
(you just didn't get it). Furthermore, should you need to take out that
part that tainted code you could concievably do this without having to
rewrite the entire code for "kpatch.h".

Another thing to consider is moles. Yes, moles!. Is it inconcievable
that some business/corp that takes considerable exception to the work and
success of Linux-F/OSS want to see it derailed? Let's say there's a
particular piece of work that has been difficult to work with. Then
somebody (let's call them X. D. Veloper) submits a patch that solves this
problem, or moreover seems to submit several pieces of code to a number of
projects. Then in time it is contended that these several projects code
bases are tainted with proprietary work/IP stuff. Well, with a chain of
custody a pattern could be seen, such as every project touched by X. D.
Velopers appears to be tainted. This would call into question just who
this person is and where/why he/she has been able to provide so much code
to solve problems.

On the otherhand, the positive aspect of this is that someone that has
"cleanly" provided several fixes can be recognized. Y. D. Veloper has
repeatedly submitted patch work that has indeed solved a good many
problems and provides very cogent and clean work. This person may be
someone that has a yet unrecognized talent that the Dev team may wish to
utilize.

When the whole SCO fiasco started. Many, including my self, did
exhaustive searches to find who from SCO/Caldera had submitted work, as
well as to what and when. Given that people from almost every corner of
the globe and having varied backgrounds have submitted work to Linux and
F/OSS it think it only prudent to have a clear "chain of custody" without
having a cumbersome and overbearing impact on the process.

Just MHO. :)

Linus on documenting patch provenance

Posted May 24, 2004 5:47 UTC (Mon) by lakeland (guest, #1157) [Link]

What you're suggesting would be an extremely dangerous game. As soon as
the company cried foul, the sources of the patch would come to light and
they'd all be pointing at one person -- an employee of the company.

At this point Linux is largely clear, I'm not even sure they have to
remove the code. And if they do, I would expect they would be given quite
generous timeframes given they'd shown due dilligence, etc.

It might work, but odds are the damage will be minor. I would hazard a
guess that making unsubstantiated lies as a PR gives better mileage with
lower risk.

Differences from GNU

Posted May 24, 2004 1:01 UTC (Mon) by mbp (subscriber, #2737) [Link] (6 responses)

I think these are the differences to the GNU system:

1. Copyrights are retained by the authors, not assigned to Linus. This has a lot of consequences: Linus is in a less strong position to sue for infringement; on the other hand every other contributor has the option of suing. Noone need worry about Linus suddenly changing the licence. (In any case, trying to change policy on this now would be impractical.)

2. It's done over email, rather than in writing. I guess it's perhaps not quite so strong, but it's certainly a lot easier and faster.

3. It better captures the way patches may move along a chain. Does GNU require an assignment from everyone who submits a bug fix? I think they do in principle, but if I sent a patch directly to a maintainer would they ignore it?

The critical part is the employer disclaimer

Posted May 24, 2004 5:49 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (5 responses)

The most important paperwork that the FSF asks for from contributors is not the copyright assignment; it is the employer disclaimer. This prevents the company from later claiming that the employee's contribution was unauthorized, or that it is "work done for hire" and therefore the employee had no right to contribute it. Also, both the contributor and the employer promise not to bring a patent infringment claim later against the contributed-to program.

The critical part is the employer disclaimer

Posted May 24, 2004 6:00 UTC (Mon) by mbp (subscriber, #2737) [Link] (4 responses)

You're right, that is an issue too. I suppose under Linus's system, he can claim that he acted in good faith in trusting the employee though. I would expect that if it turned out there was a problem, Linus might have to remove the code but that would be the limit of his liability.

I was a little surprised that Linus didn't ask people to certify that the contribution is free of patent problems to the best of their knowledge.

I would guess that the reason for both of these is that they require the company's legal department to get involved in all contributions, which would tend to slow things down. Of course the legal department *should* be involved in at least setting policy, but that is mostly a matter between the company and their employees.

The critical part is the employer disclaimer

Posted May 24, 2004 7:38 UTC (Mon) by MathFox (guest, #6104) [Link] (3 responses)

I was a little surprised that Linus didn't ask people to certify that the contribution is free of patent problems to the best of their knowledge.

Don't forget that Linus is an European and that software patents are still unenforcable there. For him patents weren't a problem, he could make and distribute the first versions of Linux without a risk of running into patent problems.

This also indicates that a declaration "I am unaware of patent problems" doesn't tell you a lot: For which countries is that declaration valid? How serious was the patent research? The only thing an "I'm unaware" declaration says with certainly is "I won't sue you for my own contribution". And the GPL implies that allready.

The critical part is the employer disclaimer

Posted May 24, 2004 11:31 UTC (Mon) by clugstj (subscriber, #4020) [Link]

An "I am unaware of patent problems" declaration is not worth spit in ANY
country. Also, if you don't believe in software patents, making them
appear important is not a good idea.

The critical part is the employer disclaimer

Posted May 24, 2004 20:22 UTC (Mon) by erwbgy (subscriber, #4104) [Link]

I love this quote from Linus (in this month's Linux User and Developer):

"I do not look up any patents on principle, because (a) it's a horrible
waste of time and (b) I don't want to know. The fact is, technical people
are better off not looking at patents. If you don't know what they cover
and where they are, you won't be knowingly infringing on them. If
somebody sues you, you change the algorithm or you just hire a hitman to
whack the stupid git."

He later noted that this "may not be legally tenable advice" :-)

The critical part is the employer disclaimer

Posted May 25, 2004 0:27 UTC (Tue) by mbp (subscriber, #2737) [Link]

Linus may have been born in Europe but he lives and works in California.

Linus on documenting patch provenance

Posted May 24, 2004 2:45 UTC (Mon) by oconnorcjo (guest, #2605) [Link] (2 responses)

I am constantly surprised how much Linus and the kernel developers are learning to scale up.

First it was developers->Linus

Then developers->system_maintainers->Linus

Then developers->system_maintainers->kernel_maintainers->Linus

With BitKeeper, the kernel_maintainer->Linus transition, became very smooth and simple. With the new sign off procedure, I expect to be able to better follow the abstraction of Linux development (which is a hobby of mine).

If the "sign off" system had been introduced a few years earlier, it would have been easier to predict Marcelo Tosatti as the Maintainer of the 2.4 branch. My guess is the Alan Cox was using Marcelo as his "pass through" guy for a lot of patches (like Linus uses Morton, Viro, Arcangeli and etc...). If people not intimately involved in the development had seen Tosatti as a "sign off" guy for a lot of patches, a lot less people would have said "who is he?" when Alan Cox first suggested him to be the maintainer of 2.4.

The new "sign off" process might even help Linus pick out new developers to promote up the chain if he sees a key developer on a lot of "sign off's" to a large number of patches.

Many people think of buerocracy as a bad thing but it is fascinating to see how Linux development introduces beurocracy to help improve the growing number of developers submitting patches and new code. At one time a developer could submit a patch to Linus and see it applied within a few days (if not sooner) but now a patch may have to go through 3 or four peoples hands before it gets to Linus's tree. I can't even imagine the day when Linux has:

developers->system_maintainers->(branch_maintaners?)->kernel_maintaners->Linus

but I highly doubt that would need to happen anytime soon since I guess that the way things are running now can scale up to over 300 full time Linux developers[***note at bottom] (without people feeling the system needs to change). This hypothessis is based on the idea that any given person only wants to trust/deal/discuss/collaberate in depth with a maximum of 8 other people on a regular basis (on any given project) which means that:

Linus->kernel_maintainers->system_maintainers->developers
Linus * 8 * 8 * 8 = 512 people.
but then one has to take into consideration overlap of the same people such as a developer being in the "trust group" of several system_maintainers and overlap of system maintainers "trust group" of kernel_maintainers so best to assume no more than "60% to %70 efficiency".

It is interesting to note that Linux is not only maturing in quality but in management as well.

[note at bottom]
***When I say full time developers, I don't mean people who wrote a device driver, and update it every once in a while (they are on no one's "trust list" since nobody collaberates with them to any great deal)- emphasis should be on FULL in full time developers.

Linus on documenting patch provenance

Posted May 24, 2004 5:48 UTC (Mon) by tgape (guest, #21785) [Link] (1 responses)

This process is barely bureaucratic. Under a truly bureaucratic process, there would be
individuals who would need to be in the chain solely because the process says that they're in
the chain. That does not sound like what we're looking at here, with one occasional
exception. And last I heard, that one exception was working on getting himself as removed
as he could from that bit. Personally, this feels like a fairly slick implementation of what's
needed.

However, I think that it might need to have a bit of documentation about the occasional
contributors - If Joe "One Patch" Programmer[*] contributes one fairly major patch, and is
never heard from again, the only indication that it was a truly unencumbered patch was that
he said it was his original code. But if five years down the road, Incorporated Company,
Inc[**], claims that he was working for them at that time, and it's their code, things could still
be messy.

I'd think that people who wrote a device driver or two, but contribute less than once a month
will still probably be on the trust list of at least one individual. However, they will probably
not be on multiple people's trust list unless they are active elsewhere in open source.

Someone who contributes occasional patches, rarely to the same component, however, will
probably take quite a while to get on a trust list.

[*] any resemblance to any actual Joe Programmer is purely coincidental.

[**] Insert standard sample company name disclaimer here.

Linus on documenting patch provenance

Posted May 24, 2004 11:35 UTC (Mon) by jerven (guest, #21795) [Link]

All liability for damage in this case would be for the original patch submitter as he certified that he did not copy the code while he did.
Weird example if someone said he owned a house and he cerifies it and he then hires you to demolish the house so that he can build a garage or whatever. If he is in fact not the owner your liability for damages would be limited as you acted in good faith that you established by making him certify he owned the house. If you did not make him certify he owned the house you could be sued for carelessnes even though you still acted in good faith that he was the owner.

So when this hapens it is only messy for the employee who contribtuted illegally and he is fully responsible for all the damages he caused to his company. So this does not only depend on trust but on simply making sure that the top developers and capable of being held responsible for someones else failing. He certified he was allowed to submit this patch if he was not then the damage is his as he submitted in typing that he was the owner and the maintainer has shown due dilligence by having him certify that he owns it. Therefore I think this is a simple but effective idea.

The only problem is that antone can submit a fake id or tag and just insert someone elses name in suplying a patch. A digital signature should realy be included.