Linus on documenting patch provenance
From: | Linus Torvalds <torvalds-AT-osdl.org> | |
To: | Kernel Mailing List <linux-kernel-AT-vger.kernel.org> | |
Subject: | [RFD] Explicitly documenting patch submission | |
Date: | Sat, 22 May 2004 23:46:29 -0700 (PDT) |
Hola! This is a request for discussion.. Some of you may have heard of this crazy company called SCO (aka "Smoking Crack Organization") who seem to have a hard time believing that open source works better than their five engineers do. They've apparently made a couple of outlandish claims about where our source code comes from, including claiming to own code that was clearly written by me over a decade ago. People have been pretty good (understatement of the year) at debunking those claims, but the fact is that part of that debunking involved searching kernel mailing list archives from 1992 etc. Not much fun. For example, in the case of "ctype.h", what made it so clear that it was original work was the horrible bugs it contained originally, and since we obviously don't do bugs any more (right?), we should probably plan on having other ways to document the origin of the code. So, to avoid these kinds of issues ten years from now, I'm suggesting that we put in more of a process to explicitly document not only where a patch comes from (which we do actually already document pretty well in the changelogs), but the path it came through. Why the full path, and not just originator? These days, most of the patches in the kernel don't actually get sent directly to me. That not just wouldn't scale, but the fact is, there's a lot of subsystems I have no clue about, and thus no way of judging how good the patch is. So I end up seeing mostly the maintainers of the subsystem, and when a bug happens, what I want to see is the maintainer name, not a random developer who I don't even know if he is active any more. So at least for me, the _chain_ is actually mostly more important than the actual originator. There is also another issue, namely the fact than when I (or anybody else, for that matter) get an emailed patch, the only thing I can see directly is the sender information, and that's the part I trust. When Andrew sends me a patch, I trust it because it comes from him - even if the original author may be somebody I don't know. So the _path_ the patch came in through actually documents that chain of trust - we all tend to know the "next hop", but we do _not_ necessarily have direct knowledge of the full chain. So what I'm suggesting is that we start "signing off" on patches, to show the path it has come through, and to document that chain of trust. It also allows middle parties to edit the patch without somehow "losing" their names - quite often the patch that reaches the final kernel is not exactly the same as the original one, as it has gone through a few layers of people. The plan is to make this very light-weight, and to fit in with how we already pass patches around - just add the sign-off to the end of the explanation part of the patch. That sign-off would be just a single line at the end (possibly after _other_ peoples sign-offs), saying: Signed-off-by: Random J Developer <random@developer.org> To keep the rules as simple as possible, and yet making it clear what it means to sign off on the patch, I've been discussing a "Developer's Certificate of Origin" with a random collection of other kernel developers (mainly subsystem maintainers). This would basically be what a developer (or a maintainer that passes through a patch) signs up for when he signs off, so that the downstream (upstream?) developers know that it's all ok: Developer's Certificate of Origin 1.0 By making a contribution to this project, I certify that: (a) The contribution was created in whole or in part by me and I have the right to submit it under the open source license indicated in the file; or (b) The contribution is based upon previous work that, to the best of my knowledge, is covered under an appropriate open source license and I have the right under that license to submit that work with modifications, whether created in whole or in part by me, under the same open source license (unless I am permitted to submit under a different license), as indicated in the file; or (c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and I have not modified it. This basically allows people to sign off on other peoples patches, as long as they see that the previous entry in the chain has been signed off on. And at the same time it makes the "personal trust" explicit to people who don't necessarily understand how these things work. The above also allows for companies that have "release criteria" to have the company "release person" sign off on a patch, so that a company can easily incorporate their own internal release procedures and see that all the patches have gone through the right channel. At the same time it is meant to _not_ cause anybody to have to change how they work (ie there is no "extra paperwork" at any point). Comments, improvements, ideas? And yes, I know about digital signatures etc, and that is _not_ what this is about. This is not about proving authorship - it's about documenting the process. This does not replace or preclude things like PGP-signed emails, this is _documenting_ how we work, so that we can show people who don't understand the open source process. Linus - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Posted May 23, 2004 15:08 UTC (Sun)
by leandro (guest, #1460)
[Link] (11 responses)
Posted May 23, 2004 16:40 UTC (Sun)
by rfunk (subscriber, #4054)
[Link] (6 responses)
Posted May 24, 2004 4:02 UTC (Mon)
by leandro (guest, #1460)
[Link] (1 responses)
The copy rights assignment thing automatically takes care of the path -- the person assigning the copy rights implicitly states he owns the code, so any misappropriation responsibily is pushed over to the contributor. Who BTW also gets properly identified.
Posted May 24, 2004 4:35 UTC (Mon)
by rfunk (subscriber, #4054)
[Link]
Posted May 24, 2004 5:52 UTC (Mon)
by JoeBuck (subscriber, #2330)
[Link] (3 responses)
It's not really a lot more paperwork; under the GNU scheme, each contributor would do paperwork only one time, at least until changing jobs, and then a new employer disclaimer is needed. Getting that first paperwork done, and getting the employer to agree, can be a hassle.
Posted May 24, 2004 12:00 UTC (Mon)
by Duncan (guest, #6647)
[Link] (2 responses)
Posted May 24, 2004 15:17 UTC (Mon)
by bfields (subscriber, #19510)
[Link]
I don't think GNU has ever required that:
http://www.gnu.org/prep/maintain_5.html
In fact, it looks to me (I'm not really familiar with the process, just doing a google search on "site:www.gnu.org copyright assignment") like they have two levels short of full copyright assignment: for trivial contributions nothing may be required, for smaller contributions a disclaimer that's short of a copyright assignment may be sufficient.
--Bruce Fields
Posted May 24, 2004 15:41 UTC (Mon)
by madscientist (subscriber, #16861)
[Link]
> 4.2 Legally Significant Changes > If a person contributes more than around 15 lines of code and/or text Also, on copyright assignment: it's true that copyright is assigned to the FSF, but the FSF grants back to the contributor full unrestricted rights to the code they contributed. So, they can take the code they submitted (but only that code of course) and continue to use it in their proprietary applications if they would like--they have a license to use it outside the GPL.
Posted May 23, 2004 16:44 UTC (Sun)
by corbet (editor, #1)
[Link] (3 responses)
Whether you ask for assignment or not, you still want to be sure that the contributor has the right to contribute the code--and to be able to document that in the future. An assignment of copyright does not make that happen in some magic way.
Posted May 24, 2004 16:20 UTC (Mon)
by JoeBuck (subscriber, #2330)
[Link] (2 responses)
Jon, you need to take a more detailed look at the FSF's process before you comment further. Find out exactly what the FSF requires from both the contributor and the contributor's employer, and you'll see more clearly the steps they take to make sure that the bases are covered. A legally binding contract is signed by three parties: the FSF, the contributor, and the contributor's employer. The contract does more than just assign copyright. It also has some pretty strong patent language (though the FSF has negotiated weaker language with some contributors, notably IBM).
Linus could start with that procedure and adapt it to his desire that the contributor retain copyright. The contract would grant Linus (or some designated body) the right to distribute, and forbid the contributor from coming back later and impeding the distribution (by, say, asserting a patent claim).
Posted May 24, 2004 17:46 UTC (Mon)
by josh_stern (guest, #4868)
[Link]
Posted May 25, 2004 0:21 UTC (Tue)
by mbp (subscriber, #2737)
[Link]
Much as I love the FSF, I am not going to personally stand between them and a psychopath like SCO. Would they hold me to it? Probably not. But the risk is there. Taking on a potentially unbounded legal liability is a heavy disincentive to signing the FSF assignment.
Posted May 23, 2004 15:50 UTC (Sun)
by ccyoung (guest, #16340)
[Link] (1 responses)
Posted May 23, 2004 17:36 UTC (Sun)
by iabervon (subscriber, #722)
[Link]
Posted May 23, 2004 16:25 UTC (Sun)
by stuart (subscriber, #623)
[Link] (2 responses)
Stu.
Posted May 23, 2004 22:11 UTC (Sun)
by khim (subscriber, #9252)
[Link] (1 responses)
Posted May 24, 2004 13:41 UTC (Mon)
by stuart (subscriber, #623)
[Link]
Stu.
Posted May 24, 2004 0:16 UTC (Mon)
by Eudyptes (guest, #15589)
[Link] (1 responses)
Posted May 24, 2004 5:47 UTC (Mon)
by lakeland (guest, #1157)
[Link]
Posted May 24, 2004 1:01 UTC (Mon)
by mbp (subscriber, #2737)
[Link] (6 responses)
1. Copyrights are retained by the authors, not assigned to Linus. This has a lot of consequences: Linus is in a less strong position to sue for infringement; on the other hand every other contributor has the option of suing. Noone need worry about Linus suddenly changing the licence. (In any case, trying to change policy on this now would be impractical.) 2. It's done over email, rather than in writing. I guess it's perhaps not quite so strong, but it's certainly a lot easier and faster. 3. It better captures the way patches may move along a chain. Does GNU require an assignment from everyone who submits a bug fix? I think they do in principle, but if I sent a patch directly to a maintainer would they ignore it?
Posted May 24, 2004 5:49 UTC (Mon)
by JoeBuck (subscriber, #2330)
[Link] (5 responses)
Posted May 24, 2004 6:00 UTC (Mon)
by mbp (subscriber, #2737)
[Link] (4 responses)
I was a little surprised that Linus didn't ask people to certify that the contribution is free of patent problems to the best of their knowledge. I would guess that the reason for both of these is that they require the company's legal department to get involved in all contributions, which would tend to slow things down. Of course the legal department *should* be involved in at least setting policy, but that is mostly a matter between the company and their employees.
Posted May 24, 2004 7:38 UTC (Mon)
by MathFox (guest, #6104)
[Link] (3 responses)
This also indicates that a declaration "I am unaware of patent problems" doesn't tell you a lot: For which countries is that declaration valid? How serious was the patent research? The only thing an "I'm unaware" declaration says with certainly is "I won't sue you for my own contribution". And the GPL implies that allready.
Posted May 24, 2004 11:31 UTC (Mon)
by clugstj (subscriber, #4020)
[Link]
Posted May 24, 2004 20:22 UTC (Mon)
by erwbgy (subscriber, #4104)
[Link]
Posted May 25, 2004 0:27 UTC (Tue)
by mbp (subscriber, #2737)
[Link]
Posted May 24, 2004 2:45 UTC (Mon)
by oconnorcjo (guest, #2605)
[Link] (2 responses)
First it was developers->Linus Then developers->system_maintainers->Linus Then developers->system_maintainers->kernel_maintainers->Linus With BitKeeper, the kernel_maintainer->Linus transition, became very smooth and simple. With the new sign off procedure, I expect to be able to better follow the abstraction of Linux development (which is a hobby of mine). If the "sign off" system had been introduced a few years earlier, it would have been easier to predict Marcelo Tosatti as the Maintainer of the 2.4 branch. My guess is the Alan Cox was using Marcelo as his "pass through" guy for a lot of patches (like Linus uses Morton, Viro, Arcangeli and etc...). If people not intimately involved in the development had seen Tosatti as a "sign off" guy for a lot of patches, a lot less people would have said "who is he?" when Alan Cox first suggested him to be the maintainer of 2.4. The new "sign off" process might even help Linus pick out new developers to promote up the chain if he sees a key developer on a lot of "sign off's" to a large number of patches. Many people think of buerocracy as a bad thing but it is fascinating to see how Linux development introduces beurocracy to help improve the growing number of developers submitting patches and new code. At one time a developer could submit a patch to Linus and see it applied within a few days (if not sooner) but now a patch may have to go through 3 or four peoples hands before it gets to Linus's tree. I can't even imagine the day when Linux has: developers->system_maintainers->(branch_maintaners?)->kernel_maintaners->Linus but I highly doubt that would need to happen anytime soon since I guess that the way things are running now can scale up to over 300 full time Linux developers[***note at bottom] (without people feeling the system needs to change). This hypothessis is based on the idea that any given person only wants to trust/deal/discuss/collaberate in depth with a maximum of 8 other people on a regular basis (on any given project) which means that: Linus->kernel_maintainers->system_maintainers->developers It is interesting to note that Linux is not only maturing in quality but in management as well.
Posted May 24, 2004 5:48 UTC (Mon)
by tgape (guest, #21785)
[Link] (1 responses)
However, I think that it might need to have a bit of documentation about the occasional Someone who contributes occasional patches, rarely to the same component, however, will [*] any resemblance to any actual Joe Programmer is purely coincidental. [**] Insert standard sample company name disclaimer here.
Posted May 24, 2004 11:35 UTC (Mon)
by jerven (guest, #21795)
[Link]
So when this hapens it is only messy for the employee who contribtuted illegally and he is fully responsible for all the damages he caused to his company. So this does not only depend on trust but on simply making sure that the top developers and capable of being held responsible for someones else failing. He certified he was allowed to submit this patch if he was not then the damage is his as he submitted in typing that he was the owner and the maintainer has shown due dilligence by having him certify that he owns it. Therefore I think this is a simple but effective idea. The only problem is that antone can submit a fake id or tag and just insert someone elses name in suplying a patch. A digital signature should realy be included.
Wouldn't it be just easier to get copy rights assignments, like per the FSF policy?
Linus on documenting patch provenance
The FSF-style copyright assignment may (or may not) be better legally, Not easier
but it's certainly not easier, since that would require a LOT more
paperwork than a maintainer adding a single line to the patch
description.
Anyway, copyright assignment solves a different problem than Linus's
solution does. Linus is trying to track the path a patch takes, not who
owns it.
Not easier
> that would require a LOT more
paperwork than a maintainer adding a single line to the patch
description [...] Linus is trying to track the path a patch takes, not who owns it.
Sure. It's still not easier to do copyright assignment.
Not easier
Not easier
> It's not really a lot more paperwork [because] What about the trivial submitters?
> each contributor would do paperwork only one time[.]
Perhaps it /would/ be done only once per person-job. That doesn't
eliminate the problem, however.
Why? Because what we've then done is make it impossible for an individual
to point to a problem and provide even a one-changed character patch (in
the event of a detected typo, say), for the first and possibly last time
in his life.
There are a lot of this type of "trivial" submitters around. People who
know C but don't really know the kernel. However, they use a driver that
doesn't work, or quits working, take a look at the code, and despite not
knowing much about the kernel, see something obvious and provide a bug
report together with a patch that "works for them." Then they go on about
their normal life, leaving the driver maintainer to check the patch, see
that it is indeed a typo affecting a corner case that nobody ever caught
before, and apply it.
What we are now suggesting is that said "trivial" submitters could no
longer submit anything, until they jumped thru a bunch of legal paperwork,
that isn't really worth it for that one character.
That's obviously an extreme example, but make it a 10-line change instead
of single character change, and it's a significant amount of people over
the course of a year, I'd guess.
Of course, not having assigned copyright does therefore make it
essentially impossible to ever change the license to the kernel in
general, because it's going to be impossible to find all those <= 10-line
submitters from over the years, but that's actually part of the object --
a practical guarantee that the kernel license can never and will never
change.
Duncan
What about the trivial submitters?
What we are now suggesting is that said "trivial" submitters could no
longer submit anything, until they jumped thru a bunch of legal paperwork,
that isn't really worth it for that one character.
Trivial submitters on the scale you describe (one changed character, or even 10 lines) does *not* require approval under the GNU guidelines. The GNU guidelines for developers say:What about the trivial submitters?
> that is legally significant for copyright purposes, which means we need
> copyright papers for it as described above.
>
> A change of just a few lines (less than 15 or so) is not legally
> significant for copyright. A regular series of repeated changes, such as
> renaming a symbol, is not legally significant even if the symbol has to
> be renamed in many places. Keep in mind, however, that a series of minor
> changes by the same person can add up to a significant contribution. What
> counts is the total contribution of the person; it is irrelevant which
> parts of it were contributed when.
I disagree; tracking the source of code submissions has little to do with copyright assignment. They are two separate issues. The kernel project has made a deliberate decision that contributors keep their copyright on their work; among other things, this approach ensures that nobody will ever attempt to force a different license on the kernel.
Linus on documenting patch provenance
Linus on documenting patch provenance
Pointing out the obvious...To assume that the entity that the Linus on documenting patch provenance
contributor describes as her current employer is the only other
party that might have copyright claims on the contributed code
is, at best, a fairly weak heuristic, while going for something
much stronger than that is likely involve substantial detective
work. Perhaps there is some close tie in between the FSF
standard and legal precedents for what constitutes reasonable
due diligence?? I'd be interested to hear Eben Moglen's
commentary on that.
Another problem with the FSF system is that (when last I looked) the contributor indemnified the FSF from some legal liabilities arising from the contribution.Linus on documenting patch provenance
(c) The contribution was provided directly to me by some other person who certified (a), (b) or (c) and any modifications I might have made are certified by (a) and (b).Error in recursive (c)
Otherwise, upstream people are not allowed to insert their two cents.
Case (c) is exclusively for when the intermediary is merely acting as a Error in recursive (c)
channel for the content and certification. If the upstream person inserts
their two cents, they have to certify that those two cents are actually
theirs to insert, in which case, they use (a) or (b); the certification
they got for the original patch suffices as a reason to believe that the
original patch may be modified, even though those clauses don't
specifically mention this system.
Yes this does smell a bit of "maybe GNU has a point" doesn't it but good for Linus for acknowledging that a problem exists and taking the first steps to fixing it.Linus on documenting patch provenance
The strange thing is that it's not about "GNU has a point" (of course GNU does but it's other story). It's about who to ask abots this or that change when something goes wrong (I do not mean legally).
Linus on documenting patch provenance
and there was me thinking it was everything to do with the legal issues...that'd be why Linus referred to SCO in his email wouldn't it?Linus on documenting patch provenance
I seems that Linus wants to track the "chain of custody" as it were. This Linus on documenting patch provenance
is standard with any org that needs to do investigations. From what I can
gather it's seems to go something like this:
A. D. Veloper submits the orignal patch for kpatch.h
Then:
B. D. Veloper changes this patch (still read as) kpatch.h
And then:
C. D. Veloper and D. D. Veloper, etc..
Now some Company screams "foul" and states that this was ripped of their
"proprietary" work. How do you prove or disprove this?
So it comes to light that "C. D. Veloper" worked for or had access to said
company's source code and got lazy and folding in a piece of code for said
company's stuff. Well, now you have a fairly good idea where the "taint"
came from. This would afford you knowledge of who got lazy or careless
(you just didn't get it). Furthermore, should you need to take out that
part that tainted code you could concievably do this without having to
rewrite the entire code for "kpatch.h".
Another thing to consider is moles. Yes, moles!. Is it inconcievable
that some business/corp that takes considerable exception to the work and
success of Linux-F/OSS want to see it derailed? Let's say there's a
particular piece of work that has been difficult to work with. Then
somebody (let's call them X. D. Veloper) submits a patch that solves this
problem, or moreover seems to submit several pieces of code to a number of
projects. Then in time it is contended that these several projects code
bases are tainted with proprietary work/IP stuff. Well, with a chain of
custody a pattern could be seen, such as every project touched by X. D.
Velopers appears to be tainted. This would call into question just who
this person is and where/why he/she has been able to provide so much code
to solve problems.
On the otherhand, the positive aspect of this is that someone that has
"cleanly" provided several fixes can be recognized. Y. D. Veloper has
repeatedly submitted patch work that has indeed solved a good many
problems and provides very cogent and clean work. This person may be
someone that has a yet unrecognized talent that the Dev team may wish to
utilize.
When the whole SCO fiasco started. Many, including my self, did
exhaustive searches to find who from SCO/Caldera had submitted work, as
well as to what and when. Given that people from almost every corner of
the globe and having varied backgrounds have submitted work to Linux and
F/OSS it think it only prudent to have a clear "chain of custody" without
having a cumbersome and overbearing impact on the process.
Just MHO. :)
What you're suggesting would be an extremely dangerous game. As soon as Linus on documenting patch provenance
the company cried foul, the sources of the patch would come to light and
they'd all be pointing at one person -- an employee of the company.
At this point Linux is largely clear, I'm not even sure they have to
remove the code. And if they do, I would expect they would be given quite
generous timeframes given they'd shown due dilligence, etc.
It might work, but odds are the damage will be minor. I would hazard a
guess that making unsubstantiated lies as a PR gives better mileage with
lower risk.
I think these are the differences to the GNU system:Differences from GNU
The most important paperwork that the FSF asks for from contributors is not the copyright assignment; it is the employer disclaimer. This prevents the company from later claiming that the employee's contribution was unauthorized, or that it is "work done for hire" and therefore the employee had no right to contribute it. Also, both the contributor and the employer promise not to bring a patent infringment claim later against the contributed-to program.
The critical part is the employer disclaimer
You're right, that is an issue too. I suppose under Linus's system, he can claim that he acted in good faith in trusting the employee though. I would expect that if it turned out there was a problem, Linus might have to remove the code but that would be the limit of his liability.The critical part is the employer disclaimer
The critical part is the employer disclaimer
I was a little surprised that Linus didn't ask people to certify that the contribution is free of patent problems to the best of their knowledge.
Don't forget that Linus is an European and that software patents are still unenforcable there. For him patents weren't a problem, he could make and distribute the first versions of Linux without a risk of running into patent problems.
An "I am unaware of patent problems" declaration is not worth spit in ANY The critical part is the employer disclaimer
country. Also, if you don't believe in software patents, making them
appear important is not a good idea.
I love this quote from Linus (in this month's Linux User and Developer): The critical part is the employer disclaimer
"I do not look up any patents on principle, because (a) it's a horrible
waste of time and (b) I don't want to know. The fact is, technical people
are better off not looking at patents. If you don't know what they cover
and where they are, you won't be knowingly infringing on them. If
somebody sues you, you change the algorithm or you just hire a hitman to
whack the stupid git."
He later noted that this "may not be legally tenable advice" :-)
Linus may have been born in Europe but he lives and works in California.
The critical part is the employer disclaimer
I am constantly surprised how much Linus and the kernel developers are learning to scale up. Linus on documenting patch provenance
Linus * 8 * 8 * 8 = 512 people.
but then one has to take into consideration overlap of the same people such as a developer being in the "trust group" of several system_maintainers and overlap of system maintainers "trust group" of kernel_maintainers so best to assume no more than "60% to %70 efficiency".
[note at bottom]
***When I say full time developers, I don't mean people who wrote a device driver, and update it every once in a while (they are on no one's "trust list" since nobody collaberates with them to any great deal)- emphasis should be on FULL in full time developers.
This process is barely bureaucratic. Under a truly bureaucratic process, there would be Linus on documenting patch provenance
individuals who would need to be in the chain solely because the process says that they're in
the chain. That does not sound like what we're looking at here, with one occasional
exception. And last I heard, that one exception was working on getting himself as removed
as he could from that bit. Personally, this feels like a fairly slick implementation of what's
needed.
contributors - If Joe "One Patch" Programmer[*] contributes one fairly major patch, and is
never heard from again, the only indication that it was a truly unencumbered patch was that
he said it was his original code. But if five years down the road, Incorporated Company,
Inc[**], claims that he was working for them at that time, and it's their code, things could still
be messy.
I'd think that people who wrote a device driver or two, but contribute less than once a month
will still probably be on the trust list of at least one individual. However, they will probably
not be on multiple people's trust list unless they are active elsewhere in open source.
probably take quite a while to get on a trust list.
All liability for damage in this case would be for the original patch submitter as he certified that he did not copy the code while he did. Linus on documenting patch provenance
Weird example if someone said he owned a house and he cerifies it and he then hires you to demolish the house so that he can build a garage or whatever. If he is in fact not the owner your liability for damages would be limited as you acted in good faith that you established by making him certify he owned the house. If you did not make him certify he owned the house you could be sued for carelessnes even though you still acted in good faith that he was the owner.