|
|
Subscribe / Log in / New account

Copyright notices (or the lack thereof) in kernel code

By Jonathan Corbet
October 27, 2022
The practice of requiring copyright assignments for contributions to free-software projects has been in decline for years; the GNU Binutils project may be the latest domino to fall in that regard. The Linux kernel project, unlike some others, has always allowed contributors to retain their copyrights, resulting in a code base that has widely distributed ownership. In such a project, who owns the copyright to a given piece of code is not always obvious. Some developers (or their employers) are insistent about the placement of copyright notices in the code to document their ownership of parts of the kernel. A series of recent discussions within the Btrfs subsystem, though, has made it clear that there is no project-wide policy on when these notices are warranted — or even acceptable.

In early September, a patch series implementing fscrypt integration for the Btrfs filesystem included this patch adding, among other things, a one-line Facebook copyright notice. Btrfs maintainer David Sterba replied with a request to limit copyright information to SPDX tags; he cited a page in the Btrfs wiki, asserting that these tags are a complete replacement for copyright notices. Christoph Hellwig disagreed, pointing out that SPDX describes licensing but not ownership:

It is not a replacement for the copyright notice in any way, and having been involved with Copyright enforcement I can tell you that at least in some jurisdictions Copyright notices absolutely do matter.

Hellwig, of course, was the initiator of a GPL-infringement lawsuit against VMware that was dismissed due to an inability to prove ownership of the code in question. It is thus unsurprising that he is sensitive to the placement of copyright notices in the code itself. When Hellwig submitted a patch of his own, also in September, that added a copyright notice to a newly created file, Sterba let it be known that he would refuse that change as well. Toward the end of October, in the discussion of yet another patch set, Hellwig eventually withdrew the work, saying:

FYI, I object to merging any of my code into btrfs without a proper copyright notice, and I also need to find some time to remove my previous significant changes given that the btrfs maintainer refuses to take the proper and legally required copyright notice.

Given that the kernel code has no shortage of copyright notices (nearly 79,000 lines contain the word "copyright"), it is natural to wonder why this policy is being applied in the Btrfs subsystem. The Btrfs wiki page describes the reasoning:

The copyright notices are not required and are discouraged for reasons that are practical rather than legal. The files do not track all individual contributors nor companies (this can be found in git), so the inaccurate and incomplete information gives a very skewed if not completely wrong idea about the copyright holders of changes in a given file. The code is usually heavily changed over time in smaller portions, slowly morphing into something that does not resemble the original code anymore though it shares a lot of the core ideas and implemented logic. A copyright notice by a company that does not exist anymore from 10 years ago is a clear example of uselessness for the developers.

The page also states that the Signed-off-by tags found in the kernel's Git history are sufficient to document the copyright status of the code. There are a few difficulties with this position, including the fact that those tags indicate that the submitter has the right to contribute the code to the kernel, but do not necessarily show who the copyright owner is. Another problem was pointed out by Bradley Kuhn: if the Git history serves as the copyright notices for the code, then it will be necessary to ship the entire Git repository to be in compliance with the GPL's source-code requirements. That makes complaints about copyright notices in the code being unwieldy lose some of their weight.

In the most recent discussion, Chris Mason said the "Christoph's request is well within the norms for the kernel". Sterba replied that he would consider changing the policy, but only as part of a wider policy decision by the kernel project:

I've asked for recommendations or best practice similar to the SPDX process. Something that TAB can acknowledge and that is perhaps also consulted with lawyers. And understood within the linux project, not just that some dudes have an argument because it's all clear as mud and people are used to do things differently.

It's not clear who Sterba has asked for recommendations at this point. Chances are that he will find, over time, that the Btrfs subsystem's position on copyright notices is not widely held across the project as a whole. Steve Rostedt arguably described the consensus view: "The policy is simple. If someone requires a copyright notice for their code, you simply add it, or do not take their code". In the absence of a decree from Linus Torvalds, though, the issue of copyright notices may continue to be a source of disagreement. Claiming copyright on a portion of a shared body of code can always be a touchy matter, but it's one that developers can care a lot about.

Index entries for this article
KernelCopyright issues


to post comments

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 15:53 UTC (Thu) by Wol (subscriber, #4433) [Link] (13 responses)

Why not make it an optional attachment to the authored tag? Would that work?

Authored by: J Random Developer: (c) Fred Bloggs Ltd 2022

That way it also shows up if somebody changes employ - the "author" line will change. Does git say "this line came from that patch"? So if they want the author, the copyright would show up at the same time.

Cheers,
Wol

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 16:37 UTC (Thu) by NYKevin (subscriber, #129325) [Link] (3 responses)

Git blame will tell you which patch last touched a line of code, but if you want to figure out who actually wrote something for copyright purposes, it's a bit more involved and messy. You don't want to see random code cleanup or other secondary activity - you want to know where the broad structure of a file or module came from, and that requires a human to review and understand the entire history of the file or files involved. Git does have the technical capability to show you that history, but it's labor-intensive and can't realistically be automated (at least, not without significant AI advancements, anyway).

OTOH, copyright notices are not necessarily going to make that any easier. If the code says it is copyright Fred Bloggs Ltd, that is not necessarily true. It may have started out as a work by that company, and then over time, other individuals may have contributed to such an extent that the original would be unrecognizable. I'm dubious that a court would recognize a claim of copyright in those circumstances, but you'd have to ask a lawyer.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 16:58 UTC (Thu) by Wol (subscriber, #4433) [Link]

> OTOH, copyright notices are not necessarily going to make that any easier. If the code says it is copyright Fred Bloggs Ltd, that is not necessarily true. It may have started out as a work by that company, and then over time, other individuals may have contributed to such an extent that the original would be unrecognizable. I'm dubious that a court would recognize a claim of copyright in those circumstances, but you'd have to ask a lawyer.

Yup. if the lines have been re-written, mangled, diluted etc then there might be an argument over whether copyright has survived, but at least git shows clearly who contributed, what they contributed, and who owned the contribution.

It at least removes some uncertainty - if my employer owns my contributions, then I leave but continue contributing to the project, I can't claim the contributions on my employers dime as my own (or vice versa - they can't claim mine from before I joined).

The thing is, it makes a clear claim (a) of ownership, and (b) of what is owned. There's still going to be argument over whether it was worthy of copyright, and of whether enough of it survived to keep a valid copyright claim.

But that would always be the case. This just reduces the amount of crap the lawyers can argue over.

Cheers,
Wol

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 11:50 UTC (Fri) by khim (subscriber, #9252) [Link] (1 responses)

> OTOH, copyright notices are not necessarily going to make that any easier.

They absolutely do make life much easier in cases like Hellwig's one.

Copyright notice clearly shows that code originates from something owned by the company or individual mentioned in these notices and then it becomes problem for the other party to prove that copyright notice is a lie.

It's not impossible, it have been done, but it's very hard.

If there are no copyright notice, on the other hand, then there are no such presumption, then you have to prove that you actually wrote enough of the code to be entitled for the copyright protection.

Yes, it's basically just the question of “who pays the lawyer”, not question of what court would, ultimately, decide… but it's still very important distinction in practice.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 29, 2022 3:48 UTC (Sat) by buck (subscriber, #55985) [Link]

> Yes, it's basically just the question of “who pays the lawyer”, not question of what court would, ultimately, decide… but it's still very important distinction in practice.

In a world where the cost of getting hauled into court is the long and short of some people's business model, you have a fiendishly sly sense of humor. [grin]

I don't know about this question of copyright and who's going to be in a position to defend it, but if there's any possibility it makes the lawyer tax fall more heavily on somebody trying to misappropriate the code or the "embodied" IP, that seems like a pretty persuasive argument against Copyright-comment minimalism. The rest of everything everybody does in this country I live in, anyway, is in large part guided by lawyer-tax-avoidance considerations. Not having that make a noticeable mark on every commitdiff is almost quaint. [wink]

Sorry; just being cynical. Please ignore if you're perturbed by me being so glib about such things, or if you're a lawyer.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 16:41 UTC (Thu) by dullfire (guest, #111432) [Link] (8 responses)

As the article briefly mentioned, the problem with relying on git is that it is only loosely tied to the source. I believe the GPL forbids (indirectly) stripping of copyright information on redistribution. US copyright laws also forbid this.

Using git as the sole source of copyright attribution would render is inadvisable (or maybe even illegal) to distribute source tarballs (that did not include the full git history).

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 17:08 UTC (Thu) by Wol (subscriber, #4433) [Link] (7 responses)

But if the copyright is in the "authored by" tag, then the (meta)data only ever exists in git. So shipping a source tarball isn't stripping data that was never there.

I hate to say it, but it's perfectly normal practice, when the reams of copyright headers get excessive, for them to be stripped from the live source and a note replaces them saying "look at the previous version for historic copyrights".

Is it a criminal offence to strip notices, or just civil? If it's a civil offence, then you'd have to prove damages, and if it's not done with the intention of breaking copyrights, but only with the intention of making working with the code easier, that would be very hard to do.

Cheers,
Wol

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 17:29 UTC (Thu) by dullfire (guest, #111432) [Link]

What I am saying is: if you have the git repo (that is acknowledged to have copyright attribution information) you are forbidden from shipping tarballs of just source off that (because that strips out the copyright information).

Assuming all source tarballs come from git sources, shipping of source-only tarballs would be illegal (used loosely) at some point (though maybe not for simply redistributing the tarball you got).

So let me clearly say "I am not a lawyer". With that out of the way, DMCA § 1204 provides for criminal charges (in some case, I recommend reading it for your self[1]). DMCA § 1203 provides for civil liabilities.

[1] https://www.law.cornell.edu/uscode/text/17/1204 see also the links to § 1202 b.2, and § 1202 c for definitions of terms

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 17:40 UTC (Thu) by matthias (subscriber, #94967) [Link]

> But if the copyright is in the "authored by" tag, then the (meta)data only ever exists in git. So shipping a source tarball isn't stripping data that was never there.

Where is the difference if you strip a COPYRIGHT file that contains the copyright information or the .git folder that contains the copyright information?

If I would add parts of the source code as git attributes (strange idea but possible) and then use a build script to extract and compile them. Would you say that it is perfectly valid to only distribute the parts of the source code that are in the files? Or does the mere fact that I put some of the code into git attributes enforce me to also include these when I distribute code?

Is there a difference between code that is hidden in git attributes and copyright notices when it comes to what is allowed to be omitted and what is not allowed to be omitted when creating a source tarball?

Cheers,
Matthias

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 17:45 UTC (Thu) by mpr22 (subscriber, #60784) [Link] (1 responses)

Criminal vs. civil, for copyright, is dependent on context.

I understand that in some jurisdictions, copyright infringement is uniformly a matter of criminal law, while in others, whether copyright infringement constitutes a crime or a tort depends on things like "scale" and "commerciality". (And I dare say there's a jurisdiction somewhere out there where copyright infringement is purely a civil matter.)

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 23:30 UTC (Thu) by NYKevin (subscriber, #129325) [Link]

> (And I dare say there's a jurisdiction somewhere out there where copyright infringement is purely a civil matter.)

The US comes surprisingly close. Commercial copyright infringement is technically a criminal matter, but in practice the federal government has better things to do, so you would usually only get prosecuted if you make a nuisance of yourself and the (rather substantial) civil remedies are inadequate. See for example Kim Dotcom. But the vast majority of copyright infringement is either handled civilly or informally (i.e. without directly involving the court system, usually in the form of DMCA notices, as well as stuff like ContentID).

Copyright notices removal -- us law

Posted Oct 27, 2022 17:48 UTC (Thu) by stephen.pollei (subscriber, #125364) [Link] (1 responses)

USC title 17, chapter 5, section 506... (d) Fraudulent Removal of Copyright Notice.—Any person who, with fraudulent intent, removes or alters any notice of copyright appearing on a copy of a copyrighted work shall be fined not more than $2,500.

I'm not a lawyer, but I think it's criminal and not civil... however key words are "fraudulent intent". Perhaps, if the intent is to declutter the source code and you have a good-faith reason to think git history is sufficient then there might be no issue. Maybe, it is best to not put yourself in situation where you have to explain intent in a court.

Copyright notices removal -- us law

Posted Oct 27, 2022 17:52 UTC (Thu) by stephen.pollei (subscriber, #125364) [Link]

I just noticed that section 506 is labeled "Criminal offenses"... So yes criminal not civil tort.

Copyright notices (or the lack thereof) in kernel code

Posted Jan 1, 2023 16:37 UTC (Sun) by agowa338 (guest, #162947) [Link]

Just btw. You don't only need to look at US copyright. But you would have to check worldwide copyright laws...

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 18:01 UTC (Thu) by IanKelling (subscriber, #89418) [Link]

When GCC decided to accept some DCO contributions, FSF identified this issue "the existing DCOs don't make it clear who is the author of the code in a contribution." https://www.fsf.org/blogs/licensing/FSF-copyright-handling .

Copyright notices (or the lack thereof) in kernel code

Posted Oct 27, 2022 23:12 UTC (Thu) by mtaht (subscriber, #11087) [Link] (2 responses)

I like the complicated hairball of the linux kernel copyrights, and I think everyone that has ever contributed deserves a copyright. Joint ownership of Linux has been the key to its staying power and growth.

I'm increasingly frustrated that any level of gpl enforcement against serial violators, particularly in the embedded market, has faded. Cambium and ubnt both stopped doing GPL drops a few years ago. So many "security" cams, so many other devices, so obviously based on linux, lacking GPL drops, also.

Lacking GPL enforcement, it would be best for the world, if somehow those that are copying and going be strongly encouraged again to work within open source best practices.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 30, 2022 21:05 UTC (Sun) by andy_shev (subscriber, #75870) [Link] (1 responses)

If I put a copyright notice to each file I have changed during my contribution to the Linux kernel, it will probably cover 20-25% of the entire code base (by file). Would it mean that I share a copyright of 25% of the Linux kernel and can have a big voice to rule the direction the project is going to? (It's a rhetorical Q obviously)

Copyright notices (or the lack thereof) in kernel code

Posted Oct 31, 2022 6:05 UTC (Mon) by ssmith32 (subscriber, #72404) [Link]

No one is going to have a big voice in the project because of copyright. You may know more than me.. but that's just not how kernel development works, afaict.

You may help someone going after GPL infringement, though. If I understand the arguments being made here.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 4:56 UTC (Fri) by scientes (guest, #83068) [Link] (5 responses)

I would rather it read "The Linux kernel is collectively licensed under the GPLv2 with no 'copyright-owner' organization exception that can re-license it" or something to that effect. Ownership is entirely a matter of control, and if I have a git copy of Linux I own Linux, period, even if it was not licensed under GPLv2. All the self-righteous "property" talk only distracts from the political and social issues involved, and comes from corrupt and good-for-nothing lawyers that do nothing but sew discontent.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 4:59 UTC (Fri) by scientes (guest, #83068) [Link]

Monopolies are not property, but nobel and moral rights. This is why distributing binary blobs is immoral, but distributing UNIX source code to another licensee of UNIX (and this continues today with ARM licensees) is nobel.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 7:31 UTC (Fri) by LtWorf (subscriber, #124958) [Link] (2 responses)

Reading the article, it seems that without ownership, there is nobody that can complain when the GPL gets violated (which happens all the time).

In a world without copyright you'd be absolutely right.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 8:41 UTC (Fri) by leromarinvit (subscriber, #56850) [Link] (1 responses)

> it seems that without ownership, there is nobody that can complain when the GPL gets violated

Regarding this, I'm hoping something comes from SFC's enforcement suit against Vizio (https://lwn.net/Articles/895405/). They're suing as a buyer of an affected device, not as a copyright owner. If they win this, owners of violating devices would have credible power against manufacturers even in the face of copyright owners who don't care - which, in the case of Linux, seems to be at least a significant minority (or maybe even the majority).

Currently, all you can do as a user is say "pretty please" if the copyright owner doesn't care, and lots of companies get away with ignoring that. This would be a massive improvement in my book.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 30, 2022 19:15 UTC (Sun) by LtWorf (subscriber, #124958) [Link]

Yeah that would be a game changer. But for now it's not certain, so better rely on what we have and is known to work.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 7:43 UTC (Fri) by rsidd (subscriber, #2582) [Link]

Copyright is about copying/distribution not ownership. You can do what you like with the linux tree on your machine. If you want to build it with a GPL-incompatible driver (that you legally obtained) you can do it on your machine. The problem arises if you want to distribute the result (even if to just one other person).

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 8:17 UTC (Fri) by hverkuil (subscriber, #41056) [Link]

One of the rare cases I experienced where copyright mattered (outside of lawsuits) is if you want to change the license. For example, going from GPLv2-only to GPLv2+BSD. In that case you need to contact the copyright holders (where possible) to get approval for the change.

We had that situation at least once in the media subsystem.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 12:17 UTC (Fri) by karim (subscriber, #114) [Link] (5 responses)

Short of having git generate an automatic footer (sort of like CVS' $Id$) that tracks lines by contributor, and would possibly therefore automatically remove a *submitter* (not author) from the list if their lines go away, then there will always be missing copyright information from the kernel. When the GPL was first authored I don't think the current massively distributed and break-neck speed commit model was necessarily envisioned, and it's likely that there needs to be at some point legal recognition that the copyright information might be extraneous to the sources, with relevant pointers at the proper places.

FWIW, git offers the "--author" flag for "commit". Maybe that'd be useful here?

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 12:59 UTC (Fri) by geert (subscriber, #98403) [Link] (4 responses)

> FWIW, git offers the "--author" flag for "commit". Maybe that'd be useful here?

That just overrides user.name/user.email in git's configuration.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 13:14 UTC (Fri) by karim (subscriber, #114) [Link] (3 responses)

Hmm. Are you sure?

From https://git-scm.com/docs/git-commit :
"--author=<author>

Override the commit author. Specify an explicit author using the standard A U Thor <author@example.com> format. Otherwise <author> is assumed to be a pattern and is used to search for an existing commit by that author (i.e. rev-list --all -i --author=<author>); the commit author is then copied from the first such commit found."

Am I misreading what this does? Note: I'm not a regular user of this functionality, so I might be missing the mark.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 21:02 UTC (Fri) by mathstuf (subscriber, #69389) [Link] (2 responses)

If `--author` is not provided, it pulls the info from the `user.name` and `user.email` configurations.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 29, 2022 20:20 UTC (Sat) by Wol (subscriber, #4433) [Link] (1 responses)

And what everyone here is missing (and I made clear in my comment) you can NOT use the "author" commit to identify the copyright holder. You MUST EXplicitly specify the copyright holder if you want to know who the copyright holder is.

A LOT of contributors do not own the copyright in their contributions. I've got a feeling I might soon need to sort out that mess in ScarletDME ...

Cheers,
Wol

Copyright notices (or the lack thereof) in kernel code

Posted Oct 30, 2022 5:18 UTC (Sun) by pabs (subscriber, #43278) [Link]

Obligatory reminder for those who don't hold copyright over the open source that they write for their employer to please renegotiate your employment contract to change that and other things:

https://sfconservancy.org/contractpatch/

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 15:08 UTC (Fri) by flussence (guest, #85566) [Link] (4 responses)

Is this office politics BS why Btrfs's RAID5/6 code has been about as reliable as a fake microSD card since its inception?

If this guy's causing a chilling effect on people trying to contribute then I'd say the real problem isn't copyright, but that the kernel's CoC is entirely toothless.

Btrfs RAID

Posted Oct 28, 2022 15:17 UTC (Fri) by corbet (editor, #1) [Link]

I think the RAID5/6 problems in Btrfs persist because nobody has put in the time to fix them. That is certainly a problem but a different one than the subject of this article; I wouldn't mix the two.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 16:16 UTC (Fri) by atnot (subscriber, #124910) [Link] (1 responses)

RAID 5/6 is broken because nobody with money or sufficient time cares enough about it anymore at this point.

The only place hard drives really live these days is in network storage devices, which do their own redundancy locally, often as some sort of cluster. If you're using local disks they're going to be high performance SSDs, in RAID 10 because the overhead of calculating parity at those speeds would be too high anyway.

So at this point the only one you've got left is enthusiasts building a DIY NAS at home, who have enough time on their hands to just deal with the inconveniences of dealing with ZFS anyway.

So unless some brand new use case for RAID 5/6 appears from somewhere, I don't think this will change even under the friendliest maintainership.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 28, 2022 21:45 UTC (Fri) by Conan_Kudo (subscriber, #103240) [Link]

There is work going on to fix RAID 5/6 modes right now.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 30, 2022 20:45 UTC (Sun) by jhoblitt (subscriber, #77733) [Link]

Storage stacks have always taken a shockingly long time to mature, often a decade or two, and require a massive number of users to chase out the bugs which cause data loss or corruption but only have a rate of incidence of once per century/host or less.

The reality is that neither the user base or commercial financial interest is present to mature reed-solomon codes for a single node storage solution. Evidence of this is that RedHat has dropped support in RHEL for btrfs completely. Mid to large scale organizations have either outsourced the problem to "the cloud" or they use a distributed storage system that has erasure-codes and/or replicas spread across multiple nodes. Single host storage solutions are simply too unreliable to be trusted with important data.

Copyright notices (or the lack thereof) in kernel code

Posted Oct 31, 2022 5:51 UTC (Mon) by mirabilos (subscriber, #84359) [Link]

Right, if someone contributes something meaningful and it has a copyright notice, accept that.

Except, I’d say, for those gazillion ones printk’d during boot. These are just ridiculous. Something-or-the-other was written for SuSE, and all that.


Copyright © 2022, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds