SPDX identifiers in the kernel
On its face, compliance with licenses like the GPL seems like a straightforward task. But it quickly becomes complicated for a company that is shipping a wide range of software, in various versions, in a whole set of different products. Compliance problems often come about not because a given company wants to flout a license, but instead because that company has lost track of which licenses it needs to comply with and for which versions of which software. SPDX has its roots in an effort that began in 2009 to help companies get a handle on what their compliance obligations actually are.
It can be surprisingly hard to determine which licenses apply to a given repository full of software. The kernel's COPYING file states that it can be distributed under the terms of version 2 of the GNU General Public License. But many of the source files within the kernel tell a different story; some are BSD licensed, and many are dual-licensed. Some carry an exception to make it clear that user-space programs are not a derived product of the kernel. Occasionally, files with GPL-incompatible licenses have been found (and fixed).
A great many files in the kernel source tree carry no license text at all.
One might presume that these files are covered by GPLv2 but, as we'll see,
the situation may not be quite that simple. No-license files are also
problematic because the Developer Certificate of
Origin, which governs contributions to the kernel, refers explicitly to
"the open source license indicated in the file
". If there
is no license indicated in the file, the meaning of that phrase is
not entirely clear.
Another complicating factor is that the license text in kernel source files, when it is present at all, is entirely free-form. There are hundreds of variants of the GPLv2 text alone. That can make it hard for human readers to figure out what's going on, but it is even more challenging for software. It is not currently possible to run a tool on the kernel repository (or that of many other projects) and get a definitive list of the operative licenses.
The Software Package Data Exchange (SPDX) standard is an attempt to address this aspect of the licensing problem. This effort, which has come under the umbrella of the Linux Foundation's compliance program, has defined a way to declare licensing information that is intended to be easily read by both humans and machines. At its core, SPDX defines a single-line string to specify the license governing a file. It looks something like:
SPDX-License-Identifier: GPL-2.0
There is a long list of known licenses and the ability to add extra conditions or exceptions where needed. If each file in a repository contains one of these strings, summing up the licensing information for the repository as a whole becomes a straightforward affair.
SPDX has been adopted in various parts of the industry in recent years. The effort to add SPDX identifiers to the kernel has been playing out, mostly in private, for at least a couple of years. It recently surfaced in the form of a huge patch set adding SPDX identifiers to over 12,000 kernel source files that did not have any license information at all, and as a brief discussion at the 2017 Maintainers Summit. Somewhat later, some documentation on the project surfaced. Fully documenting the kernel with SPDX tags will take a while, but the process is well underway at this point.
For kernel source files, the decision was made that the SPDX tag should appear as the first line in the file (or the second line for scripts where the first line must be the #! string). For normal C source files, the string will be a comment using the "//" syntax; header files, instead, use traditional (/* */) comments for reasons related to tooling. Thus, for example, if one looks at arch/alpha/include/uapi/asm/a.out.h, one will see at the top:
/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
The WITH string says that the kernel's user-space exception applies to this file, since it defines part of the system-call ABI.
Kernel developers are often short of patience for things that look like bureaucratic exercises, so it would not have been surprising to see some opposition to this project. In truth, there has been little. The biggest issue would appear to be that some of the no-license files that were marked as GPLv2 should maybe carry a different license. One could argue that this kind of disagreement is a good thing, in that it points out a place where the license applying to a specific file was not what most people might expect. Once this kind of problem comes to light, it can be addressed.
The plan is to eventually have SPDX tags in all kernel source files, but that process could take some time. For each file that already carries a license text, somebody has to look and ensure that the SPDX tag matches that text exactly. Given that there are around 60,000 files in the kernel repository, that's a fair amount of work. An additional goal is to eventually get rid of the other license texts; the consensus seems to be that the SPDX identifier is a sufficient declaration of the license on its own. But removing license text from source files must be done with a great deal of care, so it may be a long time before anybody works up the courage to attempt that on any files that they do not themselves own the copyright for.
It would not be surprising to see the process of adding SPDX tags extend
over years. There will likely be an occasional flare-up as this work
uncovers files with ambiguous or uncertain licensing, but that should
result in more clarity around the licensing of the kernel as a whole once
things are worked out. At the end, perhaps we'll know what the kernel's
license story really is.
Index entries for this article | |
---|---|
Kernel | Copyright issues |
Posted Nov 16, 2017 19:14 UTC (Thu)
by compenguy (guest, #25359)
[Link] (10 responses)
Most open source licenses carry a requirement that you distribute the exact license text for the work with the work. Simply saying "this is under the MIT license" is inadequate. Distributing a template MIT license is inadequate. Distributing the exact text of the MIT license file provided by the copyright holders (generally containing one or more copyright statements) is what's required. Without the means to locate and collate the obligatory license files, compliance is difficult, expensive, and easy to get wrong.
tl;dr: I really hope that once they get these license identifiers into files, they don't say "yay, we did it!" and stop. In many instances they'll need to either annotate the source with the full text of each original license, or provide the path to each of those original licenses in the source tree.
Posted Nov 16, 2017 19:36 UTC (Thu)
by sharkhands (guest, #114731)
[Link] (7 responses)
Posted Nov 16, 2017 23:19 UTC (Thu)
by JdGordy (subscriber, #70103)
[Link] (6 responses)
Posted Nov 17, 2017 9:33 UTC (Fri)
by tglx (subscriber, #31301)
[Link] (5 responses)
Any SPDX identifier used in the kernel must have a corresponding file under
Posted Nov 17, 2017 18:59 UTC (Fri)
by compenguy (guest, #25359)
[Link] (4 responses)
I don't understand how that system will work.
There could be 12 files, each under the license with the SPDX identifier "BSD-2-Clause", but each of those 12 could (and likely would) have different copyright statements or other additional text. So if they all reference a file named "BSD-2-Clause", how will the correct variant of the data be included? "cat foo1/LICENSE.txt foo2/LICENSE.txt foo3/LICENSE.txt ... > BSD-2-Clause" probably follows the letter of the license terms, but not the spirit because there's no way to know which entry in that file corresponds with a specific source file.
Posted Nov 19, 2017 11:06 UTC (Sun)
by Jonno (subscriber, #49613)
[Link] (3 responses)
The SPDX identifier "BSD-2-Clause" refers *explicitly* to the BSD 2-clause "Simplified" License [1]. If the licence text differs, you use a different SPDX identifier. The SPDX Licence List [2] currently contain 15 BSD licence variants, and more can be added as needed.
[1] https://spdx.org/licenses/BSD-2-Clause.html
Posted Nov 19, 2017 16:54 UTC (Sun)
by compenguy (guest, #25359)
[Link] (2 responses)
The license variants cover the different *terms* of the license, not the variations in copyright statement, nor will they ever.
Look at the link you supplied for the SPDX BSD-2-Clause license. All the text in red is boilerplate that must be filled in with the actual values for the project whose license you're attempting to comply with:
> Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Templatized or boilerplate copies of the license are insufficient to meet this requirement.
Posted Nov 20, 2017 13:22 UTC (Mon)
by Jonno (subscriber, #49613)
[Link] (1 responses)
True, but the problem with the BSD licences are that the *terms* of the licence differs depending on the copyright holder (as the original BSD licence explicitly mentioned "University of California, Berkeley" and other copyright holders replaced that with their own name when they released code under an equivalent license).
For example, the *only* difference between the *terms* of BSD-2-Clause and BSD-2-Clause-NetBSD are these two phrases:
(1) "THE COPYRIGHT HOLDERS" <-> "THE NETBSD FOUNDATION, INC."
>> Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Obviously using SPDX to *identify* the licence does not absolve you of the responsibility to also *comply* with the license. But SPDX license identifiers *can* make it easier to do so.
Consider the following hypothetical project of 5 source files. Note that file01.c and file02.c have different copyright holders and thus different copyright notices, but both are under the same licence, so you only need one copy of the license *terms* (located in the file LICENSE.BSD-2-Clause). However, file03.c is copied from NetBSD and are thus under a slightly different license. Therefore it does not only uses a different copyright notice, but also a different SPDX license identifier and a separate copy of the license terms (this time located in LICENSE.BSD-2-Clause-NetBSD). Note that LICENSE.BSD-2-Clause and LICENSE.BSD-2-Clause-NetBSD only differs by 4 words, but (as you said) some sort of template would not actually comply with either license, and you therefore need to carry both versions. Next is file04.c which is copied from FreeBSD, which is under another slightly different license, and thus a third SPDX license identifier and a third license file. And finally we have file05.c which contains parts copied from NetBSD, parts copied from FreeBSD, and parts written by me, which means that to use this file you need to comply with both the NetBSD license and the FreeBSD license (which fortunately are compatible with each other, you just need to keep both license files around to comply with both of them at the same time). All in total there are 5 source files with a total of 6 copyright notices, but thanks to the SPDX license identifiers we only need one copy each of the 3 license terms.
========== file01.c ==========
========== file02.c ==========
========== file03.c ==========
========== file04.c ==========
========== file05.c ==========
==== LICENSE.BSD-2-Clause ====
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
= LICENSE.BSD-2-Clause-NetBSD =
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
= LICENSE.BSD-2-Clause-FreeBSD =
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE FREEBSD PROJECT ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FREEBSD PROJECT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project.
Posted Nov 20, 2017 15:27 UTC (Mon)
by compenguy (guest, #25359)
[Link]
I will say that the policy we enacted based on that guidance is very similar to the policies of all the other corporations I've worked with, as well as a wide range of products I've used. I will also direct you to SPDX's own documentation for producing SDPX distribution notices: https://spdx.org/sites/cpstandard/files/pages/files/spdx-...
Posted Nov 30, 2017 14:34 UTC (Thu)
by pombredanne (guest, #113982)
[Link]
Posted Dec 8, 2017 10:22 UTC (Fri)
by hook (guest, #62261)
[Link]
As for general best practices on copyright and license info, have you checked <https://reuse.software/> yet?
Posted Nov 17, 2017 7:18 UTC (Fri)
by rakoenig (subscriber, #29855)
[Link] (1 responses)
If there would be a valid SPDX tag everywhere it would make automation of our ugly process easier. Currently we sometimes invest more effort in the bureaucratic and legally required stuff than in actually developing software.
Posted Nov 30, 2017 14:36 UTC (Thu)
by pombredanne (guest, #113982)
[Link]
Posted Nov 17, 2017 12:21 UTC (Fri)
by corsac (subscriber, #49696)
[Link]
Posted Nov 17, 2017 19:17 UTC (Fri)
by bfields (subscriber, #19510)
[Link] (1 responses)
Anything in the kernel source tree should be available under GPLv2. So I guess the only reason you'd care about the license on individual files is if you want to incorporate some portion into a more liberally-licensed project? Are there that many of those projects? I must be missing something.
Posted Nov 17, 2017 19:55 UTC (Fri)
by lkundrak (subscriber, #43452)
[Link]
Posted Nov 21, 2017 14:56 UTC (Tue)
by timur (guest, #30718)
[Link] (1 responses)
Posted Nov 30, 2017 14:37 UTC (Thu)
by pombredanne (guest, #113982)
[Link]
Posted Nov 24, 2017 15:23 UTC (Fri)
by zuki (subscriber, #41808)
[Link] (1 responses)
Seems like a good idea, I hope this catches, and e.g. Fedora's licensecheck scripts can be made more reliable.
Posted Nov 25, 2017 18:16 UTC (Sat)
by flussence (guest, #85566)
[Link]
I'm not so much a fan of its standalone metadata file format, but I don't think I'm ever going to find a package metadata format that's good - they all gradually expand until they reimplement half of X.509.
Posted Nov 25, 2017 1:51 UTC (Sat)
by mtaht (subscriber, #11087)
[Link]
Posted Nov 30, 2017 14:40 UTC (Thu)
by pombredanne (guest, #113982)
[Link]
Posted Dec 7, 2017 11:51 UTC (Thu)
by cbcbcb (subscriber, #10350)
[Link] (2 responses)
Posted Dec 7, 2017 11:59 UTC (Thu)
by gregkh (subscriber, #8)
[Link] (1 responses)
Also, what happens when you combine multiple .c files into a single .o file?
It gets messy very quickly, we tried it. Hence the fall-back to a comment, which works out much better
Posted Dec 7, 2017 13:40 UTC (Thu)
by nix (subscriber, #2304)
[Link]
Posted Dec 10, 2017 21:35 UTC (Sun)
by zack (subscriber, #7062)
[Link] (1 responses)
Posted Dec 10, 2017 21:36 UTC (Sun)
by zack (subscriber, #7062)
[Link]
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
LICENSES/... The file name is the same as the SPDX identifier and it contains
the full license text along with some machine parsable meta data.
SPDX identifiers in the kernel
SPDX identifiers in the kernel
[2] https://spdx.org/licenses/
SPDX identifiers in the kernel
SPDX identifiers in the kernel
> The license variants cover the different *terms* of the license, not the variations in copyright statement, nor will they ever.
(2) "THE COPYRIGHT HOLDER" <-> "THE FOUNDATION"
>Templatized or boilerplate copies of the license are insufficient to meet this requirement.
// Copyright (c) 2012-2016 Joe Random Hacker
// SPDX-License-Identifier: BSD-2-Clause
...
// Copyright (c) 2017 Jon Severinsson
// SPDX-License-Identifier: BSD-2-Clause
...
// Copyright (c) 2008 The NetBSD Foundation, Inc.
// SPDX-License-Identifier: BSD-2-Clause-NetBSD
...
// Copyright (c) 1992-2012 The FreeBSD Project
// SPDX-License-Identifier: BSD-2-Clause-FreeBSD
...
// Copyright (c) 2008 The NetBSD Foundation, Inc.
// Copyright (c) 1992-2012 The FreeBSD Project
// Copyright (c) 2017 Jon Severinsson
// SPDX-License-Identifier: BSD-2-Clause-NetBSD AND BSD-2-Clause-FreeBSD
...
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
Don't stop with the kernel, extend it to all software projects
Don't stop with the kernel, extend it to all software projects
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel, systemd, casync
SPDX identifiers in the kernel, systemd, casync
SPDX identifiers in the kernel
SPDX identifiers in the kernel
It seems a shame that these identifiers are written as comments. With appropriate macro definitions you could have
SPDX identifiers in the kernel
#include "spdx.h"
SPDX_LICENCE_IDENTIFIER("GPL-2.0")
And with appropriate definitions, the licence information could be propagated into object and executable files too. This would be useful, as it takes effort to establish which source files have been used in a build, and therefore which licences apply to your binaries.
SPDX identifiers in the kernel
SPDX identifiers in the kernel
SPDX identifiers in the kernel
It doesn't seem to be one of the officially supported exception of SPDX https://spdx.org/licenses/exceptions-index.html
SPDX identifiers in the kernel