|
|
Subscribe / Log in / New account

SPDX identifiers in the kernel

By Jonathan Corbet
November 16, 2017
Observers of the kernel's commit stream or mailing lists will have seen a certain amount of traffic referring to the addition of SPDX license identifiers to kernel source files. For many, this may be their first encounter with SPDX. But the SPDX effort has been going on for some years; this article describes SPDX, along with why and how the kernel community intends to use it.

On its face, compliance with licenses like the GPL seems like a straightforward task. But it quickly becomes complicated for a company that is shipping a wide range of software, in various versions, in a whole set of different products. Compliance problems often come about not because a given company wants to flout a license, but instead because that company has lost track of which licenses it needs to comply with and for which versions of which software. SPDX has its roots in an effort that began in 2009 to help companies get a handle on what their compliance obligations actually are.

It can be surprisingly hard to determine which licenses apply to a given repository full of software. The kernel's COPYING file states that it can be distributed under the terms of version 2 of the GNU General Public License. But many of the source files within the kernel tell a different story; some are BSD licensed, and many are dual-licensed. Some carry an exception to make it clear that user-space programs are not a derived product of the kernel. Occasionally, files with GPL-incompatible licenses have been found (and fixed).

A great many files in the kernel source tree carry no license text at all. One might presume that these files are covered by GPLv2 but, as we'll see, the situation may not be quite that simple. No-license files are also problematic because the Developer Certificate of Origin, which governs contributions to the kernel, refers explicitly to "the open source license indicated in the file". If there is no license indicated in the file, the meaning of that phrase is not entirely clear.

Another complicating factor is that the license text in kernel source files, when it is present at all, is entirely free-form. There are hundreds of variants of the GPLv2 text alone. That can make it hard for human readers to figure out what's going on, but it is even more challenging for software. It is not currently possible to run a tool on the kernel repository (or that of many other projects) and get a definitive list of the operative licenses.

The Software Package Data Exchange (SPDX) standard is an attempt to address this aspect of the licensing problem. This effort, which has come under the umbrella of the Linux Foundation's compliance program, has defined a way to declare licensing information that is intended to be easily read by both humans and machines. At its core, SPDX defines a single-line string to specify the license governing a file. It looks something like:

    SPDX-License-Identifier: GPL-2.0

There is a long list of known licenses and the ability to add extra conditions or exceptions where needed. If each file in a repository contains one of these strings, summing up the licensing information for the repository as a whole becomes a straightforward affair.

SPDX has been adopted in various parts of the industry in recent years. The effort to add SPDX identifiers to the kernel has been playing out, mostly in private, for at least a couple of years. It recently surfaced in the form of a huge patch set adding SPDX identifiers to over 12,000 kernel source files that did not have any license information at all, and as a brief discussion at the 2017 Maintainers Summit. Somewhat later, some documentation on the project surfaced. Fully documenting the kernel with SPDX tags will take a while, but the process is well underway at this point.

For kernel source files, the decision was made that the SPDX tag should appear as the first line in the file (or the second line for scripts where the first line must be the #! string). For normal C source files, the string will be a comment using the "//" syntax; header files, instead, use traditional (/* */) comments for reasons related to tooling. Thus, for example, if one looks at arch/alpha/include/uapi/asm/a.out.h, one will see at the top:

    /* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */

The WITH string says that the kernel's user-space exception applies to this file, since it defines part of the system-call ABI.

Kernel developers are often short of patience for things that look like bureaucratic exercises, so it would not have been surprising to see some opposition to this project. In truth, there has been little. The biggest issue would appear to be that some of the no-license files that were marked as GPLv2 should maybe carry a different license. One could argue that this kind of disagreement is a good thing, in that it points out a place where the license applying to a specific file was not what most people might expect. Once this kind of problem comes to light, it can be addressed.

The plan is to eventually have SPDX tags in all kernel source files, but that process could take some time. For each file that already carries a license text, somebody has to look and ensure that the SPDX tag matches that text exactly. Given that there are around 60,000 files in the kernel repository, that's a fair amount of work. An additional goal is to eventually get rid of the other license texts; the consensus seems to be that the SPDX identifier is a sufficient declaration of the license on its own. But removing license text from source files must be done with a great deal of care, so it may be a long time before anybody works up the courage to attempt that on any files that they do not themselves own the copyright for.

It would not be surprising to see the process of adding SPDX tags extend over years. There will likely be an occasional flare-up as this work uncovers files with ambiguous or uncertain licensing, but that should result in more clarity around the licensing of the kernel as a whole once things are worked out. At the end, perhaps we'll know what the kernel's license story really is.

Index entries for this article
KernelCopyright issues


to post comments

SPDX identifiers in the kernel

Posted Nov 16, 2017 19:14 UTC (Thu) by compenguy (guest, #25359) [Link] (10 responses)

SPDX license identifiers are generally insufficient for license compliance. They an important first step in the process of identifying whether compliance is possible, and categorizing what types of obligations compliance carries, but it doesn't help with the actual process of compliance.

Most open source licenses carry a requirement that you distribute the exact license text for the work with the work. Simply saying "this is under the MIT license" is inadequate. Distributing a template MIT license is inadequate. Distributing the exact text of the MIT license file provided by the copyright holders (generally containing one or more copyright statements) is what's required. Without the means to locate and collate the obligatory license files, compliance is difficult, expensive, and easy to get wrong.

tl;dr: I really hope that once they get these license identifiers into files, they don't say "yay, we did it!" and stop. In many instances they'll need to either annotate the source with the full text of each original license, or provide the path to each of those original licenses in the source tree.

SPDX identifiers in the kernel

Posted Nov 16, 2017 19:36 UTC (Thu) by sharkhands (guest, #114731) [Link] (7 responses)

How does this work with projects which have short license comments (i.e. "Distributed under the terms of the MIT license") in source files, and then a separate LICENSE.txt or the like which is intended to apply globally to the codebase?

SPDX identifiers in the kernel

Posted Nov 16, 2017 23:19 UTC (Thu) by JdGordy (subscriber, #70103) [Link] (6 responses)

Presumably there would be an implicit "see the LICENSE.TXT for the full text" associated with this tag?

SPDX identifiers in the kernel

Posted Nov 17, 2017 9:33 UTC (Fri) by tglx (subscriber, #31301) [Link] (5 responses)

Yes, that's in the documentation which is under review right now.

Any SPDX identifier used in the kernel must have a corresponding file under
LICENSES/... The file name is the same as the SPDX identifier and it contains
the full license text along with some machine parsable meta data.

SPDX identifiers in the kernel

Posted Nov 17, 2017 18:59 UTC (Fri) by compenguy (guest, #25359) [Link] (4 responses)

> The file name is the same as the SPDX identifier and it contains the full license text along with some machine parsable meta data.

I don't understand how that system will work.

There could be 12 files, each under the license with the SPDX identifier "BSD-2-Clause", but each of those 12 could (and likely would) have different copyright statements or other additional text. So if they all reference a file named "BSD-2-Clause", how will the correct variant of the data be included? "cat foo1/LICENSE.txt foo2/LICENSE.txt foo3/LICENSE.txt ... > BSD-2-Clause" probably follows the letter of the license terms, but not the spirit because there's no way to know which entry in that file corresponds with a specific source file.

SPDX identifiers in the kernel

Posted Nov 19, 2017 11:06 UTC (Sun) by Jonno (subscriber, #49613) [Link] (3 responses)

> There could be 12 files, each under the license with the SPDX identifier "BSD-2-Clause", but each of those 12 could (and likely would) have different copyright statements or other additional text. So if they all reference a file named "BSD-2-Clause", how will the correct variant of the data be included?

The SPDX identifier "BSD-2-Clause" refers *explicitly* to the BSD 2-clause "Simplified" License [1]. If the licence text differs, you use a different SPDX identifier. The SPDX Licence List [2] currently contain 15 BSD licence variants, and more can be added as needed.

[1] https://spdx.org/licenses/BSD-2-Clause.html
[2] https://spdx.org/licenses/

SPDX identifiers in the kernel

Posted Nov 19, 2017 16:54 UTC (Sun) by compenguy (guest, #25359) [Link] (2 responses)

> If the licence text differs, you use a different SPDX identifier. The SPDX Licence List [2] currently contain 15 BSD licence variants, and more can be added as needed.

The license variants cover the different *terms* of the license, not the variations in copyright statement, nor will they ever.

Look at the link you supplied for the SPDX BSD-2-Clause license. All the text in red is boilerplate that must be filled in with the actual values for the project whose license you're attempting to comply with:

> Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Templatized or boilerplate copies of the license are insufficient to meet this requirement.

SPDX identifiers in the kernel

Posted Nov 20, 2017 13:22 UTC (Mon) by Jonno (subscriber, #49613) [Link] (1 responses)

>> If the licence text differs, you use a different SPDX identifier.
> The license variants cover the different *terms* of the license, not the variations in copyright statement, nor will they ever.

True, but the problem with the BSD licences are that the *terms* of the licence differs depending on the copyright holder (as the original BSD licence explicitly mentioned "University of California, Berkeley" and other copyright holders replaced that with their own name when they released code under an equivalent license).

For example, the *only* difference between the *terms* of BSD-2-Clause and BSD-2-Clause-NetBSD are these two phrases:

(1) "THE COPYRIGHT HOLDERS" <-> "THE NETBSD FOUNDATION, INC."
(2) "THE COPYRIGHT HOLDER" <-> "THE FOUNDATION"

>> Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
>Templatized or boilerplate copies of the license are insufficient to meet this requirement.

Obviously using SPDX to *identify* the licence does not absolve you of the responsibility to also *comply* with the license. But SPDX license identifiers *can* make it easier to do so.

Consider the following hypothetical project of 5 source files. Note that file01.c and file02.c have different copyright holders and thus different copyright notices, but both are under the same licence, so you only need one copy of the license *terms* (located in the file LICENSE.BSD-2-Clause). However, file03.c is copied from NetBSD and are thus under a slightly different license. Therefore it does not only uses a different copyright notice, but also a different SPDX license identifier and a separate copy of the license terms (this time located in LICENSE.BSD-2-Clause-NetBSD). Note that LICENSE.BSD-2-Clause and LICENSE.BSD-2-Clause-NetBSD only differs by 4 words, but (as you said) some sort of template would not actually comply with either license, and you therefore need to carry both versions. Next is file04.c which is copied from FreeBSD, which is under another slightly different license, and thus a third SPDX license identifier and a third license file. And finally we have file05.c which contains parts copied from NetBSD, parts copied from FreeBSD, and parts written by me, which means that to use this file you need to comply with both the NetBSD license and the FreeBSD license (which fortunately are compatible with each other, you just need to keep both license files around to comply with both of them at the same time). All in total there are 5 source files with a total of 6 copyright notices, but thanks to the SPDX license identifiers we only need one copy each of the 3 license terms.

========== file01.c ==========
// Copyright (c) 2012-2016 Joe Random Hacker
// SPDX-License-Identifier: BSD-2-Clause
...

========== file02.c ==========
// Copyright (c) 2017 Jon Severinsson
// SPDX-License-Identifier: BSD-2-Clause
...

========== file03.c ==========
// Copyright (c) 2008 The NetBSD Foundation, Inc.
// SPDX-License-Identifier: BSD-2-Clause-NetBSD
...

========== file04.c ==========
// Copyright (c) 1992-2012 The FreeBSD Project
// SPDX-License-Identifier: BSD-2-Clause-FreeBSD
...

========== file05.c ==========
// Copyright (c) 2008 The NetBSD Foundation, Inc.
// Copyright (c) 1992-2012 The FreeBSD Project
// Copyright (c) 2017 Jon Severinsson
// SPDX-License-Identifier: BSD-2-Clause-NetBSD AND BSD-2-Clause-FreeBSD
...

==== LICENSE.BSD-2-Clause ====
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

= LICENSE.BSD-2-Clause-NetBSD =
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

= LICENSE.BSD-2-Clause-FreeBSD =
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE FREEBSD PROJECT ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FREEBSD PROJECT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

The views and conclusions contained in the software and documentation are those of the authors and should not be interpreted as representing official policies, either expressed or implied, of the FreeBSD Project.

SPDX identifiers in the kernel

Posted Nov 20, 2017 15:27 UTC (Mon) by compenguy (guest, #25359) [Link]

I'm just going to stop here, because we clearly got different guidance from our corporate copyright lawyers.

I will say that the policy we enacted based on that guidance is very similar to the policies of all the other corporations I've worked with, as well as a wide range of products I've used. I will also direct you to SPDX's own documentation for producing SDPX distribution notices: https://spdx.org/sites/cpstandard/files/pages/files/spdx-...

SPDX identifiers in the kernel

Posted Nov 30, 2017 14:34 UTC (Thu) by pombredanne (guest, #113982) [Link]

Did you read the docs? https://lwn.net/Articles/738809/ ?

SPDX identifiers in the kernel

Posted Dec 8, 2017 10:22 UTC (Fri) by hook (guest, #62261) [Link]

Of course adding the SPDX short identifiers into the code is not a cure-all, but it is a massive step forward for clarity. Especially in a code-base as huge as the Linux kernel.

As for general best practices on copyright and license info, have you checked <https://reuse.software/> yet?

Don't stop with the kernel, extend it to all software projects

Posted Nov 17, 2017 7:18 UTC (Fri) by rakoenig (subscriber, #29855) [Link] (1 responses)

Suffering from a coirporate policy that requires a full assessment process for software that contains Open Source components I'm glad to see those SPDX identifiers show up in the kernel. Next step would be to include them as well in all userland software. But yet they are very rare, just download some sources of popular programs and go hunting for SPDX and you won't find much.

If there would be a valid SPDX tag everywhere it would make automation of our ugly process easier. Currently we sometimes invest more effort in the bureaucratic and legally required stuff than in actually developing software.

Don't stop with the kernel, extend it to all software projects

Posted Nov 30, 2017 14:36 UTC (Thu) by pombredanne (guest, #113982) [Link]

I helped making this happen for the kernel and I am scratching my head on how to make this possible everywhere. I have tools, but that's only part of the story. What would be the way?

SPDX identifiers in the kernel

Posted Nov 17, 2017 12:21 UTC (Fri) by corsac (subscriber, #49696) [Link]

One could also take a look at the Debian DEP5 proposal (http://dep.debian.net/deps/dep5/) and the machine-readable debian/copyright format (https://www.debian.org/doc/packaging-manuals/copyright-fo...). There's also a differences analysis between SPDX and DEP5 (https://wiki.debian.org/Proposals/CopyrightFormat#Differe...)

SPDX identifiers in the kernel

Posted Nov 17, 2017 19:17 UTC (Fri) by bfields (subscriber, #19510) [Link] (1 responses)

I'm having trouble understanding who exactly this is useful to.

Anything in the kernel source tree should be available under GPLv2. So I guess the only reason you'd care about the license on individual files is if you want to incorporate some portion into a more liberally-licensed project? Are there that many of those projects? I must be missing something.

SPDX identifiers in the kernel

Posted Nov 17, 2017 19:55 UTC (Fri) by lkundrak (subscriber, #43452) [Link]

Unless I'm mistaken the BSDs regularly pull in the DRM subsystem and certain drivers from Linux that happen to be dual-licensed.

SPDX identifiers in the kernel

Posted Nov 21, 2017 14:56 UTC (Tue) by timur (guest, #30718) [Link] (1 responses)

My company requires a full GPLv2 (and v2 only) copyright statement on each file. I don't think that's going to change any time soon. Does this mean that, on new files, we should always also include the SPDX line?

SPDX identifiers in the kernel

Posted Nov 30, 2017 14:37 UTC (Thu) by pombredanne (guest, #113982) [Link]

Yes, this is exactly what this means.

SPDX identifiers in the kernel, systemd, casync

Posted Nov 24, 2017 15:23 UTC (Fri) by zuki (subscriber, #41808) [Link] (1 responses)

I now added SPDX tags in the same format to systemd and casync.

Seems like a good idea, I hope this catches, and e.g. Fedora's licensecheck scripts can be made more reliable.

SPDX identifiers in the kernel, systemd, casync

Posted Nov 25, 2017 18:16 UTC (Sat) by flussence (guest, #85566) [Link]

SPDX is one of the few universally good ideas I've seen for metadata. Perl 6 switched everything to it earlier this year too; turns out it's much easier to get everyone to agree on a common shorthand for licenses than it is to make everyone use, say, SemVer. In an ecosystem role like that it nudges people away from ad-hoc license proliferation, which is probably a good thing.

I'm not so much a fan of its standalone metadata file format, but I don't think I'm ever going to find a package metadata format that's good - they all gradually expand until they reimplement half of X.509.

SPDX identifiers in the kernel

Posted Nov 25, 2017 1:51 UTC (Sat) by mtaht (subscriber, #11087) [Link]

Sigh. Now I need to go update kill-a-lawyer.el.

SPDX identifiers in the kernel

Posted Nov 30, 2017 14:40 UTC (Thu) by pombredanne (guest, #113982) [Link]

FWIW, I helped a bit with this effort and with my FLOSS tool: https://github.com/nexB/scancode-toolkit which can used anywhere and is not kernel specific

SPDX identifiers in the kernel

Posted Dec 7, 2017 11:51 UTC (Thu) by cbcbcb (subscriber, #10350) [Link] (2 responses)

It seems a shame that these identifiers are written as comments. With appropriate macro definitions you could have
    #include "spdx.h"

    SPDX_LICENCE_IDENTIFIER("GPL-2.0")
And with appropriate definitions, the licence information could be propagated into object and executable files too. This would be useful, as it takes effort to establish which source files have been used in a build, and therefore which licences apply to your binaries.

SPDX identifiers in the kernel

Posted Dec 7, 2017 11:59 UTC (Thu) by gregkh (subscriber, #8) [Link] (1 responses)

It's a common idea, but it doesn't really work well for .h files.

Also, what happens when you combine multiple .c files into a single .o file?

It gets messy very quickly, we tried it. Hence the fall-back to a comment, which works out much better

SPDX identifiers in the kernel

Posted Dec 7, 2017 13:40 UTC (Thu) by nix (subscriber, #2304) [Link]

It works even less well for languages other than C and languages that have nothing like the preprocessor -- but almost any language you can name has comments.

SPDX identifiers in the kernel

Posted Dec 10, 2017 21:35 UTC (Sun) by zack (subscriber, #7062) [Link] (1 responses)

So where is the actual meaning of "Linux-syscall-note" defined?
It doesn't seem to be one of the officially supported exception of SPDX https://spdx.org/licenses/exceptions-index.html

SPDX identifiers in the kernel

Posted Dec 10, 2017 21:36 UTC (Sun) by zack (subscriber, #7062) [Link]

To be clear, I'm aware of the note about syscall use in the kernel. What I'm wondering is where the SPDX notation makes the link between that exception identifier and its expansion.


Copyright © 2017, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds