What's new in SPDX 2.0
Version 2.0 of the Software Package Data Exchange (SPDX) specification was unveiled on May 13. SPDX is designed to facilitate license-compliance efforts for large projects (or projects that simply include a large number of upstream components), as well as those undertaken by component vendors and product manufacturers. The new revision adds some important flexibility to the format, enabling cross-references between packages and support for packages that are delivered via version-control systems.
The SPDX format is the product of the SPDX workgroup at the Linux Foundation (LF). The version 2.0 page includes the formal specification [PDF] as well as some background material created by the workgroup, such as the requirements document [PDF] that set out goals for the new revision.
Chief among those goals was adapting SPDX to work well as a metadata format that propagates easily through a "supply chain" from one vendor to another. In SPDX 1.x, a single file was used to capture the licensing information for every package. As such packages get incorporated into a larger combined work, however, the information from those files had to be copied into a single replacement file describing the derivative work, because the SPDX 1.x format could not reference external files. The most significant changes in version 2.0 are those that overcome this limitation.
SPDX uses RDF/XML to record several types of metadata about a software package. The focus is generally placed on license information, given the importance of license compliance in open-source software, but the SPDX format incorporates several other sections containing other types of metadata fields, such as a general-purpose package-information section (which includes the version number, the original source of the package, the provider of this particular copy of the program, and so on).
If it is not clear how this information would be of use to a development team, a presentation [PPT] from the 2015 Collaboration Summit includes a real world example for ActiveMQ. The official ActiveMQ packages released by the Apache Software Foundation bundle in Jetty (which itself is copyrighted by the Eclipse Foundation), while Jetty bundles in javax.servlet from Glassfish (which is copyrighted by Oracle). So the SPDX document for ActiveMQ denotes which of the many files in the package are under which license (in this case, Apache 2.0 for ActiveMQ, either Apache 2.0 or Eclipse 1.0 for Jetty, and either CDDL or GPL for javax.servlet) as well as the respective copyright holders. It also concludes that the combined ActiveMQ release is under the Apache 2.0 license. However, noting those per-component licenses is still important because some downstream developers might be interested in using only part of the whole.
Packages, references, and relationships
The biggest change in SPDX 2.0 is that all of this information no longer has to be fused together into a single, massive document. Assuming that the individual components (i.e., javax.servlet and Jetty) ship with their own SPDX 2.0 documents, the SPDX 2.0 document for ActiveMQ can reference the licensing, copyright, and other metadata information from those other SPDX files. This is done with two additions to the format: a globally unique SPDX identifier for each document and an internal identifier for each XML element within the document. SPDX documents can thus reference individual elements inside of other SPDX documents unambiguously. In supply-chain terms, this means that the ActiveMQ project could preserve the SPDX documents that come with Jetty and javax.servlet, then create a shorter SPDX document for its combined release by simply including references to those existing files.
Perhaps a more subtle result of this change is that it is no longer necessary to create a separate SPDX file for each individual software package that a company or vendor releases. Thanks to the ability to unambiguously reference individual elements within a SPDX document, each package can contain more than one top-level "package" element. Of course, since each SPDX 2.0 document can contain multiple packages or can reference packages in other documents, parsing a SPDX file is not as simple as it was in the 1.x days. But such is the price of flexibility.
The format also now allows users to explicitly designate the relationships between various files, packages, and other elements. The relationships supported include simple dependencies, plus "generates" relationships (to, for example, designate that a particular binary file is generated from a specific source file), designations that a file is a test case or data file, notes that a file has been removed or altered from its upstream version, and so on.
It is also worth noting that some of these relationships capture what one might call time-sequential information—like the addition or removal of a file. This is a departure from the SPDX 1.x era, where only the current hierarchical state of the package was represented. The ability to record such time-based information is an aspect of the supply-chain model that the SPDX workgroup hopes to support with 2.0.
The files referenced in a SPDX document can also now include a variety of data formats, such as audio, video, images, documentation files, and generic plain text. Documentation is, of course, often text itself, but the semantic meaning may be valuable to downstream users of a package. A number of new hash algorithms are also supported, so that checksums for different file types can be recorded in the SPDX document, too. The upshot of all of these changes is that SPDX 2.0 can more completely capture the semantic meaning of the files and packages provided by a particular project.
Licenses
SPDX documents contain a section named "Other Licensing Information Detected" for referencing licenses that are not on the SPDX official list. As the Collaboration Summit presentation expresses it, the SPDX license list aims to cover 90% of the world's FOSS code, which it says can be done with subset of the total license ecosystem—about 20 out of the 2000-plus licenses used in the wild. Nevertheless, the format somehow needs to account for that other 10%.
Off-list licenses are described with a set of XML attributes and RDF tag:value pairings. They include the license name, URL, and a text snippet (potentially the entire license). Despite this flexibility, in previous SPDX revisions there was no clearly defined way to express certain complex licensing situations. The most important is when a package is released under a choice of licenses (i.e., dual-licensed or tri-licensed). The 2.0 specification attempts to standardize how such licensing information is recorded by defining an augmented Backus-Naur Form (ABNF) syntax for expressing compound licensing relationships.
Like other elements in a SPDX 2.0 document, the off-list licenses described can be referenced elsewhere in the file by their internal ID, and cross-referenced by other SPDX documents.
And more
Among the other noteworthy changes to SPDX 2.0 is an extension of the Package Download Location field within the package-information section. Starting with this version, the field can contain a reference to a version-control system (VCS), whereas earlier versions expected an HTTP URL. The VCS references supported include Git, Mercurial, Subversion, and Bazaar, using HTTPS or SSH transport URLs. The syntax is taken from that used by the Python Package Index, and includes support for denoting specific branch, sub-path, commit hash, and tag names.
Finally, one feature from SPDX 1.x has been deprecated in SPDX 2.0. A new "Annotation" section replaces the "Review Information" section of earlier versions. This now-deprecated section was used to record human review information: the reviewer, review date, and an optional comment. All of that has been replaced by a more general-purpose annotation system. The format is essentially the same; an annotation has a "annotator" field, plus a date and room for comments. The new wrinkle is that Annotations can reference any SPDX element by its identifier. Thus, they can note per-element changes, rather than being attached only to the whole package (as was the case in SPDX 1.x).
What's next
It will be interesting to see how SPDX 2.0 catches on with users, specifically within corporate software environments (who tend to want far more detailed provenance about the components they use than do volunteer projects). There is a lot of additional flexibility in SPDX 2.0, but that flexibility does mean that SPDX users will have to restructure the metadata files for their packages. For large and complex software projects, the complexity of the new format compounds the amount of work required.
It is also interesting to note what did not make it into SPDX 2.0.
In particular, the requirements document noted earlier mentions a
desire to increase the granularity with which metadata can be
recorded. Per-file licensing is too coarse for some projects, so
there was interest in accommodating more finer-grained components
(e.g., functions or classes). This does not appear to have made it
into the 2.0 release. Nor did one other proposed requirement: a
mechanism to verify the creator and reviewer information listed in the SPDX
metadata for a package. Presumably, such a mechanism would resemble
the one already used to provide a means for verifying file
integrity—by, say, including GPG signatures—but the
feature was tabled.
The SPDX workgroup indicates that the
standard will not be sitting still, however, so perhaps these and
other features will soon be implemented in a new update.
Posted May 21, 2015 11:45 UTC (Thu)
by mstone_ (subscriber, #66309)
[Link] (9 responses)
Posted May 21, 2015 16:48 UTC (Thu)
by xnox (guest, #63320)
[Link] (1 responses)
Not sure if SPDX would help debian wide. We do have https://www.debian.org/doc/packaging-manuals/copyright-fo...
Posted May 25, 2015 14:21 UTC (Mon)
by dr@jones.dk (subscriber, #7907)
[Link]
Posted May 21, 2015 17:36 UTC (Thu)
by ewan (guest, #5533)
[Link] (2 responses)
Posted May 25, 2015 17:44 UTC (Mon)
by nix (subscriber, #2304)
[Link]
Posted May 27, 2015 0:55 UTC (Wed)
by katestewart (guest, #98525)
[Link]
Posted May 22, 2015 23:05 UTC (Fri)
by louie (guest, #3285)
[Link]
Posted May 26, 2015 10:28 UTC (Tue)
by etienne (guest, #25256)
[Link] (1 responses)
It seems like a lot of corporations did not change anything in their legal department related to "free" software, compared to the time where they would buy subsystems to other companies (and then have a full personalised contract).
Posted May 31, 2015 12:33 UTC (Sun)
by kleptog (subscriber, #1183)
[Link]
You may be right that the companies don't care about fixing any problems, but the legal departments sure as hell would like all the risks quantified.
Posted May 30, 2015 19:34 UTC (Sat)
by paulj (subscriber, #341)
[Link]
On the other side, there are also some in free software communities who are very cavalier about complying with the free software licences of other copyright holders of code they are using.
Posted May 27, 2015 16:43 UTC (Wed)
by hugoroy (guest, #60577)
[Link] (2 responses)
I would be interested in reading this :-)
Thanks
Posted May 28, 2015 0:02 UTC (Thu)
by katestewart (guest, #98525)
[Link] (1 responses)
What's new in SPDX 2.0
What's new in SPDX 2.0
What's new in SPDX 2.0
What's new in SPDX 2.0
Not quite. In the future embedded developers will have nicely structured XML licencing information to completely ignore and get horribly out of date, rather than the pain of having to ignore plain text files.
What's new in SPDX 2.0
What's new in SPDX 2.0
What's new in SPDX 2.0
What's new in SPDX 2.0
So basically, they wait for someone to complain, send the complain to their legal department which would then wake up (i.e. start spending money) and see if the complain is a treat to the company, i.e. the other side could go to court and win a substancial amount of money, which would ultimately reduce the benefit of the corporation.
At best, corporations ask for software engineers to tell if they can use some free software themself, and most of those software engineers did not even read attentively the GPL... and anyway the product has to be on the shop shelves yesterday.
Maybe someone should go to court and demand stop selling some mobile phone, and demand the recall of all phones sold during the last 3 or 4 years because such company has lost the right to use the GPLv2 at that date for this unreplaceable package... the request will probably go directly to the bin because that corporation participate to "massive data collection" to protect against terrorists, and so should be treated as an army supplier which never bother about licenses and pattents.
What's new in SPDX 2.0
What's new in SPDX 2.0
"Dead" link
"Dead" link