LWN.net Logo

Advertisement

GStreamer, Embedded Linux, Android, VoD, Smooth Streaming, DRM, RTSP, HEVC, PulseAudio, OpenGL. Register now to attend.

Advertise here

LCA: Cooperative management of package copyright and licensing data

By Jonathan Corbet
January 20, 2010
Kate Stewart is the manager of the PowerPC team at Freescale. As such, she has a basic customer service problem to solve: people who buy a board from Freescale would like to have some sort of operating system to run on it. That system, of course, will be Linux; satisfying this requirement means that Freescale must operate as a sort of Linux distributor. At her linux.conf.au talk, Kate talked about a new initiative aimed at helping distributors to ensure that they are compliant with the licenses of the software they are shipping.

Early GPL enforcement actions against companies like Cisco were, arguably, misplaced: Cisco was just gluing its nameplate onto hardware (and [Kate Stewart] software) supplied to it by far-eastern manufacturing operations. The original GPL violation was committed by the original manufacturers who incorporated GPL-licensed software and failed to live up to the source distribution requirements. There was a clear purpose behind targeting companies like Cisco, though: the unpleasantness of dealing with GPL compliance problems was meant to get them to require compliance from their suppliers, which were otherwise harder to reach. Companies seem to have gotten the message; Kate noted that the supply chain is now routinely requiring certification of license compliance from suppliers. So Freescale needs to stay on top of license compliance in order to be able to sell its products; your editor suspects this may be a more powerful motivation than the mere need to avoid copyright infringement.

One common worry related to license compliance, of course, is that somebody might have somehow included proprietary code into a freely-licensed package. More common, though, are simple license compatibility issues, such as the inclusion of a GPL-licensed file in an ostensibly BSD-licensed package. Finding this kind of problem requires the examination of every file distributed with a package - and there are a lot of packages with a great many files out there. It's a lot of work.

Freescale is certainly not the only Linux distributor, and it is not the only one facing this problem; anybody who is distributing software (free or otherwise) is (or at least should be) going through a similar process. That leads to a lot of duplicated work which really could be shared. At the first LinuxCon event in September 2009, a number of interested parties got together to try to figure out if there was a way that the license validation and compliance work could be carried out in a more community-oriented manner.

The problem may seem simple, but there are a lot of details to deal with, starting with the large number of ways of analyzing projects. At one end, commercial tools provided by companies like Black Duck and Palamida can automate the task of finding a number of common licensing problems. But there are also many homegrown tools and spreadsheets in use throughout the industry. The end result is predictable: lots of incompatible data, inconsistent work, and duplicated effort.

Given that, it's not surprising that this new (and, apparently, still unnamed) project is starting with an attempt to standardize the encoding of information about packages. This information comes at a number of levels:

  • The identification of the project as a whole, including metadata on the results of any analysis which has been done. Included here is a formal name for the package, its published location, the stated license (and any possible alternative licenses), how the package is used (is it a standalone program or a library?), the copyright holders and dates of copyright, etc.

  • Package-specific facts: the version that was analyzed, hashes for each of the included files, how the information about the package was generated, and so on. There will also be the equivalent of a "signed off by" tag whereby people doing analysis on a package would certify their results.

  • File-specific information for every file found in the package: its full path name, the type of the file, the license governing it, copyright information, and so on.

Once the process of standardizing the encoding of this information has been completed, the project can move on to the second phase, which is the creation of a common site to host information stored in that format. The idea here is to make it easy to look up and share information on specific packages, and to make any known problems publicly visible.

All of that, in turn, has a goal beyond the simple sharing of work: they would also like to improve the quality of the next generation of packages. By making public review of licensing information easier, it is hoped that problems will be found (and fixed) sooner. One gets the sense that companies like Freescale are getting tired of finding licensing issues in packages which are scheduled to ship in a few days. A related goal is to make package maintainers more aware of where their code is coming from. As licensing issues are found in a public review process, maintainers will, hopefully, begin to pay more attention and these issues will become less common.

The project is still in an early stage; there is a mailing list set up on the FOSSBazaar site, but not a whole lot else. The dreaded regular conference call will be established in the near future. The group hopes to create a proposed standard within the next few months; the Linux Foundation will be helping with legal review to ensure that all of the appropriate bases are covered. The current plan is to get the first version of the standard published in August, 2010.

During the question period, Andrew Bartlett expressed his dislike for the central database concept. Centrally-maintained information, he says, will soon go stale. It would be better to create a format for a license metadata file which could be maintained and shipped with the project itself; he said he would be glad to carry such information with the Samba distribution. That is an idea which will likely be carried back to the working group for consideration.

Licensing is an important component of the free software development process, and ensuring that our licenses are complied with is incumbent upon anybody engaged in software distribution. But all of the associated due diligence work really only has to be done once; like the development of the software itself, it can be managed in a community-oriented manner. The formalization and organization of the associated information is a logical first step toward bringing a community process to this important - if not necessarily fun - task.


(Log in to post comments)

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 16:16 UTC (Wed) by iabervon (subscriber, #722) [Link]

One thing to consider is that not all files in a project will be licensed under the same license, and, where the files are not derived from each other, may not be under compatible licenses (so the distributor can't say that they're using relicensing rights to standardize the license of files in the package, for the purposes of the package).

For example, it's extremely common for projects where all of the code is licensed under the GPL to contain a file (named "COPYING") under a "freely-distributable, no derivatives" license. Of course, packages will often also contain data files and documents under other licenses, as well as, in some cases, software which is separated licensed any meant to be used together across a protocol layer. (Including such things as software modem drivers that handle configuring the hardware as a phone-line audio device and directing data to a closed-source program which converts between character data and audio; while nobody's written an free replacement, it's entirely specified, independently of the driver, what such a program would do.)

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 17:10 UTC (Wed) by drag (subscriber, #31333) [Link]

This is one of the nice things about the 'viral' nature (or copyleft) of the GPL.

By having that as the 'standard' license and requiring that everything works with it then it makes it the lowest common denominator. That will be the 'least free' license you have to deal with. Meaning that as long as you comply by the requirements of the GPL then you should be safe.

Also this is why it's very important for distributions like Debian to be very anal about licensing issues.

It's when you get into licenses that allow a mixing and matching of copyright restrictions.. like the Mozilla license, is when you start getting down to the point were you are requiring end users to start to examine individual files for different restrictions on use.

The legal overhead of having to do that for something as large as a operating system can be very significant. You can see that for major commercial software products, also, which is probably a much worse problem. Look at Java, for example, how many hundreds of thousands of dollars and man hours have gone into Sun examining and auditing it's own source code in order to try to open source java?

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 18:49 UTC (Wed) by JoeBuck (subscriber, #2330) [Link]

"Early GPL enforcement actions against companies like Cisco were, arguably, misplaced: Cisco was just gluing its nameplate onto hardware (and [Kate Stewart] software) supplied to it by far-eastern manufacturing operations."

I disagree. Cisco was distributing GPL software without source, or any way to obtain source, so it was in violation of the copyright terms. Now that Cisco knows it will face legal costs for this kind of thing, it will require its contractors and offshore subdivisions to comply with the terms. This means that going directly after Cisco was the most efficient way to address the problem.

The only difference between this kind of unknowing infringement, and blatant, conscious infringement is that courts usually don't impose punitive damages on an unknowing infringer. Either way, the problem has to be corrected.

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 18:54 UTC (Wed) by corbet (editor, #1) [Link]

I agree with all of this - was trying to express that in the article and to say that it has worked. Clearly I didn't entirely succeed. I blame too much New Zealand wine at the LCA speakers' dinner.

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 19:07 UTC (Wed) by drag (subscriber, #31333) [Link]

The only difference between this kind of unknowing infringement, and blatant, conscious infringement is that courts usually don't impose punitive damages on an unknowing infringer. Either way, the problem has to be corrected.

As far as the courts are concerned, this is true. However it's always important to keep in mind that you want friends, not enemies, and friends always try to forgive and work out compromises. It's just a matter of good ethics, sanity, and practicality that you avoid punishing people for mistakes and work with them to resolve the issues. If a compromise can't be found then the court system is the last resort and certainly a justifiable path to take.

Of course if somebody is being blatant or is knowingly violating the copyrights then that is another story.

Not that I think that you are advocating being a hard-ass about it or anything, I just like to point out things like this. Just for clarification. :)

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 21:52 UTC (Wed) by emk (subscriber, #1128) [Link]

It should be noted the SFLC makes an _enormous_ effort to work things out
quietly, without publicity or lawsuits.

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 20:34 UTC (Wed) by mp (subscriber, #5615) [Link]

Somewhat relevant to the data collection requirements is this Debian Enhancement Proposal: Machine-readable debian/copyright.

LCA: Cooperative management of package copyright and licensing data

Posted Jan 20, 2010 22:44 UTC (Wed) by davide.del.vento (guest, #59196) [Link]

Somewhat relevant? This is very relevant. Thanks for posting it!

LCA: Cooperative management of package copyright and licensing data

Posted Jan 21, 2010 12:25 UTC (Thu) by tcabot (subscriber, #6656) [Link]

the second phase, which is the creation of a common site to host information stored in that format
It might be more efficient to work with the FSF to enhance the existing Free Software Directory than to start a new site.

LCA: Cooperative management of package copyright and licensing data

Posted Jan 22, 2010 21:15 UTC (Fri) by philh (subscriber, #14797) [Link]

I think it's important for distributions to do their own due diligence when doing license checking, so if there is a central resource it should be used as an additional hurdle that needs to be jumped to get into a distribution.

At present in Debian, it's the Package Maintainer's responsibility to ensure that the package's Copyright file reflects the copyrights of the files contained. Even so, when a new package is uploaded, the ftp-masters also check the copyrights.

A central resource should be something that allows alerts about broken copyrights/licenses to be shared, so that people don't have to waste effort packaging something only to later discover that the license is toxic -- instead the first victim could report it, and future prospective packagers could save themselves the effort by checking it against the central database.

In Debian's case, the posting of an ITP (Intent To Package) could provoke a database lookup that would automatically alert the maintainer that issues had previously been reported against their package.

Instead of wasting the effort and enthusiasm in useless packaging, they could then perhaps put effort into working with the upstream to fix the problem.

So, it should not be a license approvals database, but a license warnings database.

Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds