|
|
Log in / Subscribe / Register

LCA: Cooperative management of package copyright and licensing data

By Jonathan Corbet
January 20, 2010
Kate Stewart is the manager of the PowerPC team at Freescale. As such, she has a basic customer service problem to solve: people who buy a board from Freescale would like to have some sort of operating system to run on it. That system, of course, will be Linux; satisfying this requirement means that Freescale must operate as a sort of Linux distributor. At her linux.conf.au talk, Kate talked about a new initiative aimed at helping distributors to ensure that they are compliant with the licenses of the software they are shipping.

Early GPL enforcement actions against companies like Cisco were, arguably, misplaced: Cisco was just gluing its nameplate onto hardware (and [Kate Stewart] software) supplied to it by far-eastern manufacturing operations. The original GPL violation was committed by the original manufacturers who incorporated GPL-licensed software and failed to live up to the source distribution requirements. There was a clear purpose behind targeting companies like Cisco, though: the unpleasantness of dealing with GPL compliance problems was meant to get them to require compliance from their suppliers, which were otherwise harder to reach. Companies seem to have gotten the message; Kate noted that the supply chain is now routinely requiring certification of license compliance from suppliers. So Freescale needs to stay on top of license compliance in order to be able to sell its products; your editor suspects this may be a more powerful motivation than the mere need to avoid copyright infringement.

One common worry related to license compliance, of course, is that somebody might have somehow included proprietary code into a freely-licensed package. More common, though, are simple license compatibility issues, such as the inclusion of a GPL-licensed file in an ostensibly BSD-licensed package. Finding this kind of problem requires the examination of every file distributed with a package - and there are a lot of packages with a great many files out there. It's a lot of work.

Freescale is certainly not the only Linux distributor, and it is not the only one facing this problem; anybody who is distributing software (free or otherwise) is (or at least should be) going through a similar process. That leads to a lot of duplicated work which really could be shared. At the first LinuxCon event in September 2009, a number of interested parties got together to try to figure out if there was a way that the license validation and compliance work could be carried out in a more community-oriented manner.

The problem may seem simple, but there are a lot of details to deal with, starting with the large number of ways of analyzing projects. At one end, commercial tools provided by companies like Black Duck and Palamida can automate the task of finding a number of common licensing problems. But there are also many homegrown tools and spreadsheets in use throughout the industry. The end result is predictable: lots of incompatible data, inconsistent work, and duplicated effort.

Given that, it's not surprising that this new (and, apparently, still unnamed) project is starting with an attempt to standardize the encoding of information about packages. This information comes at a number of levels:

  • The identification of the project as a whole, including metadata on the results of any analysis which has been done. Included here is a formal name for the package, its published location, the stated license (and any possible alternative licenses), how the package is used (is it a standalone program or a library?), the copyright holders and dates of copyright, etc.

  • Package-specific facts: the version that was analyzed, hashes for each of the included files, how the information about the package was generated, and so on. There will also be the equivalent of a "signed off by" tag whereby people doing analysis on a package would certify their results.

  • File-specific information for every file found in the package: its full path name, the type of the file, the license governing it, copyright information, and so on.

Once the process of standardizing the encoding of this information has been completed, the project can move on to the second phase, which is the creation of a common site to host information stored in that format. The idea here is to make it easy to look up and share information on specific packages, and to make any known problems publicly visible.

All of that, in turn, has a goal beyond the simple sharing of work: they would also like to improve the quality of the next generation of packages. By making public review of licensing information easier, it is hoped that problems will be found (and fixed) sooner. One gets the sense that companies like Freescale are getting tired of finding licensing issues in packages which are scheduled to ship in a few days. A related goal is to make package maintainers more aware of where their code is coming from. As licensing issues are found in a public review process, maintainers will, hopefully, begin to pay more attention and these issues will become less common.

The project is still in an early stage; there is a mailing list set up on the FOSSBazaar site, but not a whole lot else. The dreaded regular conference call will be established in the near future. The group hopes to create a proposed standard within the next few months; the Linux Foundation will be helping with legal review to ensure that all of the appropriate bases are covered. The current plan is to get the first version of the standard published in August, 2010.

During the question period, Andrew Bartlett expressed his dislike for the central database concept. Centrally-maintained information, he says, will soon go stale. It would be better to create a format for a license metadata file which could be maintained and shipped with the project itself; he said he would be glad to carry such information with the Samba distribution. That is an idea which will likely be carried back to the working group for consideration.

Licensing is an important component of the free software development process, and ensuring that our licenses are complied with is incumbent upon anybody engaged in software distribution. But all of the associated due diligence work really only has to be done once; like the development of the software itself, it can be managed in a community-oriented manner. The formalization and organization of the associated information is a logical first step toward bringing a community process to this important - if not necessarily fun - task.

Index entries for this article
Conferencelinux.conf.au/2010


The LWN site is currently under high scraper load, so comment display has been suppressed for anonymous users. If you are a human, you may read the comments by clicking the button below:

Note: you can avoid this step in the future by logging into your LWN account.


Copyright © 2010, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds