By Jonathan Corbet
January 20, 2010
Kate Stewart is the manager of the PowerPC team at Freescale. As such, she
has a basic customer service problem to solve: people who buy a board from
Freescale would like to have some sort of operating system to run on it.
That system, of course, will be Linux; satisfying this requirement means
that Freescale must operate as a sort of Linux distributor. At her
linux.conf.au talk, Kate talked about a new initiative aimed at helping
distributors to ensure that they are compliant with the licenses of the
software they are shipping.
Early GPL enforcement actions against companies like Cisco were, arguably,
misplaced: Cisco was just gluing its nameplate onto hardware (and
software) supplied to it by far-eastern manufacturing operations. The
original GPL violation was committed by the
original manufacturers who incorporated GPL-licensed software and failed to
live up to the source distribution requirements. There
was a clear purpose behind targeting companies like Cisco, though: the
unpleasantness of dealing with GPL compliance problems was meant to get
them to require compliance from their suppliers, which were otherwise
harder to reach. Companies seem to have gotten the message; Kate noted
that the supply chain is now routinely requiring certification of license
compliance from suppliers. So Freescale needs to stay on top of license
compliance in order to be able to sell its products; your editor suspects
this may be a more powerful motivation than the mere need to avoid
copyright infringement.
One common worry related to license compliance, of course, is that somebody
might have
somehow included proprietary code into a freely-licensed package. More
common, though, are simple license compatibility issues, such as the
inclusion of a GPL-licensed file in an ostensibly BSD-licensed package.
Finding this kind of problem requires the examination of every file
distributed with a package - and there are a lot of packages with a great
many files out there. It's a lot of work.
Freescale is certainly not the only Linux distributor, and it is not the
only one facing this problem; anybody
who is distributing software (free or otherwise) is (or at least should be)
going through a
similar process. That leads to a lot of duplicated work which really could
be shared. At the first LinuxCon event in September 2009, a number of
interested parties got together to try to figure out if there was a way
that the license validation and compliance work could be carried out in a
more community-oriented manner.
The problem may seem simple, but there are a lot of details to deal with,
starting with the large number of ways of analyzing projects. At one end,
commercial
tools provided by companies like Black Duck and Palamida can automate the
task of finding a number of common licensing problems. But there are also
many homegrown tools and spreadsheets in use throughout the industry. The
end result is predictable: lots of incompatible data, inconsistent work,
and duplicated effort.
Given that, it's not surprising that this new (and, apparently, still
unnamed) project is starting with an attempt to standardize the encoding of
information about packages. This information comes at a number of levels:
- The identification of the project as a whole, including metadata on
the results of any analysis which has been done. Included here is a
formal name for the package, its published location, the stated
license (and any possible alternative licenses), how the package is
used (is it a standalone program or a library?), the copyright holders
and dates of copyright, etc.
- Package-specific facts: the version that was analyzed, hashes for each
of the included files, how the information about the package was
generated, and so on. There will also be the equivalent of a "signed
off by" tag whereby people doing analysis on a package would certify
their results.
- File-specific information for every file found in the package: its
full path name, the type of the file, the license governing it,
copyright information, and so on.
Once the process of standardizing the encoding of this information has been
completed, the project can move on to the second phase, which is the
creation of a common site to host information stored in that format. The
idea here is to make it easy to look up and share information on specific
packages, and to make any known problems publicly visible.
All of that, in turn, has a goal beyond the simple sharing of work: they
would also like to improve the quality of the next generation of packages.
By making public review of licensing information easier, it is hoped that
problems will be found (and fixed) sooner. One gets the sense that
companies like Freescale are getting tired of finding licensing issues in
packages which are scheduled to ship in a few days. A related goal is to
make package maintainers more aware of where their code is coming from. As
licensing issues are found in a public review process, maintainers will,
hopefully, begin to pay more attention and these issues will become less
common.
The project is still in an early stage; there is a mailing list set up on
the FOSSBazaar site, but not a whole
lot else. The dreaded regular
conference call will be established in the near future. The group hopes to
create a proposed standard within the next few months; the Linux Foundation will be
helping with legal review to ensure that all of the appropriate bases are
covered. The current plan is to get the first version of the standard
published in August, 2010.
During the question period, Andrew Bartlett expressed his dislike for the
central database concept. Centrally-maintained information, he says, will
soon go stale. It would be better to create a format for a license
metadata file which could be maintained and shipped with the project
itself; he said he would be glad to carry such information with the Samba
distribution. That is an idea which will likely be carried back to the
working group for consideration.
Licensing is an important component of the free software development
process, and ensuring that our licenses are complied with is incumbent upon
anybody engaged in software distribution. But all of the associated due
diligence work really only has to be done once; like the development of the
software itself, it can be managed in a community-oriented manner. The
formalization and organization of the associated information is a logical
first step toward bringing a community process to this important - if not
necessarily fun - task.
(
Log in to post comments)