|
|
Log in / Subscribe / Register

FOSSology gains SPDX support

By Nathan Willis
July 3, 2013

A new release of the FOSSology source-code analysis tool is out. Although there have been minor updates, this is the first update in 2013 to bring additional functionality. The 2.0 release in 2012 marked a major shift for the project, debuting a new, more modular design and paving the way for faster releases. The newest update, version 2.2.0, includes a new permissions scheme and some usability improvements, but in the long run, the most notable feature in this release may be the improved compatibility with the Software Package Data Exchange (SPDX) standard for tracking software components, licenses, and copyrights.

FOSSology is designed to be a flexible platform for analyzing source code, but it is best known for its ability to scan large collections of files and pick out licenses and copyright statements. The resulting license and copyright information is then used to help an organization stay in compliance with the licensing requirements it inherits from upstream open source projects. However, there are other use cases—for instance, at LinuxCon Japan, Armijn Hemel mentioned using FOSSology to help automate the process for finding license violations in the source code of software shipped in embedded Linux devices. It is not hard to imagine the tool being adapted for other source-scanning tasks, such as to assemble a list of contributors needed to sign off on license change.

Users can upload source packages to FOSSology, then queue scanning jobs that analyze the packages for various types of information handled by scanning "agents." As new code is added, components are updated, and trees are rearranged, these scans can be run periodically, to help check for problematic license combinations or missing information. The basic agents available include a license recognizer, a copyright recognizer, a MIME-type analyzer, and a package header parser (which looks for the packaging information defined for RPM or .deb files). However, users can write their own agents to scan for arbitrary information.

All of the agents work by matching text patterns, which is a tricky business, considering all of the ways a licensing statement could be phrased, and the wide assortment of licenses that may be encountered. FOSSology defines 600 or so at the moment. Although they are sometimes less critical from a legal-compliance standpoint, recognizing copyright statements is also a pattern-matching game; FOSSology looks for text blocks that resemble copyright statements, as well as for email addresses and URLs.

Historically, FOSSology has been deployed on a web server backed by a PostgreSQL database, with multiple users uploading source code bundles and performing scans through the web UI. In October 2012, version 2.1.0 added a pair of command-line utilities, fo_nomos_license_list and fo_copyright_list, with which users could query the FOSSology database for license or copyright information. The command-line utilities free up users from the web UI, plus they make the FOSSology repository more accessible to scripting, and they are reported to run faster. Execution speed can be a major issue with large repositories, where a scan run in the web UI could time out if it took too long. But in the 2.1.0 release the tools were pretty limited in scope, since both required scanning an entire upload (that is, one package or source archive). The 2.2.0 release updates the utilities to accept a sub-tree as the starting node from which to perform a scan.

Version 2.2.0 also introduces a new permissions scheme that allows administrators to limit access to specific files on a per-file and per-user basis. The system implements its own set of internal user groups (i.e., separate from the Unix groups that may be associated with accounts); each user in a group can be granted read permission, write permission, and user/group-administration permission. The ability to upload source packages to the application is governed by a separate permission table, perm_upload, which grants upload permission for each folder to specific groups; each user gets his or her own group by default, which enables per-user upload restrictions. It is a fairly straightforward system, but it replaces the permission system used in previous releases (which bound permissions to each individual application plugin), so administrators may have to do some work migrating existing installations.

Licenses galore

There are, naturally, the usual collection of bugfixes and stability improvements in this release, plus the noteworthy addition of the ability to pull up the full text of a software license from within FOSSology itself (useful for those rare users who do not have the differences between GFDL v1.1 and GFDL v1.2 memorized, no doubt).

But the bigger news item on the license-presentation front is the fact that FOSSology has migrated its list of license names to be compatible with the canonical list supported by SPDX. The SPDX project is a relatively new effort (dating back to 2010); it defines a metadata format for describing the "bill of materials" of a software package, including everything from its creator and definitive name to its URL of origin and file checksums. In the list of mandatory items, as one might guess, is the "concluded license" that governs the package as a whole. SPDX is meant to be both human-readable and machine-parsable (RDF is the preferred file format), so the specification includes a list of open source licenses.

SPDX is also in use by a few other source analysis tools, such as the Ninka scanner and the commercial tools used by Black Duck Software. The specification is written by a Linux Foundation workgroup, which is currently drafting a new revision.

What SPDX support brings with it is the ability to use FOSSology data in conjunction with other tools based on sharing a common file format. The license-compliance problem is no longer one that organizations can ignore. Last week, Harald Welte won a GPL infringement case in Germany in which the court held that the violator had to ascertain on its own that it was in compliance with the licensing requirements it inherited from upstream suppliers. In other words, even if a device maker contracts out the software to a third party, it is still required to verify that the source code it offers in compliance with the GPL actually corresponds to the software on the device. For a device maker that does not do development itself, that could be a tricky undertaking. But with independent tools able to report licensing information in a compatible format, the problem becomes easier (although still not trivial) to solve.

For its part, FOSSology has adopted SPDX's names for the licenses already on its list of recognized licenses, and the 2.2.0 release notes comment that the application also added support for a few SPDX licenses not previously recognized by its license agent. FOSSology is most certainly a specialist's tool at this stage, but the refactoring that went into the 2.x series may make it useful for a wider variety of applications, if developers write scanning modules of their own to look for interesting nuggets buried in the source code. There was a one-year wait between version 1.4 and 2.0, but in the year since, the project has picked up the pace and delivered two stable releases with functional additions. Hopefully, that signals a platform that more developers will wish to contribute to. After all, the free software community is (justifiably) nitpicky where licenses and copyrights are concerned, but there are far more potentially useful bits of information to glean from a corpus of source code, given the proper tool to find them.


to post comments

FOSSology gains SPDX support

Posted Jul 5, 2013 20:27 UTC (Fri) by DavidS (guest, #84675) [Link]

I'm wondering what it would take to run this on something like the new http://code.debian.net ...


Copyright © 2013, Eklektix, Inc.
This article may be redistributed under the terms of the Creative Commons CC BY-SA 4.0 license
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds