User: Password:
|
|
Subscribe / Log in / New account

Leading items

A binary analysis tool for GPL compliance investigations

By Jake Edge
May 5, 2010

There are thousands of embedded devices running Linux today, with more released hourly it seems. Many of those are in full compliance with the licenses for the free software that they ship, but some, sadly, are not. In most cases, it is probably due to ignorance, but sometimes arrogance or even malfeasance play a role. A new Apache-licensed Binary Analysis Tool from Armijn Hemel and Shane Coughlan is meant to help developers and others interested in GPL compliance in determining whether Linux or BusyBox are present in a particular device.

There are multiple levels to GPL compliance investigations. If the device is not shipped with source, nor an offer to provide it, one can assume that it contains no GPL code. In that case, just detecting the presence of the Linux kernel or BusyBox is enough to identify a problem. For devices that do ship or offer source, there is another step: determining whether the source code and configuration that was provided corresponds to the code on the device. That process was described by Hemel and Coughlan in a series of LWN articles (part 1, part 2, and part 3).

The first step is to extract any filesystems that exist in a firmware image, so that they can be investigated further. The Binary Analysis Tool provides the bruteforce.py script to detect various kinds of filesystems, including those that are compressed, and to extract them from the image. It then digs down inside the filesystem to find "interesting" files. Right now, the output is terse, but that is slated to change "in the near future", according the README file.

Beyond that, there are scripts to look at BusyBox and kernel binaries to extract configuration information. Running:

    python busybox.py --binary=/path/to/busybox
on a BusyBox binary results in a list of configuration options that shows which of the applets were built into the binary:
    CONFIG_ADDGROUP=y
    CONFIG_ADDUSER=y
    CONFIG_ADJTIMEX=y
    ...
BusyBox configuration is important because it can be a clue as to whether or not the source corresponds to the binary. In fact the tool provides an automated way to compare the configuration found in a binary with one that is included in the source: busybox-compare-configs.py.

The tool uses a database of sorts for BusyBox configurations going back to the 0.52 release. The busybox-version.py command can be used to manually determine the version of a binary, or the other tools will do so automatically—though it can be overridden on the command line. In addition, the busybox.py script can check for applets in a binary for which there is no configuration option in the official BusyBox sources, which would indicate that additional code (for which source must be released) has been added.

There are also scripts to extract configuration and strings from a Linux kernel. extractkernelstrings.py is used on a provided kernel source tree and generates a database of strings that should be present in the kernel image. findkernelstrings.py then uses that database and the kernel image file to find matches, and, more importantly, things that do not match. Once again, this can lead to a determination that the source code and shipped binaries are either not the same, or not configured in the same way.

Due to various reverse engineering laws worldwide, the Binary Analysis Tool does not do any kind of decompilation or disassembly of the code that it finds. It strictly looks at the symbol tables and strings stored in the binaries to do its work. For much the same reason, it does not try to "crack" any encryption or DRM that might be protecting the firmware image or its contents.

The tool is still a bit rough around the edges, but does come with fairly extensive documentation, both as PDF Quick Start and User guides and various documentation files in the source tree. It comes as a tarball or can be grabbed from an svn repository. The list of dependencies seems a bit large for a program of this type. For the kernel strings database, it includes the PyLucene Python library for accessing the Java-based Lucene text searching and indexing, which necessitates installing OpenJDK and Ant. More obvious dependencies for things like python-magic for magic numbers, e2tools and squashfs tools for accessing filesystems, and various compression utilities are required as well.

The development of the Binary Analysis Tool was supported by the NLnet Foundation and the Linux Foundation, and it was created by Hemel as part of his work at Loohuis Consulting and by Coughlan at OpenDawn. It is still being actively developed with releases scheduled for May and July. Contributions of bug reports, development time, or money to continue development are welcome.

While the scripts will be useful as a starting point for those who are investigating GPL compliance, there is still quite a bit of work to be done. The tool provides a framework for looking at two of the most common GPL-licensed components appearing in embedded devices, but there are others. It's no coincidence that that the tool focuses on BusyBox and the Linux kernel, which have been the most successful at enforcing license compliance in the last several years. As other projects are used more widely in embedded devices, there will be a need to expand the coverage of tools like this.

There are uses for the tool beyond those of developers trying to ensure that their code is used properly. Embedded device manufacturers will also find it useful. There have been numerous cases of OEMs getting code from their suppliers without the proper source files—or even notice that it contains GPL code. Companies can also test their competitor's products for compliance to help level the playing field. Any tool that makes it easier to spot license compliance problems is a boon for developers, users, and device makers.

Comments (8 posted)

Koha community squares off against commercial fork

May 5, 2010

This article was contributed by Nathan Willis

Koha is the world's first open source system for managing libraries (the books and periodical variety, that is), and one of the most successful. In the ten years since its first release, Koha has expanded from serving as the integrated library system (ILS) at a single public library in New Zealand to more than 1000 academic, public, and private libraries across the globe. But the past twelve months have been divisive for the Koha community, due to a familiar source of argument in open source: tensions between community developers, end users, and for-profit businesses seeking to monetize the code base. As usual, copyrights and trademarks are the legal sticks, but the real issue is sharing code contributions.

Koha was originally written in 1999 by New Zealand's Katipo Communications, spearheaded by developer Chris Cormack. Katipo was contracted to build an ILS for the Horowhenua Library Trust (HLT) to replace its aging (and Y2K-bug-vulnerable) system, and to release the code under an open source license. The name Koha is a Māori word for a reciprocal gift-giving custom.

The first public release was made in 2000. Over the years, Koha usage grew, and several businesses popped up to provide support and customization services for Koha-using libraries; as with many infrastructure applications, the ongoing support of an ILS is the real expense. An ILS not only serves as an electronic "card catalog" system for library patrons, but handles acquisitions, circulation tracking, patron account management, checkout, search, and integration with other cataloging systems for inter-library loan. Libraries do not change ILS vendors quickly or lightly.

One of these support businesses was US-based LibLime, founded in 2005 by Koha developer Joshua Ferraro. In 2007, LibLime purchased Katipo Communications' assets in Koha, including its copyright on the Koha source code, and took over maintenance of the koha.org web site. For several years, life continued on as it had before; koha.org was the home of the project, and LibLime participated in Koha's ongoing development as did several other support-based businesses, many individuals, and many libraries.

The fork

The first signs of trouble began to appear in mid-2009, when LibLime announced that it would be providing its customers with a version of Koha built from a private Git repository, instead of the public source code maintained by the community as a whole. Many in the community regarded this as an announcement that LibLime was forking the project, a claim that Ferraro denied. The company cited several factors as its reasons for maintaining a separate code base, including the need to deliver on Koha contract work on its own deadlines, lack of quality control in community code contributions, and customer data it could not make public.

Ferraro stated that LibLime would publish its enhancements to Koha, that it was "100% committed to the open-source movement", and that its integration with the main code repository would be "seamless." However, no such publication took place; as of today, the most recent source code for LibLime's products that is available on the web site are from June of 2009, and the LibLime source code repository remains inaccessible to the public.

LibLime's enhanced version of Koha is named LibLime Enterprise Koha (LLEK), runs on Amazon's EC2 cloud platform, and sports a list of features not present in the 3.0.2 "community" release. Meanwhile, the community has continued to develop Koha, making point releases to the 3.0.x branch, and is readying a major update in version 3.2.

Enough people in the Koha community were concerned about the project's future and about practical matters like the web site and Git repository that they decided to migrate to a new domain, koha-community.org, to be managed by a committee and legally held by Koha's original sponsors, HLT. Those migrating included Cormack, many other core developers, and several of the other Koha support vendors.

2010 started off with a ray of hope for commercial and community reconciliation, when Progressive Technology Federal Systems, Inc. (PTFS), another Koha support vendor, announced in January that it was acquiring LibLime. PTFS was a relatively recent convert to the Koha community; it started out as a proprietary-only ILS vendor catering to government and military institutions. But it selected Koha as its open source product of choice in 2008, in part for its ability to integrate with PTFS's profitable digital content management products. PTFS engineers had been active on the mailing list and IRC channel, and submitted patches back to the community, so the community was optimistic that they would continue to participate, and the LLEK fork would be merged back into the main branch.

In April, PTFS asked the community — developers, documentation and translation teams, release managers — to return to the koha.org domain, and set up a new repository with the intent of merging the code. As community members explained in the thread, they did not like those terms and instead asked PTFS to either turn the koha.org domain over to the community or to bring its code and participants to the koha-community.org site.

Unfortunately, what could have been a simple disagreement over hosting and domain name relevance deteriorated further. PTFS asked HLT's Koha committee for a conference call under a non-disclosure agreement, but the committee asked for a public email or IRC discussion instead. PTFS then responded with a press release (copied to the Koha mailing list) publicly criticizing the committee, calling it "new to business matters," "one-sided," and "inaccurate," and touting its own version of Koha as superior. Judging by the responses on the list, that action served only to further alienate the already-suspicious Koha community at large.

Code, Trademarks, Copyrights, and Names

Koha is far from the first project to go through such a divisive conflict. In fact, forks of free software projects are not wrong in and of themselves, and can lead to improvements in the code. What caused the major split between the Koha community and LibLime was the company's decision to keep its fork private and not give back. It promised to do so, but instead withdrew from the Koha community altogether.

Naturally there is no way to prevent individuals or companies from acting with hostility, but the Koha project was vulnerable to LibLime's behavior on a couple of fronts. First, as it recognized, LibLime controlled the ostensibly community-run koha.org site — prompting the community to re-launch the content in a new location.

What is more troubling is that, based on its actions, LibLime evidently believed that it had the right to create a closed-source fork of Koha due to its acquisition of Katipo Communications's Koha assets, including the latter company's copyrights. But whether or not Katipo's copyrights constituted the whole of Koha in 2009 when LibLime forked the project is questionable. Cormack and other developers point to the Git repository's commit statistics, which show the percentages by individual authors. How to interpret those statistics is an open question, but there was no copyright assignment required to participate in Koha development. In the absence of such an agreement, Koha contributors retain copyrights for their work; as a result, taking the code proprietary is not an easy option for anybody.

It is still unclear whether or not LibLime provided the full source code to its LLEK product to its paying customers, as is required by the upstream Koha project's GPLv2+ license. Koha is written mostly in Perl, which is presumably distributed in source form, but the GPL source requirement does include all the source necessary to build the software, include supporting libraries and compilation scripts — a requirement that might affect support libraries needed to support LLEK's EC2 environment.

Muddying the waters still further is the issue of who can legally call their code "Koha" at all. LibLime filed for a registered US trademark on the name in October 2008; it was granted in May of 2009. European support vendor BibLibre filed for an EU trademark on "Koha" in December of 2008; it is still undergoing review. Finally, LibLime filed for the Koha trademark in New Zealand itself in February of 2010; it too is still undergoing review. Yet "Koha" has been used as the name of the open source project itself, not a vendor package or support product, since 2000.

The Software Freedom Law Center's Karen Sandler said that such trademark-based disputes are common, enough so that SFLC has published a primer on the subject for projects. Without commenting on the specifics of the Koha situation, she noted that although registration constitutes "legal presumption of ownership," if another party can prove it was using the mark first, it retains the right to use the mark. In addition, she added,

Others can use a mark in a manner that does not imply an official relationship or sponsorship so long as there's no likelihood of confusion on the part of consumers. Factually referring to unmodified software by a particular name, for example, is likely to be considered clearly within permitted usage. This kind of use is called nominative use.

The community's unstructured approach to the project in past years does not make up for PTFS's very public missteps, however. The company may indeed have meant to put the community back together into a functioning whole when it initiated talks about the web site, but it clearly underestimated the ire that LibLime had earned through its actions over the previous year, and the derisive press release would be considered a mistake under any circumstances. If there was any hope of drawing the larger Koha community back to koha.org, it probably died when that message went out.

Cormack observed on his blog that any vendor has the right to try and turn its Koha offering into a superior product for customers in order to increase sales — the harm was inflicted because of the way LibLime chose to carry out that business decision.. Whether you agree with that or not, however, it seems that the project would have been better equipped to cope with LibLime's withdrawal from the community had the domain name, trademarks, and perhaps even copyrights been held by a trusted entity such as HLT. Taking those legal steps is something few projects seem to consider when things are running smoothly. They are no doubt time-consuming and tedious, perhaps even expensive. But so is trying to do them in a hurry, ten years after the project launches, with hostile players going after your name.

[ Thanks to Lars Wirzenius for pointing us toward this topic. ]

Comments (16 posted)

A conference on software patents and free software

By Jonathan Corbet
April 30, 2010
On April 29, the University of Colorado held a conference on patents and free software. Your editor, having spent the morning getting some significant dental work done, figured that an afternoon devoted to software patents would appropriately continue the day in the same theme - only without the anesthetic. The following is not a comprehensive report of the event; instead, it focuses on a few of the more interesting moments.

Pamela Samuelson is a professor of law at the University of California at Berkeley; she also serves on the boards of organizations like the Electronic Frontier Foundation, the Electronic Privacy Information Center, and Public Knowledge. At the conference, she presented some results on her research into the idea of software patents as an incentive for innovation. A survey was done back in 2008, with 15,000 surveys sent out to a large number of firms. 1,333 of them - representing over 700 companies - came back. The numbers that came out were interesting, if arguably unsurprising.

According to this survey, 65% of software companies have no interest in software patents; they do not see patents as an important part of doing business. That compares with 82% of non-software companies which said they were working toward the acquisition of patents. It is worth noting that companies with venture capital backing had a higher level of interest in software patents than those without.

When companies do go for software patents, their motivations tend to be to enhance their reputation and make it easier to secure investments. Preventing litigation was also cited as a reason. But, when it comes to the question of what makes a software business successful, patents were at the very bottom of the list. Being first to market was the most important success factor. In summary: software patents are a weak incentive - at best - toward innovation.

So, do software patents matter for new companies? Lawyer Jason Haislmaier said that they can be important, especially with venture-backed companies, because they are relatively attractive to investors. Venture capitalist Jason Mendelson disagreed, though, saying that he didn't care about patents in the companies that he evaluates. In fact, if a company is focused on getting patents, he sees it as a reason not to invest: the company should be putting resources into its products instead.

Stormy Peters, director of the GNOME Foundation, noted that community developers tend to be strongly anti-patent; a company with a patent-heavy focus may find it hard to work with the community or hire developers. Stormy also worries that the current trend toward cloud computing may make the issue of open source software moot. The convenience of free web services has, she says, distracted the community from the issue of freedom. There needs to be a means by which truly free and open services can be defined.

Patent litigation was the subject of a different panel. Lucky Vidmar started with the observation that patent suits against open source software still tend to be rare, and that suits against individual developers are not really happening. In general, he says, the lawsuits which have come about have little to do with open source; they are just more in a long series of software patent suits. But suits against open-source companies do tend to get a lot of negative attention, something which potential plaintiffs may well keep in mind.

Julie DeCecco, a litigator for Oracle (by way of Sun), noted that patent litigation is very expensive. That alone makes it unlikely that open source projects will be sued; the exposure to legal action is proportional to the amount of money being made. "Follow the money," she says, and you'll see where the lawsuits are happening. Attorney David St. John-Larkin suggested that open source might be more vulnerable to these suits due to the public nature of its development.

Jason Schultz and Jennifer Urban are both from the Samuelson Law, Technology and Public Policy Clinic at Berkeley; Schultz previously did a stint at the EFF. They presented a concept they have been working on as a way of mitigating the software patent threat called the Defensive Patent License, or DPL. This work is in an early stage, and the DPL text is not yet available, but it should be forthcoming in the near future.

The core idea behind the DPL is that software patents can serve in a useful, defensive role. They can be used to negotiate cross-licensing agreements, and they can be used for countersuits if need be. But defensive patents are not as heavily used as they could be, especially in the open source area. There are a couple of possible reasons for this: defensive patents require a concentration of resources that doesn't always exist in our community, and there tends to be a certain amount of distrust toward the acquisition of patents for defensive purposes.

The DPL would promote the defensive use of software patents in a way which reinforces the free software community's norms; it is meant to be similar in spirit to the GPL. The DPL would promote the defensive use of software patents in a way which reinforces the free software community's norms; it is meant to be similar in spirit to the GPL. A company which buys into the DPL will put all of its patents under that license. Any other DPL licensee could then automatically obtain a royalty-free license for any of those patents. The license is irrevocable - unless the licensee sues another DPL licensee or withdraws from the pact. Withdrawal is possible with advance notice (six months was suggested), but any licenses granted to others would remain valid.

If this idea takes off, it will encourage the creation of a growing network of cross-licensed patents; eventually, the value of joining the pool will be far higher than remaining outside of it. Since patents in this scheme cannot be used to attack other participants, they will be limited to defensive uses only. Among other things, that should keep DPL-covered patents out of the hands of patent trolls.

There are a lot of details to be worked out yet, and it is far from clear that the idea will really take off. It is hard to imagine that large companies with extensive patent portfolios would be willing to commit the entire set to the DPL. The concept is interesting, though; we will see where it goes.

The discussion danced around a number of issues, including patent shakedowns that are settled without the filing of lawsuits, current litigation, or the general problem of low-quality patents. With regard to the last two, your editor asked about Apple's attack against HTC, which is using some highly dubious patents as a weapon against Linux. Nobody wanted to talk about the Apple case, but Julie DeCecco said that the best weapon against low-quality patents is reexamination actions in the patent office. They are relatively cheap (at a mere $20K or so) and are often at least partially successful.

Jason Schultz said that he participated in a number of these actions while at the EFF. They can be effective, but there are a lot of bad patents out there, and there's no way to challenge them all.

Your editor would note that, when talking with people more directly involved in the defense of free software, he has found the reexamination option to be held in relatively low repute. The actions are risky and might serve to make the patent stronger; this has happened with the VFAT patent. And, in the best of scenarios, it is still not possible to truly kill a patent this way; they can always come back after further rewriting by the patent holder.

There was a panel on the intersection of open source, patents, and standards; much of it was about as exciting as sitting on one of the standards committees themselves. The audience did hear an interesting presentation from Steve Mutkoski of Microsoft, who asserted that patent-encumbered standards are entirely compatible with most open source licenses. In fact, "only the GPL family of licenses" is truly problematic in this regard. It is, he suggested, more of a problem with the GPL than with patents.

Also, Steve made the claim that a lot of people who complain about patent-encumbered standards really just don't want to pay royalties. That may well be true, but it's not relevant to the larger discussion. Unfortunately, there did not seem to be anybody on the panel who understood free software well enough to try to correct that point of view.

There was an interesting suggestion that, perhaps, we need some concept of "fair use for patents." That is especially true in situations where the government has mandated the use of a patent-encumbered standard in some situation. Nobody tried to fill in the idea of how fair use might work in this setting, though.

In summary, your editor found the event to be somewhat frustrating. It was dominated by lawyers of the academic variety with a small venture capital presence; Stormy Peters was the only community representative on the panels. Even so, it is interesting to see how the problem is viewed by people who are a few steps removed from it.

Comments (16 posted)

Results from the LWN reader survey

By Jake Edge
May 5, 2010

As part of our "media kit" project, we put together a reader survey that ran for the last two weeks of April. Over 1800 readers filled out the survey—our thanks to all of them—and, as promised, here is a summary of the responses.

The vast majority (90%) of respondents were subscribers, and almost all of those folks intend to continue. Less than 5% of responses either never planned to subscribe or may not resubscribe. Three-quarters of subscribers were likely to continue their current level if there were a subscription price increase, with 8% overall likely to drop to a lower subscription level and 16% being less likely to subscribe or renew.

As for LWN content, the weekly edition front and kernel pages are by far the most popular, with 90% reading them frequently. The daily news page (71%), weekly development (70%), security (61%), and distributions (52%) pages were all fairly popular as well. Less so were the yearly timeline (33%), weekly announcements page (27%), and the events calendar (10%).

Pages and features that readers could live without had responses that, unsurprisingly, mirrored those above. No more than 25% of readers could live without any of the daily or weekly pages, with the exception of 45% who would be fine without the announcements page. The events calendar (57%) and timeline (34%) didn't fare as well.

The clear winner for areas that readers would like to see more coverage is "Languages and development tools" at 57%. Roughly 40% would like to see more system administration and desktop Linux coverage, while approximately one-third saw embedded systems and virtualization as areas for expanded coverage. "The business of Linux and free software" was only chosen by 25% of respondents and it would seem that we, perhaps, have the right amount of coverage of legal issues and conferences as only 20% thought those should increase.

Formatting LWN for mobile device display was the most popular choice for that question, with 30% saying that they would personally use it. A PDF version of the weekly edition was next at 17%, but EPub (7%) and Kindle (2%) were not particularly interesting to respondents.

The question about regularly used distributions led to some interesting results, with Ubuntu (54%) and Debian (44%) far ahead of any of the rest. The next tier was led by Fedora (24%), followed by Red Hat Enterprise Linux (21%), other OS (20%), CentOS (19%), and other Linux (15%). All of the rest came in at less than 10%: Gentoo, openSUSE, SUSE Linux Enterprise Server, Mandriva, and Oracle Unbreakable Linux (with 13 respondents) in that order.

In the single-choice "primary desktop" question, GNOME came out way ahead with 50%. KDE had a 23% share and the numbers drop off quickly from there. 8% use some Linux desktop environment that we didn't list and 7% use another OS entirely for their primary desktop. No desktop environment (5%) was just ahead of Xfce (4%), while LXDE is only used by ten of our readers who responded.

As we move forward, and look at changes we might make—for content, features, and coverage—we will definitely keep these answers in mind. There are some things, like the events calendar, that we do as a service to the community and are likely to stay, even if they are somewhat sparsely used. But when thinking about article assignments and where to focus our efforts, these answers will come in very handy. Thanks again to all who responded.

Comments (39 posted)

Page editor: Jonathan Corbet
Next page: Security>>


Copyright © 2010, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds