When we last looked in on GCC, the 4.4 release branch was held up by licensing issues, specifically the exemption that would allow GCC plugins. Since that time, things have progressed—there is now a 4.4 release branch—but the license issue has not yet been completely resolved, though a new revision is available. But, since the license change is not needed for the 4.4 release, some GCC hackers are unhappy about the manner in which the release was held up. It has led to questions about the roles of the Free Software Foundation (FSF) and the GCC steering committee (SC) have in controlling GCC development.
Holding up a release for licensing concerns is seen by many as a reasonable request that the FSF might make. That organization is very concerned about licenses, in general, and they certainly think it is important to get the license right before releasing any code under the license. It is the concerns expressed by GCC developers about the wording of the runtime library exception that led to the license review. But Richard Stallman, acting for the FSF, did not ask that the release be delayed—at least directly—he asked that no release branch be created. The net effect on the release process is the same (i.e. no release), but the impact on development of new features destined for GCC 4.5 is a large part of what irked the GCC developers.
The other piece of the puzzle is that the issue is essentially moot: there is no plugin API in GCC 4.4 that would require the runtime library exemption changes. Stallman is adamant that proprietary GCC plugins be outlawed before such an API gets added. But, since the API isn't present (yet), the license changes aren't needed yet either. So, while it makes sense to take whatever time is required to get the license right, many seem very puzzled as to why that would need to hold up the release and new development.
There has never been a clear explanation of why the release branch needed to be delayed, at least publicly. Either Stallman relented—perhaps due to completing the license rewording—or the SC eventually decided to override his request because the release branch was created on March 27. But there was clear discontent in the ranks that the SC could, evidently, be pushed around so easily by the FSF. This led to questions about the role of the SC, how much control over "technical" decisions the FSF has, and, in general, how the project is governed. As Daniel Berlin put it: "Where is the pushback by the SC onto the FSF?"
Berlin's complaint was that it has taken the FSF too long to resolve the issue, to the point where it is now (and has been for a bit) seriously impacting GCC development. Because the license change is not required for this release, there is no good reason to delay it. Ian Taylor was fairly blunt:
In response to Berlin's criticism of their response, SC member David Edelsohn noted that there were things going on behind the scenes:
But a lack of communication from the SC to the greater GCC community is part of the problem, according to Berlin:
The discussion then turned to the role that the FSF plays in GCC development. SC member Joe Buck points out that the FSF holds the cards: "The problem in this instance is that the SC has little power; it's the FSF that's holding things up and I don't know more than you do." But that doesn't sit well with some. Steven Bosscher asks:
Others believe that the FSF should be in complete control. Richard Kenner outlines how he sees the relationship:
As a practical matter, the FSF *delegates* most of their responsibilities to the maintainer of the package, but they can undo that delegation as to any matter any time they want.
Berlin would like to see the governance structure for GCC more clearly spelled out. The SC web page seems to indicate that its role is to prevent the project from being controlled by any party. But whether the SC is supposed to prevent the FSF from controlling its own project was disputed. Ultimately, the developers do have some power as Buck states: "There are checks on FSF control in the sense that the project can be forked and developers can leave."
That kind of talk inevitably leads some to mention the egcs/GCC split, and subsequent join, as an example of the power that the development community has. No one has said they are seriously considering a fork, but the GPL certainly allows such things if the FSF's hand gets too heavy. Jeff Law doesn't see it coming to that:
Clearly some developers are chafing under what they see as unnecessary interference in technical issues from Stallman. But as Buck points out, Stallman does not dictate technical details to GCC. Various decisions (bugtracker, version control, etc.) have gone against his express wishes. In addition, contrary to Stallman's aversion to C++, Taylor has used that language for the gold linker, and currently has a branch that implements some of GCC in C++. He sees things this way:
This discussion will hopefully lead to a clearer picture of the governing structure of GCC. With luck, it may also make Stallman and the FSF more cognizant of the perception that they are meddling in technical issues to the detriment of their relationship with at least some in the community. No one in the rather long thread could come up with any sensible reason that the release branch should have been held up. At best, that means that Stallman didn't communicate the why, which leads many to a sense that he is being rather arbitrary.
In the meantime, though, GCC is preparing for the 4.4 release. Release manager Mark Mitchell created the release branch, Bosscher is rounding up folks to update the 4.4 changes page, and work is proceeding towards a release. That also allows the changes for 4.5 to be added to the trunk, which puts that release back on track.
With GCC 4.5 there will likely be a new plugin API for which the license change is needed. On April 1, Edelsohn announced that the revised runtime library exception had been released. It explicitly allows for Java byte code to be used as input into GCC, making a "compilation process" using that input eligible for the runtime library exception. One of the other concerns, regarding independent modules, will be addressed in the FAQ, though it has not been at the time of this writing.
Assuming the new exception passes muster on the gcc-devel list, and no problems are found that would require adjustments, it will presumably end up in the 4.4 release. While that should conclude this particular issue, the overarching governance questions will remain.
Xiph.org achieved a milestone last week, unveiling the first public release of its new encoder for Theora video. The new encoder is codenamed Thusnelda to distinguish it from previous work, and makes several big improvements, including fixes to constant bitrate and variable bitrate encoding.
Theora is derived from a video codec called VP3 created by On2 Technologies. On2 donated the code to VP3 and to the public under an open source license in 2001, and agreed to help Xiph.org develop Theora as its successor. The specification for the Theora codec's format was finalized in 2004, but the reference encoder itself — the actual binary that converts a video file into Theora format — only reached 1.0 in November of 2008. Work on Thusnelda began shortly thereafter, spearheaded by Xiph.org's Christopher Montgomery, but was bolstered by a grant from Mozilla and the Wikimedia Foundation that allowed lead Theora developer Tim Terriberry to focus on improving the encoder to coincide with the built-in Theora support slated for Firefox 3.5.
The Thusnelda encoder is denoted 1.1 alpha, and is available for download from Xiph.org in several formats: source code for the libtheora library, binaries of the ffmpeg2theora command-line conversion utility, and even a Mac OS X Quicktime component.
According to Xiph.org's Ralph Giles, the most noticeable improvement in 1.1 is proper rate control, particularly for fixed bit rate encoding, where the user specifies either the number of bits per second desired in the output (a common use case for streaming applications), or the desired file size. "The 1.0 encoder relies a lot on heuristics, instead of trying to optimize directly the trade-off between quality of the coded images and the number of bits used to represent them," he said, "More significantly, the fixed bitrate mode in the 1.0 reference encoder didn't really work; it just guessed how to meet its target and often missed the requested bitrate, sometimes by quite a bit, which was a problem for streaming and fixed-size encodes."
But Montgomery's work — supported for a year by his employer Red Hat — also included extensive refactoring of the code, which should result in improvements today and allow for easier changes moving forward. "The older encoder was structured as a bunch of nearly independent passes," Giles said, "[it] made something like 8 passes over each frame. This made some forms of decision making hard, i.e. if an earlier decision caused you problems (higher bitrate) in a later stage you were out of luck. The new encoder collapses most of the passes."
The restructuring also allows Thusnelda to take advantage of features in the Theora specification that had never been implemented before, such as "4MV" macroblocks, a motion compensation scheme that adaptively chooses whether to encode motion information for an entire segment of the picture, for a sub-segment, or for none of the segment. "Theora always breaks each image up into square blocks," Giles explained, "one of those blocks then can be split into four motion vectors, or use an average, and if any of those four don't need to be coded, the alpha encoder can skip coding a corresponding motion vector. Making a change like that was too difficult with the 1.0 codebase."
Naturally, real-world performance and not a feature list is the primary means of assessing an encoder. Theora has been the object of criticism in years past, especially when compared against proprietary offerings such as H.264. Reader comments on news stories at Slashdot often dismissed Theora as a poor alternative, producing larger files than the competition for the same subjective quality.
Codec testers are always at the mercy of the encoder, however, and as noted above Theora's 1.0-series encoder had significant flaws, especially with respect to constant bitrate encoding. In the oft-cited doom9.org 2005 codec shootout, the Theora encoder performed poorly by failing to meet the target file size due to poor rate control; the very feature targeted in the 1.1 branch. Similarly, Eugenia Loli-Queru's 2007 Theora versus H.264 test for OSNews repeatedly cited problems with the encoder that made direct comparison close to impossible.
Both tests pre-date the 2008 release of the final 1.0 encoder, much less the 1.1 alpha. Shortly after the Thusnelda alpha, Jan Schmidt posted the results of his personal tests on his blog, indicating a 20% reduction in file size and 14% reduction in encoding time over the 1.0 encoder. Those are significant numbers, even without accounting for better rate control and other encoding parameter improvements. As commenters to the blog pointed out, Schmidt's test was not scientific, particularly as it involved re-encoding an H.264 file rather than a lossless original, and showed example still frames rather than video results.
Video quality is ultimately a subjective, human-centric measure. Although there are attempts to quantify video encoding quality, such as peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM), they rarely replace subjective evaluations of quality. Xiph's Gregory Maxwell said that Thusnelda improves on Theora's PSNR, but that it was a mistake to assume that that equated to a subjective improvement for any particular use case.
Terriberry concurred, noting that none of the simple objective metrics take any kind of temporal effects into account, and they are still less trustworthy than the processing done in the brain. "Like most things, it's a matter of knowing what the limitations of your tools are. PSNR and SSIM are useful for monitoring day-to-day changes in the code to identify regressions and optimize parameters. But for evaluating fundamentally different approaches, there's currently no substitute for using real humans."
Theora took hits from critics on subjective quality in the 2005 and 2007 tests, too, points which Montgomery responded to in 2007 with a page on his personal Web site. Although some subjective quality issues like discernible blockiness are not the result of problems with the 1.0 encoder, he argued, many of the most visible problems are, and he urged readers to watch the progress made in the 1.1 series.
There are several improvements still to come before 1.1 is declared final, according to the Theora team. Giles said the next major feature will be per-block quantizers, the functions that simplify a block of input data into smaller values for output. "[Theora precursor] VP3 used a fixed set of quantizers, and the "quality" knob was the only way you could change things. When VP3 became Theora, back 2004, we added support for varying those quantizers both per video, and per frame type. The 1.0 encoder was able to support alternate quantizer matrices, because you just switch them out, but there were some tuning issues."
"1.1alpha1 is still using the same set, but we expect that the change soon," Giles said. The newly-restructured codebase makes it easy to vary the quantizer used, not just on a per-file or per-frame basis, but block-by-block. Terriberry added that the new code will support 4:2:2 and 4:4:4 pixel formats, which will allow higher color quality, and the ability to use different quantization matrices for different color channels and frames.
Giles and Terriberry agreed that 1.1 final will be significantly better than even the current alpha release once all of the changes are incorporated. Terriberry noted that many of the remaining improvements are "minor things" but that added together they will be substantial. "And that's not even mentioning things like speed optimizations, which also have real practical benefits."
"There are other things still on the docket as well — we're not done yet!" added Montgomery, "However, we're finally to the point of putting together a release solidly better than 1.0 in every way, along with a much higher future ceiling."
Between now and then, the team is soliciting user input from real-world encoding tests. "We put it out to show what we've been up to, and to make it easier to give it a try," said Giles. "We're interested in samples where it really does poorly, especially relative to 1.0, compatibility testing with current decoders, and general build and integration issues which of course can only be found through people trying your software in their own environments." He encouraged users to submit concrete issues through the bug tracker, but to share other experiences through the project mailing list, or simply to blog about them for all to read.
Web video is poised to start changing dramatically once Firefox 3.5 ships with a built-in Theora decoder underlying the HTML5 video element. That makes it all the more important to get the Theora encoder right. Xiph.org does not have the full-time staff or resources of larger activist groups like the Free Software Foundation or Creative Commons, it has only software developers. Consequently, without the support of Red Hat, Mozilla, and the Wikimedia Foundation, it might not have been able to get up to speed. It remains to be seen whether the final build of Thusnelda will beat Firefox 3.5 to release, but the progress made already is encouraging.Open Source Business Conference, held at San Francisco's Palace Hotel, draws a lot of lawyers, from both corporate legal departments and law firms. Continuing Legal Education (CLE) credit is available. Jeff Norman, a partner at the law firm of Kirkland & Ellis, delivered a talk on "Shims and Shams: Firewalling Proprietary Code in a Copyleft Context." This talk gives some insights into the current thinking on how difficult it can be to create a combined software product using both copyleft and proprietary code.
Most clients who want to combine GPL and proprietary code, Norman said, do not have an open source business model in mind. But the idea of creating a mixed GPL/proprietary software product is difficult and expensive. Step one for the lawyer is to explore the reasons behind the idea. The question is: "Why do you want to do something unusual instead of complying with open source disclosure requirements?" Only if the client says, "We can't open source this," does Norman recommend what he calls "shimming," which he defines as "programing practices and architectures that reduce the risk that independently created proprietary code might be deemed a derivative work based upon some other code that is intended to operate with such proprietary code." Shimming includes both procedural shims, which are development practices, and substantive shims, which are design decisions.
The GPL's reciprocity requirements rely not on any technical criterion, but on the legal definition of a "derivative work." The definition is actually consistent across countries. Norman surveyed the law in US, Europe, and Asia, and found "little substantial difference." While the definition is consistent, it's also broad, often surprisingly broad. Case law shows that derivative works can come into existence easily, whenever pre-existing work is either incorporated into new work or modified.
Two cases show the wide reach of the derivative work concept. A.R.T Company sold products based on unmodified postcards, and in one case was found to be creating a derivative work. (Another case found that A.R.T.'s product consisting of a postcard mounted on a tile was not infringing, creating a conflict between two U.S. circuit courts.) In another case, Midway Manufacturing Co. v. Artic International, Inc., the court found that selling a hardware speed-up kit for an existing arcade game was creating a derivative work of the game software, even though the defendant did not copy or modify the original game software. Courts use the test of "access and substantial similarity." If the alleged infringer had access to the copyrighted work, and even parts of the alleged derivative work are substantially similar, then, according to the test, it's a derivative work. Applying the idea to software, "proprietary code may incorporate non-proprietary code in non-obvious ways," Norman said.
"There are built in to most computer languages directives that cause code to be combined," he said. The C preprocessor's #include directive is one example. "I've seen hundreds of thousands of lines of code incorporated with one include." In one example, a proprietary program used a widget class's API, and the widget class, using a header file, incorporated code from a system library. "This whole thing becomes a derivative work," he said.
If a combination of GPL and proprietary code is only for in-house use, some clients decide simply not to redistribute it. The GPL's reciprocity requirements apply only in distribution. However, distribution could happen in a lot of ways. Does depositing source code in escrow count as distribution? How about an acquisition that's structured as a sale of assets from one company to another? "Relying on no distribution is very dangerous. There are a lot of situations where distribution can happen but you wouldn't think of it as distribution," Norman said.
The other approach is what Norman recommends. "Don't create a derivative work and then you won't have a problem." He said that some open source advocates say, "You're violating the spirit or the purpose of the GPL." But in the long run, allowing a license to reach out too far could enable proprietary vendors to apply unwanted terms to open source code. "If the end user is not creating a derivative work, not only are the license terms not being triggered but you don't want them to be triggered,"
In the years that developers have been using and discussing the GPL, some have developed a false sense of security about when they're creating a derivative work. Just using an API may or may not create a derivative work. The purely functional aspects of a function call are not copyrightable. However, Norman says, even minor non-functional aspects of an API, such as the sequence of fields in a structure, are probably copyrightable. And just using an API can result in bringing thousands of lines of code into your application. Another fallacy is the often-heard, "We can avoid any problems if we use dynamic rather than static links," However, dynamic linking by itself does not automatically avoid creating a derivative work.
Some US circuits ignore extremely small copying under the so-called "de minimis" exception. However, Norman said, "it would almost never cover code," There's no sure test for what is or isn't de minimis copying, and one module or section of a program could be found to be a derivative work. "Even if the whole project is not substantially similar, one module may be substantially similar," he added.
With the broad standards of what is a derivative work, combined with the ways that software can mingle at build time, "derivative works practically create themselves," Norman said. In order for a combined product not to be a derivative work, developers need to take specialized and expensive measures. Avoiding creating a derivative work is not something to do in the ordinary course of business. Shimming requires the same clean-room techniques that software companies use to protect themselves when doing reverse engineering. Building a combined product cleanly "really has to be worth it," Norman said. "If you have someone who has never done a clean room before you're going to spend time getting them up to speed." Besides the development costs, there's some publicity cost. "In any shimming scenario you may get some negative PR," he said.
The wrong way to do shimming is simply to build a wrapper, implementing essentially the same interface as the GPL software, then communicate with the wrapper. It doesn't work because the wrapper becomes a derivative work, then the code that talks to the wrapper does too. In practice, two development teams need to work side by side, but not in direct communication with each other. One team handles GPL code, the other handles the proprietary code, and the two communicate only through the legal department, which acts as filter.
They have to be kept separate, Norman says, because, "Programmers like to borrow just like lawyers. How often do you borrow from somebody else's brief?" The filtering has to block any knowledge about creative expression from making its way across the barrier. Since anyone could have seen the GPL code, "We have a set of programmers who verify under affidavit that they've never looked at this code before." The clean room developers' access to the Internet has to be monitored, or better yet, blocked. "It's a fairly cumbersome technique, but most software companies have done some kind of clean room before," Norman said.
NDISwrapper is an example of another safe approach, Norman said. A network device driver originally written and compiled for Microsoft Windows cannot be a derivative work of Linux. NDISwrapper itself is GPL-licensed, and probably a derivative work of Linux (it's very difficult to make a kernel module that isn't). The API used in the Windows driver clearly has nothing to do with the copyleft API.
Another approach is to split the GPL code into a server process and put the proprietary code in the client. However, this is unlikely to work if the server and client are distributed together on the same CD, relying on what the GPL calls "mere aggregation." Any "intimate communication" between the server and client could also create a derivative work, Norman said. The last approach is to time-shift the creation of a derivative work to the end user. An example is the NVIDIA proprietary device drivers for Linux, which the company distributes separately from the kernel. This only works for technical or hobbyist end users, not for an integrated product. There's also a potential patent problem: distributing two pieces that, when combined, infringe a patent constitutes contributory infringement, Norman said later.
A problem shimming scenario is using it to attempt to undo a previous decision to combine software. It could be "admitting that what you did was problematic." If possible, try to buy a exception from the copyright holder instead, Norman said. Shimming is possible and might even be necessary, as in the case of third-party code that can't be relicensed. But the lesson is that companies will save time, use fewer developers, make a simpler product, and avoid legal bills just sticking with the copyleft.
Page editor: Jonathan Corbet
Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds