LWN.net Logo

LWN.net Weekly Edition for November 10, 2011

Good fences make good projects?

By Jonathan Corbet
November 9, 2011
Back in August, there was a big fight over whether the user-space "native Linux KVM tool" should be merged into the mainline kernel repository. One development cycle later, we've had the same fight with many of the same arguments and roughly the same result. Sequels are rarely as good as the original; that applies to flame wars as well as to more creative works. But there is a core issue here that has relevance well beyond the kernel community: does the separation of projects help the Linux community more than it hurts it?

The proponents of merging the tool into the kernel make a number of points. Having the projects in the same repository makes development that crosses the boundary between the two easier; in particular, it helps in the creation of APIs that will stand the test of time. The project's overall standards help to keep the quality of the tools high and the release cycle predictable. Reuse of code between user-space and kernel projects gets easier. All told, they say, having the "perf" tool in the kernel tree has greatly helped its development; see this message from Ingo Molnar for a detailed description of the perceived advantages of this mode of development. Artificial separation of projects, instead, is said to have high costs; Ingo went so far as to claim that Linux lost the desktop market as the result of an ill-advised separation of projects.

Opponents, instead, say that putting the kernel and the tools in the same tree makes it easier to create API regressions for out-of-tree tools. The reason that perf has a relatively good record on this front, Ted Ts'o said, has more to do with the competence of the developers involved than its presence in the kernel tree. Adding user-space tools bloats the kernel source distribution, puts competing out-of-tree projects at a disadvantage, and, Ted said, creates a number of difficulties for distributors.

The one concrete end result of the discussion was that the pull request for the KVM tool was passed over by Linus who, feeling that he had enough stuff for this development cycle already, did not want to wander into this particular disagreement. It is not hard to imagine that he will get another chance in a future development cycle; it does not seem that any minds have been changed by the discussion so far.

In the middle of this discussion, it was asked whether it would make sense to bring other projects into the kernel - GNOME, for example. It was pointed out that BSD-based systems tend to be developed in this mode - an existence proof that operating system development can work that way. Ted responded (in the message linked above) as follows:

[T]here has speculation that this was one of many contributions to why they lost out in the popularity and adoption competition with Linux. (Specifically, the reasoning goes that the need to package up the kernel plus userspace meant that we had distributions in the Linux ecosystem, and the competition kept everyone honest. If one distribution started making insane decisions, whether it's forcing Unity on everyone, or forcing GNOME 3 on everyone, it's always possible to switch to another distribution. The *BSD systems didn't have that safety valve....)

One could note that BSD does have one safety valve: to fork the entire system. That has happened a number of times in the history of BSD; pointing this out, though, only serves to reinforce Ted's point.

Distributors play a crucial role in the Linux ecosystem; they function as the middleman between most development projects and their users. Most of us, most of the time, do not obtain the software we run directly from those who wrote it; it comes, instead, nicely packaged from our distributor. As they ponder each package, distributors (the successful ones, at least) will be keeping their users' needs in mind. If the package has obnoxious anti-social features or security problems, the distributors will either fix it or leave the package out altogether. The recent Calibre mess is a prime example; aware distributors had already eliminated the worst problems before they were generally known.

Distributors make it possible to change the source of your operating system without having to stop running Linux. Anybody who has been working with Linux long enough has almost certainly switched distributions at least once during that time; the process is not without its disruptions, but the amount of pain is usually surprisingly low. The lack of lock-in in the Linux world has improved life for users and, at the same time, given distributors an incentive to improve the Linux experience for everybody.

The role of the distributors is made possible by the boundaries between the projects. If the entire system were integrated into a single source tree, there would be little space for the distributors to do their own integration work. The lack of independent *BSD distributions makes this point clear. That suggests that too much integration at the project level might not be a good thing for Linux.

So one could make an argument that bringing GNOME into the kernel source tree is probably a bad idea for this reason alone; Linux as a whole may be better served by having the kernel and the desktop environments be separate components that can be combined (or not) at will. That makes it clear (if it wasn't before - your editor can be slow at times, please bear with him) that there is a line to be drawn somewhere; bringing some projects into the kernel source tree may be harmful for Linux even without considering the effects on the kernel itself. But separating the kernel from some user-space projects may have costs that are just as high. There is no consensus, currently, on what those costs are or where the line should be drawn.

All of this implies that the debate over the inclusion of the KVM tool has an importance that goes beyond the fate of that one project. Does (as some allege) the integration between perf and the kernel impede the development of alternatives and hurt the performance tooling ecosystem as a whole? Would the integration of the KVM tool put QEMU at the mercy of a fast-changing, regression-prone API over which its developers have no control? Are we better served by a fence between the kernel and user space that is as well defined at the project level as it is at the API level? Or, on the other hand, does keeping the KVM tool out of the kernel repository slow its growth and hurt the capability and usability of Linux tooling as a whole? And, importantly, what does the reasoning that leads to an answer to these questions tell us about which other projects should - or should not - find a home in the kernel tree?

These issues arise at a number of levels; some distributors, for example, are increasingly taking control of parts of the system through tightly-controlled in-house projects. Android is an extreme example of this approach, but it can be found in more traditional distributions as well. There are clear advantages to doing things that way, but it is worth asking whether that behavior is good for Linux in the long term and just where the line should be drawn. The fences between our projects may have played an important role in both the successes and failures of Linux; decisions on whether to strengthen them or tear them down need some serious thought.

Comments (17 posted)

Xiph.org's "Monty" on codecs and patents

By Jake Edge
November 9, 2011

While the talks at the 2011 GStreamer conference mostly focused on the multimedia framework itself—not surprising—there were also some that looked at the wider multimedia ecosystem. One of those was Christopher "Monty" Montgomery's presentation about Xiph.org, and its work to promote free and open source multimedia. Xiph is known for its work on the Ogg container format (and the Vorbis and Theora codecs), but the organization has worked on much more than just those. In addition, Montgomery outlined a new strategy that Xiph is trying out to combat one of the biggest problems in the free multimedia world: codec patents.

[Christopher 'Monty' Montgomery]

Xiph was founded in 1994, originally as a for-profit company (Xiph.com) that was set up to sell codecs. These days, it is a non-profit that consists of various "loosely grouped" codec projects. All of the members are volunteers, and various FOSS companies pay the salaries of some of the members as donations to Xiph.org. For example, Red Hat pays Montgomery's salary to allow him to work on Xiph projects. The organization is "like a coffee shop where skilled codec developers hang out", Montgomery said.

Beyond Ogg, Vorbis, and Theora, there are a number of different projects under the Xiph umbrella, Montgomery said. The cdparanoia compact disc ripper program and library was something he wrote as a student that is now part of Xiph. The Icecast streaming media server is another Xiph project, he said, as are various codecs including Speex, FLAC, the new Opus audio codec, and "a whole bunch of codecs that no one remembers".

Xiph does hold "intellectual property", Montgomery said, and that is one of the reasons it exists. Non-profits have an advantage when it comes to patents because the board gets to decide what happens to the patents if the organization goes out of business. That's different from for-profit companies that go bankrupt, he said, because whoever buys the assets gets the patents free of any promises or other entanglements (at least those that aren't legally binding, like licenses). If the original company promised not to assert some patents (e.g. for free software implementations or to implement a standard), a new owner may not be bound by that promise. A non-profit's board can ensure that any patents end up with a like-minded organization, he said.

Codec news

The biggest Xiph news in the recent past is that Google chose Vorbis as the audio codec for WebM. Montgomery said that he is very happy to see Vorbis included into WebM, but is also glad to see that Google is stepping up to help the cause of free codecs. Xiph has been trying to "hold the line on free codecs", mostly by themselves, he said. He is hopeful that Google picking up some of that will allow Xiph to "go back to what we are actually good at", which is codec development.

Xiph will be continuing to do more codec development because the members enjoy doing so, Montgomery said. Revising the Ogg container format is one thing that's on the plate now. That is not something that Xiph wanted to do while Ogg was part of its effort to hold the free codec line. With the advent of WebM, which uses the Matroska container format, some of the "legitimate complaints" about Ogg can now be addressed.

FLAC is now finished, he said. It is stable and mature with good penetration; it is essentially the standard for lossless audio codecs, and one that Apple has been unable to overturn, Montgomery said. He also noted that there were plans for a Theora 1.2 release that never happened, partly because "everyone went to work on VP8 and Opus". He believes that the release will still happen at some point, but that the pressure is off because of the existence of WebM.

Opus is a new audio codec that incorporates pieces from Xiph's CELT codec and Skype's SILK codec. Opus is designed for streaming voice or other audio over the internet, and is the subject of an IETF Internet-draft. As is usual for such documents, Intellectual Property Rights (IPR) disclosures were made by various parties who believed they had IP (e.g. patents) that are required to implement the proposed standard. Qualcomm has filed such a disclosure for Opus, but, unlike the other disclosing organizations, Qualcomm has not offered its patents under a royalty-free license.

Patent strategy

Montgomery was clear that he wasn't singling out Qualcomm in his talk, because what it has done is "business as usual" in the industry, and Qualcomm is "not in any sense alone" in making these kinds of claims. But it has led Xiph to spend almost as much time on patent strategy as it has in writing code recently. Part of the problem is that these IPR disclosures are immediately assumed to be valid by everyone, whether they know something about patents in that space or not. The presumption is that Qualcomm would never have made the claims without doing a great deal of research.

But Montgomery is not convinced that there is much of substance to Qualcomm's claims. The patent game is essentially a protection racket, he said, and those who are trying to do things royalty-free are messing things up for those who want to collect tolls. "The industry is pissed at Google because they won't play the protection racket game", he said. Qualcomm and others just list some patents that look like they could plausibly read on a royalty-free codec, because it doesn't cost them anything.

That leaves Xiph with few options, though. There is the "thermonuclear option" of going to court and getting a declaratory judgement, but there are some major downsides to pursuing that strategy. It will take a lot of time and money to do so and "no one will use it while the litigation is going on". Montgomery's original inclination was to pursue a declaratory judgement, to "bash in some teeth" and "show that Xiph.org is not to be trifled with". But even if Xiph won, it would only impact those few patents listed by Qualcomm. What is needed is a way to "change 'business as usual'", he said.

Companies "have figured out how to fight 'free'", Montgomery said, by making it illegal. In order to fight back through the courts, there would be an endless series of cases that would have to be won, and each of those wins would not hurt the companies at all. There is a "presumption of credibility" when a patent holder makes a claim of infringement, and the press "plays along with that", he said. But Eben Moglen has pointed out that an accusation of infringement has no legal weight, so there is no real downside to making such a claim.

One way to combat that is to document why the patents don't apply. Basically, Xiph did enough research to show why the Qualcomm patents don't apply to Opus and it is planning to release that information. It is a dangerous strategy at some level because it gives away some of the defense strategy, he said, but Xiph has to try something. By publishing the results of the research, Xiph will be "giving away detailed knowledge of the patents" and may be called to testify if those patents ever do get litigated, but it should counter the belief that the Qualcomm patents cover Opus.

Qualcomm could respond to the research in several different ways. It could ignore it, respond to it, or come back with more patents. It could also formally abandon the claim. If Qualcomm doesn't respond, Montgomery said, that does have some legal weight. One advantage of this approach is that regardless of how Qualcomm responds, Xiph has something concrete (i.e. the research) for the money that it has spent, which is not really the case when taking the declaratory judgement route.

New codecs

Montgomery called Opus a "best in class codec" that Xiph would like to see widely used. Hardware implementations of Opus have been considered, but have not been done yet, he said. Finishing the Opus rollout and "responding to patent claims" have been higher on the list, but they will get to it eventually.

He mentioned two other codecs that Xiph will be working on, including Ghost, which splits audio into two components: strong tones and everything else. Each of the components will be processed separately, much like what the ears do, he said. Both can be represented compactly, but the same transforms don't work on them, so representing them separately may make sense. There was a need to "invent some amount of math for all of this", he said. In addition, Xiph will be working on a new video codec that is being done as part of a "friendly rivalry with On2" (makers of the VP8 codec in WebM).

Montgomery painted a picture of an organization that is doing a great deal to further the cause of free multimedia formats. There are lots of technical and political battles to fight, but Xiph.org seems to be up to the task. It will be interesting to see how Qualcomm responds to the Opus research, and generally how the codec patent landscape plays out over the next few years. The battle is truly just beginning ...

[ I'd like to thank the Linux Foundation for helping with travel expenses so that I could attend the GStreamer conference. ]

Comments (18 posted)

Authenticating Git pull requests

By Jake Edge
November 9, 2011

One of the outcomes from the kernel.org compromise is the increased use of GPG among kernel developers. GPG keys are now required to get write access to the kernel.org Git repositories, and folks are starting to think about how to use those keys for other things. Authenticating pull requests made by kernel hackers to Linus Torvalds are one possible use. But, as the discussion on the linux-kernel mailing list shows, there are a few different use-cases that might benefit from cryptographic signing.

Most of the code that flows into the kernel these days comes from Git trees that various lieutenants or maintainers manage. During the merge window (and at other times), Torvalds is asked to "pull" changes from these trees via an email from the maintainer. In the past, Torvalds has used some ad hoc heuristics to determine whether to trust that the request (and the tree) are valid, but, these days, stronger assurances are needed. That's where GPG signing commits and tags may be able to help.

Conceptually the idea is simple: the basic information required to do a pull (location and branch of the Git tree along with the commit ID of its head) could be signed by the developer requesting the pull. Torvalds could then use GPG with his keyring of kernel developer public keys to verify that the signature is valid for the person who sent the request. That would ensure that the pull request is valid. It could all be done manually, of course, but it could also be automated by making some changes to Git.

The discussion on how to do that automation started after a signed pull request for libata updates was posted by Jeff Garzik. The entire pull request mail (some 3200+ lines including the diffs and diffstat) was GPG signed, which mangled the diff output as Garzik noted. Beyond that, though, it is unwieldy for Torvalds to check the signature, partly because he uses the GMail web interface. In order to check it, he has to cut and paste the entire message and feed it to GPG, which is labor intensive and might be prone to the message being mangled—white space or other changes—that would lead to a false negative signature verification. As Torvalds noted: "We need to automate this some sane way, both for the sender and for the recipient."

The initial goal is just to find a way to ensure that Torvalds knows who the pull request is coming from and where to get it, all of which could be handled outside of Git. Rather than signing the entire pull request email, just a small, fixed-format piece of that mail could be signed. In fact, Torvalds posted a patch to git-request-pull to do just that. It still leaves the integrator (either Torvalds or a maintainer who is getting a pull request from another developer) doing a cut-and-paste into GPG for verification, however.

There are others who have an interest in a permanent trail of signatures that could be audited if the provenance of a particular part of the kernel needs to be traced. That would require storing the signatures inside the Git tree somehow, so that anyone with a copy of Torvalds's tree could see any of the commits that had been signed, either by Torvalds or by some other kernel hacker. But, as Torvalds pointed out, that information is only rarely useful:

Having thought about it, I'm also not convinced I really want to pollute the "git log" output with information that realistically almost nobody cares about. The primary use is just for the person who pulls things to verify it, after that the information is largely stale and almost certain to never be interesting to anybody ever again. It's *theoretically* useful if somebody wants to go back and re-verify, but at the same time that really isn't expected to be the common case.

Torvalds's idea is that the generation of the pull request is the proper time for a developer to sign something, rather than having it tied to a specific commit. His example is that a developer or maintainer may wish to push the tree out for testing (or to linux-next), which requires that it be committed, but then request a pull for that same commit if it passes the tests. Signing before testing has been done is likely to be a waste of time, but signing the commit later requires amending the commit or adding a new empty commit on top, neither of which were very palatable. Git maintainer Junio C. Hamano is not convinced that ephemeral signatures (i.e. those that only exist for the pull-request) are the right way to go, though: "But my gut feeling is that 'usually hidden not to disturb normal users, but is cast in stone in the history and cannot be lost' strikes the right balance."

The conversation then turned toward tags, which can already be signed with a GPG key. One of the problems is that creating a separate tag for each commit that gets signed rapidly becomes a logistical nightmare. If you just consider the number of trees that Torvalds pulls in a normal merge window (hundreds), the growth in the number of signed tags becomes unwieldy quickly. If you start considering all of the sub-trees that get pulled into the trees that Torvalds pulls, it becomes a combinatorial explosion of tags.

What's needed is an automated method of creating tag-like entries that live in a different namespace. That's more or less what Hamano proposed by adding a refs/audit hierarchy into the .git directory data structures. The audit objects would act much like tags, but instead carry along information about the signature verification status of the merges that result from pulls. In other words, a git-pull would verify the signature associated with the remote tag (which are often things like "for-linus" that get reused over and over) and create an entry in the local audit hierarchy that recorded the verification. Since the audit objects wouldn't pollute the tag namespace, and would be pulled and created automatically, they will have much less of an impact on users and existing tools. In addition, the audit objects could then be pushed into Torvalds's public tree so that audits could be done.

So far, Hamano has posted a patch set that implements parts of his proposed solution. In particular, it allows for signing commits, verifying the signatures, and for pulling signed tags. Other pieces of the problem are still being worked on.

As is often the case in our communities, adversity results in pretty rapid improvements. For the kernel, the SCO case brought about the Developer's Certificate of Origin, the relicensing of BitKeeper gave us Git, the kernel.org break-in brought about a closer scrutiny of security practices, and the adoption of GPG keys because of that break-in will likely lead to even better assurances of the provenance of kernel code. While we certainly don't want to court adversity, we certainly do take advantage of it when it happens.

Comments (12 posted)

Page editor: Jonathan Corbet

Inside this week's LWN.net Weekly Edition

  • Security: Password managers; New vulnerabilities in Firefox/Seamonkey, kernel, Perl, Xen, ...
  • Kernel: The second half of the 3.2 merge window; Better device power management for 3.2; Fast interprocess communication revisited.
  • Distributions: Two flavors of GNOME for Linux Mint 12; Fedora 16; GNOME Shell; ...
  • Development: GIMP 2.8; Transactional memory; recutils, spyder, ...
  • Announcements: New books, ELCE videos, and tablets falling from helicopters.
Next page: Security>>

Copyright © 2011, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds