User: Password:
Subscribe / Log in / New account

Leading items

Courgette meets a dangerous (Red) Bend

By Jonathan Corbet
November 2, 2009
Back in July, your editor stumbled across Google's Courgette announcement and promptly added it to the LWN topic slush pile. He then promptly let it sit for three months or so. The news that this software is now the subject of a patent suit brought Courgette back to the foreground; here we'll look at what Courgette is for, how it works, and how it relates to the patent being asserted.

As most LWN readers will know, Google is working on its own web browser, called Chrome. The Chrome developers seem to be focusing on speed, but they are also clearly putting significant thought into the security of the browser. That is a good thing: web browsers are a large, complex body of code which are directly exposed to whatever a web server might choose to throw at them. The complexity makes security-related bugs inevitable; the exposure makes them highly exploitable. Chrome's developers have come to the conclusion that, when security problems are found, they must be fixed as quickly as possible.

Prompt patching of bugs requires that they be identified and repaired as quickly as possible. But the repairs are not useful unless they get to the browser's users - all of them, or as close to that as possible. The Chrome developers worried that the sheer size of browser updates would make that goal harder to achieve. Massive updates take longer to download and install, are more likely to be interrupted in the middle, and greatly increase the strain on server bandwidth. Pushing out a fix for a severe zero-day problem might even tax the bandwidth resources of a company like Google, leaving users exposed for longer than they should be.

If the size of browser updates could be reduced significantly, it should become possible to update far more systems in less time. After looking at various ways to compress patches, the Chrome developers decided to create their own algorithm; the result was Courgette. This algorithm is based on the key observation that small changes at the source level tend to cascade into big changes in binary code; by taking a small step back toward the source, many of those changes can be abstracted back out.

In particular, Courgette tries to eliminate irrelevant changes to static pointers. Consider a simple example:

        if (some_condition)
	    goto error_exit;

	/* ... */
	return -EYOULOSE;

As the program is built, error_exit turns into a specific location in the code. An irrelevant change elsewhere in the file can cause the location of error_exit to change; that, in turn, will change the final compiled form of the goto line even though that line has not changed. That changed address looks like a difference in the binary file; when this happens thousands of times over, the binary patch will become severely bloated.

Courgette works by finding static pointers in the code and turning them back into something that looks like a symbolic identifier. The new identifiers are generated in a way that ensures that they do not change if the underlying code has not changed. New versions of the binary (both before and after patching) are built using the replaced pointers; these reworked binaries can then be compared with a utility like bsdiff. Since addresses with unimportant changes have been replaced with consistent identifiers, the two binaries should be a lot closer to each other and the resulting diff should be much smaller.

How much smaller? In an example cited on, a full update weighed in at some 10MB. Using bsdiff (which already shrinks binary diffs considerably) yielded a 700KB change, already a significant improvement. With Courgette, though, the diff is 78,848 bytes. In other words, the size of the update has been dropped to less than that of the unpleasant flash ad which probably decorates this article. That seems like an improvement worth having. It also seems like a technology that projects like deltarpm (which is bsdiff-based at its core) might want to take a look at.

Enter Red Bend Software and patent #6,546,552. For the curious, here is the first independent claim from that patent:

A method for generating a compact difference result between an old executable program and a new executable program; each program including reference entries that contain reference that refer to other entries in the program; the method comprising the steps of:
(a) scanning the old program and for substantially each reference entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark, whereby a modified old program is generated;
(b) scanning the new program and for substantially each reference entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark, whereby a modified new program is generated;
(c) generating said difference result utilizing directly or indirectly at least said modified old program and modified new program.

Even for patentese, this language tends toward the impenetrable. But once one realizes that "reference entries that contain reference that refer to other entries" means "addresses," it starts to become a little clearer. To your editor's overtly non-lawyerly, not-legal-advice reading, this claim does appear to describe what Courgette is doing.

Google is not dealing with a typical patent troll here; Red Bend is a company which manages over-the-air firmware updates for mobile carriers. The patent was applied for in 1999, and granted in 2003. This company may well be in a position to tell a sob story where its bread-and-butter patent is being stepped on by Google - a company which is now getting into the business of supplying firmware for mobile phones. On its face, this could certainly be made to look like just the sort of situation the patent system was created to deal with.

Of course, there may be prior art which invalidates this patent. But Google may well find that it's cheaper and easier to just settle with Red Bend, especially if, as Richard Cauley argues, the amount of the settlement could be quite small. Defeating a patent in court is a lengthy, expensive, and risky enterprise; it would not be surprising if Google decided that it had better things to do. The real question, in that case, is what sort of terms Google would negotiate. If Google takes a page from the Red Hat playbook, it will seek to get this patent licensed for all free software implementations. That outcome would remove this patent from consideration in the free software community and keep Courgette free software. A back-room deal with undisclosed terms, instead, could leave this useful technique unavailable for the next ten years.

Comments (65 posted)

AbiWord 2.8 features expanded collaboration

November 4, 2009

This article was contributed by Nathan Willis

The multi-platform, open source word processor AbiWord was updated to version 2.8 last week, debuting several new editing features, most notably expanded real-time collaboration support. AbiWord's collaboration capabilities are designed to work on top of a variety of underlying transport mechanisms, but the project is highlighting its web service, which not only allows peer-to-peer collaboration, but group membership and other social networking features.

AbiWord is a standalone word processor, and thus has significantly lower disk and memory footprints than, which bundles word processor, spreadsheet, presenter, and several other office applications together. In fact, it is the word processor shipped by the One Laptop Per Child project on its modestly-powered XO laptops. It is built using GTK, but like most modern applications runs on all Linux desktop environments. The new release was made on October 27, for Linux, Windows and Mac OS X. Linux users are encouraged to get binaries through their distribution's package manager, or consult the wiki for finding third-party packages.

What's new: vector graphics, annotations, and punctuation education

[Multi-page view]

Version 2.8 introduces annotation support, with which users can attach comments to portions of document text. The annotations are visible as pop-ups when the cursor moves over the annotated text, and can also be optionally displayed in the footer of each page. It also adds a flexible multi-page view, allowing the user to see as much of his or her document as fits on screen — not as a preview image, but as an open, editable session.

AbiWord also supports the use of SVG and WMF graphics inside a document, and now uses the Cairo rendering engine for greatly increased quality — on screen as well as printed. Previous releases converted SVG images on import, resulting in quality degradation. Similarly, according to the release notes, previous versions of AbiWord had a broken implementation of "educating quotes" — the process to automatically convert basic, straight "dumb quotes" into aesthetically curved "smart quotes" — but the feature has finally been fixed for 2.8.

Import and export of other file formats has also improved, including TeX, ODT, the S5 presentation format, and Microsoft DOCX — a project which the AbiWord team mentored a student during this year's Google Summer of Code. The code clean-up that included the aforementioned Cairo support also replaced the now deprecated gnome-print printing library with the preferred GTK Print.

In spite of its goal to remain a lean word processor, AbiWord does support some cross-application features common to full office suites. AbiWord documents can be embedded into other applications with the GTK AbiWidget, and AbiWord can now embed Gnumeric spreadsheets within its own documents. Both features received updates in this release.


Finally, the most talked-about change in 2.8 is the substantial update to AbiWord's collaborative editing feature. Collaborative editing was introduced in the 2.6.x code base, with the ability for two AbiWord instances to directly connect to each other over TCP for a shared editing session, or to connect through an XMPP server. 2.8 marks the debut of a free web service called, which functions as a connecting point for AbiWord sessions, and as an online document storage service.

Collaborating with

[] provides free user accounts that come with 25M of document storage. In addition to storing the contents, the site retains a full version history that can roll back the document to a previous state. It also supports export to the AbiWord, ODT, RTF, PDF, HTML, plain text, and DOC formats, has a tagging system intended to help users more easily find their documents, and password-protected RSS feeds for monitoring changed files. Users can create a blank document on the site, upload an existing document, or activate sharing on an open document from AbiWord's Collaborate menu.

Those features amount to an online storage service, though;'s real advantage is that it allows real-time collaborative editing without the hassle of directly connecting two applications by IP address. Site users can share documents with other users or make them globally-accessible. Sharing includes a read-only option as well as full read-write permission, on a document-by-document basis.

There are two ways to connect to other users on the site — adding them individually as friends in traditional social networking style, and by group. Users can set up their own groups at will, and group owners can manage group membership and set administration privileges for members. The site is still structured around the documents, however — there are no status updates, profile pages, or other social elements. Preserving privacy is also important; potential friends can only be found through searching as a logged-in user, and every user can mark their account as invisible to searches. Friend requests must be approved by both parties.

The server relays changes between two users of a shared document using its own synchronization protocol, not the HTTP connection. Developer Martin Sevior described the protocol as very bandwidth-friendly, and said it was akin to a distributed version control system. As useful as it is, though, there are some limitations. AbiWord cannot simultaneously share a document via and over a peer-to-peer (TCP or XMPP) connection.


Sevior has said that online office suites like Google Office and Zoho are's main competition, but he believes that integrating sharing into the local desktop application offers a far superior work experience than that provided by an in-browser editor. AbiWord offers advanced editing features not found in any web application, such as control over margins, tabs, table positioning, footnotes, outlines, and math, he said.

Also, its standard menus and dialogs offer a better user experience than the JavaScript-created menus and dialogs implemented in a web editor — which are often modal, block user input, and can be difficult to activate with the mouse. Finally, he added, AbiWord can handle significantly larger documents without suffering from performance problems, while web browsers begin to struggle with 20 pages or more.

Some free software advocates criticized the site launch last week because the source code to the site is not free. Sevior and fellow developer Marc Maurer acknowledged the concern, but pointed out that the service was new. The team would like to find a way to make the site code free, but they also want to investigate ways to use it to raise funds to help support further development. Ideas include offering larger storage space for a fee and building a custom server for business use, but all of the ideas are just brainstorming at present.

In the meantime, it is still possible to use AbiWord to collaboratively edit documents with a peer-to-peer TCP or XMPP connection. The application does not know or care what network transport mechanism is being used; in fact work is well underway to use Telepathy as yet another editing session transport in a future release.

AbiWord has long been a solid word processing choice on the desktop, while Google Docs and other web suites get away with offering fewer editing and formatting features by making document sharing simple. AbiWord 2.8 with built-in real-time editing through is an attempt to do both. Whether it will catch on to the degree that in-browser editors have is anybody's guess, but one must not forget that AbiWord has the advantage of being completely cross-platform, which makes it an option for every computer, just like the web browser.

Comments (6 posted)

Gerrit: Google-style code review meets git

October 30, 2009

This article was contributed by Don Marti

Gerrit, a Git-based system for managing code review, is helping to spread the popular distributed revision control system into Android-using companies, many of which have heavy quality assurance, management, and legal processes around software. HTC, Qualcomm, TI, Sony Ericsson, and Android originator Google are all running Gerrit, project leader Shawn Pearce said in a talk at the October 2009 GitTogether event, hosted at Google in Mountain View.

The Gerrit story starts with the progressive escape of an in-house Google process and tool. Google requires code review for any change to company code or configuration files; there are a few exceptions, but those are subject to review after deployment. The code review process started out using lots of email, but for the past several years it has been automated. When Guido van Rossum, creator of the Python language, began working at Google in 2005, he started developing a tool, in Python naturally, to coordinate code reviews. The result, called Mondrian, lets users view the proposed change as a side-by-side comparison, and participate in comment threads attached anywhere in the code under review. An overview page shows a to-do list of incoming changes to review and reviewers' comments. Van Rossum presented Mondrian at a public talk in 2006. (video).

Mondrian has been a huge success inside Google, Pearce said. "Almost every engineer uses this as their daily thing." But Mondrian is heavily dependent on Google's internal infrastructure, including the in-house Bigtable non-relational table store and the proprietary Perforce revision control system. Google is a huge Perforce shop, and has built its own highly-customized IT infrastructure, including Perforce-dependent tools.

The first step in making a Mondrian-style tool available to a wider audience was van Rossum's 2008 release of Rietveld, which uses Subversion instead of Perforce, and the public interfaces of Google App Engine instead of Google internals. It's named for modern architect Gerrit Rietveld. As Google began the Android project, though, developers demanded a Mondrian-like tool for their codebase, tracked with Git. Google App Engine was a deal-breaker, because mobile hardware vendors working on Android-based products maintain internal repositories, and won't rely on an outside service.

Shawn Pearce, who previously reimplemented git in Java as JGit, and is now at Google, took on the project; the result is Gerrit Code Review, now used to track public proposed changes to Android. Android's applications are written in Java, so writing the new tool in that language should make it more accessible to would-be contributors among Android developers.

Gerrit runs a copy of the Mina SSH daemon, along with JGit, which is now maintained as part of the Eclipse EGit project. Although the combination is slower than original git over OpenSSH, it's fast enough for the Android developers. "The entire Android team uses this as their interface to Git," Pearce said. The server-side dependencies are Tomcat and an SQL database, which so far can be either MySQL, PostgreSQL, or H2. Gerrit uses OpenID for authentication by default, but can be configured to use HTTP basic (or digest) authentication, or Siteminder, a single-sign-on system from Computer Associates.

On the UI side, Gerrit uses Google Web Toolkit, an Apache-licensed project that compiles Java to JavaScript with AJAX functionality. The UI has a few tiny Flash widgets for convenience, - to copy Git command lines to the clipboard, for example - but Flash is not required. A user who prefers not to use the web interface can also ssh to the Gerrit server to execute commands. Gerrit doesn't enforce any particular processes to make git look more like the centralized revision control systems that spawned Mondrian and Rietveld. A Gerrit-using developer has a full git install and can still do distributed revision control tricks, such as cherry-picking from a newer upstream release. [Android workflow] Gerrit just guards access to its own repository. A developer can set up a git repository with "origin" pointing back to an ssh:// URL on the Gerrit server, and do something like centralized development, or do "drive-by" interactions with a Gerrit server like any other Git repository.

To propose a change for approval through Gerrit, a developer must start a branch in git for that change. Each change, and each iteration of a reworked change, becomes a new branch. In order to preserve information among successive versions of the same work, Gerrit includes a git hook to apply a "Change-Id" line to commit messages. After doing a git push to the Gerrit server, the developer can come back to the web dashboard and see the status of the pending change, then request a code review. Alternatively, a wrapper called Repo lets the developer specify a reviewer on the command line when doing the push.

Once a reviewer is lined up, Gerrit starts sending email, giving both the URL for the Gerrit page and a git command line for the reviewer to pull the change. On the change page, a reviewer can see the change side-by-side with the original or as a diff, and add review comments anywhere in the code along with a "cover sheet" message. Approval has multiple levels, with configurable access to the range that a reviewer can apply. Typically, an individual developer would be able to apply -1 or +1, which are "prefer you don't submit this" and "I like it," and some would have access to the -2 "do not submit" and +2 "Approved" levels. The web interface is not required--a reviewer can ssh to the Gerrit server to approve or reject a change.

A rejected and reworked change with a proper "Change-Id" line preserves Gerrit metadata, and the reviewer can see his or her original comments and the submitter's replies, join an existing comment thread on the previous, rejected version, or start new comment threads anywhere in the new version. If the change is not accepted, the new version has to be a new branch.

Kernel developer David Brown, at the Qualcomm Innovation Center, uses Git and Gerrit with his team. "The biggest complaint people have so far about Gerrit is people have to be constantly rebasing their changes," he said. However, the company has an extensive review process in order to make anything available under a free software license, and Gerrit streamlines the process of approving changes for the people who are authorized to check outgoing code. "The biggest thing that's changed since last year is Gerrit. The second biggest thing that's changed since last year is Gerrit," Brown said. But, he added, doing things the Gerrit way does work. "Most people learn a really small subset of git, I mean a really really small subset of git," he said.

Gerrit can be set up to automatically enforce some policies. "There's a lot of different work models people want," Pearce said. For example, Gerrit can be set up to enforce a check for a signed contributor agreement. The public Gerrit instance for Android enforces the contributor agreement requirement for all modules except the kernel, where only a "Signed-off-by" line is required. Gerrit can be integrated with a bug tracking system (BTS), but the integration is still based on site-specific tricks, since everyone is on a different bug tracker and nobody seems to like theirs very much. Besides better BTS integration, Pearce is looking at ways to store Gerrit metadata in git. "We'd like to do all the things that Gerrit does, offline," he said. "The fact that it doesn't work offline is a bug."

The Android developers are still figuring out how to connect with upstream. Staging maintainer Greg Kroah-Hartman plans to drop Android drivers from drivers/staging as of 2.6.33, as "no one wants to maintain them and help get them merged into the kernel," he said in email. Behind the apparent driver slowness are substantial corporate culture changes, though, with both Qualcomm and TI starting programs to manage outgoing code. Qualcomm is the lead sponsor of Code Aurora Forum, and TI is behind In the potential minefield that is the mobile industry, with considerations such as not offending carrier partners, securely supporting third-party applications, deploying codecs and GUI code without patent troubles, and complying with radio regulations, Gerrit seems to be a needed focus for gatekeeping efforts.

Comments (21 posted)

Page editor: Jonathan Corbet
Next page: Security>>

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds