Back in July, your editor stumbled across Google's
and promptly added it to the LWN topic slush
pile. He then promptly let it sit for three months or so. The news
that this software is now the subject of a patent suit brought Courgette
back to the foreground; here we'll look at what Courgette is for, how it
works, and how it relates to the patent being asserted.
As most LWN readers will know, Google is working on its own web browser,
called Chrome. The Chrome
developers seem to be focusing on speed, but they are also clearly putting
significant thought into the security of the browser. That is a good
thing: web browsers are a large, complex body of code which are directly
exposed to whatever a web server might choose to throw at them. The
complexity makes security-related bugs inevitable; the exposure makes them
highly exploitable. Chrome's developers have come to the conclusion that,
when security problems are found, they must be fixed as quickly as
Prompt patching of bugs requires that they be identified and repaired as
quickly as possible. But the repairs are not useful unless they get to the
browser's users - all of them, or as close to that as possible. The Chrome
developers worried that the sheer size of browser updates would make that
goal harder to achieve. Massive updates take longer to download and
install, are more likely to be interrupted in the middle, and greatly
increase the strain on server bandwidth. Pushing out a fix for a severe
zero-day problem might even tax the bandwidth resources of a company like
Google, leaving users exposed for longer than they should be.
If the size of browser updates could be reduced significantly, it should
become possible to update far more systems in less time. After looking at
various ways to compress patches, the Chrome developers decided to create
their own algorithm; the result was Courgette.
This algorithm is based on the key observation that small changes at the
source level tend to cascade into big changes in binary code; by taking a
small step back toward the source, many of those changes can be abstracted
In particular, Courgette tries to eliminate irrelevant changes to static
pointers. Consider a simple example:
/* ... */
As the program is built, error_exit turns into a specific location
in the code. An irrelevant change elsewhere in the file can cause the
location of error_exit to change; that, in turn, will change the
final compiled form of the goto line even though that line has not
changed. That changed address looks like a difference in the binary file;
when this happens thousands of times over, the binary patch will become
Courgette works by finding static pointers in the code and turning them
back into something that looks like a symbolic identifier. The new
identifiers are generated in a way that ensures that they do not change if
the underlying code has not changed. New versions of the binary (both
before and after patching) are built using the replaced pointers; these
reworked binaries can then be compared with a utility like bsdiff. Since addresses with
unimportant changes have been replaced with consistent identifiers, the two
binaries should be a lot closer to each other and the resulting diff should
be much smaller.
How much smaller? In an example cited on chromium.org, a full update
weighed in at some 10MB. Using bsdiff (which already shrinks binary diffs
considerably) yielded a 700KB change, already a significant improvement.
With Courgette, though, the diff is 78,848 bytes. In other words, the size
of the update has been dropped to less than that of the unpleasant flash ad
which probably decorates this article. That seems like an improvement
worth having. It also seems like a technology that projects like deltarpm (which is bsdiff-based at
its core) might want to take a look at.
Enter Red Bend Software and patent
#6,546,552. For the curious, here is the first independent claim from
A method for generating a compact difference result between an old
executable program and a new executable program; each program
including reference entries that contain reference that refer to
other entries in the program; the method comprising the steps of:
(a) scanning the old program and for substantially each reference
entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark,
whereby a modified old program is generated;
(b) scanning the new program and for substantially each reference entry
perform steps that include:
(i) replacing the reference of said entry by a distinct label mark,
whereby a modified new program is generated;
(c) generating said difference result utilizing directly or
indirectly at least said modified old program and modified new
Even for patentese, this language tends toward the impenetrable. But once
one realizes that "reference entries that contain reference that refer to
other entries" means "addresses," it starts to become a little clearer. To
your editor's overtly non-lawyerly, not-legal-advice reading, this claim
does appear to describe what Courgette is doing.
Google is not dealing with a typical patent troll here; Red Bend is a
company which manages over-the-air firmware updates for mobile carriers.
The patent was applied for in 1999, and granted in 2003. This company may
well be in a position to tell a sob story where its bread-and-butter patent
is being stepped on by Google - a company which is now getting into the
business of supplying firmware for mobile phones. On its face, this could
certainly be made to look like just the sort of situation the patent system
was created to deal with.
Of course, there may be prior art which invalidates this patent. But Google
may well find that it's cheaper and easier to just settle with Red Bend,
especially if, as Richard
Cauley argues, the amount of the settlement could be quite small.
Defeating a patent in court is a lengthy, expensive, and risky enterprise;
it would not be surprising if Google decided that it had better things to
do. The real question, in that case, is what sort of terms Google would
negotiate. If Google takes a
page from the Red Hat playbook, it will seek to get this patent
licensed for all free software implementations. That outcome would remove
this patent from consideration in the free software community and keep
Courgette free software. A back-room deal with undisclosed terms, instead,
could leave this useful technique unavailable for the next ten years.
Comments (65 posted)
The multi-platform, open source word processor AbiWord was updated to
version 2.8 last week, debuting several new editing features, most notably
expanded real-time collaboration support. AbiWord's collaboration
capabilities are designed to work on top of a variety of underlying
transport mechanisms, but the project is highlighting its AbiCollab.net web service, which not
only allows peer-to-peer collaboration, but group membership and other
social networking features.
AbiWord is a standalone word processor, and thus has significantly lower
disk and memory footprints than OpenOffice.org, which bundles word
processor, spreadsheet, presenter, and several other office applications
together. In fact, it is the word processor shipped by the One Laptop Per
Child project on its modestly-powered XO laptops. It is built using GTK,
but like most modern applications runs on all Linux desktop environments.
The new release was made on October 27, for Linux, Windows and Mac OS X.
Linux users are encouraged
to get binaries through their distribution's package manager, or consult
the wiki for finding third-party packages.
What's new: vector graphics, annotations, and punctuation education
Version 2.8 introduces annotation support, with which users can attach
comments to portions of document text. The annotations are visible as
pop-ups when the cursor moves over the annotated text, and can also be
optionally displayed in the footer of each page. It also adds a flexible
multi-page view, allowing the user to see as much of his or her document as
fits on screen — not as a preview image, but as an open,
AbiWord also supports the use of SVG and WMF graphics inside a document,
and now uses the Cairo rendering engine for greatly increased quality
— on screen as well as printed. Previous releases converted SVG
images on import, resulting in quality degradation. Similarly, according
to the release notes, previous versions of AbiWord had a broken
implementation of "educating quotes" — the process to automatically convert
basic, straight "dumb quotes" into aesthetically curved "smart quotes" — but
the feature has finally been fixed for 2.8.
Import and export of other file formats has also improved, including
TeX, ODT, the S5
presentation format, and Microsoft DOCX — a project which the AbiWord
team mentored a student during this year's Google Summer of Code. The code
clean-up that included the aforementioned Cairo support also replaced the
now deprecated gnome-print printing library with the preferred GTK
In spite of its goal to remain a lean word processor, AbiWord does
support some cross-application features common to full office suites.
AbiWord documents can be embedded into other applications with the GTK
AbiWidget, and AbiWord can now embed Gnumeric spreadsheets within its own
documents. Both features received updates in this release.
Finally, the most talked-about change in 2.8 is the substantial update
to AbiWord's collaborative editing feature. Collaborative editing was
introduced in the 2.6.x code base, with the ability for two AbiWord
instances to directly connect to each other over TCP for a shared editing
session, or to connect through an XMPP server. 2.8 marks the debut of a
free web service called AbiCollab.net, which functions as a connecting
point for AbiWord sessions, and as an online document storage service.
Collaborating with AbiCollab.net
AbiCollab.net provides free user accounts that come with 25M of document
storage. In addition to storing the contents, the site retains a full
version history that can roll back the document to a previous state. It
also supports export to the AbiWord, ODT, RTF, PDF, HTML, plain text, and
DOC formats, has a tagging system intended to help users more easily find
their documents, and password-protected RSS feeds for monitoring changed
files. Users can create a blank document on the site, upload an existing
document, or activate AbiCollab.net sharing on an open document from
AbiWord's Collaborate menu.
Those features amount to an online storage service, though;
AbiCollab.net's real advantage is that it allows real-time collaborative
editing without the hassle of directly connecting two applications by IP
address. Site users can share documents with other users or make them
globally-accessible. Sharing includes a read-only option as well as full
read-write permission, on a document-by-document basis.
There are two ways to connect to other users on the site — adding
them individually as friends in traditional social networking style, and by
group. Users can set up their own groups at will, and group owners can
manage group membership and set administration privileges for members. The
site is still structured around the documents, however — there are no
status updates, profile pages, or other social elements. Preserving
privacy is also important; potential friends can only be found through
searching as a logged-in user, and every user can mark their account as
invisible to searches. Friend requests must be approved by both
The AbiCollab.net server relays changes between two users of a shared
document using its own synchronization protocol, not the HTTP connection.
Developer Martin Sevior described the protocol as very bandwidth-friendly,
and said it was akin to a distributed version control system. As useful as
it is, though, there are some limitations. AbiWord cannot simultaneously
share a document via AbiCollab.net and over a peer-to-peer (TCP or XMPP)
Sevior has said that online office suites like Google Office and Zoho are AbiCollab.net's main competition,
but he believes that integrating sharing into the local desktop application
offers a far superior work experience than that provided by an in-browser
editor. AbiWord offers advanced editing features not found in any web
application, such as control over margins, tabs, table positioning,
footnotes, outlines, and math, he said.
Also, its standard menus and dialogs offer a better user experience than
— which are often modal, block user input, and can be difficult to
activate with the mouse. Finally, he added, AbiWord can handle
significantly larger documents without suffering from performance problems,
while web browsers begin to struggle with 20 pages or more.
Some free software advocates criticized the AbiCollab.net site launch
last week because the source code to the site is not free. Sevior and
fellow developer Marc Maurer acknowledged the concern, but pointed out that
the service was new. The team would like to find a way to make the site
code free, but they also want to investigate ways to use it to raise funds
to help support further
development. Ideas include offering larger storage space for a fee and
building a custom server for business use, but all of the ideas are just
brainstorming at present.
In the meantime, it is still possible to use AbiWord to collaboratively
edit documents with a peer-to-peer TCP or XMPP connection. The application
does not know or care what network transport mechanism is being used; in
fact work is well underway to use Telepathy as yet another editing session
transport in a future release.
AbiWord has long been a solid word processing choice on the desktop,
while Google Docs and other web suites get away with offering fewer editing
and formatting features by making document sharing simple. AbiWord 2.8
with built-in real-time editing through AbiCollab.net is an attempt to do
both. Whether it will catch on to the degree that in-browser editors have is
anybody's guess, but one must not forget that AbiWord has the advantage of
being completely cross-platform, which makes it an option for every
computer, just like the web browser.
Comments (6 posted)
Gerrit, a Git-based system for managing code
review, is helping to spread the popular distributed
revision control system into Android-using companies,
many of which have heavy quality assurance, management, and legal processes
around software. HTC, Qualcomm, TI, Sony Ericsson,
and Android originator Google are all running Gerrit,
project leader Shawn Pearce said in a talk at the October 2009
hosted at Google in Mountain
The Gerrit story starts with the progressive escape of
an in-house Google process and tool. Google requires
code review for any change to company code or
configuration files; there are a few exceptions, but those are
subject to review after deployment. The code review
process started out using lots of email, but for the
past several years it has been automated. When Guido van
Rossum, creator of the Python language, began working
at Google in 2005, he started developing a tool,
in Python naturally, to coordinate code reviews.
The result, called Mondrian, lets users view the
proposed change as a side-by-side comparison, and
participate in comment threads attached
anywhere in the code under review. An overview
page shows a to-do list of incoming changes
to review and reviewers' comments. Van Rossum presented
Mondrian at a public talk in 2006. (video).
Mondrian has been a huge success inside
Google, Pearce said. "Almost every engineer
uses this as their daily thing." But
Mondrian is heavily dependent on Google's
internal infrastructure, including the in-house Bigtable
non-relational table store and the proprietary Perforce revision
control system. Google is a huge Perforce shop, and
has built its own highly-customized IT infrastructure,
including Perforce-dependent tools.
The first step in making a Mondrian-style
tool available to a wider audience was van
Rossum's 2008 release of Rietveld,
which uses Subversion instead of Perforce, and the
public interfaces of Google App Engine instead of
Google internals. It's named for modern architect Gerrit
Rietveld. As Google began the Android
project, though, developers demanded a Mondrian-like
tool for their codebase, tracked with Git. Google App
Engine was a deal-breaker, because mobile hardware
vendors working on Android-based products maintain
internal repositories, and won't rely on an outside
Shawn Pearce, who previously reimplemented
git in Java as JGit, and is now at Google,
took on the project; the result is Gerrit
Code Review, now used to track public proposed
changes to Android. Android's applications are written
in Java, so writing the new tool in that
language should make it more accessible to would-be
contributors among Android developers.
Gerrit runs a copy of the Mina
SSH daemon, along with JGit, which
is now maintained as part of the Eclipse EGit project.
Although the combination is slower than original
git over OpenSSH, it's fast enough for the Android
developers. "The entire Android team uses this as
their interface to Git," Pearce said. The server-side
dependencies are Tomcat and an SQL database, which
so far can be either MySQL, PostgreSQL, or H2. Gerrit
uses OpenID for authentication by default, but can
be configured to use HTTP basic (or digest)
authentication, or Siteminder, a single-sign-on system
from Computer Associates.
On the UI side, Gerrit uses Google
Web Toolkit, an Apache-licensed project that
The UI has a few tiny Flash widgets for convenience,
- to copy Git command lines to the clipboard, for example - but Flash
is not required. A user who prefers not to use the
web interface can also ssh to the Gerrit server to
execute commands. Gerrit doesn't enforce any particular processes
to make git look more like the centralized revision
control systems that spawned Mondrian and Rietveld. A
Gerrit-using developer has a full git install and
can still do distributed revision control tricks,
such as cherry-picking from a newer upstream release.
Gerrit just guards access to its own repository.
A developer can set up a git repository with "origin"
pointing back to an ssh:// URL on the Gerrit server,
and do something like centralized development, or do
"drive-by" interactions with a Gerrit server like
any other Git repository.
To propose a change for approval through Gerrit,
a developer must start a branch in git for that
change. Each change, and each iteration of a
reworked change, becomes a new branch. In order to
preserve information among successive versions of
the same work, Gerrit includes a git hook to apply a
"Change-Id" line to commit messages. After doing
git push to the Gerrit server, the
developer can come back to the web dashboard and
see the status of the pending change, then request
a code review. Alternatively, a wrapper called Repo
lets the developer specify a reviewer on the command
line when doing the push.
Once a reviewer is lined up, Gerrit starts sending
email, giving both the URL for the Gerrit page and
a git command line for the reviewer to pull the
change. On the change page, a reviewer can see the
change side-by-side with the original or as a diff,
and add review comments anywhere in the code along
with a "cover sheet" message. Approval has multiple
levels, with configurable access to the range that
a reviewer can apply. Typically, an individual
developer would be able to apply -1 or +1, which
are "prefer you don't submit this" and "I like it,"
and some would have access to the -2 "do not submit"
and +2 "Approved" levels. The web interface is not
required--a reviewer can ssh to the Gerrit server to
approve or reject a change.
A rejected and reworked change with a proper
"Change-Id" line preserves Gerrit metadata, and the
reviewer can see his or her original comments and
the submitter's replies, join an existing comment
thread on the previous, rejected version, or start
new comment threads anywhere in the new version.
If the change is not accepted, the new version has to
be a new branch.
Kernel developer David Brown, at the
Qualcomm Innovation Center, uses Git and Gerrit with
his team. "The biggest complaint people have so far
about Gerrit is people have to be constantly rebasing
their changes," he said. However, the company has an
extensive review process in order to make anything
available under a free software license, and Gerrit
streamlines the process of approving changes for the
people who are authorized to check outgoing code.
"The biggest thing that's changed since last year
is Gerrit. The second biggest thing that's changed
since last year is Gerrit," Brown said. But,
he added, doing things the Gerrit way does work.
"Most people learn a really small subset of git,
I mean a really really small subset of git," he said.
Gerrit can be set up to automatically enforce
some policies. "There's a lot of different work
models people want," Pearce said. For example,
Gerrit can be set up to enforce a check for a signed
contributor agreement. The public Gerrit instance for
Android enforces the contributor agreement requirement for all modules
except the kernel, where only a "Signed-off-by"
line is required. Gerrit can be integrated with a bug
tracking system (BTS), but the integration is still based on
site-specific tricks, since everyone is on a different
bug tracker and nobody seems to like theirs very much.
Besides better BTS integration, Pearce is looking at
ways to store Gerrit metadata in git. "We'd like to
do all the things that Gerrit does, offline," he said.
"The fact that it doesn't work offline is a bug."
The Android developers are still figuring out how to
connect with upstream. Staging maintainer Greg
Kroah-Hartman plans to drop Android drivers
from drivers/staging as of 2.6.33, as "no one wants
to maintain them and help get them merged into the
kernel," he said in email. Behind the apparent driver
slowness are substantial corporate culture changes,
though, with both Qualcomm and TI starting programs to
manage outgoing code. Qualcomm is the lead sponsor of
Aurora Forum, and TI is behind OmapZoom.org. In the
potential minefield that is the mobile industry, with
considerations such as not offending carrier partners,
securely supporting third-party applications,
deploying codecs and GUI code without patent troubles,
and complying with radio regulations, Gerrit seems
to be a needed focus for gatekeeping efforts.
Comments (21 posted)
Page editor: Jonathan Corbet
Next page: Security>>