By Jonathan Corbet
November 2, 2009
Back in July, your editor stumbled across
Google's
Courgette announcement and promptly added it to the LWN topic slush
pile. He then promptly let it sit for three months or so. The
news
that this software is now the subject of a patent suit brought Courgette
back to the foreground; here we'll look at what Courgette is for, how it
works, and how it relates to the patent being asserted.
As most LWN readers will know, Google is working on its own web browser,
called Chrome. The Chrome
developers seem to be focusing on speed, but they are also clearly putting
significant thought into the security of the browser. That is a good
thing: web browsers are a large, complex body of code which are directly
exposed to whatever a web server might choose to throw at them. The
complexity makes security-related bugs inevitable; the exposure makes them
highly exploitable. Chrome's developers have come to the conclusion that,
when security problems are found, they must be fixed as quickly as
possible.
Prompt patching of bugs requires that they be identified and repaired as
quickly as possible. But the repairs are not useful unless they get to the
browser's users - all of them, or as close to that as possible. The Chrome
developers worried that the sheer size of browser updates would make that
goal harder to achieve. Massive updates take longer to download and
install, are more likely to be interrupted in the middle, and greatly
increase the strain on server bandwidth. Pushing out a fix for a severe
zero-day problem might even tax the bandwidth resources of a company like
Google, leaving users exposed for longer than they should be.
If the size of browser updates could be reduced significantly, it should
become possible to update far more systems in less time. After looking at
various ways to compress patches, the Chrome developers decided to create
their own algorithm; the result was Courgette.
This algorithm is based on the key observation that small changes at the
source level tend to cascade into big changes in binary code; by taking a
small step back toward the source, many of those changes can be abstracted
back out.
In particular, Courgette tries to eliminate irrelevant changes to static
pointers. Consider a simple example:
if (some_condition)
goto error_exit;
/* ... */
error_exit:
return -EYOULOSE;
As the program is built, error_exit turns into a specific location
in the code. An irrelevant change elsewhere in the file can cause the
location of error_exit to change; that, in turn, will change the
final compiled form of the goto line even though that line has not
changed. That changed address looks like a difference in the binary file;
when this happens thousands of times over, the binary patch will become
severely bloated.
Courgette works by finding static pointers in the code and turning them
back into something that looks like a symbolic identifier. The new
identifiers are generated in a way that ensures that they do not change if
the underlying code has not changed. New versions of the binary (both
before and after patching) are built using the replaced pointers; these
reworked binaries can then be compared with a utility like bsdiff. Since addresses with
unimportant changes have been replaced with consistent identifiers, the two
binaries should be a lot closer to each other and the resulting diff should
be much smaller.
How much smaller? In an example cited on chromium.org, a full update
weighed in at some 10MB. Using bsdiff (which already shrinks binary diffs
considerably) yielded a 700KB change, already a significant improvement.
With Courgette, though, the diff is 78,848 bytes. In other words, the size
of the update has been dropped to less than that of the unpleasant flash ad
which probably decorates this article. That seems like an improvement
worth having. It also seems like a technology that projects like deltarpm (which is bsdiff-based at
its core) might want to take a look at.
Enter Red Bend Software and patent
#6,546,552. For the curious, here is the first independent claim from
that patent:
A method for generating a compact difference result between an old
executable program and a new executable program; each program
including reference entries that contain reference that refer to
other entries in the program; the method comprising the steps of:
(a) scanning the old program and for substantially each reference
entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark,
whereby a modified old program is generated;
(b) scanning the new program and for substantially each reference entry
perform steps that include:
(i) replacing the reference of said entry by a distinct label mark,
whereby a modified new program is generated;
(c) generating said difference result utilizing directly or
indirectly at least said modified old program and modified new
program.
Even for patentese, this language tends toward the impenetrable. But once
one realizes that "reference entries that contain reference that refer to
other entries" means "addresses," it starts to become a little clearer. To
your editor's overtly non-lawyerly, not-legal-advice reading, this claim
does appear to describe what Courgette is doing.
Google is not dealing with a typical patent troll here; Red Bend is a
company which manages over-the-air firmware updates for mobile carriers.
The patent was applied for in 1999, and granted in 2003. This company may
well be in a position to tell a sob story where its bread-and-butter patent
is being stepped on by Google - a company which is now getting into the
business of supplying firmware for mobile phones. On its face, this could
certainly be made to look like just the sort of situation the patent system
was created to deal with.
Of course, there may be prior art which invalidates this patent. But Google
may well find that it's cheaper and easier to just settle with Red Bend,
especially if, as Richard
Cauley argues, the amount of the settlement could be quite small.
Defeating a patent in court is a lengthy, expensive, and risky enterprise;
it would not be surprising if Google decided that it had better things to
do. The real question, in that case, is what sort of terms Google would
negotiate. If Google takes a
page from the Red Hat playbook, it will seek to get this patent
licensed for all free software implementations. That outcome would remove
this patent from consideration in the free software community and keep
Courgette free software. A back-room deal with undisclosed terms, instead,
could leave this useful technique unavailable for the next ten years.
(
Log in to post comments)