Courgette meets a dangerous (Red) Bend
As most LWN readers will know, Google is working on its own web browser, called Chrome. The Chrome developers seem to be focusing on speed, but they are also clearly putting significant thought into the security of the browser. That is a good thing: web browsers are a large, complex body of code which are directly exposed to whatever a web server might choose to throw at them. The complexity makes security-related bugs inevitable; the exposure makes them highly exploitable. Chrome's developers have come to the conclusion that, when security problems are found, they must be fixed as quickly as possible.
Prompt patching of bugs requires that they be identified and repaired as quickly as possible. But the repairs are not useful unless they get to the browser's users - all of them, or as close to that as possible. The Chrome developers worried that the sheer size of browser updates would make that goal harder to achieve. Massive updates take longer to download and install, are more likely to be interrupted in the middle, and greatly increase the strain on server bandwidth. Pushing out a fix for a severe zero-day problem might even tax the bandwidth resources of a company like Google, leaving users exposed for longer than they should be.
If the size of browser updates could be reduced significantly, it should become possible to update far more systems in less time. After looking at various ways to compress patches, the Chrome developers decided to create their own algorithm; the result was Courgette. This algorithm is based on the key observation that small changes at the source level tend to cascade into big changes in binary code; by taking a small step back toward the source, many of those changes can be abstracted back out.
In particular, Courgette tries to eliminate irrelevant changes to static pointers. Consider a simple example:
if (some_condition)
goto error_exit;
/* ... */
error_exit:
return -EYOULOSE;
As the program is built, error_exit turns into a specific location in the code. An irrelevant change elsewhere in the file can cause the location of error_exit to change; that, in turn, will change the final compiled form of the goto line even though that line has not changed. That changed address looks like a difference in the binary file; when this happens thousands of times over, the binary patch will become severely bloated.
Courgette works by finding static pointers in the code and turning them back into something that looks like a symbolic identifier. The new identifiers are generated in a way that ensures that they do not change if the underlying code has not changed. New versions of the binary (both before and after patching) are built using the replaced pointers; these reworked binaries can then be compared with a utility like bsdiff. Since addresses with unimportant changes have been replaced with consistent identifiers, the two binaries should be a lot closer to each other and the resulting diff should be much smaller.
How much smaller? In an example cited on chromium.org, a full update weighed in at some 10MB. Using bsdiff (which already shrinks binary diffs considerably) yielded a 700KB change, already a significant improvement. With Courgette, though, the diff is 78,848 bytes. In other words, the size of the update has been dropped to less than that of the unpleasant flash ad which probably decorates this article. That seems like an improvement worth having. It also seems like a technology that projects like deltarpm (which is bsdiff-based at its core) might want to take a look at.
Enter Red Bend Software and patent #6,546,552. For the curious, here is the first independent claim from that patent:
(a) scanning the old program and for substantially each reference entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark, whereby a modified old program is generated;
(b) scanning the new program and for substantially each reference entry perform steps that include:
(i) replacing the reference of said entry by a distinct label mark, whereby a modified new program is generated;
(c) generating said difference result utilizing directly or indirectly at least said modified old program and modified new program.
Even for patentese, this language tends toward the impenetrable. But once one realizes that "reference entries that contain reference that refer to other entries" means "addresses," it starts to become a little clearer. To your editor's overtly non-lawyerly, not-legal-advice reading, this claim does appear to describe what Courgette is doing.
Google is not dealing with a typical patent troll here; Red Bend is a company which manages over-the-air firmware updates for mobile carriers. The patent was applied for in 1999, and granted in 2003. This company may well be in a position to tell a sob story where its bread-and-butter patent is being stepped on by Google - a company which is now getting into the business of supplying firmware for mobile phones. On its face, this could certainly be made to look like just the sort of situation the patent system was created to deal with.
Of course, there may be prior art which invalidates this patent. But Google
may well find that it's cheaper and easier to just settle with Red Bend,
especially if, as Richard
Cauley argues, the amount of the settlement could be quite small.
Defeating a patent in court is a lengthy, expensive, and risky enterprise;
it would not be surprising if Google decided that it had better things to
do. The real question, in that case, is what sort of terms Google would
negotiate. If Google takes a
page from the Red Hat playbook, it will seek to get this patent
licensed for all free software implementations. That outcome would remove
this patent from consideration in the free software community and keep
Courgette free software. A back-room deal with undisclosed terms, instead,
could leave this useful technique unavailable for the next ten years.
