By Nathan Willis
August 21, 2013
Updates to existing packages can occasionally introduce regression
bugs, which cause considerable turmoil when they hit all of a large
distribution's users at the same time. Ubuntu quietly introduced a
new mechanism in its 13.04 release that progressively rolls out
package updates, pushing each update to a small subset of the total user base first, then steadily scaling up, rather
than publishing the update for everyone simultaneously. "Phased updates" (as they
are known) are designed to catch and
revert buggy package updates before they are propagated out to the
entire user community. On
the server side, the distribution monitors crash reports in order to
decide whether each roll out should continue or be stopped for
repair. The client-side framework has been in place since the release
of 13.04, but updates themselves only started phasing in August when
all of the server-side components were ready.
Canonical's Brian Murray wrote an
introduction to the new roll-out mechanism on his blog shortly after the system went
live. The system applies to stable release
updates (SRUs) only. SRUs are updates from the main Ubuntu
repositories that by definition are supposed to ship with a
"high degree of stability" and fix critical bugs—in
contrast, for example, to backport
updates, which can introduce new features from upstream releases and
are not supported by Canonical.
On the client end, phased updates are implemented in the
update-manager tool, which is Ubuntu's graphical update
installation application. The other methods for updating a package,
such as apt-get, are not affected by the phased update plan.
The rationale is that a user using apt-get to update a
package is expressing a conscious intent to install the new version.
update-manager, in contrast, periodically checks the Ubuntu
package repositories in the background for new updates, so it is a
passive tool.
update-manager generates a random number between zero and
one for each package, then compares it to the
Phased-Update-Percentage value published on the server for
that package. If update-manager's
generated number is less than the published percentage, then the
package will be added to the list of available updates that the user can install. Dependencies for a
package are pulled in automatically; if users are in the update group
for foo, they do not also have to "re-roll the dice" (so to
speak) and wait for libfoo-common as well.
As is probably obvious, controlling the value of
Phased-Update-Percentage throttles the speed at which an
update rolls out. Currently, whenever a new package update is
published, the Phased-Update-Percentage begins at 10%. The
update percentage is incremented by 10% every six hours if nothing
goes wrong, so a complete roll-out takes 54 hours to ramp up to 100%
availability.
Alternatively, if something does go wrong with an update, the
percentage can be dialed back all the way to zero, at which point the
update can be pulled from the repository then debugged to catch and
repair whatever regressions it introduced.
Regressions are counted based on the number of reports generated by
Ubuntu's crash reporter Apport. Apport gathers
system data for each crash (stack traces, core dumps, environment
variables, system metadata, etc.) and after getting the user's consent,
sends a report in to Launchpad. All reports are logged on the Ubuntu
error tracker; when a newly
released update triggers error reports that were not present with the
previous version of the package, the Ubuntu bug squad will pull the
update. When an update is pulled, both the package signer and the
package uploader (who may, of course, be the same person) are notified
via email.
In addition to the error tracker, the phased update process is
exposed through several other Ubuntu services. The current update
percentage is tracked on the publishing-history page for each package
(a page which was already used to publication data and status
information for each package). There is also a phased update overview
page where one can see the current status of every SRU in the
phasing process.
At the moment, the overview page only has data going back until
August 7 (two weeks ago as of press time), so naturally there are only
a handful of SRUs included. There are currently three updates at the
90% level, five at 80%, and two at 0%—indicating that they have
been pulled. Those packages are the BAMF support library for Unity
and—perhaps ironically—Apport. Ironic or not, the "Problems" column of the
overview page links to the error reports for the package in question.
For privacy reasons, the individual reports are only visible to
approved members of the bug-triaging team. In an email, Murray said
that the phased update system has caught five distinct regressions
since its launch on August 7, and that nine package updates have
progressed completely to the 100% distribution phase.
Five regressions caught may not seem like many, but in the
context of Ubuntu's large installed user base, catching them before
they are distributed to the entire community is likely to have averted
several thousand application crashes. In his blog post on phased
updates, Murray commented that the system supports some corner cases,
such as not stopping an update if the team knows that the crashes it
sees were not introduced by the update itself. He also pointed out
that the system is new, so the team is still experimenting with the
various parameters (such as the speed of roll-out itself and the
utilities used to detect regressions introduced by a package).
The other interesting dimension of the system is that the subset of
users who get access to the updated package at each phase is a random
sample. That should ensure that the error reports come from a more
statistically valid set of machines than, say, a self-selected "early
adopter" group or a set of customers paying for first access.
The notion of being the first person to test out an update may make
some users uncomfortable (at least some in the comments on Murray's
blog post suggested as much), but it is important to remember that the
updates being phased in are the SRUs, not experimental updates. SRUs are
already required to go through a testing and sign-off process, so they
should be stable; the fact that there are sometimes still errors and
regressions is simply a fact of life in the software world.
Nevertheless, Murray's post says it is possible to opt-out of the
phased update system entirely by adding a directive to
/etc/apt/apt.conf. Opting out means that
update-manager will only report updates as available when
they reach the 100% phase, by which point they should be more
error-free. Alternatively, the impatient can simply user
apt-get, and install all updates immediately.
(
Log in to post comments)