By Jonathan Corbet
May 20, 2008
Developers in the Debian project had a busy week cleaning up after the
openssl vulnerability was disclosed. Once that was taken care of, they
moved on to process-related issues. Clearly, some shortcomings in how
Debian handles patches to the programs it ships have been revealed; now the
project would like to face those problems and make things work better in
the future. The resulting discussion shows Debian at its introspective
best, and may well have results that other distributors will want to pay
attention to. As a Fedora developer
noted:
"
This bug could easily have been us on the receiving end."
All distributors make changes to their packages, so all of them are
potentially exposed to this kind of failure.
Debian's packaging policy resembles that of most other distributions. A
Debian source package is supposed to contain a tarball of the upstream
source distribution, without changes. Any distribution-specific patches
are included separately and applied when the source package is prepared for
building. There are couple of Debian-specific issues to be faced, though:
- From the discussion, it seems that the "pristine upstream
tarball" rule is occasionally bent by developers. Sometimes there is
no alternative: some upstream source distributions contain material
which, due to its licensing, cannot be shipped by Debian. The
justification for other cases is not always quite as clear.
- Debian's patches are all mashed together and included as a single diff
file. So there is no metadata describing the patches, and they are
difficult to separate from each other. In this regard, Debian differs
from RPM-based distributions, which generally keep each patch separate.
The end result of all this is that Debian's patches are hard for others to
review, hard for upstream projects to consider, and even hard for other
Debian developers to get a handle on.
Raphaël Hertzog started a discussion
on how to improve this situation. A key part of his approach (and an idea
which others have been pursuing as well) is to make changes to the Debian
source package format which would make the nature of each patch explicit.
At a minimum, packagers would include a debian/patches directory
with the source; that directory would contain each patch, broken out into a
separate file. Some Debian packages are built this way already, though the
practice is far from universal.
Beyond that, though, it would be nice to have the source package itself
understand the patch stream and its associated metadata. There are a few
proposals for this; Raphaël favors the "3.0 (quilt)" format, which
keeps the patches (in a separate tarball) as a quilt series. This format
seems to have a certain amount of support; among other things, its
simplicity would make it easy for Debian developers to create packages in
this format without having to learn new tools. The quilt series file -
like the spec file used with RPM packages - makes it clear which patches
must be applied, and in which order.
There are other variants of the 3.0 source package format, though. The
"3.0 (git)" format contains a git repository containing the
upstream source and a series of patches to it. This approach has the
advantage of including the history of the patches along with the other
metadata; it could also, arguably, make it easier for other distributors
(and upstream) to cherry-pick patches of interest. On the other hand, a
git-based package format requires the availability of git and has the
potential to make those packages larger. The GitSrc FAQ has some more
information on this format; there's also a "3.0 (bzr)" format
variant out there.
Any of these new formats, if widely adopted, would bring a new level of
transparency to Debian's patching activities. It would enable the creation
of a "patches.debian.org" site (clearly inspired by patches.ubuntu.com) where anybody
could quickly look at the changes which have been made to any given
package. There are some developers who doubt the utility of this; they
worry that upstream developers won't want to poll a site to see what
changes have been made to their code. One developer at least (GNOME hacker
Vincent Untz) thinks that a
patches.debian.org site would be a step in the right direction, though.
Another quibble which has been heard is that Debian does not need any new
infrastructure for patch management. The right place for patch tracking,
it is said, is with the upstream project. Nobody seems to challenge the
claim that more patches need to go back upstream, but there is also the
fact that quite a few patches will never get there. The upstream
developers for a number of projects seem to have different goals and are
seen by the distribution maintainers as being overtly uncooperative. And
some patches - such as those removing non-free material - may not be
something that even cooperative upstream maintainers want. So there will
always be a need for distribution-specific patches; the "track it upstream"
approach will not solve the whole problem.
Meanwhile, Joey Hess brought a completely different
idea to the discussion: just treat every divergence from upstream as a
bug. Each patch would have a corresponding entry in the Debian bug
tracking system (BTS) with a special tag. Anybody could then query the
list of outstanding bugs, view the patches, and participate in the
associated discussion. Using the BTS brings some real technical
advantages, in that the system already exists. But, Joey says, the real benefit is elsewhere:
The biggest reason for using the BTS is not technical. It's that,
if we decide that the project will treat divergence from upstream
as a bug, then we've effectively decided that maintainers will be
responsible for both minimising unnecessary divergence,
communicating about it to upstream, and for keeping track of what
divergence exists. Because developers are responsible for their
bugs.
A separate patch tracking mechanism, instead, would be a mostly automatic
subsystem on the side which might not bring the same sort of pressure to
bear on developers.
The BTS approach is not universally acclaimed either. Some developers
claim that most Debian-specific patches are not really Debian bugs - they
are, instead, upstream bugs. Regardless of whether that is really true,
distribution bug trackers generally carry a great many entries which, in
the end, describe bugs in upstream packages. Another complaint is that
creating and maintaining BTS entries would be just another bit of
bureaucratic work imposed on Debian developers. Beyond any doubt, some
developers would see it that way.
But this may be a place where a bit more bureaucracy makes some sense. The
Linux distributors of the world (certainly not just Debian) are carrying
thousands of patches against the free programs they distribute. Making the
nature and extent of those patches more readily apparent can only be
beneficial for users, reviewers, distributors, and upstream maintainers.
One clear conclusion from recent events is that all distributors could do
more to let the rest of the community know about the changes they are making.
A distributor's ability to patch a program is a crucial part of the whole
ecosystem - it's the distributors' way of balancing their users' needs
against the upstream maintainer's policies. But distributors should be
clear about the changes they are making, willing to merge those changes
upstream whenever possible, and wanting feedback on those patches. Any
"bureaucracy" which helps to make that happen can only help our community
as a whole in ways that go far beyond the avoidance of another openssl
disaster.
One final note: the existence of source package formats which incorporate
distributed version control system repositories shows that developers have
been thinking about this problem for a while; it's not just a response to
recent events. There is an effort underway to think about what the
intersection of version control and packaging can really achieve for all
distributors; the folks working on this project can be found at vcs-pkg.org. They are working on organizing
a gathering this
September in Extremadura. Vcs-pkg is worth watching; it has the
potential to make things work better for developers and users of all
distributions.
(
Log in to post comments)