By Nathan Willis
September 11, 2013
It is pretty easy to get a new project up and running with Git, but
integrating Git—or any other new version control
system—can be painful for an existing project with an
established code base. Such is the case for Debian, with its tens of
thousands of packages spread across multiple versions of the distribution. Migrating
Debian to a Git-based version-control system would be a herculean
ordeal, if that was even a task that the project was interested in
undertaking. But Ian Jackson recently unveiled a new tool that serves
as a bridge between the official Debian archives and a Git repository,
thus allowing developers to use a Git workflow while remaining fully
integrated with the archive.
The tool is called dgit; Jackson announced version 0.7, the first
"suitable for alpha and beta testers", on August 22. The
concept behind it was hashed out during DebConf13 in mid-August. As
Jackson explained it, the goal was to allow package maintainers and
developers to use "a gitish workflow" if they so desired,
including working with upstream Git repositories and preserving Git
histories, but without forcing a Git-based workflow on anyone who was
happier using the status quo.
The bird's-eye view of dgit is that it treats the Debian archive
(which contains all of the packages that make up a Debian release) as
if it were a remote Git repository. A developer can clone or fetch a
package from the archive, commit and merge changes, and push updates
to the package, all using dgit commands that mirror, in most
ways, the offerings of Git itself. But this functionality is on
demand; if no developer dgit clones a particular package,
there is no Git view of it created—doing so automatically for
every Debian package would consume far too many resources.
Thus, there is quite a bit of work going on behind the scenes to
keep the archive and the dgit view of the package in sync. When a
developer uses:
dgit clone foopackage sid
for example,
dgit initializes a Git repository on Debian's Alioth server, pulls in
the contents of foopackage from the sid
distribution, then constructs the local repository on the developer's
machine. The developer can then use normal Git tools (raw
command-line or otherwise) as desired. When it is time to upload
changes, dgit sbuild constructs the source package.
Then, a dgit push both pushes the current HEAD to
the remote Git repository on Alioth and uploads the source package to
the Debian archive.
Where things get more difficult are those situations when a package
is modified outside of changes made directly on the dgit local branch,
such as with a set of patches. The tool includes a
dgit quilt-fixup command to integrate with the
quilt patch manager
(which lets maintainers keep track of a set of patches that need to be
applied before each upload). The quilt-fixup command creates a
"synthetic commit" which is then added to the Git history
before the package is pushed. However, as Jackson noted in the man
page and on the debian-devel mailing list, this is an imperfect
solution.
Jackson pointed out some peculiarities of quilt that make it
incompatible (at least for the time being) with dgit. For example,
when one uses dpkg-source
to build a source package in Debian's quilt-compatible format, if the result is then
extracted (again using dpkg-source), the contents are not identical to
the original—specifically, there are extra metadata files
generated. This makes it difficult to use quilt to apply a set of
patches and push the results with dgit, so Jackson recommended
steering clear of quilt-formatted source packages altogether.
On the mailing list, Raphael Hertzog took some umbrage at Jackson's
description of this issue as "brain damage" on quilt's part. In the
ensuing discussion, Hertzog and Jackson eventually reached an
impasse. The disagreement boils down to what is considered the
"normal" workflow—specifically, how a developer should manage
both local changes and a set of quilt-managed patches. Hertzog
contends that developers should record their own local changes as a
separate patch in quilt, while Jackson believes local changes should
be orthogonal to those patches managed in quilt. But when using
Jackson's workflow, quilt copes with the local changes by adding
additional metadata, in the form of those the extra files seen by
dkpg-source.
In any case, Jackson
eventually decided to simply work around the oddities that result from
trying to use quilt and dgit together. It is certainly possible for a
developer to use dgit without worrying about the issue, merely by not
bringing quilt into the mix. Of course, asking a developer to start
using a different workflow is rarely a welcome suggestion, but there
is hope that the distinctions will eventually be smoothed over.
There are some other limitations, however. For now, dgit is only
usable by official Debian Developers (DDs); non-DDs cannot even
create a read-only view of a dgit repository. This is due to the
access control setup deployed on the Debian servers; it may be
resolved in the future when Jackson and the system administrators have
sufficient time.
Hertzog also inquired whether there
might be any lessons to learn from Ubuntu's Distributed
Development (UDD) project, which automatically imported all packages in
the Ubuntu and Debian archives into repositories for use with Bazaar.
"Automatic" import is in many ways wishful thinking; as several
reported, Ubuntu found that there are a variety of special cases that
dictate manual intervention to repair an imported package, and it can
be problematic to get the full commit history of each
package—which can involve upstream changes, patches, and commits
made by individual developers. Ubuntu had it easier than Debian
because UDD was limited to a single, Bazaar-based workflow. Since
Debian is (at least for the foreseeable future) committed to giving
its developers and maintainers the freedom to use any workflow they
wish, deploying something like dgit for the entire Debian package
archive would probably require more people-power than the project has.
No doubt many interesting things could be done with the
availability of a Git repository containing the entire Debian archive,
and accessible to the world. Dgit is not likely to reach that stage
any time soon, but, as Jackson pointed out, he wanted something that he
could deploy and use immediately. And it is clearly good news that
Debian developers can begin using dgit now; Git has proven itself to
be the version-control system of choice in free software at large, so
integrating it with one of the premiere free software distributions is
sure to reap benefits for developers and Debian users alike.
(
Log in to post comments)