Distributions
Bringing Git workflows to Debian with dgit
When introducing his talk at DebConf 2015 in Heidelberg, Ian Jackson said he was there to "plug" dgit, which is a system that lets users treat the Debian archive as a remote Git repository. In fact, of course, dgit has already proven itself popular among Debian Maintainers (DMs) and other users, since it allows Git-based workflows for patching and uploading Debian packages—and does so without disrupting Debian's existing infrastructure. But there are still quite a few DMs who do not use dgit for the packages they maintain, so Jackson made a case that the tool enriches Debian as a whole the more it is used.
Debian volunteers take on a variety of different roles, he said, which boil down to "package maintainer" (i.e., DM) and "everything else" (a category that covers those people doing bug squashing, downstream projects and derivatives, users, and those doing non-maintainer uploads or NMUs). Dgit offers different advantages for DMs than for others, he said, so he addressed the groups separately.
Dgit for non-maintainers
For people in the "everyone else" group, the point of dgit is that they can access the archive like a repository. A user can clone any package in any suite (e,g, "stable," "unstable," or "experimental") and will get a source tree that is identical to the output of dpkg-source -x. This result is the same regardless of any choices made by the package maintainer (e.g., preferred packaging format, Git workflow, etc.). As Jackson addressed from time to time, adhering to that principle of identical output affects how dgit behaves in a number of key situations.
![Ian Jackson [Ian Jackson at DebConf]](https://static.lwn.net/images/2015/09-debconf-jackson-sm.jpg)
With package source fetched through dgit, Jackson said, the user can then work with the code exactly as they would with any other Git project: local commits, cherry-picking changes from other branches, resetting, cleaning, rebasing, and all of the other "gittish stuff" is supported. Actually, he added, git log and git blame are a bit different, but he would explain why during the talk.
For typical tasks, such as creating patches, users would use Git itself. Users can push their changes to any Git server that they have access to—although pushing to the dgit server does not automatically pass changes through to the Debian archive. Only a user with the proper permissions—namely, a DM or a Debian Developer (DD)—can do that. That rule guarantees that what is fetched with dgit is always identical to what resides in the archive.
But because a source tree fetched with dgit can then be pushed to any other remote Git server, downstream projects and derivative distributions can use dgit and do away with manually wrangling all of Debian's source packages—downloading and importing them into a version-control system in particular.
It is important to understand, he said, that dgit does not replace any build infrastructure (though it does provide wrappers for several common Debian package-building tools). Even DMs using dgit still have to perform builds before they upload a new binary package to the archive. As Debian improves support for source-only uploads, that may change, but for now there is a distinction between source uploads and binary uploads, and dgit does what it can to support the build process.
Behind the scenes, he explained, dgit provides a set of Git repositories that is parallel to the archive, although it runs on a different server. When a privileged user runs dgit push on a package, two things happen. First, dgit tags and pushes to the remote dgit server. Then it performs a traditional package upload to the archive—except that one additional field is added to the package's .dsc source-control file: the git commit hash.
After that push, whenever any user does a dgit clone or dgit fetch, dgit looks for the hash field in the archive's source package. If there is one, dgit uses the corresponding commit from the Git history on the dgit server to complete the operation. If there is no commit-hash field, that means the package's most recent upload (and, perhaps, many or even all uploads) did not come through dgit, so dgit imports the package into Git. If necessary, it stitches the newly imported version into any existing Git history in the dgit server.
The dgit server only stores changes pushed to the server using dgit. As mentioned, when a DM does a dgit push, the altered package is uploaded to the archive. When an unprivileged user doing bug-fixing runs dgit push, something different happens: dgit takes the user's sequence of commits and turns it into an ordered sequence of patches that the package maintainer can use. That behavior, Jackson said, means that it always remains up to the DM's discretion whether or not to use dgit. Those that do not use dgit still get patches that they can incorporate into their workflow, and the history of other users' work is still available to the public. On the other hand, he cautioned, users creating patches with dgit must not also submit their patches some other way, lest the DM (and dgit) get confused.
Sadly, he said, when DMs do not use dgit for a package, dgit's history for the package in question will clearly not include everything in the DM's history, so some potentially useful information is lost. Dgit attempts to work around this choice; it looks in each source package for an X-Vcc-Git header, which a maintainer might use to indicate that they are working from some other Git server. If the header is found when a user clones a package, dgit adds the indicated server as a remote in the user's Git configuration.
But, even then, dgit still bases its repository contents on the source package in the archive. That preserves the principle that dgit mirrors the archive, and it covers those situations when the Git server listed in the header drops offline or simply does not exist. In addition, he said, there are quite a few maintainer trees specified in the X-Vcs-Git header that only contain packaging data (like the debian/ directory or a set of incoming patches).
Dgit for maintainers
Quite a few users like dgit and would love for more DMs to start using it, too, Jackson said. It makes the maintainer's history visible to the users in a uniform fashion, whereas relying on X-Vcs-Git means that each user much learn each DM's "special snowflake workflow." But, he said, there are several other reasons why DMs could benefit from using dgit.
First, it has the potential to simplify maintenance tasks. Dgit makes users' branches and patches readily accessible for merging, and when both the user and the DM are using dgit, one can more readily count on their Git histories being in sync. Second, using Dgit automatically publishes the DM's Git history online (at browse.dgit.debian.org), saving the DM the additional overhead of publishing their history.
Dgit also ensures that the source the maintainer uploads to the Debian archive is exactly the same as contents of their Git HEAD. And it can spare DMs some additional tests and sanity checks on the .dsc and other package control files.
Jackson then explained how dgit integrates with the various Debian packaging workflow tools in use by DMs. The simplest case is for a DM using native or source-format 1.0 packages. Those packaging options require no changes to the source tree fetched with dgit, so DMs using them can adopt dgit immediately.
For packages in the 3.0 "quilt" format, things are more complicated, because the quilt source-tree format might differ from the Git tree. If the DM uses git-buildpackage, then what dgit produces is essentially what Jackson called a "patches-applied packaging branch without a .pc directory." In other words, all changes made against the upstream source have been applied in the source tree and are also included in a patch that is kept in the debian/patches/ directory. Jackson said he had been collaborating with git-buildpackage maintainer Guido Guenther to bridge the gap between what git-buildpackage expects and what dgit currently produces.
If the DM uses git-dpm (the other main tool for working with quilt), however, the outlook is less rosy. The big hurdle there is that git-dpm ignores .gitignore files in the package when it performs the build, which means that those files are then lost when the resulting package is uploaded to the archive. That breaks the cardinal "always be identical to the archive" rule of dgit, Jackson said, but so far he has not been able to convince git-dpm maintainer Bernhard Link to change git-dpm's behavior. If Link cannot be convinced, Jackson said, he may have to add a git-dpm–specific workaround.
Jackson closed out the session by listing a few items still on his to-do list for dgit. One outstanding problem, for example, is that many packages currently include files that are not in the DM's Git branch—such as autotools output. There is no one-size-fits-all solution to handling these extra files, since maintainers' workflows vary, but he thinks it is solvable. There are also new potential uses for dgit, he said. The server already manages access control for dgit users, for instance, so perhaps it could offload some of that responsibility from the Alioth project-hosting server. There is also a rumor that Ubuntu is interested in running its own dgit server, which could bring new developers and patches to the project.
From the outside, dgit may appear to go to a lot of trouble to unite two disparate ways of working with software: the Debian archive provides a central, world-readable store of source packages, while Git is aimed at enabling multiple remote developers to work in a distributed network. But Jackson reminded the audience what it gets out of the deal: anyone can download a Debian source package and see both the original product of the upstream developers and every patch applied by Debian. That is a valuable record to preserve; dgit simply makes it accessible from within the world's leading version-control software, too.
[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]
Brief items
Distribution quotes of the week
Distribution News
Debian GNU/Linux
Debian stable releases
Debian 8.2, the second update to the stable distribution v8 "jessie", has been released.The oldstable distribution, Debian 7 "wheezy", has been updated to v7.9.
These updates mainly add corrections for security problems and other serious issues.
Newsletters and articles of interest
Distribution newsletters
- Debian Project News (September 2)
- DistroWatch Weekly, Issue 626 (September 7)
- openSUSE weekly review (September 10)
- Ubuntu Weekly Newsletter, Issue 433 (September 6)
Page editor: Rebecca Sobol
Next page:
Development>>