Automating architecture bootstrapping in Debian
Debian supports a lengthy list of hardware architectures—twelve on the official list, plus twelve unofficial ports and a variety of other "port-like" projects such as distributions based on non-Linux kernels. Nevertheless, starting a new architecture-support effort involves a lot of repetitive work that Helmut Grohne (and others) think could be automated. Grohne presented the topic at DebConf 2015 in Heidelberg, discussing the issues involved when bootstrapping a new architecture and what needs to be improved. The good news is that progress is being made and that the work benefits the rest of the project, even those not interested in architecture bootstrapping.
In fact, Grohne started the session by discussing why everyone in Debian should care about automating the architecture-bootstrap process. "Bootstrapping," he said, just means the process of getting the initial, core suite of Debian packages up and running on the new platform. Roughly speaking, that means getting the new architecture to the point where the build-essential metapackage can be used; at that point most other Debian packages can be built on the target system.
The project averages about one new bootstrap per year, he said; ARM64 and PowerPC64-EL are the most recently added architectures, while MIPS64-EL, RISC-V, and OpenRISC are on the horizon. Improving the bootstrapping process will only make Debian a more inviting distribution in areas like embedded development, he said, where Debian may not be the OS of choice. But it also forces the project to re-examine much of its build-from-source tool set, which might otherwise languish, and improving the process could encourage new projects like bootstrapping sub-architectures (for example, creating an x32-optimized port of Debian, or a port that uses the musl C library).
Grohne is the author of rebootstrap, a QA tool for bootstrapping a new architecture. It currently runs on Debian's Jenkins server, testing 20 different architectures about once each week. Each test tries to cross-build about 100 packages, which is only a subset of the packages build-essential pulls in or depends on. Nevertheless, rebootstrap has caught 190 bugs so far (120 of which have been fixed). Grohne plans to expand the package set covered by rebootstrap, but said that one of the lasting benefits of the process is catching and fixing bugs in the core package set.
Cross toolchains and cross-building
He then turned his attention to outlining the steps involved in bootstrapping an architecture, beginning with a description of the cross toolchains used in Debian. Two options are in common usage; both include a version of GCC that can cross-compile for the target architecture, plus target-architecture versions of binutils, glibc, glibc headers, and gcc-defaults. The two toolchains differ in how dependencies are handled: one expects multi-architecture builds to be available on the build system for all dependencies, while the other expects target-architecture versions of all dependencies.
Both of the approaches work, Grohne said. The toolchain packages are now in Debian unstable (which was not true as recently as two years ago). Today, though, most bootstrapping projects can begin with the back-and-forth GCC/glibc "dance." First the user cross-compiles a minimalist version of GCC for the new architecture, which is then used to build the glibc-header package. Then a bit more of GCC can be built, which in turn allows more of glibc to be built, and so forth.
There are, however, a few architectures where cross toolchain support is still problematic. Alpha and HPPA have glibc conflicts, while OpenRISC, RISC-V, armel, armhf, and SuperH have GCC bugs. Patches are available to fix each of these problems, but they have not yet been merged. Thus, anyone needing to bootstrap or cross-compile on those architectures will need to get the patches from the bug-tracking system and apply them before proceeding. Grohne encouraged anyone who saw their "favorite architecture" on the problematic list to get in touch after the talk.
He then described the process for cross-building an individual package. Thanks to the Emdebian team, some packages have supported cross-building for close to ten years. For the rest, most Debian packages can be cross-built using sbuild or dpkg-buildpackage, so long as the appropriate flags are set to build for the target architecture. What does cause problems, though, is satisfying a package's Build-Depends dependencies when cross-building.
Problems and solutions
A lot of packages in the Debian archive are multi-architecture, which should allow the build system's version to satisfy Build-Depends for a cross-build. But, in reality, the long chains of transitive dependencies can break down if just one package without multi-architecture support is involved. Grohne said that out of Debian's 20,000 packages, Build-Depends problems mean that only about 3,000 can be automatically cross-built. There is a web page available that monitors the status of the dependency issues; interested developers can check there for packages that need attention.
In many cases, he said, the fixes required to unstick a problematic Build-Depends chain are simple enough—such as rewriting dependency rules that inadvertently assume that the build architecture and host architecture are the same. For example, he said, the dependency rule:
Build-Depends: g++ (>= 4:5)
is probably meant to specify that the package should be built with a recent version of G++, but the rule is interpreted as a package that needs to be present on the target system. For now, bootstrappers usually solve these problems through a lot of manual effort. Better solutions have been proposed, such as special "compiler for host" packages, which could be specified in dependency rules:
Build-Depends: g++-for-host (>= 4:5)
A proof-of-concept package for this idea is in Debian experimental.
Interested Debian contributors can also make a significant difference by adding multi-architecture support to more and more packages in the archive. Most of the work required involves straightforward fixes, such as changing compiler references to use target triplets (which allow different build and host architectures).
There are a few "funky issues" that arise when working on multi-architecture support, however. The most common is encountered in interpreted languages. For example, a "Architecture: any" Perl application may depend on a "Architecture: all" Perl module, which in turn depends on a "Architecture: any" Perl extension. But "all" and "any" are not the same to the dependency resolver. Whereas "all" usually designates a package that will work, unaltered, on any processor (such as a collection of Perl scripts), "any" means that the package can be built for any architecture.
Unfortunately, due to that minor distinction, passing through the "all" architecture rule in the middle of the chain breaks the chain, since the build system's version of the package satisfies that dependency. At that point, the dependency resolver stops looking for packages in the target architecture. The bootstrapping team has not yet decided on a solution to this problem, he said, although there is a workaround: manually changing the all to an any and adding another rule (Multi-Arch: same) to every dependency in the chain.
There are, of course, quite a few other problems encountered when cross-building a large set of packages. Grohne gave multiple examples, some of which raise difficult-to-answer questions. For example, there are some packages that are their own build dependency (he noted cracklib2 and nss in this group) because they expect to access certain data files during the build process, and those files are shipped in the same package as the source code. Fixing that circular dependency without breaking native builds requires careful thought, he said.
Grohne closed the session with a brief status report and some ideas for future development. Bootstrapping a new architecture currently involves about 500 source packages. His rebootstrap tool only tests 100 of those, which means it would require a lot of additional work to be comprehensive. Instead, he has proposed implementing the Build Profiles specification, which would essentially allow developers to define a separate set of build dependencies and compilation targets to be used for cross-builds. If widely implemented, it can reduce the amount of manual tweaking required. The architecture-bootstrapping team has added Build Profile support to a number of core packages already, but more remains to be done.
At the conclusion of the talk, the audience had quite a few questions for Grohne, most of which focused in on the particulars of cross-compilation or of specifying build dependencies. On the whole, it seems as though the Debian community is interested in doing what it can to make cross-building packages more reliable. For developers interested in bringing Debian up from scratch on a new processor architecture, the long-term outlook may be good, but there is considerable work to be done in the days ahead.
[The author would like to thank the Debian project for travel assistance to attend DebConf 2015.]
| Index entries for this article | |
|---|---|
| Conference | DebConf/2015 |
