October 21, 2011
This article was contributed by Jon Masters
[
Editor's note: this is the second of a two-part series on the creation
of a Fedora distribution for the ARM architecture, contributed by Red Hat
developer Jon Masters. Part 1 covers
the history leading up to the
current effort; this part will look at how the Fedora ARM distribution
was bootstrapped and where things will go from here.]
Like most distributions, Fedora uses binary software packages (RPMs in this
case) to
manage installed software. These packages are built using complex sets of
build dependencies (other software packages), some of which are not
explicit dependencies but rather implied through their fixed presence in
the standard "buildroots" (chroot environments containing a basic set of
packages) used in the Fedora build infrastructure. All of these build
dependencies have to initially come from somewhere. Ordinarily, the build
system takes care of managing this, picking up dependencies as required, but
in the case of a complete bootstrap there are no pre-built dependencies, so
they must be created from scratch. Not only that, but build dependencies
must be built using a minimal "bootstrap" configuration that avoids circular
source dependencies on packages not built yet, and also provides enough basic
functionality to rebuild the full set of dependencies using standard tools.
Currently, there is no convenient way to automatically bootstrap Fedora
from scratch. Individual packages lack the level of data that would be
required, such as a special set of configuration options sufficient to
build a minimal "bootstrap" version of the core packages (free of circular
dependencies) that could then be used to rebuild the normal configuration.
Fedora is not alone in missing support for automated bootstrap, which is
currently being investigated by a number of distributions. One of the
problems has been that a full bootstrap remains such a rare event that it
is difficult to justify the level of effort required to maintain support
for automatically bootstrapping in the future (as opposed to making a
large number of package changes for one bootstrap effort). Fedora may well
eventually support automated bootstrap (a topic to be fully explored). In
the meantime, bootstrapping for ARMv7 involved many successive stages.
The first involved building a limited cross-compiling Fedora 15
gcc toolchain for use on an x86 host and targeting the "hard float" ARMv7 ABI.
This toolchain was then be used to build a native ARMv7 "hard float" toolchain
and associated minimal dependencies necessary to run that toolchain within a
chroot environment on an ARM system. The output of stage 1 was a
minimal root filesystem, sufficient for use as a chroot (although not
bootable), along with a set of bootstrap scripts that can be used in the future
should it ever become necessary to repeat this exercise. DJ Delorie wrote the
majority of the stage 1 bootstrap scripts, which are available in Fedora's
ARM git repository at git://fedorapeople.org/~djdelorie/bootstrap.git. These bootstrap
scripts have since been used by other distributions and developers to
experiment with ideas of their own around bootstrap, including several
comparative studies of the different toolchain options that might be used.
Stage 2 of the bootstrap process used the minimal chroot environment from
stage 1 (which contained a native "hard float" gcc toolchain) to build up a
small set of packages sufficient to run a native "hard float"
version of rpmbuild, a tool used to build RPM packages from source. The
goal was not to get a version of rpmbuild sufficient to build every package
in the distribution, but rather sufficient to rebuild the packages from
this and the previous stage as "real" RPMs. The choice to build only a
minimally useful version of rpmbuild was important. It meant that it was
possible to reduce the number of build-time dependency packages that would
have to be built manually from source to a minimal set. Further, the
extensive use of GNU autotools in core distribution packages meant that
many of those dependencies would automatically configure away advanced
features not necessary at this stage. Stage 2 resulted in the creation of
a directory containing a number of "recipe" scripts, one per package
dependency, that could run in order using a numbered script prefix.
Using git as a filesystem
Perhaps the most novel thing about stage 2 was the use of git early on. This
author had an idea that, since git supported distributed development, more
Fedora developers could collaborate to build a minimal root filesystem
sufficient for building further bootstrap packages if that filesystem were
itself a git repository, along with the scripts used to create it. A git
repository was created that initially contained the stage 1 filesystem
contents (the "hard float" toolchain components). To this was added a
"/stage2"
directory, containing a "recipe.d" subdirectory. The latter housed
a number of
individual scripts necessary to build particular packages from sources also
contained within the same filesystem. The use of git to manage the entire
work-in-progress chroot meant that all prospective developers would need to do
would be to clone the filesystem, and then make it available to an ARM target
system as a chroot (typically by having a second root-owned clone that was
NFS-exported to a suitable development system). It was ugly, but it worked.
Each time a developer added a new recipe for an RPM build requirement in
"/stage2/recipe.d", they were able to commit that recipe along
with all of the
sources for a given package, its populated build directory after running
the build for the package, and the subsequent changes to the filesystem after
installation. Each git commit thus represented one more step toward having
a usable set of dependencies to build rpmbuild. In cases where there were not
direct build interdependencies between the basic set of packages, developers
working on stage 2 were able to parallelize the effort by building
packages in
their own git branches prior to issuing a standard "pull" request to
the maintainer of the filesystem. This had the benefit of providing a
git history for general posterity. The output of stage 2 was a working
rpmbuild.
Once an rpmbuild binary was available and it became possible to build RPMs,
it was on to the next stage. Stage 3 involved taking the filesystem from
stage 2 (which now contained enough binaries and dependencies to support
running minimal RPM builds of the software that had been built from source
up until that time) and rebuilding its contents as regular RPM packages,
then adding sufficient additional dependency packages to build the standard
"yum" and "mock" Fedora packaging tools. A new top level directory,
"/stage3"
was created and within it various builds were performed using rpmbuild, with
the output source and binary packages preserved. After each build successfully
completed, its build directory was removed (it was not necessary to preserve
since the source and binary RPMs contained enough data to reproduce it if
that became necessary). The resulting package was installed into the same
filesystem, overwriting any files that had previously been manually built
and installed during stage 2 with real RPM versions. The resulting changes
to the filesystem then formed yet another git commit for easy tracking.
During stage 3, developers could once again parallelize various builds that
were not inter-dependent, commit the result of those builds into their own git
branches, and then submit pull requests for their branches to be merged
into the filesystem. The whole git approach would at first appear to be error
prone, but it did not prove to be that way in reality since the packages
being built were largely independent and did not modify the filesystem
in ways that could not be trivially resolved during a git merge. Eventually,
the total size of the git repository containing the root filesystem exceeded
4.6GB (which git handled with ease) and all of the necessary packages sufficient
to build and use the "yum" and "mock" utilities were available. This included
both perl and python as well as many plugins. Yum is a utility used to manage
installing software from repositories of packages. It can be driven by Mock,
which is a tool that creates chroots of installed packages and then uses
those to build other packages in a trusted and reproducible fashion.
The git approach to bootstrapping seemed to pay off. It was retired
following the completion of stage 3 because there were sufficient RPM
packages available and infrastructure to use yum and mock directly to build
other packages. In addition, at this stage it became possible for Dennis
Gilmore to use the mock support to build a few more packages and to then
create a minimal bootable Fedora 15 "hard float" image for installation
onto a few PandaBoards. Images were also created with alternative kernels
for other ARM systems, including the ComuLab TrimSlice (based on the Nvidia
Tegra-2). These images were made available to developers who then had a
choice of booting the images or using them as chroots on an existing
Fedora 13 installation (the result being a chroot within a chroot for
mock builds
done on such latter systems). The images were collectively then used for
stage 4 of the bootstrap.
The stage 4 images included a set of scripts written by Dennis Gilmore
and subsequently enhanced by many others, including Henrik Nordström.
These scripts implemented a poor man's distributed build system (for
various reasons, it was not possible to run a full Koji build system yet).
A dedicated virtual server machine was created to host various source and
successfully built packages in a "stage4" yum repo. Then, each developer
with a build system running the stage 4 image was able to automate the
process of retrieving a package from the server, building it in mock on
their ARM system, and contributing the results back to the overall repository
of built stage 4 packages. Using this approach, over 13,000 binary
"armv7hl" (Fedora's name for the ARMv7 hard-float, little-endian
architecture)
packages were built over a period of a couple of weeks. These were combined
with the existing "noarch" (meaning architecture independent) packages from
the primary architecture (x86) to create an initial set of Fedora 15 ARM packages.
Stage 4 was the largest stage by far since it targeted generally building the
entire distribution set of packages. At this point, it was discovered that
many source packages required some tweaking. Some did not build for ARM at
first, requiring usually trivial fixes and the occasional patch be committed
into the Fedora git repositories and built first for the primary architecture.
In the case that ARM-specific changes that were not yet in the official Fedora
git repositories, special package versions were built containing obvious and
compatible changes to the release number (such as appending
".arm0") in such a
way that updates later built from the fixed git repository would not conflict. A
tool was later written by DJ Delorie to automatically track down those packages
that required fixes yet to be committed in order to ensure all Fedora ARM
packages were fully in sync with the original primary architecture set.
During stage 4, it had been decided generally to build the Fedora 15 package
set as it was on the day Fedora 15 was released for the primary architecture,
not including all of the updates that might have been built in the interim
(unless those updates contained necessary ARM or non-ARM fixes). This is
because the intent was to first bootstrap the entire distribution to the
same point as Fedora 15 had been at release on the primary architecture.
This known combination of packages could then be used to rebuild the entire
package set in the stage 5 final Koji mass rebuild. Updates would then be
built from the primary architecture using the known good Fedora 15 base.
Henrik Nordström in particular assisted greatly at this stage in building
packages and fixing problems with the stage 4, even though at times he did
not always agree with the pedantry of targeting non-updates packages first.
And that brings the reader up to the present day. As of this writing, a
core set of Fedora 15 packages has been built and the distribution is
usable on ARMv7 systems (at least experimentally). The final stage is to
rebuild everything one more time using the standard build infrastructure
in a known good configuration, using the official ARM Koji system. Fedora
protocol is that each package be built exactly once (for official builds)
for a given release and at that time it should be built for all variants
of the particular architecture. This means that it is necessary to wait
for a corresponding initial set of ARMv5 packages to be built prior to
completing the stage 5 final mass rebuild of Fedora 15. Since ARMv5 is
not a new architecture, and is not changing ABI, all that is required
there is a stage 4-like rebuild using mock. This is currently being performed
using many of the same builders that bootstrapped ARMv7, under the control
of a new set of scripts named "moji", written by Jon Chiappetta of Seneca College.
Once a mass rebuild of the entire Fedora 15 package set has been completed
using Koji, an initial release of Fedora 15 will be made available. It is
hoped that this will happen prior to the final release of Fedora 16 on the
primary architecture (in early November), or at least soon thereafter. It
should be noted that until Fedora ARM is a primary architecture, and while
it is playing a constant game of catch-up, no release will be entirely
finished. Instead, it will be good enough for the criteria established for
that particular release. This means that there will be some (non essential)
packages that do not make the initial release of Fedora 15. These will
follow the release as there is time to work on making them available.
Following the Fedora 15 release, attention will turn to the next phase. It
has been proposed that the team skip Fedora 16 and target getting rawhide
(the development build of Fedora) built as the next step, using a tool
known as koji-shadow to automatically rebuild all rawhide package builds
within a reasonable period of their being built on the primary architecture.
Whether "reasonable" is one day or several days will have to be decided, but it
is clear that, by targeting rawhide as quickly as possible, Fedora ARM can
benefit from following along closely as Fedora 17 and 18 are released.
The future of Fedora ARM
The Fedora ARM project has gained greater cohesion over the past 6 months as
more involvement from many of the parties involved has lead to the solving
of a number of challenging problems around package building, architecture
bootstrap, and so on. Clearly, one of the primary (pun intended) goals of
Fedora ARM is to reach primary architecture status as effectively as possible.
That won't happen over night, and, in fact, no secondary architecture has ever
been promoted to primary yet, so there is a lot left to determine around what
exactly the requirements will be and how that will be done.
Primary architecture status would afford many benefits to Fedora ARM, such as
requiring all Fedora packagers to support ARM (rather than relying on good
will in fixing bugs - although the assistance given so far has generally been
excellent), and being able to define blockers and targets for individual
Fedora releases. As a secondary architecture, Fedora ARM must play a constant
game of catch-up, rebuilding packages that have already been built on the
primary and hoping that not too many issues will arise to slow down the
overall release. As a primary, the entire distribution will wait if an
issue arises that causes a package somewhere to fail to build on ARM. At the
same time, primary architecture status brings benefits to Fedora in terms
of having more than one first-class architecture and will allow Fedora to
better compete for the growing numbers of users who have ARM based systems.
Getting to primary architecture status will take some time. At a minimum
the process will likely require that Fedora ARM very closely track
the building of every primary architecture package and produce a build
shortly thereafter (and eventually in parallel). Tools such as koji-shadow
are already being deployed to automatically track primary architecture
builds and initiate secondary ones in this fashion. In addition, the
systems used to build packages will need to be more rugged than those that
are available today. The current builder hardware is sufficient for a lab
environment, but it is occasionally plagued with issues such as faulty
SD cards, overheating boards, and so forth. It is hoped that newer, more
suitable hardware will appear over the coming months to solve that issue.
Beyond a plan for becoming a primary architecture, a number of specific
recommendations are likely to feed out of the current efforts of Fedora ARM
project back into the planning around general Fedora releases. They are
likely to include a call for mandatory mass rebuilds every one or two Fedora
releases. A mass rebuild means that every package in the entire distribution
is rebuilt from scratch, and it makes the life of secondary architectures (or
new architectures) much easier because those secondary architectures need
build only one release worth of package dependencies.
More about Fedora ARM
To find out more about the Fedora ARM project and to get involved, visit the
Fedora ARM
secondary architecture wiki pages.
Here you can find directions on joining the "arm" Fedora mailing list, and
joining the #fedora-arm IRC channel on the Freenode IRC network.
[ Jon Masters is a Principal Software Engineer at Red Hat, where he works on
the Fedora ARM project. Jon is co-author of Building Embedded Linux Systems,
and is currently writing a book on porting Linux to new architectures. ]
(
Log in to post comments)