February 22, 2012
This article was contributed by Koen Vervloesem
In his talk at FOSDEM (Free and
Open Source Software Developers' European Meeting) 2012 in Brussels, Wookey
(who is working for Linaro on Linux
for ARM and doesn't have a first
name) talked about what multiarch is and why it's important. Multiarch
is a general solution for installing libraries of more than one
architecture on the same system.
By "general" we mean more than just the lib and lib64 directories for 32 and 64-bit x86 libraries. Currently the Filesystem Hierarchy Standard (FHS) attempts to address the use of these libraries on the same system by requiring that /usr/lib be reserved for 32-bit libraries, while 64-bit libraries are located in /usr/lib64. This so-called "biarch" design was adopted by Red Hat and SUSE, but not by Debian and Ubuntu. A general solution should not only scale to other architectures, but it should also "remove all corresponding bodgery we have in Debian, such as ia32-libs and biarch packages," Wookey says. The Debian developers have been working on a multiarch solution for years and multiarch support is a release goal for the coming Debian 7 "Wheezy" release, expected in 2013.
The basic idea behind multiarch is to generalize the biarch design to arbitrary architectures, and the way it is done is actually quite simple, Wookey maintains: you put your libraries into architecture-specific paths. For instance /usr/lib/libfoo goes into /usr/lib/x86_64-linux-gnu/libfoo if your machine has an x86_64 architecture, into /usr/lib/i386-linux-gnu/libfoo for an i386 architecture, into /usr/lib/powerpc64-linux-gnu/libfoo for a ppc64 architecture, and /usr/lib/arm-linux-gnueabi/libfoo for an armel architecture.
The multiarch paths contain the GNU triplets used by GCC to describe
architectures. For instance, in x86_64-linux-gnu "x86_64"
stands for
the processor type, "linux" designates the kernel, and
"gnu" stands for the
user-space ABI. However, multiarch adopts the GNU triplets with some
adjustments. For instance, both the i486-linux-gnu and
i586-linux-gnu GNU triplets will be translated to the
/usr/lib/i386-linux-gnu/libfoo path because, according to Wookey,
a few minor instruction set differences do not add up to a different ABI
requiring its own triplet. The advantage of this partial
rethinking of the file system hierarchy is that all libraries have a
canonical path. There are no special cases for the locations of native,
cross-built, or emulated (with QEMU) libraries: they are all the same.
What can we do with it?
So what can we do with multiarch? As already mentioned, multiarch makes cross-compilation much simpler: it is no longer a special case and essentially you're getting it for free as a byproduct of multiarch. This is primarily because the library loader path is baked into every executable by the linker, and thanks to multiarch's canonical path based on the system's architecture, this path is the same whether the library is built or cross-built.
In the classical approach of cross-building for armel (with dpkg-cross),
the build-time library path is for instance
/usr/arm-linux-gnueabi/lib/libfoo, while the runtime library path
is just /usr/lib/libfoo. With the multiarch approach for
cross-compilation, the library path is
/usr/lib/arm-linux-gnueabi both at build time and at run time, so
"it's much harder for libtool to screw it up," Wookey said. Another advantage is that you can just run the build tools under QEMU via binfmt-misc for testing.
Multiarch also allows for a better support for binary-only software, which tends to be only available for 32-bit systems. Thanks to multiarch, you can more easily install 32-bit versions of a 32-bit proprietary program's dependencies on a 64-bit system. Wookey gave as examples the Flash plugin, Skype or Xilinx development tools. Multiarch also allows cheap emulated environments: you can emulate only the parts you need.
The slow genesis of multiarch
Wookey quotes Tollef Fog Heen, who said in 2005: "ia32-libs [is
now] the biggest source package in Debian." That is because
currently any 32-bit software that has to run on an amd64 (which is the
name Debian uses for x86_64) installation depends on the package ia32-libs,
which contains i386 versions of all of the libraries, so its source package currently weighs in as a 555 MiB tarball. Ia32-libs was always intended as a temporary solution for the i386/amd64 case, but unfortunately (as often happens with these things), developing the proper general replacement took a lot longer. There were talks about a solution at Debconf 4 and 5 (in 2004 and 2005, respectively), there was a multiarch meeting at FOSDEM 2006, and in June 2006 the first multiarch patches for dpkg were uploaded.
In May 2009, the apt and dpkg maintainers agreed on a package management specification for multiarch at the Ubuntu Developer Summit in Barcelona. To avoid further delays, they restricted the scope to multiarch libraries. In August 2010, the first proposal for multiarch directory names was drafted, and in February 2011 a dpkg multiarch implementation (sponsored by Linaro) landed in Ubuntu. A month later, the normalized GNU triplets were adopted for the multiarch directory names, and then the Ubuntu 11.04 release came with 83 libraries multiarched. Together with 14 multiarch libraries in a PPA (Personal Package Archive), this was already enough to cross-install the 32-bit Flash plugin on a 64-bit system.
Currently, the Ubuntu core is almost completely multiarch: at the time
of Wookey's talk, 110 out of the 112 source libraries in the Ubuntu 12.04
main repository were multiarched, as well as 175 out of the 176 binary
libraries. Obviously all libraries have to be made multiarch-ready, but
also most -dev packages need converting as well, using a similar directory naming scheme as for libraries. That makes it possible to co-install include files that differ between architectures. But on top of this, any tool that is aware of library paths had to be fixed, including libc, dpkg, apt, compilers, make, pkg-config, pmake, cmake, debhelper, lintian, libffi, OpenJDK's lib-jna, and dpkg-cross.
Wookey made it clear that the multiarch development is a classic example
of a significant distribution-wide change, which is generally very
difficult to do right. One of the factors in the success of the multiarch
development is that they used written specifications to record a shared
understanding. As can be seen from the project's history, another key
factor is that they split the work into bite-sized deliverables.
How does it work?
Normally, a package of the same name but a different architecture is not
co-installable. Multiarch-ready packages, though, are given an extra field
Multi-Arch in the package specification. This field has one of three possible values, depending on the type of package. A library has the value "same": it can be co-installed with the same package from another architecture, but it can only be used to satisfy the dependencies of a package within the same architecture. An executable has the value "foreign": it cannot be co-installed with the same package from another architecture, but it should be allowed to satisfy the dependencies for any architecture (of course, preference is given to a package for the native architecture if available). And a package that contains both libraries and executables has the value "allowed". An example of this is the python package. The depending packages specify how they use it.
The Debian wiki has some information about how package maintainers can
convert their
package to multiarch, as well as some general information about multiarch support in
Debian. Note that a package for a foreign architecture is only installable if all of its (recursive) dependencies are either marked as multiarch or do not have corresponding packages installed for the native architecture.
An interesting implementation detail is that co-installability doesn't mean that documents from a package get installed twice when you install two architectural versions of it: according to Wookey, dpkg has support for reference-counting of documentation files from co-installable packages that overlap. So an identical documentation file in a 32 and 64-bit x86 version of a library only gets installed once, and it doesn't get removed until both versions of the library are removed.
In practice, you can easily add a new architecture to your machine's
Debian or Ubuntu installation. For instance, when you have an amd64
installation and you want to install some i386 libraries, you can add the
latter architecture with a simple
"dpkg --add-architecture i386" command. Use
"dpkg --print-foreign-architectures" to get a list of the
foreign architectures you have added, and
"dpkg-architecture -qDEB_HOST_MULTIARCH" to see the multiarch
pathname for your system's native architecture. The entries in
/etc/apt/sources.list also get an extra arch field, for instance:
deb [arch=amd64,i386] http://archive.ubuntu.com/ubuntu precise main
After an "
apt-get update" to refresh the package list, you can just install an available multiarch-ready library by specifying the architecture after a colon, for instance "
apt-get install libattr1-dev:i386". This has been working in Ubuntu for nearly a year now, since 11.04.
Things multiarch (currently) doesn't do
Currently the multiarch solution is limited to libraries. This means that you can't install executables from more than one architecture in /bin or /usr/bin with multiarch. Co-installable executables could be useful (for instance to reuse a single network-mounted root partition on systems of multiple architectures with no modification), but it is deliberately left out of the initial implementation because it would complicate matters further than they already are. Other than a multiarch path for executables, such a system would require kernel support or boot-time symlinking. Before implementing this, the multiarch developers need a detailed specification as they have done with the implementation for libraries, Wookey warned.
Another interesting but currently not implemented feature is that you could "cross-grade" your machine from one architecture to another one. For instance, if you have installed a 32-bit x86 distribution on your 64-bit machine, you could convert it to a 64-bit distribution without having to reinstall it. This could be possible by first manually installing the 64-bit versions of dpkg and apt and then changing which architecture is used by default, after which you could reinstall all installed software, but from the 64-bit architecture. This should work the same way for a cross-grade from arm to armel and from armel to armhf.
Wookey ended his talk with the message that Debian and Ubuntu have now
done the hard work for multiarch and shown that it works. However, it could
be useful beyond Debian and its derivatives. The multiarch directory scheme
will be a target for FHS/LSB standardization in the future, but even if
that doesn't happen, it's a much more scalable solution than the current one.
(Those wanting more details can watch the
video of Wookey's talk posted by the conference.)
(
Log in to post comments)