FOSDEM: Multiarch on Debian and Ubuntu
In his talk at FOSDEM (Free and Open Source Software Developers' European Meeting) 2012 in Brussels, Wookey (who is working for Linaro on Linux for ARM and doesn't have a first name) talked about what multiarch is and why it's important. Multiarch is a general solution for installing libraries of more than one architecture on the same system.
By "general" we mean more than just the lib and lib64 directories for 32 and 64-bit x86 libraries. Currently the Filesystem Hierarchy Standard (FHS) attempts to address the use of these libraries on the same system by requiring that /usr/lib be reserved for 32-bit libraries, while 64-bit libraries are located in /usr/lib64. This so-called "biarch" design was adopted by Red Hat and SUSE, but not by Debian and Ubuntu. A general solution should not only scale to other architectures, but it should also "remove all corresponding bodgery we have in Debian, such as ia32-libs and biarch packages,
" Wookey says. The Debian developers have been working on a multiarch solution for years and multiarch support is a release goal for the coming Debian 7 "Wheezy" release, expected in 2013.
The basic idea behind multiarch is to generalize the biarch design to arbitrary architectures, and the way it is done is actually quite simple, Wookey maintains: you put your libraries into architecture-specific paths. For instance /usr/lib/libfoo goes into /usr/lib/x86_64-linux-gnu/libfoo if your machine has an x86_64 architecture, into /usr/lib/i386-linux-gnu/libfoo for an i386 architecture, into /usr/lib/powerpc64-linux-gnu/libfoo for a ppc64 architecture, and /usr/lib/arm-linux-gnueabi/libfoo for an armel architecture.
The multiarch paths contain the GNU triplets used by GCC to describe architectures. For instance, in x86_64-linux-gnu "x86_64" stands for the processor type, "linux" designates the kernel, and "gnu" stands for the user-space ABI. However, multiarch adopts the GNU triplets with some adjustments. For instance, both the i486-linux-gnu and i586-linux-gnu GNU triplets will be translated to the /usr/lib/i386-linux-gnu/libfoo path because, according to Wookey, a few minor instruction set differences do not add up to a different ABI requiring its own triplet. The advantage of this partial rethinking of the file system hierarchy is that all libraries have a canonical path. There are no special cases for the locations of native, cross-built, or emulated (with QEMU) libraries: they are all the same.
What can we do with it?
So what can we do with multiarch? As already mentioned, multiarch makes cross-compilation much simpler: it is no longer a special case and essentially you're getting it for free as a byproduct of multiarch. This is primarily because the library loader path is baked into every executable by the linker, and thanks to multiarch's canonical path based on the system's architecture, this path is the same whether the library is built or cross-built.
In the classical approach of cross-building for armel (with dpkg-cross),
the build-time library path is for instance
/usr/arm-linux-gnueabi/lib/libfoo, while the runtime library path
is just /usr/lib/libfoo. With the multiarch approach for
cross-compilation, the library path is
/usr/lib/arm-linux-gnueabi both at build time and at run time, so
"it's much harder for libtool to screw it up,
" Wookey said. Another advantage is that you can just run the build tools under QEMU via binfmt-misc for testing.
Multiarch also allows for a better support for binary-only software, which tends to be only available for 32-bit systems. Thanks to multiarch, you can more easily install 32-bit versions of a 32-bit proprietary program's dependencies on a 64-bit system. Wookey gave as examples the Flash plugin, Skype or Xilinx development tools. Multiarch also allows cheap emulated environments: you can emulate only the parts you need.
The slow genesis of multiarch
Wookey quotes Tollef Fog Heen, who said in 2005: "ia32-libs [is
now] the biggest source package in Debian.
" That is because
currently any 32-bit software that has to run on an amd64 (which is the
name Debian uses for x86_64) installation depends on the package ia32-libs,
which contains i386 versions of all of the libraries, so its source package currently weighs in as a 555 MiB tarball. Ia32-libs was always intended as a temporary solution for the i386/amd64 case, but unfortunately (as often happens with these things), developing the proper general replacement took a lot longer. There were talks about a solution at Debconf 4 and 5 (in 2004 and 2005, respectively), there was a multiarch meeting at FOSDEM 2006, and in June 2006 the first multiarch patches for dpkg were uploaded.
In May 2009, the apt and dpkg maintainers agreed on a package management specification for multiarch at the Ubuntu Developer Summit in Barcelona. To avoid further delays, they restricted the scope to multiarch libraries. In August 2010, the first proposal for multiarch directory names was drafted, and in February 2011 a dpkg multiarch implementation (sponsored by Linaro) landed in Ubuntu. A month later, the normalized GNU triplets were adopted for the multiarch directory names, and then the Ubuntu 11.04 release came with 83 libraries multiarched. Together with 14 multiarch libraries in a PPA (Personal Package Archive), this was already enough to cross-install the 32-bit Flash plugin on a 64-bit system.
Currently, the Ubuntu core is almost completely multiarch: at the time of Wookey's talk, 110 out of the 112 source libraries in the Ubuntu 12.04 main repository were multiarched, as well as 175 out of the 176 binary libraries. Obviously all libraries have to be made multiarch-ready, but also most -dev packages need converting as well, using a similar directory naming scheme as for libraries. That makes it possible to co-install include files that differ between architectures. But on top of this, any tool that is aware of library paths had to be fixed, including libc, dpkg, apt, compilers, make, pkg-config, pmake, cmake, debhelper, lintian, libffi, OpenJDK's lib-jna, and dpkg-cross.
Wookey made it clear that the multiarch development is a classic example of a significant distribution-wide change, which is generally very difficult to do right. One of the factors in the success of the multiarch development is that they used written specifications to record a shared understanding. As can be seen from the project's history, another key factor is that they split the work into bite-sized deliverables.
How does it work?
Normally, a package of the same name but a different architecture is not co-installable. Multiarch-ready packages, though, are given an extra field Multi-Arch in the package specification. This field has one of three possible values, depending on the type of package. A library has the value "same": it can be co-installed with the same package from another architecture, but it can only be used to satisfy the dependencies of a package within the same architecture. An executable has the value "foreign": it cannot be co-installed with the same package from another architecture, but it should be allowed to satisfy the dependencies for any architecture (of course, preference is given to a package for the native architecture if available). And a package that contains both libraries and executables has the value "allowed". An example of this is the python package. The depending packages specify how they use it.
The Debian wiki has some information about how package maintainers can convert their package to multiarch, as well as some general information about multiarch support in Debian. Note that a package for a foreign architecture is only installable if all of its (recursive) dependencies are either marked as multiarch or do not have corresponding packages installed for the native architecture.
An interesting implementation detail is that co-installability doesn't mean that documents from a package get installed twice when you install two architectural versions of it: according to Wookey, dpkg has support for reference-counting of documentation files from co-installable packages that overlap. So an identical documentation file in a 32 and 64-bit x86 version of a library only gets installed once, and it doesn't get removed until both versions of the library are removed.
In practice, you can easily add a new architecture to your machine's Debian or Ubuntu installation. For instance, when you have an amd64 installation and you want to install some i386 libraries, you can add the latter architecture with a simple "dpkg --add-architecture i386" command. Use "dpkg --print-foreign-architectures" to get a list of the foreign architectures you have added, and "dpkg-architecture -qDEB_HOST_MULTIARCH" to see the multiarch pathname for your system's native architecture. The entries in /etc/apt/sources.list also get an extra arch field, for instance:
deb [arch=amd64,i386] http://archive.ubuntu.com/ubuntu precise mainAfter an "apt-get update" to refresh the package list, you can just install an available multiarch-ready library by specifying the architecture after a colon, for instance "apt-get install libattr1-dev:i386". This has been working in Ubuntu for nearly a year now, since 11.04.
Things multiarch (currently) doesn't do
Currently the multiarch solution is limited to libraries. This means that you can't install executables from more than one architecture in /bin or /usr/bin with multiarch. Co-installable executables could be useful (for instance to reuse a single network-mounted root partition on systems of multiple architectures with no modification), but it is deliberately left out of the initial implementation because it would complicate matters further than they already are. Other than a multiarch path for executables, such a system would require kernel support or boot-time symlinking. Before implementing this, the multiarch developers need a detailed specification as they have done with the implementation for libraries, Wookey warned.
Another interesting but currently not implemented feature is that you could "cross-grade" your machine from one architecture to another one. For instance, if you have installed a 32-bit x86 distribution on your 64-bit machine, you could convert it to a 64-bit distribution without having to reinstall it. This could be possible by first manually installing the 64-bit versions of dpkg and apt and then changing which architecture is used by default, after which you could reinstall all installed software, but from the 64-bit architecture. This should work the same way for a cross-grade from arm to armel and from armel to armhf.
Wookey ended his talk with the message that Debian and Ubuntu have now done the hard work for multiarch and shown that it works. However, it could be useful beyond Debian and its derivatives. The multiarch directory scheme will be a target for FHS/LSB standardization in the future, but even if that doesn't happen, it's a much more scalable solution than the current one.
(Those wanting more details can watch the video of Wookey's talk posted by the conference.)
Index entries for this article | |
---|---|
GuestArticles | Vervloesem, Koen |
Conference | FOSDEM/2012 |
Posted Feb 23, 2012 11:28 UTC (Thu)
by aleXXX (subscriber, #2742)
[Link] (4 responses)
This is wrong.
Cross compilation means that the configure-process (autotools, cmake, scons, whatever) must not use properties from the system it runs on to conclude how the target system looks like.
Really, the one thing where I see it helping with cross compiling is RPATH.
Alex
Posted Feb 23, 2012 12:34 UTC (Thu)
by wookey (guest, #5501)
[Link] (3 responses)
The slides are here: http://wookware.org/talks/ (the ELC version given 10 days later has marginally improved slides)
Multiarch does make crossbuilding (on Debian/Ubuntu) much easier in several ways. I cover this in the talk, but to save you watch me blither for 40 mins I'll write it down here.
1) Cross-dependencies.
Because multiarch specifies whether things are co-installable (and only satisfy dependencies within an architecture) - (i.e are like libraries) or whether they are not co-installable, and satisfy deps across arches (i.e are like tools), this info provides a very good mapping to whether to install a BUILD arch of HOST arch version of a build-dependency, and there is a syntax to express the exceptions. So suddenly cross-build-deps is a properly defined, deterministic thing that just falls out of the packaging and tools.
2) Libraries and headers now have a canonical path that doesn't change between build time, run time and install time. Because of this the whole cmake/autoconf/libtool/--libdir shenanigins gets a lots simpler. Files have a path - use it. architecture dependent files have an architecture qualified path. It's beatiful in comparison to what went before.
3) Its easy to install runtimes of a different arch in the system (and their dependencies) and thus (if you have CPU or emulation support) expect them to run, so wrong-arch scripts run at build-time can just work, or you can do your builds as fake-native where a lot of the tools have been replaced by BUILD arch ones. None of this is new (see scratchbox, scratchbox2, OBS), but it all gets a whole lot more orthogonal, which ought to make it a lot easier to use and maintain. Using multiarch for OBS or scratchbox-style not-really-cross building is currently pretty-much untested, so this is my enthusiasm showing through to some degree, but I don't see why it won't work.
Yes, I agree, packages still need to be constructed properly so that can cross correctly, and preferably without running any wrong-arch binaries (The case where you simply don't _have_ qemu support for the HOST arch, so all qemu bodging is cheating, is still very important), but the whole multiarch co-installation infrastructure and canonical pathnames gives us a lot more than just less hassle from RPATH (Debian removes RPATH in almost all cases anyway becauase it's wrong and bad for almost everything except plugins)
I've set up a multiarch cross buildd in order to track the current state of play in terms of what actually works (not a huge amount right now, but I expect that to improve quite rapdily) here:
Posted Feb 23, 2012 19:31 UTC (Thu)
by aleXXX (subscriber, #2742)
[Link] (2 responses)
What do you mean with "Files have a path - use it" ?
Regarding 3) I'm really not up-to-date with that.
Alex
Posted Feb 24, 2012 9:11 UTC (Fri)
by rvfh (guest, #31018)
[Link]
I think headers should be common, and kept in /usr/include. If a header has arch-specific sections it should be easy to fix with correct #defines.
Posted Feb 28, 2012 18:10 UTC (Tue)
by pboddie (guest, #50784)
[Link]
I imagine he means that files you're building and installing go in a location that won't differ from the deployed location on a foreign system. So whereas you'd previously have to make foreign libraries targeted for /usr/lib but which can't be installed there because your host's libraries already reside there - problematic if your tools expect those foreign libraries to be there when building other things which depend on those libraries - you can now give the tools a canonical path for each architecture and the host and foreign libraries won't conflict with each other.
Posted Feb 23, 2012 15:04 UTC (Thu)
by branden (guest, #7029)
[Link] (4 responses)
Having seen this work get started back in the day, it's gratifying to see it starting to bear fruit. :)
Posted Feb 23, 2012 17:36 UTC (Thu)
by zuki (subscriber, #41808)
[Link]
Posted Feb 23, 2012 19:00 UTC (Thu)
by dashesy (guest, #74652)
[Link] (1 responses)
Posted Feb 25, 2012 0:57 UTC (Sat)
by wookey (guest, #5501)
[Link]
We are doing our best in linaro to at least make sure we don't do incompatible things between distros, and there is some LSB work needed to make a distro-independent spec.
Posted Feb 25, 2012 0:38 UTC (Sat)
by wookey (guest, #5501)
[Link]
I'm just helping out and trying to get the cross-building aspects of this to fruition.
Posted Feb 24, 2012 6:29 UTC (Fri)
by ncm (guest, #165)
[Link]
The importance of this development cannot be overstated, but I'm going to have a go at it.
Posted Feb 24, 2012 9:18 UTC (Fri)
by rvfh (guest, #31018)
[Link]
Now the next step is to cross-grade it so I can take advantage of the hardware it runs on, and I wish this could be made easy or at least well documented. I have come across this [1] but did not try it yet.
[1] http://askubuntu.com/questions/81824/how-can-i-switch-a-3...
Posted Feb 24, 2012 10:02 UTC (Fri)
by job (guest, #670)
[Link]
Posted Feb 28, 2012 15:48 UTC (Tue)
by pagerc (guest, #53182)
[Link] (15 responses)
Posted Feb 28, 2012 16:21 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link] (14 responses)
Posted Feb 28, 2012 16:25 UTC (Tue)
by pagerc (guest, #53182)
[Link] (13 responses)
Posted Feb 28, 2012 16:33 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link] (9 responses)
Fatelf and Multiarch solve different problems. Fatelf lets you distribute a single binary that works on multiple architectures. Multiarch lets you install binaries from multiple architectures on a single system.
Posted Feb 28, 2012 17:43 UTC (Tue)
by khim (subscriber, #9252)
[Link] (8 responses)
Not entirely. Not just distribute. Develop it, too. With fatelf you don't need to ever do that which kind of leads to initial question That's responsibility of the applications. In a lot of cases you can invent some architecture-specific solution. Often enough IA32 and x86-64 files can be shared (but ARM and IA32 can't), etc.
Posted Feb 28, 2012 17:47 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link] (7 responses)
Posted Feb 28, 2012 18:29 UTC (Tue)
by khim (subscriber, #9252)
[Link] (6 responses)
Sorry, but you've already named the reason: fatelf makes it easy to distribute binaries for multiple platforms. Multiarch does not solve this problem. And as I've pointed above with fatelf it's easy to solve all the problems which multiarch is supposed to solve, too. So... what exactly makes multiarch support so exciting? Modest space savings? In a world where $100 mobile phone includes gigabytes of storage? Meh...
Posted Feb 28, 2012 18:34 UTC (Tue)
by mjg59 (subscriber, #23239)
[Link]
Posted Feb 28, 2012 18:59 UTC (Tue)
by jwakely (subscriber, #60262)
[Link] (4 responses)
What proportion of users want binaries for multiple platforms?
> In a world where $100 mobile phone includes gigabytes of storage? Meh...
My phone might have plenty of space (actually it's not _that_ much) but my netbook doesn't, I'm down to 300MB free space, and I still use that for real work. I'd really rather not have fatter binaries there.
Posted Feb 28, 2012 20:32 UTC (Tue)
by khim (subscriber, #9252)
[Link] (3 responses)
What this has to do with anything? Users neither need nor want versions for different platforms. Developers do. There are usable (for ISVs) solution (fatelf) and half-usable one (multiarch). As was noted Linux distributors are not interested in ISVs. Ok, no problem, this explains why multiarch is exciting and fatelf is not. What is not explained is why should I care about either.
Posted Feb 29, 2012 5:57 UTC (Wed)
by mathstuf (subscriber, #69389)
[Link] (2 responses)
Posted Feb 29, 2012 8:00 UTC (Wed)
by khim (subscriber, #9252)
[Link] (1 responses)
Everything is possible (it's just a software, after all), but is it feasible? Distributions reject fatelf idea because they are not interested in ISVs support (yes, their hubris is this big) and without support "out-of-the-box" fatelf does not really makes sense: it's much simpler to put bunch of files in a single directory with appropriately named subdirectories and use some script to select proper binary.
Posted Mar 7, 2012 0:18 UTC (Wed)
by fest3er (guest, #60379)
[Link]
... it's much simpler to put bunch of files in a single directory with appropriately named subdirectories and use some script to select proper binary. I did this many years ago. Built binaries for SunOS (68k and SPARC), Solaris (SPARC), SysV68 (68k), SysV88 (88k), HPUX (PA-RISC) and Irix (MIPS). Put them in suitably-named subdirs and wrote a shell script that worked on all of them (though HPUX was problematic) that discovered the architecture and exec'ed the correct binary. NFS-mounted the dir everywhere; people then had the same programs at the same path regardless of which system they were using. True multi-arch binary data files can be a pain to lay out. Especially when structs are involved. It's hard, but it can be done. We had a DB program on 68k. Worked great. But on 88k it didn't work. Many assumptions were made about where members were put and how things were aligned. We ended up using 'fillers' to force alignment. After that, the data files were usable on 68k and 88k, both big-endian. But I would expect that hton*() and ntoh*() would solve most of the endianness problems. But, first things first. Get the library locations standardized. The rest will probably pert near fall into place.
Posted Mar 1, 2012 12:11 UTC (Thu)
by elanthis (guest, #6227)
[Link] (2 responses)
The files created by a package should not be modified for any reason. I should be able to do a package verification and check the checksums of the installed components.
It would be possible to update the package database with modified checksums of binaries that are "patched" by a fatelf system, but then that reduces the overall safety. Then I would only be able to check a potentially compromised system's filesystem using data that only exists in the potentially compromised system's filesystem. Without modifying binaries, I can grab the upstream original verified out-of-band package and compare its checksums directly to those on the system's filesystem image.
Yes, I realize that prelink already screws up most of this. I'm not sure if prelink is still commonly used (faster linkers like gold and strict symbol visibility control can reduce the need for prelinking, and address space randomization should be part of the dynamic loader, but maybe Linux distros haven't caught up yet).
Posted Mar 1, 2012 16:02 UTC (Thu)
by nix (subscriber, #2304)
[Link] (1 responses)
Posted Mar 3, 2012 10:02 UTC (Sat)
by TRS-80 (guest, #1804)
[Link]
Posted Mar 1, 2012 0:38 UTC (Thu)
by filteredperception (guest, #5692)
[Link]
FOSDEM: Multiarch on Debian and Ubuntu
It probably helps with RPATH, but that's about it.
E.g. it still cannot run executables on the host system for the target system (try_run() in cmake). The buildsystem must still know that it is cross compiling, and handle try-run tests accordingly.
It must still be aware whether the executables it finds are for the host or for the target, and whether it can actually execute them or not.
FOSDEM: Multiarch on Debian and Ubuntu
There has been no proper solution to this in Debian to date, The closest were xapt (which simply installed every dependency in the system twice, once for BUILD arch and once for HOST arch), and xdeb which has a big list of heuristics for package-names that should be cross-installed and library packages which aren't named lib-something, etc). Both of these are pretty cranky, an the huge pile of dpkg-crossed 'local' packages caused upgradeability problems too.
apt-get -a<arch> build-dep <package> should install the build tool packages, and cross-library/header packages for the build.
http://people.linaro.org/~wookey/buildd/
FOSDEM: Multiarch on Debian and Ubuntu
And there are really no problems with overlapping files (headers) ?
The last time I used scratchbox, I had to actually log in to it to do something. Has this changed ?
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
What do you mean with "Files have a path - use it" ?
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
I wish this was available for Fedora, but Debian seems to be the distribution of choice for multiarch development.
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
Cross-grading a Ubuntu install
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
Fatelf and Multiarch solve different problems.
Fatelf lets you distribute a single binary that works on multiple architectures.
Multiarch lets you install binaries from multiple architectures on a single system.
It's also more complicated than that - imagine a binary that drops some data in /usr/lib/whatever.
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
Of course it's the responsibility of the applications! But if you're changing all the applications anyway, there's no reason to use fatelf rather than multiarch.
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
What proportion of users want binaries for multiple platforms?
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
FOSDEM: Multiarch on Debian and Ubuntu
(better) Multiarch LiveUSB(/CD/DVD/BD)?