FatELF: universal binaries for Linux
One interesting feature of Mac OS X is the concept of a Universal Binary, a single binary file that runs natively on both PowerPC and Intel platforms. Professional game porter Ryan Gordon got sick of Mac developers pointing out that Linux doesn't have anything like that, so he did something about it and wrote FatELF. FatELF brings the idea of single binaries supporting multiple architectures to Linux.
Universal binaries in Mac OS X
Apple introduced the Universal Binary file format in 2005 to ease the transition of the Mac platform from the PowerPC architecture to the Intel architecture. The solution was to include both PowerPC and x86 versions of an application in one "fat binary". If a universal binary is run by Mac OS X, the operating system executes the appropriate section depending on the architecture in use. The big advantage was that Mac developers could distribute one executable of their software, so that end-users wouldn't have to worry about which version to download. Later, Apple went even further and allowed four-architecture binaries: 32 and 64 bit for both Intel and PowerPC.
This was not the first time Apple performed such a trick: in 1994 the company transitioned from Motorola 68k processors to PowerPC and introduced a "fat binary" which included executable code for both platforms. Moreover, NeXTSTEP, the predecessor of Mac OS X, had a fat binary file format (called "Multi-Architecture Binaries") which supported Motorola 68k, Intel x86, Sun SPARC, and HP PA-RISC. So Apple knew what needed to be done when they chose Intel as their new Mac platform. In fact, the Universal Binary format in Mac OS X is essentially the same as NeXTSTEP's Multi-Architecture Binaries. This was possible because Apple uses NeXTSTEP's Mach-O as the native object file format in Mac OS X.
A fat elf for Linux
Ryan Gordon is a well-known game porter: he has created ports of commercial games and other software to Linux and Mac OS X. Notable examples of his work are the Linux ports of the Unreal Tournament series, some of the Serious Sam Series, the Postal Series, Devastation and Prey, but also non-gaming software such as Google Earth and Second Life. With this experience, he knows a lot of both Mac OS X and Linux, so Ryan is well suited to implement the Mac OS X universal binary functionality in Linux.
His FatELF file format embeds multiple Linux binaries for different architectures in a single file. FatELF is actually a simple container format: it adds some accounting information at the start of the file and then appends all the ELF (Executable and Linking Format) binaries after it, adding padding for alignment. FatELF can be used for both executable files and shared libraries (.so files).
An obvious downside of FatELF is that the executable's size gets multiplied by the number of embedded ELF architectures. However, this only holds for the executable files and libraries; common non-executable resources such as images and data files are just shipped as they are without FatELF. For example, a game that ships with hundreds of megabytes of data will, relatively, become only slightly larger.
Moreover, a FatELF binary doesn't require more RAM to run than a regular ELF binary, because the operating system decides which chunk of the file is needed to run on the current system and ignores the ELF objects of the other architectures. This also means that the entire FatELF file does not have to be read (except for kernel modules), so the disk bandwidth overhead is minimal.
On the project's website, Ryan lists a lot of reasons why someone would use FatELF. Some of them are rather far-fetched, such as:
Another benefit in the same vein is that third party packages no longer have to publish multiple packages for different architectures. An obvious critique is that this multiplies the needed disk space and bandwidth if FatELF is used systematically.
However, there is something to be said for FatELF as a means to abstract away architecture differences for end-users. For example, install scripts for proprietary Linux software, such as the scripts for the graphics drivers by AMD and Nvidia, that select which driver to install based on the detected architecture, could be implemented as FatELF binaries. This seems like a cleaner solution than each software vendor implementing his own scripts and flaky logic to detect the right version. Web browser plug-ins are another type of binary that could be an interesting match for FatELF. In support of this idea, Ryan admits he made flaky shell script errors himself in the past:
Another use for FatELF is what Apple used its universal binary for: a transition to a new architecture. The 32-bit to 64-bit transition comes to mind, where FatELF makes it possible to no longer need separate /lib, /lib32 and /lib64 trees. It also makes it possible to get rid of IA-32 compatibility libraries: if you want to run a couple of 32-bit applications on a 64-bit system, you only need FatELF versions of the handful of packages needed by them. But more exotic transitions are also possible, for example when the ELF OSABI (Operating System Application Binary Interface) used by the system changes, or for CPUs that can handle different byte orders.
Status
At the moment, Ryan has written a file format
specification and documentation for FatELF. To make the fat binary concept
possible on Linux, he created patches for the Linux kernel to support
FatELF, and he also adapted the file command to recognize FatELF
files, the binutils commands to allow GCC to link against a FatELF
shared library, and gdb to be able to debug FatELF
binaries. The patches are stored in a Mercurial repository
"until they have been merged into the upstream project
". The
repository also hosts some tools to manipulate FatELF binaries, which are
zlib-licensed.
One of the FatELF tools is fatelf-extract, which lets the user extract a specific ELF binary from a FatELF file, e.g. the x86_64 one. The fatelf-split command extracts all embedded ELF binaries, ending up with files like my_fatelf_binary-i386 and my_fatelf_binary-x86_64. The fatelf-info command reports interesting information about a FatELF file. A tool for developers is fatelf-glue, which will glue ELF binaries together, because GCC currently can't build FatELF binaries. You just have to build each ELF binary separately and then create a FatELF file of them.
As a proof-of-concept, Ryan created a VMware virtual machine image of Ubuntu 9.04 where almost every binary and library is a FatELF file with x86 and x86_64 support. The image can be downloaded and run in VMware Workstation or VMware Player to try the FatELF functionality. But this is not the regular use case. When FatELF is used, it's probably only for a handful of applications. FatELF files also coexist fine with ELF binaries: a FatELF binary can load ELF shared libraries and vice versa.
Relatively simple implementation
Ryan recalls the real point of inspiration for FatELF, a thread on the mailing list of the installer program MojoSetup. On May 20 2007, he writes on this list:
Two years later, Ryan has implemented this idea:
So after a few weeks of work in his spare time, Ryan got a working fat binary implementation for Linux. In contrast, building the virtual machine proof-of-concept literally took days, because it took a lot of work to automate. Ryan also spent a lot of time preparing to post the kernel patches:
Reception
Overall, the patch that allows the Linux kernel to load a FatELF file was received quite positively, but with some questions. For example, Jeremy Fitzhardinge asked why Ryan made it ELF-specific:
Later in the discussion, Jeremy adds that a generic approach would allow the last executable
in the file to be a shell script. If no other format was supported, this
shell script would then be executed, doing something like displaying a
useful message. Ryan seems
unsure that the added flexibility is worth the extra complications, although
he admitted that he would have chosen this route if other executable
formats like a.out files "were still in widespread use and actively
competed with ELF for mindshare.
" He also thinks it should be
possible to support other executable formats in the existing FatELF
format.
Some reactions to the patch that allows kernel
modules to be FatELF binaries are less positive. For example, Jeremy
objected to this because it would only encourage
more binary modules. Ryan understands his concern, but answered:
"I worry about refusing to take steps that would aid free software
developers in case it might help the closed-source people, too.
"
However, Jeremy didn't see it
that way, casting doubt on the use case of FatELF kernel modules:
I don't see much upside in making it "easier" to distribute binary-only open source drivers separately. (It wouldn't help that much, in the end; the modules would still be compiled for some finite set of kernels, and if the user wants to use something else they're still stuck.)
Moreover, even for proprietary kernel modules the use case is not that compelling. Companies like Nvidia have to distribute modules for multiple kernel versions. If the OSABI version doesn't change, they can't use FatELF to pack together multiple drivers for this purpose. So, all in all, FatELF support for kernel modules seems a bit dubious.
In another discussion, Rayson Ho found that Apple (NeXT, actually) has
patented
the technologies behind universal binaries, as a "method and
apparatus for architecture independent executable files
" (#5432937 and
#5604905).
Something that may be considered prior art is the mix of 32-bit and 64-bit object files in a single archive
on AIX, Rayson
thinks. David Miller adds another
possible prior art: TILO, a variant of the Sparc SILO boot loader, that
packs a 32-bit and 64-bit Linux kernel into one file an figures out which
one to actually boot depending on the machine it is running on, but Rayson
doubts this counts, because the project was started in 1995
or 1996, while NeXT's patent filing is from 1993. Ryan also entered the
discussion and clarified that FatELF has a few fields that Apple's
format doesn't, so the flow chart in the patent isn't the same. However,
it's not clear yet if Ryan should be concerned and if so, which changes he
should make to work around the patent.
The future
There are still a lot of things to do. Patches for module-init-tools, glibc (for loading shared FatELF libraries), and elfutils still have to be written. And the patches for binutils and gdb still have to be submitted, Ryan said:
Ryan even thinks about embedding binaries from other UNIXes into a FatELF file. He mentions FreeBSD, OpenBSD, NetBSD and OpenSolaris. In principle, each operating system using ELF files for its binaries could be supported. In addition to the ones mentioned, this also includes DragonFly BSD, IRIX, HP-UX, Haiku, and Syllable. The implementations should not be difficult, according to Ryan:
The support for other operating systems will make it possible to ship one file that works across Linux and FreeBSD, for example, without a platform compatibility layer. This could also be an interesting feature for hybrid Debian GNU/Linux and Debian GNU/kFreeBSD binaries.
The biggest hurdle that FatELF is facing now are adoption pains, Ryan explains:
Another disadvantage is the problems with creating fat binaries in build systems. For example, Erik de Castro Lopo writes about this on his blog. According to Ryan making the build systems handle this situation cleanly still needs some work. He expects the most popular way to build FatELF files will be to do two totally independent builds and glue them together instead of rethinking autoconf and such.
Conclusion
While a universal binary seems much less interesting for Linux than for Mac OS X, because most software in Linux is installed from within a package manager that knows the architecture, the concept is interesting for proprietary Linux software such as games. For a non-expert user, it's not evident if their processor is 32 or 64 bit. A FatELF download embedding both the x86 and x86_64 binary may be a good solution for this problem. And if ARM-based smartbooks become more popular, an x86/x86_64/arm FatELF binary may be the perfect way to distribute a binary that works on 32 bit Intel Atom netbooks, 64 bit Intel computers and ARM smartbooks.
| Index entries for this article | |
|---|---|
| GuestArticles | Vervloesem, Koen |
