October 28, 2009
This article was contributed by Koen Vervloesem
One interesting feature of Mac OS X is the concept of a Universal
Binary, a single binary file that runs natively on both PowerPC and Intel
platforms. Professional game porter Ryan Gordon got sick of Mac developers
pointing out that Linux doesn't have anything like that, so he did
something about it and wrote FatELF.
FatELF brings the idea of single binaries supporting multiple architectures
to Linux.
Universal binaries in Mac OS X
Apple introduced the Universal Binary file format in 2005 to ease the
transition of the Mac platform from the PowerPC architecture to the Intel
architecture. The solution was to include both PowerPC and x86 versions of
an application in one "fat binary". If a universal binary is run by Mac OS
X, the operating system executes the appropriate section depending on the
architecture in use. The big advantage was that Mac developers could
distribute one executable of their software, so that end-users wouldn't
have to worry about which version to download. Later, Apple went even
further and allowed four-architecture binaries: 32 and 64 bit for both
Intel and PowerPC.
This was not the first time Apple performed such a trick: in 1994 the
company transitioned from Motorola 68k processors to PowerPC and introduced
a "fat binary" which included executable code for both platforms. Moreover,
NeXTSTEP, the predecessor of Mac OS X, had a fat binary file format (called
"Multi-Architecture Binaries") which supported Motorola 68k, Intel x86, Sun
SPARC, and HP PA-RISC. So Apple knew what needed to be done when they chose
Intel as their new Mac platform. In fact, the Universal Binary format in
Mac OS X is essentially the same as NeXTSTEP's Multi-Architecture
Binaries. This was possible because Apple uses NeXTSTEP's Mach-O as the
native object file format in Mac OS X.
A fat elf for Linux
Ryan Gordon is a well-known game porter: he has created ports of
commercial games and other software to Linux and Mac OS X. Notable examples
of his work are the Linux ports of the Unreal Tournament series, some of
the Serious Sam Series, the Postal Series, Devastation and Prey, but also
non-gaming software such as Google Earth and Second Life. With this
experience, he knows a lot of both Mac OS X and Linux, so Ryan is
well suited to implement the Mac OS X universal binary functionality in
Linux.
His FatELF file format embeds multiple Linux binaries for different
architectures in a single file. FatELF is actually a simple container
format: it adds some accounting information at the start of the file and
then appends all the ELF (Executable and Linking Format) binaries after it,
adding padding for alignment. FatELF can be used for both executable files
and shared libraries (.so files).
An obvious downside of FatELF is that the executable's size gets
multiplied by the number of embedded ELF architectures. However, this only
holds for the executable files and libraries; common non-executable
resources such as images and data files are just shipped as they are
without FatELF. For example, a game that ships with hundreds of megabytes
of data will, relatively, become only slightly larger.
Moreover, a FatELF binary doesn't require more RAM to run than a regular
ELF binary, because the operating system decides which chunk of the file is
needed to run on the current system and ignores the ELF objects of the
other architectures. This also means that the entire FatELF file does not
have to
be read (except for kernel modules), so the disk bandwidth overhead
is minimal.
On the project's website, Ryan lists a lot of reasons why someone would
use FatELF. Some of them are rather far-fetched, such as:
Distributions no longer need to have separate
downloads for various platforms. Given enough disc space, there's no reason
you couldn't have one DVD ISO file that installs an x86-64, x86, PowerPC,
SPARC, and MIPS system, doing the right thing at boot time. You can remove
all the confusing text from your website about "which installer is right
for me?"
Another benefit in the same vein is that third party packages no longer
have to publish multiple packages for different architectures. An obvious
critique is that this multiplies the needed disk space and
bandwidth if FatELF is used systematically.
However, there is something to be said for FatELF as a means to abstract
away architecture differences for end-users. For example, install scripts
for proprietary Linux software, such as the scripts for the graphics
drivers by AMD and Nvidia, that select which driver to install based on the
detected architecture, could be implemented as FatELF binaries. This seems
like a cleaner solution than each software vendor implementing his own
scripts and flaky logic to detect the right version. Web browser plug-ins
are another type of binary that could be an interesting match for
FatELF. In support of this idea, Ryan admits he made flaky shell script errors
himself in
the past:
Many years ago, I shipped a game that ran on i686 and
PowerPC Linux. I could not have predicted that one day people would be
running x86_64 systems that would be able to run the i686 version, so doing
something like: exec $(uname -m)/mygame would fail, and there's
really no good way to future-proof that sort of thing. As that game now
fails to start on x86_64 systems, it would have been better to just ship
for i686 and not try to select a CPU arch.
Another use for FatELF is what Apple used its universal binary for: a
transition to a new architecture. The 32-bit to 64-bit transition comes to
mind, where FatELF makes it possible to no longer need separate
/lib, /lib32 and /lib64 trees. It also makes it
possible to get rid of IA-32 compatibility libraries: if you want to run a
couple of 32-bit applications on a 64-bit system, you only need FatELF
versions of the handful of packages needed by them. But more exotic
transitions are also possible, for example when the ELF OSABI (Operating
System Application Binary Interface) used by the system changes, or for
CPUs that can handle different byte orders.
Status
At the moment, Ryan has written a file format
specification and documentation for FatELF. To make the fat binary concept
possible on Linux, he created patches for the Linux kernel to support
FatELF, and he also adapted the file command to recognize FatELF
files, the binutils commands to allow GCC to link against a FatELF
shared library, and gdb to be able to debug FatELF
binaries. The patches are stored in a Mercurial repository
"until they have been merged into the upstream project". The
repository also hosts some tools to manipulate FatELF binaries, which are
zlib-licensed.
One of the FatELF tools is fatelf-extract, which lets the user
extract a specific ELF binary from a FatELF file, e.g. the x86_64 one. The
fatelf-split command extracts all embedded ELF binaries, ending up
with files like my_fatelf_binary-i386 and
my_fatelf_binary-x86_64. The fatelf-info command reports
interesting information about a FatELF file. A tool for developers is
fatelf-glue, which will glue ELF binaries together, because GCC
currently can't build
FatELF binaries. You just have to build each ELF binary
separately and then create a FatELF file of them.
As a proof-of-concept, Ryan created a VMware virtual machine image of Ubuntu
9.04 where almost every binary and library is a FatELF file with x86 and
x86_64 support. The image can be downloaded and run in VMware Workstation
or VMware Player to try the FatELF functionality. But this is not the
regular use case. When FatELF is used, it's probably only for a handful of
applications. FatELF files also coexist fine with ELF binaries: a FatELF
binary can load ELF shared libraries and vice versa.
Relatively simple implementation
Ryan recalls the real point of inspiration for FatELF, a thread on the
mailing list of the installer program MojoSetup. On May 20 2007, he writes
on this list:
I'd love someone to extend the ELF format so that it
supports "fat" binaries, like Apple's Mach-O format does for the
PowerPC/Intel "Universal" binaries...but that would require coordination
and support at several points in the system software stack.
Two years later, Ryan has implemented this idea:
I have a long list of things that Linux should
blatantly steal from Mac OS X, and given infinite time, I'll implement them
all. FatELF happens to be something on that list that is directly useful to my
work as a game developer that also happens to be a simple project. I think
the changes required to the system are pretty small for what could be good
benefits to Unix as a whole.
So after a few weeks of work in his spare time, Ryan got a working fat
binary implementation for Linux. In contrast, building the virtual machine
proof-of-concept literally took days, because it took a lot of work to
automate. Ryan also spent a lot of time preparing to post the kernel
patches:
I was so intimidated by the kernel mailing list, that
I spent a disproportionate amount of time researching etiquette, culture,
procedure. I didn't want to offend anyone or waste their time.
Reception
Overall, the patch that allows the
Linux kernel to load a FatELF file was received quite positively, but
with some questions. For example, Jeremy Fitzhardinge asked why
Ryan made it ELF-specific:
The idea seem interesting, but does it need to be
ELF-specific? What about making the executable a simple archive file
format (possibly just an "ar" archive?) which contains other executables.
The archive file format would be implemented as its own binfmt,
and the internal executables could be arbitrary other executables. The
outer loader would just try executing each executable until one works (or
it runs out).
Later in the discussion, Jeremy adds that a generic approach would allow the last executable
in the file to be a shell script. If no other format was supported, this
shell script would then be executed, doing something like displaying a
useful message. Ryan seems
unsure that the added flexibility is worth the extra complications, although
he admitted that he would have chosen this route if other executable
formats like a.out files "were still in widespread use and actively
competed with ELF for mindshare." He also thinks it should be
possible to support other executable formats in the existing FatELF
format.
Some reactions to the patch that allows kernel
modules to be FatELF binaries are less positive. For example, Jeremy
objected to this because it would only encourage
more binary modules. Ryan understands his concern, but answered:
"I worry about refusing to take steps that would aid free software
developers in case it might help the closed-source people, too."
However, Jeremy didn't see it
that way, casting doubt on the use case of FatELF kernel modules:
Any open source driver should be encouraged to be
merged with mainline Linux so there's no need to distribute them
separately. With the staging/ tree, that's easier than ever.
I don't see much upside in making it "easier" to distribute binary-only
open source drivers separately. (It wouldn't help that much, in the end;
the modules would still be compiled for some finite set of kernels, and if
the user wants to use something else they're still stuck.)
Moreover, even for proprietary kernel modules the use case is not that
compelling. Companies like Nvidia have to distribute modules for multiple
kernel versions. If the OSABI version doesn't change, they can't use FatELF
to pack together multiple drivers for this purpose. So, all in all, FatELF
support for kernel modules seems a bit dubious.
In another discussion, Rayson Ho found that Apple (NeXT, actually) has
patented
the technologies behind universal binaries, as a "method and
apparatus for architecture independent executable files" (#5432937 and
#5604905).
Something that may be considered prior art is the mix of 32-bit and 64-bit object files in a single archive
on AIX, Rayson
thinks. David Miller adds another
possible prior art: TILO, a variant of the Sparc SILO boot loader, that
packs a 32-bit and 64-bit Linux kernel into one file an figures out which
one to actually boot depending on the machine it is running on, but Rayson
doubts this counts, because the project was started in 1995
or 1996, while NeXT's patent filing is from 1993. Ryan also entered the
discussion and clarified that FatELF has a few fields that Apple's
format doesn't, so the flow chart in the patent isn't the same. However,
it's not clear yet if Ryan should be concerned and if so, which changes he
should make to work around the patent.
The future
There are still a lot of things to do. Patches for
module-init-tools, glibc (for loading shared FatELF
libraries), and elfutils still have to be written. And the patches
for binutils and gdb still have to be submitted, Ryan said:
I've only submitted the kernel patches. If the kernel
community is ultimately uninterested, there's not much point in bothering
the binutils people. The patches for all the other parts are sitting in my
Mercurial repository. If FatELF makes it into Linus's mainline, several
other mailing lists will get patches sent to them right away.
Ryan even thinks about embedding binaries from other UNIXes into a
FatELF file. He mentions FreeBSD, OpenBSD, NetBSD and OpenSolaris. In
principle, each operating system using ELF files for its binaries could be
supported. In addition to the ones mentioned, this also includes DragonFly BSD,
IRIX, HP-UX, Haiku, and Syllable. The implementations should not be
difficult, according to Ryan:
You have to touch several parts of the system, but
the changes you have to make to them are reasonably straightforward, so
you'll probably spend more time getting comfortable with their code than
patching it. And then twice as long trying to figure out how to boot a
custom kernel and libc.
The support for other operating systems will make it possible to ship
one file that works across Linux and FreeBSD, for example, without a platform
compatibility layer. This could also be an interesting feature for hybrid
Debian GNU/Linux and Debian
GNU/kFreeBSD binaries.
The biggest hurdle that FatELF is facing now are adoption pains, Ryan
explains:
If Linus applies it in the 2.6.33 merge window and
every other project puts the patches into revision control, too, we're
looking at maybe 6 to 12 months before distributions pick it up and some
time later before you can count on people running those
distributions.
Another disadvantage is the problems with creating fat binaries in build
systems. For example, Erik de Castro Lopo writes
about this on his blog. According to Ryan making the build systems
handle this situation cleanly still needs some work. He expects the most
popular way to build FatELF files will be to do two totally independent
builds and glue them together instead of rethinking autoconf and
such.
Conclusion
While a universal binary seems much less interesting for Linux than for
Mac OS X, because most software in Linux is installed from within a package
manager that knows the architecture, the concept is interesting for
proprietary Linux software such as games. For a non-expert user, it's not
evident if their processor is 32 or 64 bit. A FatELF download embedding
both the x86 and x86_64 binary may be a good solution for this problem.
And
if ARM-based smartbooks become more popular, an x86/x86_64/arm FatELF
binary may be the
perfect way to distribute a binary that works on 32 bit Intel Atom
netbooks, 64 bit Intel computers and ARM smartbooks.
(
Log in to post comments)