May 1, 2013
This article was contributed by Andrew Shewmaker
In 2011, LWN reported on
the x32 system call ABI for the Linux kernel.
Now x32 is officially supported by a number of upstream projects and had some time to
mature. As x32 begins finding its way into distributions, more people will be able
to try it out on real world applications. But what kind of applications have the
potential for the greatest benefit? And how well do the distributions support
x32? This article will address those questions and detail some ways to
experiment with the x32 ABI.
The x32 ABI uses the same registers and floating point hardware as the
x86-64 ABI, but with 32-bit pointers and long variables as on ia32
(32-bit x86). On
the down side, fully supporting x32 means having a third set of libraries.
However, its assembly code is more efficient than ia32. Additionally, if an
application uses many pointers and requires less than 4GB of RAM, then x32 can
be a big win over x86-64. It can theoretically double the number of pointers
held in cache, reducing misses, and halve the amount of system memory used to
store pointers, reducing page faults.
Benchmark results
The x32 project site lists
two SPEC benchmarks that show particularly good benefits.
MCF,
a network simplex algorithm making heavy use of pointers, shows x32 with a 40%
advantage over x86-64.
Crafty,
a chess program made up predominantly of integer code, shows x32 with a 40%
advantage over ia32. In H. Peter Anvin's 2011 Linux Plumbers Conference talk
x32
— a native 32-bit ABI for x86-64 [PPT],
he lists embedded devices as one of the primary use cases. He reports 5-10%
improvement for the SPEC CPU 2K/2006 INT geomean over ia32 and x86-64, and 5-11%
advantage for SPEC CPU 2K/2006 FP geomean over ia32, but none against x86-64.
The MCF results have been called out as the most impressive, but the majority
of users may care more about video codec performance. The news is good there
too. H.264 performed
35% better on x32 than on ia32 and 15-20% better than on x86-64.
Intel gives additional SPEC and
Embedded Microprocessor Benchmark Consortium
(EEMBC) performance numbers in a
paper describing the work done to support x32
in GCC and their own compiler.
Parser,
a syntactic parser of English, shows an approximately 17% improvement for x32 over
both ia32 and x86-64. And
Eon,
a probabilistic ray tracer, performs almost 30% better than ia32. In general,
GCC's floating-point code generation isn't as good on x32 as on x86-64. However,
Intel's compiler does at least as well for x32 as x86-64. Take
Facerec,
a facial recognition algorithm which compares graph structures with vertices
containing information computed from fast Fourier transforms, where x32 performs
almost 40% better than x86-64.
Scientific simulations are often limited by memory latency and capacity.
Nathalie Rauschmayr has presented results showing how
x32 benefits some of CERN's applications [PDF]. These applications use millions of
pointers and, when compiled as x32, see a reduction in physical memory use of 10-30%.
While some ran no faster, a couple sped up by 10-15%. In those cases, the
underlying cause for the improvement is a 10-38% reduction in page faults.
Loop unrolling is an important optimization for scientific code and x32 could
use some work in that area, but H.J. Lu, the driving force behind x32, doesn't
plan to do that work himself.
Distribution support
As Nathalie notes, a working x32 environment doesn't come out of the box in a
Red Hat– or Debian–based Linux distribution just yet. Currently, a person must
build glibc with an intermediate GCC build before building a full GCC. Fortunately,
distributions are beginning to provide an easier way to experiment with x32. Two
of the first projects to feature a release with official x32 support are
Gentoo and the Yocto
meta-distribution, and a couple of traditional distributions are
following to one degree or another.
Gentoo's first x32 release candidate came out last June;
support is now integrated into the main tree. You can download and install
it like any other port. When I installed it, I ran
into a problem creating an initramfs due to a BusyBox linking error. Configuring
the kernel with everything built-in (eliminating the need for an initramfs
altogether) worked around this problem, resulting in a working x32 environment. The remaining
x32 issues are largely in multimedia packages' hand-optimized assembly
routines. As soon as Gentoo's infrastructure team upgrades the kernel on a
development box, Mike Frysinger plans on announcing the availability of an x32
environment for upstream multimedia projects to port their optimized assembly to
the x32 ABI.
I have not tried the Linux Foundation's Yocto project. It is used for
creating custom embedded distributions and has
integrated x32 support
into its core, as well as fixing most of its recipes.
x32 won't be an official ABI for Debian until after the next release,
Wheezy. Daniel Schepler, who oversees the new port, told me that it is in
reasonable shape. Grub2 doesn't currently build, so an x32-only install is not
possible yet. But Daniel provides directions to install into a chroot
environment from his private archive on the
Debian x32 Port wiki. Newer
packages are available from Debian Ports.
Simpler desktop environments like XFCE are installable, and KDE is almost ready.
However, more work needs to be done before GNOME is installable. Its biggest
blocker is that Iceweasel currently doesn't build. Also, LibreOffice and PHP
are not ready yet.
My experience with Debian went well. After a minimal installation
from the stable repository and upgrading to Wheezy, I installed one of Daniel's
kernels and used the debootstrap tool to quickly create an x32 chroot from
Daniel's personal archive. After that, I was able to point to the most recent
packages on the Debian Ports archive. Installing a chroot directly from those
involves learning how to use the multistrap tool, and I stopped short of that.
Ubuntu shipped with
limited x32 support in 13.04. However, it isn't planning on providing a full
port of x32 packages like Debian. Ubuntu is simply providing a kernel with the
x32 syscalls and a functional
tri-arch toolchain with GCC and eglibc.
x32 hasn't found as much acceptance in other open-source projects. Fedora has
no plans for x32,
which means it's unlikely for RHEL too. Arch Linux has some
unofficial packages.
LLVM users can apply
patches from June 2012. But merging of x32 support looks unlikely at this
point. A team from Google inquired about it earlier this year, and they were
told that the patch submitter had not addressed the maintainers' concerns yet.
Some experiments
After reading Diego Elio "Flameeyes" Pettenò's series of posts on
debunking x32 myths,
I decided someone ought to compare x32 and x86-64 performance of the
B-heap simulator
mentioned in Poul-Henning Kamp's ACM article
"You're Doing It Wrong."
This example is interesting because Kamp created a data structure called a
"B-heap" to take advantage of
the large address space provided by 64-bit pointers. That would seem to
argue against using x32's smaller address space, but in
his article he also says, "The order of magnitude of difference obviously
originates with the number of levels of heap inside each VM page, so the
ultimate speedup will be on machines with small pointer sizes and big page
sizes."
Kamp's B-heap simulator compares the virtual memory performance of a traditional
binary heap to his B-heap. A port from BSD to Linux went smoothly, and the
results are identical. No changes were needed to compile the simulator in my
Debian x32 chroot. Its virtual memory footprint was reduced by 19% and its
resident set size was 88% smaller. Performance of x32 always exceeded x86-64,
especially for larger problem sizes. In particular, 500K ran 15% faster. It's
important to note that the simulator itself doesn't use much memory, so it would
be interesting to see how a real-world use of a B-heap, as in the
Varnish Cache, would compare.
It will be interesting to see which other applications will benefit from x32. If
it shows as much promise for the embedded space as its creators suggest, high
performance computing as CERN shows, and web servers as my experiment indicates,
then desktop users in the middle of the computing spectrum might also benefit.
The key to that segment will likely be whether multimedia codecs (other than the
H.264 reference implementation tested by SPEC CPU 2006) see a substantial enough
improvement to justify developing and maintaining another assembly code
implementation of their core routines. I predict they will. Perhaps x32 will
prove its value in such a broad number of cases that it becomes the default for
many software packages. Perhaps we'll even see developers for other 64-bit
architectures such as IBM's
Power or upcoming ARM server chips propose similar ABIs.
(
Log in to post comments)