x32 ABI support by distributions
| Did you know...? LWN.net is a subscriber-supported publication; we rely on subscribers to keep the entire operation going. Please help out by buying a subscription and keeping LWN on the net. |
In 2011, LWN reported on the x32 system call ABI for the Linux kernel. Now x32 is officially supported by a number of upstream projects and had some time to mature. As x32 begins finding its way into distributions, more people will be able to try it out on real world applications. But what kind of applications have the potential for the greatest benefit? And how well do the distributions support x32? This article will address those questions and detail some ways to experiment with the x32 ABI.
The x32 ABI uses the same registers and floating point hardware as the x86-64 ABI, but with 32-bit pointers and long variables as on ia32 (32-bit x86). On the down side, fully supporting x32 means having a third set of libraries. However, its assembly code is more efficient than ia32. Additionally, if an application uses many pointers and requires less than 4GB of RAM, then x32 can be a big win over x86-64. It can theoretically double the number of pointers held in cache, reducing misses, and halve the amount of system memory used to store pointers, reducing page faults.
Benchmark results
The x32 project site lists two SPEC benchmarks that show particularly good benefits. MCF, a network simplex algorithm making heavy use of pointers, shows x32 with a 40% advantage over x86-64. Crafty, a chess program made up predominantly of integer code, shows x32 with a 40% advantage over ia32. In H. Peter Anvin's 2011 Linux Plumbers Conference talk x32 — a native 32-bit ABI for x86-64 [PPT], he lists embedded devices as one of the primary use cases. He reports 5-10% improvement for the SPEC CPU 2K/2006 INT geomean over ia32 and x86-64, and 5-11% advantage for SPEC CPU 2K/2006 FP geomean over ia32, but none against x86-64. The MCF results have been called out as the most impressive, but the majority of users may care more about video codec performance. The news is good there too. H.264 performed 35% better on x32 than on ia32 and 15-20% better than on x86-64.
Intel gives additional SPEC and Embedded Microprocessor Benchmark Consortium (EEMBC) performance numbers in a paper describing the work done to support x32 in GCC and their own compiler. Parser, a syntactic parser of English, shows an approximately 17% improvement for x32 over both ia32 and x86-64. And Eon, a probabilistic ray tracer, performs almost 30% better than ia32. In general, GCC's floating-point code generation isn't as good on x32 as on x86-64. However, Intel's compiler does at least as well for x32 as x86-64. Take Facerec, a facial recognition algorithm which compares graph structures with vertices containing information computed from fast Fourier transforms, where x32 performs almost 40% better than x86-64.
Scientific simulations are often limited by memory latency and capacity. Nathalie Rauschmayr has presented results showing how x32 benefits some of CERN's applications [PDF]. These applications use millions of pointers and, when compiled as x32, see a reduction in physical memory use of 10-30%. While some ran no faster, a couple sped up by 10-15%. In those cases, the underlying cause for the improvement is a 10-38% reduction in page faults. Loop unrolling is an important optimization for scientific code and x32 could use some work in that area, but H.J. Lu, the driving force behind x32, doesn't plan to do that work himself.
Distribution support
As Nathalie notes, a working x32 environment doesn't come out of the box in a Red Hat– or Debian–based Linux distribution just yet. Currently, a person must build glibc with an intermediate GCC build before building a full GCC. Fortunately, distributions are beginning to provide an easier way to experiment with x32. Two of the first projects to feature a release with official x32 support are Gentoo and the Yocto meta-distribution, and a couple of traditional distributions are following to one degree or another.
Gentoo's first x32 release candidate came out last June; support is now integrated into the main tree. You can download and install it like any other port. When I installed it, I ran into a problem creating an initramfs due to a BusyBox linking error. Configuring the kernel with everything built-in (eliminating the need for an initramfs altogether) worked around this problem, resulting in a working x32 environment. The remaining x32 issues are largely in multimedia packages' hand-optimized assembly routines. As soon as Gentoo's infrastructure team upgrades the kernel on a development box, Mike Frysinger plans on announcing the availability of an x32 environment for upstream multimedia projects to port their optimized assembly to the x32 ABI.
I have not tried the Linux Foundation's Yocto project. It is used for creating custom embedded distributions and has integrated x32 support into its core, as well as fixing most of its recipes.
x32 won't be an official ABI for Debian until after the next release, Wheezy. Daniel Schepler, who oversees the new port, told me that it is in reasonable shape. Grub2 doesn't currently build, so an x32-only install is not possible yet. But Daniel provides directions to install into a chroot environment from his private archive on the Debian x32 Port wiki. Newer packages are available from Debian Ports. Simpler desktop environments like XFCE are installable, and KDE is almost ready. However, more work needs to be done before GNOME is installable. Its biggest blocker is that Iceweasel currently doesn't build. Also, LibreOffice and PHP are not ready yet.
My experience with Debian went well. After a minimal installation from the stable repository and upgrading to Wheezy, I installed one of Daniel's kernels and used the debootstrap tool to quickly create an x32 chroot from Daniel's personal archive. After that, I was able to point to the most recent packages on the Debian Ports archive. Installing a chroot directly from those involves learning how to use the multistrap tool, and I stopped short of that.
Ubuntu shipped with limited x32 support in 13.04. However, it isn't planning on providing a full port of x32 packages like Debian. Ubuntu is simply providing a kernel with the x32 syscalls and a functional tri-arch toolchain with GCC and eglibc.
x32 hasn't found as much acceptance in other open-source projects. Fedora has no plans for x32, which means it's unlikely for RHEL too. Arch Linux has some unofficial packages. LLVM users can apply patches from June 2012. But merging of x32 support looks unlikely at this point. A team from Google inquired about it earlier this year, and they were told that the patch submitter had not addressed the maintainers' concerns yet.
Some experiments
After reading Diego Elio "Flameeyes" Pettenò's series of posts on
debunking x32 myths,
I decided someone ought to compare x32 and x86-64 performance of the
B-heap simulator
mentioned in Poul-Henning Kamp's ACM article
"You're Doing It Wrong."
This example is interesting because Kamp created a data structure called a
"B-heap" to take advantage of
the large address space provided by 64-bit pointers. That would seem to
argue against using x32's smaller address space, but in
his article he also says, "The order of magnitude of difference obviously
originates with the number of levels of heap inside each VM page, so the
ultimate speedup will be on machines with small pointer sizes and big page
sizes.
"
Kamp's B-heap simulator compares the virtual memory performance of a traditional binary heap to his B-heap. A port from BSD to Linux went smoothly, and the results are identical. No changes were needed to compile the simulator in my Debian x32 chroot. Its virtual memory footprint was reduced by 19% and its resident set size was 88% smaller. Performance of x32 always exceeded x86-64, especially for larger problem sizes. In particular, 500K ran 15% faster. It's important to note that the simulator itself doesn't use much memory, so it would be interesting to see how a real-world use of a B-heap, as in the Varnish Cache, would compare.
It will be interesting to see which other applications will benefit from x32. If
it shows as much promise for the embedded space as its creators suggest, high
performance computing as CERN shows, and web servers as my experiment indicates,
then desktop users in the middle of the computing spectrum might also benefit.
The key to that segment will likely be whether multimedia codecs (other than the
H.264 reference implementation tested by SPEC CPU 2006) see a substantial enough
improvement to justify developing and maintaining another assembly code
implementation of their core routines. I predict they will. Perhaps x32 will
prove its value in such a broad number of cases that it becomes the default for
many software packages. Perhaps we'll even see developers for other 64-bit
architectures such as IBM's
Power or upcoming ARM server chips propose similar ABIs.
| Index entries for this article | |
|---|---|
| GuestArticles | Shewmaker, Andrew |
(Log in to post comments)
x32 ABI support by distributions
Posted May 2, 2013 13:27 UTC (Thu) by epa (subscriber, #39769) [Link]
x32 ABI support by distributions
Posted May 6, 2013 5:46 UTC (Mon) by butlerm (subscriber, #13312) [Link]
The issue with pointers is not that much different. For the majority of applications, 32 bit pointers are ample. Using 64 bit pointers where they aren't necessary is just pouring power and performance down the drain. A five percent average performance and efficiency gain across potentially billions of devices is nothing to be scoffed at. We could power whole cities with that kind of savings.
x32 ABI support by distributions
Posted May 2, 2013 14:55 UTC (Thu) by Flameeyes (guest, #51238) [Link]
Also, given that the performance of x32 depend vastly on the way the CPU has been designed (Intel makes it clear that one of the reasons why it performs so much better is that the 64-bit support in some models of Atom is terrible), the numbers make little to no sense without specifying on which CPU you're running them.
I'm still sceptical that all the work that we (including me, not at my desire) poured into getting this stuff to "work" is worth it. By the way of idea, Ruby still does not work on x32 because of assembly that does not work, either in 32- or 64-bit variants, when built on x32. You may call that minor, but I certainly don't.
x32 ABI support by distributions
Posted May 2, 2013 19:57 UTC (Thu) by deater (subscriber, #11746) [Link]
This is easy on those architectures because the 32bit to 64-bit ISA expansion really only increased register size, and didn't change other things (like the number of available registers) so didn't require some extra in-between architecture like x32.
Doing this with ARM will be a hassle, as ARM64 is a completely different architecture than ARM32. What documentation I have seen looks like they are assuming LP64 implementations, so a separate hack like x32 would likely be necessary.
It is odd ARM went from supporting a nice compact ISA like Thumb2 to a large 64-bit encoding in ARM64.
x32 ABI support by distributions
Posted May 24, 2013 8:52 UTC (Fri) by Matc (guest, #91112) [Link]
Arm have the same problem than x86 : arm64 is completly different from arm (32 bit).
It is new instruction, different registers, ...
But because arm64 have instruction for working on 32bits or 64bits data/register, it should be easy to do something like x32 on arm.
In fact arm64 is not a traditional RISC instruction set like PPC, mips, sparc :
- the instruction size is 32 (and not 64 bits)
- it works on 32 or 64 bits data
- register/instruction are different between native 32 bits mode and 64 bits mode.
x32 ABI support by distributions
Posted May 24, 2013 13:18 UTC (Fri) by deater (subscriber, #11746) [Link]
> mips, sparc :
> - the instruction size is 32 (and not 64 bits)
> - it works on 32 or 64 bits data
> - register/instruction are different between native 32 bits
> mode and 64 bits mode.
As far as I know there aren't any RISC chips with 64-bit long instructions (if that's what you mean). Usually they are fixed 4-byte (32bits).
Also the 64-bit RISC chips can operate on 32-bit (and smaller) values just fine. If you mean you can't operate on the bottom 32-bits with ALU instructions while ignoring the top 32-bit, that might be true.
I'll give you the last point though. ARM is unusual in that the instruction encoding is completely different between 32 and 64-bit. Even x86 didn't go to that full extreme
x32 ABI support by distributions
Posted May 2, 2013 22:54 UTC (Thu) by Shewmaker (guest, #1126) [Link]
I'm not sure where I was going, exactly, with that last sentence. Please disregard it.However, rather than say that it isn't a generally-applicable idea, I think that it's a little more accurate to say that other modern architectures already have good 32-bit ABIs. The Power architecture seems to be a bit of an exception in terms of not needing to rework their 32-bit ABI like others. MIPS had to create a new 32-bit ABI when it went to a 64-bit arch. And ARM has gone from T32 to A32 for their new 64-bit arch.
I apologize for not listing the processor I used in my experiment. It was a 2GHz Intel Core i7.
Daniel's x32 port of Debian includes ruby packages. I only executed a trivial script, but it appeared to work fine.
x32 ABI support by distributions
Posted May 3, 2013 7:18 UTC (Fri) by Flameeyes (guest, #51238) [Link]
One of the reasons why any other architecture has a decent (I wouldn't go as far as good especially for MIPS) 32-bit ABI is that the standard x86 ABI sucked, especially for multimedia programs, as you have too few registers. As I already said on my blog, x32 would have made sense _nine years ago_, then the work wouldn't have been wasted on twice the amount of porting.
And please don't bring up the reference H264 codec. The point of reference codecs _is_ to suck in term of performance, that's why you use x264 and not the reference codec nowadays.
As for Ruby, last time I saw patches floating around for 1.9, they caused it to crash at runtime, will check again what Debian/x32 is using, maybe they fixed it, upstream still hasn't.
x32 ABI support by distributions
Posted May 2, 2013 15:13 UTC (Thu) by neilm (guest, #28422) [Link]
x32 ABI support by distributions
Posted May 2, 2013 17:21 UTC (Thu) by hmh (subscriber, #3838) [Link]
x32 ABI support by distributions
Posted May 2, 2013 17:33 UTC (Thu) by neilm (guest, #28422) [Link]
x32 ABI support by distributions
Posted May 2, 2013 22:57 UTC (Thu) by Shewmaker (guest, #1126) [Link]
Thank you for clarifying.For those interested in trying Daniel's work, he told me today that he updated his instructions to use the multistrap tool.
x32 ABI support by distributions
Posted May 6, 2013 22:05 UTC (Mon) by juliank (guest, #45896) [Link]
x32 ABI support by distributions
Posted May 6, 2013 23:05 UTC (Mon) by dlang (guest, #313) [Link]
x32 requires a 64 bit CPU, the i386 port works on a 32 bit CPU.
So why should we require the elimination of support for a class of CPUs to get a (slightly) more efficient mode of operation for newer CPUs?
x32 ABI support by distributions
Posted May 7, 2013 1:31 UTC (Tue) by intgr (subscriber, #39733) [Link]
Funny you should mention 386, which is not supported by the Linux kernel any more (and probably not by most distros for a longer time). :)
32-bit x86 processors are quite rare these days and chances are they're too underpowered for a modern distro anyway (on the desktop at least). It won't be long until they're completely irrelevant. Definitely not 2038, perhaps 2018 at the latest.
x32 ABI support by distributions
Posted May 7, 2013 2:57 UTC (Tue) by dlang (guest, #313) [Link]
the i386 port supports all x86 processors that do not have 64 bit support. This includes a lot of low-power embedded processors that are still manufactured. It will be a lot longer than 5 years before they all disappear.
x32 ABI support by distributions
Posted May 7, 2013 9:23 UTC (Tue) by neilm (guest, #28422) [Link]
1) each new architecture requires significant space in the archive, adding to storage space and transfer bandwidth for mirrors, processing time for archive software and migration tools and buildd times
2) It's only slightly more efficient.
I've (personally) yet to be convinced that it has any major advantage over amd64.
x32 ABI support by distributions
Posted May 7, 2013 12:11 UTC (Tue) by juliank (guest, #45896) [Link]
x32 ABI support by distributions
Posted May 7, 2013 13:33 UTC (Tue) by mirabilos (subscriber, #84359) [Link]
I’m *for* an inclusion of x32 into Debian proper, as architecture of its own, side by side with i386 and amd64. I’d help.
x32 ABI support by distributions
Posted May 10, 2013 18:17 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]
x32 ABI support by distributions
Posted May 10, 2013 18:26 UTC (Fri) by juliank (guest, #45896) [Link]
The question is: Do we want to migrate 32-bit architectures to a 64-bit time_t or not?
x32 ABI support by distributions
Posted May 10, 2013 18:44 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]
x32 ABI support by distributions
Posted May 10, 2013 19:24 UTC (Fri) by dlang (guest, #313) [Link]
It provides access to the larger number of registers that a 64 bit binary can access, without paying the size overhead for the larger integers.
AMD64 is an oddity among 32/64 bit systems because the 64 bit version has more registers available than the 32 bit version. Since the performance of the x86 family of processors is frequently limited because of the small number of registers, this can make a very significant difference in the performance of your software. This is the biggest thing that makes x32 faster than i386 (the other being that the compiler can assume more functionality available than in the i386 port)
and the smaller pointer sized are what can make x32 faster than AMD64 code.
This is why x32 can be better than either AMD64 or i386 in at least some cases.
The question is how widespread are these cases. In practice, it is looking like the answer is "most".
x32 ABI support by distributions
Posted May 10, 2013 19:43 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]
I'm aware of the technical differences and why they can be expected to improve performance.
> The question is how widespread are these cases. In practice, it is looking like the answer is "most".
Well I would like to see some more evidence than a few examples which mostly show it can beat one or the other.
x32 ABI support by distributions
Posted May 10, 2013 19:49 UTC (Fri) by dlang (guest, #313) [Link]
remember that good benchmarks take a lot of runs to give good numbers, so it's not reasonable to say something like "test everything on the system"
the x32 people have posted quite a few tests over the time this has been discussed, and so it should be possible to find a program that's been tested that's similar to what you are interested in.
But if some more numbers would convince people, I'm sure that the x32 advocates would take the time to run the tests.
But if the answer to every set of tests is "well, that's only a few programs, show me more", it's not going to be worth their time to try ans satisfy the doubters.
Also remember that it's _not_ necessary to satisfy all the doubters, all it takes is convincing a large enough minority that it's worth supporting. Remember all the doubt about the AMD64 when it was new? doubters were saying that the larger registers were good, but the larger pointers would make the software slower, we should just all stick with i386 (unless you were some esoteric use case that needed more then 4G of address space)
x32 ABI support by distributions
Posted May 10, 2013 20:28 UTC (Fri) by mirabilos (subscriber, #84359) [Link]
I’d choose x32 for simplicity and less overhead for most real-life situations, except big databases and stuff, just like the SPARC world is doing currently.
(Oh, and: x32 instead of M-A amd64+i386 gives you working dselect ☺)
B-Heap page sizes
Posted May 2, 2013 18:55 UTC (Thu) by bcopeland (subscriber, #51750) [Link]
x32 isn't changing the OS page size, though, is it? That seems a separate detail unrelated to the choice of x32 or x86_64 (and pages are generally 4kb everywhere anyway).
It seems to me that x32 will clearly perform better than x86_64 on the B-heap, up to the point that you exhaust the address space, making it only useful for smaller numbers of objects. You'll get twice the fanout in every page, but only be able to access up 4 GB per instance. In the article, however, it seems the goal is to completely fill memory and use swap, relying on the tree layout to minimize disk transfers (cf. van Emde Boas-layout trees which take this idea even further).
B-Heap page sizes
Posted May 3, 2013 4:47 UTC (Fri) by Shewmaker (guest, #1126) [Link]
No, x32 doesn't change the page size. You are right. However, within the last couple of years Linux has gotten transparent huge pages and you can always use hugetlbfs, so 4k pages aren't the rule on x86-64 that they used to be.I agree with your reading of the B-heap article. A web accelerator like Varnish can direct requests to different instances based on a hash of the URL, so you could use many independent x32 B-heaps in parallel to make sure you fully use a large system's memory.
x32 ABI support by distributions
Posted May 3, 2013 8:06 UTC (Fri) by gmaxwell (guest, #30048) [Link]
I would take a sizable bet that they will not. If your codec is moving around tons of pointers ... it's not a well optimized codec to begin with.
x32 ABI support by distributions
Posted May 3, 2013 13:04 UTC (Fri) by mathstuf (subscriber, #69389) [Link]
x32 ABI support by distributions
Posted May 5, 2013 1:54 UTC (Sun) by gmaxwell (guest, #30048) [Link]
x32 ABI support by distributions
Posted May 4, 2013 21:58 UTC (Sat) by jdulaney (subscriber, #83672) [Link]
