LWN.net Logo

x32 ABI support by distributions

May 1, 2013

This article was contributed by Andrew Shewmaker

In 2011, LWN reported on the x32 system call ABI for the Linux kernel. Now x32 is officially supported by a number of upstream projects and had some time to mature. As x32 begins finding its way into distributions, more people will be able to try it out on real world applications. But what kind of applications have the potential for the greatest benefit? And how well do the distributions support x32? This article will address those questions and detail some ways to experiment with the x32 ABI.

The x32 ABI uses the same registers and floating point hardware as the x86-64 ABI, but with 32-bit pointers and long variables as on ia32 (32-bit x86). On the down side, fully supporting x32 means having a third set of libraries. However, its assembly code is more efficient than ia32. Additionally, if an application uses many pointers and requires less than 4GB of RAM, then x32 can be a big win over x86-64. It can theoretically double the number of pointers held in cache, reducing misses, and halve the amount of system memory used to store pointers, reducing page faults.

Benchmark results

The x32 project site lists two SPEC benchmarks that show particularly good benefits. MCF, a network simplex algorithm making heavy use of pointers, shows x32 with a 40% advantage over x86-64. Crafty, a chess program made up predominantly of integer code, shows x32 with a 40% advantage over ia32. In H. Peter Anvin's 2011 Linux Plumbers Conference talk x32 — a native 32-bit ABI for x86-64 [PPT], he lists embedded devices as one of the primary use cases. He reports 5-10% improvement for the SPEC CPU 2K/2006 INT geomean over ia32 and x86-64, and 5-11% advantage for SPEC CPU 2K/2006 FP geomean over ia32, but none against x86-64. The MCF results have been called out as the most impressive, but the majority of users may care more about video codec performance. The news is good there too. H.264 performed 35% better on x32 than on ia32 and 15-20% better than on x86-64.

Intel gives additional SPEC and Embedded Microprocessor Benchmark Consortium (EEMBC) performance numbers in a paper describing the work done to support x32 in GCC and their own compiler. Parser, a syntactic parser of English, shows an approximately 17% improvement for x32 over both ia32 and x86-64. And Eon, a probabilistic ray tracer, performs almost 30% better than ia32. In general, GCC's floating-point code generation isn't as good on x32 as on x86-64. However, Intel's compiler does at least as well for x32 as x86-64. Take Facerec, a facial recognition algorithm which compares graph structures with vertices containing information computed from fast Fourier transforms, where x32 performs almost 40% better than x86-64.

Scientific simulations are often limited by memory latency and capacity. Nathalie Rauschmayr has presented results showing how x32 benefits some of CERN's applications [PDF]. These applications use millions of pointers and, when compiled as x32, see a reduction in physical memory use of 10-30%. While some ran no faster, a couple sped up by 10-15%. In those cases, the underlying cause for the improvement is a 10-38% reduction in page faults. Loop unrolling is an important optimization for scientific code and x32 could use some work in that area, but H.J. Lu, the driving force behind x32, doesn't plan to do that work himself.

Distribution support

As Nathalie notes, a working x32 environment doesn't come out of the box in a Red Hat– or Debian–based Linux distribution just yet. Currently, a person must build glibc with an intermediate GCC build before building a full GCC. Fortunately, distributions are beginning to provide an easier way to experiment with x32. Two of the first projects to feature a release with official x32 support are Gentoo and the Yocto meta-distribution, and a couple of traditional distributions are following to one degree or another.

Gentoo's first x32 release candidate came out last June; support is now integrated into the main tree. You can download and install it like any other port. When I installed it, I ran into a problem creating an initramfs due to a BusyBox linking error. Configuring the kernel with everything built-in (eliminating the need for an initramfs altogether) worked around this problem, resulting in a working x32 environment. The remaining x32 issues are largely in multimedia packages' hand-optimized assembly routines. As soon as Gentoo's infrastructure team upgrades the kernel on a development box, Mike Frysinger plans on announcing the availability of an x32 environment for upstream multimedia projects to port their optimized assembly to the x32 ABI.

I have not tried the Linux Foundation's Yocto project. It is used for creating custom embedded distributions and has integrated x32 support into its core, as well as fixing most of its recipes.

x32 won't be an official ABI for Debian until after the next release, Wheezy. Daniel Schepler, who oversees the new port, told me that it is in reasonable shape. Grub2 doesn't currently build, so an x32-only install is not possible yet. But Daniel provides directions to install into a chroot environment from his private archive on the Debian x32 Port wiki. Newer packages are available from Debian Ports. Simpler desktop environments like XFCE are installable, and KDE is almost ready. However, more work needs to be done before GNOME is installable. Its biggest blocker is that Iceweasel currently doesn't build. Also, LibreOffice and PHP are not ready yet.

My experience with Debian went well. After a minimal installation from the stable repository and upgrading to Wheezy, I installed one of Daniel's kernels and used the debootstrap tool to quickly create an x32 chroot from Daniel's personal archive. After that, I was able to point to the most recent packages on the Debian Ports archive. Installing a chroot directly from those involves learning how to use the multistrap tool, and I stopped short of that.

Ubuntu shipped with limited x32 support in 13.04. However, it isn't planning on providing a full port of x32 packages like Debian. Ubuntu is simply providing a kernel with the x32 syscalls and a functional tri-arch toolchain with GCC and eglibc.

x32 hasn't found as much acceptance in other open-source projects. Fedora has no plans for x32, which means it's unlikely for RHEL too. Arch Linux has some unofficial packages. LLVM users can apply patches from June 2012. But merging of x32 support looks unlikely at this point. A team from Google inquired about it earlier this year, and they were told that the patch submitter had not addressed the maintainers' concerns yet.

Some experiments

After reading Diego Elio "Flameeyes" Pettenò's series of posts on debunking x32 myths, I decided someone ought to compare x32 and x86-64 performance of the B-heap simulator mentioned in Poul-Henning Kamp's ACM article "You're Doing It Wrong." This example is interesting because Kamp created a data structure called a "B-heap" to take advantage of the large address space provided by 64-bit pointers. That would seem to argue against using x32's smaller address space, but in his article he also says, "The order of magnitude of difference obviously originates with the number of levels of heap inside each VM page, so the ultimate speedup will be on machines with small pointer sizes and big page sizes."

Kamp's B-heap simulator compares the virtual memory performance of a traditional binary heap to his B-heap. A port from BSD to Linux went smoothly, and the results are identical. No changes were needed to compile the simulator in my Debian x32 chroot. Its virtual memory footprint was reduced by 19% and its resident set size was 88% smaller. Performance of x32 always exceeded x86-64, especially for larger problem sizes. In particular, 500K ran 15% faster. It's important to note that the simulator itself doesn't use much memory, so it would be interesting to see how a real-world use of a B-heap, as in the Varnish Cache, would compare.

It will be interesting to see which other applications will benefit from x32. If it shows as much promise for the embedded space as its creators suggest, high performance computing as CERN shows, and web servers as my experiment indicates, then desktop users in the middle of the computing spectrum might also benefit. The key to that segment will likely be whether multimedia codecs (other than the H.264 reference implementation tested by SPEC CPU 2006) see a substantial enough improvement to justify developing and maintaining another assembly code implementation of their core routines. I predict they will. Perhaps x32 will prove its value in such a broad number of cases that it becomes the default for many software packages. Perhaps we'll even see developers for other 64-bit architectures such as IBM's Power or upcoming ARM server chips propose similar ABIs.


(Log in to post comments)

x32 ABI support by distributions

Posted May 2, 2013 13:27 UTC (Thu) by epa (subscriber, #39769) [Link]

Next up I'd like to see x16 so that programs using less than 64 kilobytes of memory can run more efficiently. The user space can be ported from Minix 1.0.

x32 ABI support by distributions

Posted May 6, 2013 5:46 UTC (Mon) by butlerm (subscriber, #13312) [Link]

On the 68000, code compiled using 16 bit integers and 32 bit pointers tended to run much faster than code compiled using a 32/32 model. Of course there were serious issues with using ints of that size, but it is essentially the same reason why approximately no one uses 64 bit ints in the modern world either - given contemporary hardware constraints 64 bit ints are an incredible waste of resources. People can use longs for that, where and when necessary.

The issue with pointers is not that much different. For the majority of applications, 32 bit pointers are ample. Using 64 bit pointers where they aren't necessary is just pouring power and performance down the drain. A five percent average performance and efficiency gain across potentially billions of devices is nothing to be scoffed at. We could power whole cities with that kind of savings.

x32 ABI support by distributions

Posted May 2, 2013 14:55 UTC (Thu) by Flameeyes (subscriber, #51238) [Link]

I'm surprised that you actually posted an article that brings up the (moot and stupid) point that ARM could look into this. The x32 ABI is vastly an exception created by the way x86-64 is built upon the old x86 code, and most certainly not a generally-applicable idea.

Also, given that the performance of x32 depend vastly on the way the CPU has been designed (Intel makes it clear that one of the reasons why it performs so much better is that the 64-bit support in some models of Atom is terrible), the numbers make little to no sense without specifying on which CPU you're running them.

I'm still sceptical that all the work that we (including me, not at my desire) poured into getting this stuff to "work" is worth it. By the way of idea, Ruby still does not work on x32 because of assembly that does not work, either in 32- or 64-bit variants, when built on x32. You may call that minor, but I certainly don't.

x32 ABI support by distributions

Posted May 2, 2013 19:57 UTC (Thu) by deater (subscriber, #11746) [Link]

what is strange is that I'm pretty sure Power, MIPS64, and SPARC64 have done this for years. Typically users run 32-bit userspaces with 64-bit kernels, for the various performance reasons mentioned.

This is easy on those architectures because the 32bit to 64-bit ISA expansion really only increased register size, and didn't change other things (like the number of available registers) so didn't require some extra in-between architecture like x32.

Doing this with ARM will be a hassle, as ARM64 is a completely different architecture than ARM32. What documentation I have seen looks like they are assuming LP64 implementations, so a separate hack like x32 would likely be necessary.

It is odd ARM went from supporting a nice compact ISA like Thumb2 to a large 64-bit encoding in ARM64.

x32 ABI support by distributions

Posted May 24, 2013 8:52 UTC (Fri) by Matc (guest, #91112) [Link]

> Doing this with ARM will be a hassle

Arm have the same problem than x86 : arm64 is completly different from arm (32 bit).
It is new instruction, different registers, ...

But because arm64 have instruction for working on 32bits or 64bits data/register, it should be easy to do something like x32 on arm.

In fact arm64 is not a traditional RISC instruction set like PPC, mips, sparc :
- the instruction size is 32 (and not 64 bits)
- it works on 32 or 64 bits data
- register/instruction are different between native 32 bits mode and 64 bits mode.

x32 ABI support by distributions

Posted May 24, 2013 13:18 UTC (Fri) by deater (subscriber, #11746) [Link]

> In fact arm64 is not a traditional RISC instruction set like PPC,
> mips, sparc :
> - the instruction size is 32 (and not 64 bits)
> - it works on 32 or 64 bits data
> - register/instruction are different between native 32 bits
> mode and 64 bits mode.

As far as I know there aren't any RISC chips with 64-bit long instructions (if that's what you mean). Usually they are fixed 4-byte (32bits).

Also the 64-bit RISC chips can operate on 32-bit (and smaller) values just fine. If you mean you can't operate on the bottom 32-bits with ALU instructions while ignoring the top 32-bit, that might be true.

I'll give you the last point though. ARM is unusual in that the instruction encoding is completely different between 32 and 64-bit. Even x86 didn't go to that full extreme

x32 ABI support by distributions

Posted May 2, 2013 22:54 UTC (Thu) by Shewmaker (subscriber, #1126) [Link]

I'm not sure where I was going, exactly, with that last sentence. Please disregard it.

However, rather than say that it isn't a generally-applicable idea, I think that it's a little more accurate to say that other modern architectures already have good 32-bit ABIs. The Power architecture seems to be a bit of an exception in terms of not needing to rework their 32-bit ABI like others. MIPS had to create a new 32-bit ABI when it went to a 64-bit arch. And ARM has gone from T32 to A32 for their new 64-bit arch.

I apologize for not listing the processor I used in my experiment. It was a 2GHz Intel Core i7.

Daniel's x32 port of Debian includes ruby packages. I only executed a trivial script, but it appeared to work fine.

x32 ABI support by distributions

Posted May 3, 2013 7:18 UTC (Fri) by Flameeyes (subscriber, #51238) [Link]

One thing I still hasn't seen in a single posting is a comparison of x32 vs x86-64 on an Opteron system, tbh. I have reason to think the improvement is smaller there.

One of the reasons why any other architecture has a decent (I wouldn't go as far as good especially for MIPS) 32-bit ABI is that the standard x86 ABI sucked, especially for multimedia programs, as you have too few registers. As I already said on my blog, x32 would have made sense _nine years ago_, then the work wouldn't have been wasted on twice the amount of porting.

And please don't bring up the reference H264 codec. The point of reference codecs _is_ to suck in term of performance, that's why you use x264 and not the reference codec nowadays.

As for Ruby, last time I saw patches floating around for 1.9, they caused it to crash at runtime, will check again what Debian/x32 is using, maybe they fixed it, upstream still hasn't.

x32 ABI support by distributions

Posted May 2, 2013 15:13 UTC (Thu) by neilm (subscriber, #28422) [Link]

Just for clarity, there are no plans at the moment for the Debian Project to support x32. Some work has been done by Daniel indeed, but that's a far cry from actually becoming an official port.

x32 ABI support by distributions

Posted May 2, 2013 17:21 UTC (Thu) by hmh (subscriber, #3838) [Link]

Indeed. However, it looks useful enough for certain packages that we might as well support it as a partial sub-arch in Debian, if nothing else. We shall see...

x32 ABI support by distributions

Posted May 2, 2013 17:33 UTC (Thu) by neilm (subscriber, #28422) [Link]

Yeah, but that requires sub-arch support in dak, which was being discussed in Bosnia... Perhaps this is becoming a little too specific for LWN though :)

x32 ABI support by distributions

Posted May 2, 2013 22:57 UTC (Thu) by Shewmaker (subscriber, #1126) [Link]

Thank you for clarifying.

For those interested in trying Daniel's work, he told me today that he updated his instructions to use the multistrap tool.

x32 ABI support by distributions

Posted May 6, 2013 22:05 UTC (Mon) by juliank (subscriber, #45896) [Link]

And I'd probably prefer not to have an official x32 port at all, at least until the i386 port is dropped. So, I say let's keep this in mind for 2038, but not earlier.

x32 ABI support by distributions

Posted May 6, 2013 23:05 UTC (Mon) by dlang (✭ supporter ✭, #313) [Link]

Why, the x32 port does not replace the i386 port.

x32 requires a 64 bit CPU, the i386 port works on a 32 bit CPU.

So why should we require the elimination of support for a class of CPUs to get a (slightly) more efficient mode of operation for newer CPUs?

x32 ABI support by distributions

Posted May 7, 2013 1:31 UTC (Tue) by intgr (subscriber, #39733) [Link]

> Why, the x32 port does not replace the i386 port.

Funny you should mention 386, which is not supported by the Linux kernel any more (and probably not by most distros for a longer time). :)

32-bit x86 processors are quite rare these days and chances are they're too underpowered for a modern distro anyway (on the desktop at least). It won't be long until they're completely irrelevant. Definitely not 2038, perhaps 2018 at the latest.

x32 ABI support by distributions

Posted May 7, 2013 2:57 UTC (Tue) by dlang (✭ supporter ✭, #313) [Link]

note the comment was about the i386 port, not supporting the 80386 CPU

the i386 port supports all x86 processors that do not have 64 bit support. This includes a lot of low-power embedded processors that are still manufactured. It will be a lot longer than 5 years before they all disappear.

x32 ABI support by distributions

Posted May 7, 2013 9:23 UTC (Tue) by neilm (subscriber, #28422) [Link]

Two reasons:
1) each new architecture requires significant space in the archive, adding to storage space and transfer bandwidth for mirrors, processing time for archive software and migration tools and buildd times
2) It's only slightly more efficient.
I've (personally) yet to be convinced that it has any major advantage over amd64.

x32 ABI support by distributions

Posted May 7, 2013 12:11 UTC (Tue) by juliank (subscriber, #45896) [Link]

I consider x32 mostly to be a replacement for i386. I don't want to have 2 32-bit-ish x86 ports in the archive, as they cause work, people, and require space.

x32 ABI support by distributions

Posted May 7, 2013 13:33 UTC (Tue) by mirabilos (subscriber, #84359) [Link]

It’s not. It’s a replacement for (a number between 50 and 75, percent of use cases of) amd64, actually.

I’m *for* an inclusion of x32 into Debian proper, as architecture of its own, side by side with i386 and amd64. I’d help.

x32 ABI support by distributions

Posted May 10, 2013 18:17 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]

The question for Debian and other general-purpose distributions is: what does x32 provide that multiarch amd64+i386 does not? It is all very well showing benchmarks where x32 is better than one or the other, but it doesn't seem to be a useful replacement for i386 unless (1) it performs significantly better than <em>both</em> for some applications, and (2) it is not significantly worse than i386 for any applications where i386 is currently faster than amd64.

x32 ABI support by distributions

Posted May 10, 2013 18:26 UTC (Fri) by juliank (subscriber, #45896) [Link]

At least starting 2038 it will be important that i386 has 32-bit time_t, whereas x32 has 64-bit time_t.

The question is: Do we want to migrate 32-bit architectures to a 64-bit time_t or not?

x32 ABI support by distributions

Posted May 10, 2013 18:44 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]

I expect that well before 2038 a 4 GB virtual memory space will be so laughably small that both no-one would want to install either i386 or x32.

x32 ABI support by distributions

Posted May 10, 2013 19:24 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

> what does x32 provide that multiarch amd64+i386 does not?

It provides access to the larger number of registers that a 64 bit binary can access, without paying the size overhead for the larger integers.

AMD64 is an oddity among 32/64 bit systems because the 64 bit version has more registers available than the 32 bit version. Since the performance of the x86 family of processors is frequently limited because of the small number of registers, this can make a very significant difference in the performance of your software. This is the biggest thing that makes x32 faster than i386 (the other being that the compiler can assume more functionality available than in the i386 port)

and the smaller pointer sized are what can make x32 faster than AMD64 code.

This is why x32 can be better than either AMD64 or i386 in at least some cases.

The question is how widespread are these cases. In practice, it is looking like the answer is "most".

x32 ABI support by distributions

Posted May 10, 2013 19:43 UTC (Fri) by BenHutchings (subscriber, #37955) [Link]

> It provides access to the larger number of registers that a 64 bit binary can access, without paying the size overhead for the larger integers.

I'm aware of the technical differences and why they can be expected to improve performance.

> The question is how widespread are these cases. In practice, it is looking like the answer is "most".

Well I would like to see some more evidence than a few examples which mostly show it can beat one or the other.

x32 ABI support by distributions

Posted May 10, 2013 19:49 UTC (Fri) by dlang (✭ supporter ✭, #313) [Link]

what would you suggest?

remember that good benchmarks take a lot of runs to give good numbers, so it's not reasonable to say something like "test everything on the system"

the x32 people have posted quite a few tests over the time this has been discussed, and so it should be possible to find a program that's been tested that's similar to what you are interested in.

But if some more numbers would convince people, I'm sure that the x32 advocates would take the time to run the tests.

But if the answer to every set of tests is "well, that's only a few programs, show me more", it's not going to be worth their time to try ans satisfy the doubters.

Also remember that it's _not_ necessary to satisfy all the doubters, all it takes is convincing a large enough minority that it's worth supporting. Remember all the doubt about the AMD64 when it was new? doubters were saying that the larger registers were good, but the larger pointers would make the software slower, we should just all stick with i386 (unless you were some esoteric use case that needed more then 4G of address space)

x32 ABI support by distributions

Posted May 10, 2013 20:28 UTC (Fri) by mirabilos (subscriber, #84359) [Link]

Actually, I’m not interested in any benchmark results.

I’d choose x32 for simplicity and less overhead for most real-life situations, except big databases and stuff, just like the SPARC world is doing currently.

(Oh, and: x32 instead of M-A amd64+i386 gives you working dselect ☺)

B-Heap page sizes

Posted May 2, 2013 18:55 UTC (Thu) by bcopeland (subscriber, #51750) [Link]

>That would seem to argue against using x32's smaller address space, but in his article he also says, "The order of magnitude of difference obviously originates with the number of levels of heap inside each VM page, so the ultimate speedup will be on machines with small pointer sizes and big page sizes.

x32 isn't changing the OS page size, though, is it? That seems a separate detail unrelated to the choice of x32 or x86_64 (and pages are generally 4kb everywhere anyway).

It seems to me that x32 will clearly perform better than x86_64 on the B-heap, up to the point that you exhaust the address space, making it only useful for smaller numbers of objects. You'll get twice the fanout in every page, but only be able to access up 4 GB per instance. In the article, however, it seems the goal is to completely fill memory and use swap, relying on the tree layout to minimize disk transfers (cf. van Emde Boas-layout trees which take this idea even further).

B-Heap page sizes

Posted May 3, 2013 4:47 UTC (Fri) by Shewmaker (subscriber, #1126) [Link]

No, x32 doesn't change the page size. You are right. However, within the last couple of years Linux has gotten transparent huge pages and you can always use hugetlbfs, so 4k pages aren't the rule on x86-64 that they used to be.

I agree with your reading of the B-heap article. A web accelerator like Varnish can direct requests to different instances based on a hash of the URL, so you could use many independent x32 B-heaps in parallel to make sure you fully use a large system's memory.

x32 ABI support by distributions

Posted May 3, 2013 8:06 UTC (Fri) by gmaxwell (subscriber, #30048) [Link]

> The key to that segment will likely be whether multimedia codecs (other than the H.264 reference implementation tested by SPEC CPU 2006) see a substantial enough improvement to justify developing and maintaining another assembly code implementation of their core routines. I predict they will.

I would take a sizable bet that they will not. If your codec is moving around tons of pointers ... it's not a well optimized codec to begin with.

x32 ABI support by distributions

Posted May 3, 2013 13:04 UTC (Fri) by mathstuf (subscriber, #69389) [Link]

What about the number of registers?

x32 ABI support by distributions

Posted May 5, 2013 1:54 UTC (Sun) by gmaxwell (subscriber, #30048) [Link]

The performance should be the same for x86_64 and x32, both higher than x86. (See the article, it was hypothesizing increased performance for x32 over x86_64 which absolutely won't be the case in that application— at least not for a reasonably well optimized implementation)

x32 ABI support by distributions

Posted May 4, 2013 21:58 UTC (Sat) by jdulaney (subscriber, #83672) [Link]

For craps and giggles, I built enough of Fedora 19 on an Opteron system to run Firefox (Fluxbox as WM). I built an entirely native x32 environment in a VM and used it to create the set of packages, much as it would have been done to bootstrap a secondary arch in Fedora. For all this work, running on a 2GHZ Opteron, I only saw about 15-20% improvement in terms of memory usage.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds