User: Password:
|
|
Subscribe / Log in / New account

Pettenò: Debunking x32 myths

Earlier this month LWN covered the announcement of an initial x32 release candidate of Gentoo. The x32 ABI enables the running of processes in 64-bit mode while using 32-bit pointers. Gentoo developer Diego Elio "Flameeyes" Pettenò isn't convinced that x32 is the way to go, and debunks some common misconceptions about the x32 ABI. "The new x32 ABI has proven to be faster. Not really; what we have right now are a few benchmarks, published by those who actually created the ABI, Of course you’d expect that those who spent time to set it up found it interesting and actually faster, but I honestly have doubts about the results, for reasons that will be clearer by reading the next few entries."
(Log in to post comments)

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 19:57 UTC (Mon) by butlerm (guest, #13312) [Link]

The writer makes some interesting points, but ultimately his analysis fails because he neglects to address the main reason why anyone would use x32 instead of x86-64 in the first place - the reduction in size and cache impact of pointer heavy dynamically allocated data structures.

This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers. Is that enough for a distribution to support x32? I don't know, but it is certainly not something to scoff at.

One worthwhile point he makes is that most x32 applications will run on an x86-64 kernel, so kernel performance will not be improved at all. I don't know how many embedded systems are kernel performance bound, but if it is important enough surely some way could be found to support an x32 native kernel as well.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 20:22 UTC (Mon) by scientes (guest, #83068) [Link]

> but if it is important enough surely some way could be found to support an x32 native kernel as well.

That won't, and should not happen.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:54 UTC (Mon) by slashdot (guest, #22014) [Link]

It would be a good thing for virtual machines and embedded devices with little RAM.

Not sure if the Linux x86 maintainers would accept it though, but it could be a fun project.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 2:52 UTC (Tue) by ringerc (subscriber, #3071) [Link]

These days "little RAM" is still often 265MB or more. x32 kernels would be a rather short term proposition given that the comfortable limit of 32-bit is 2GB, and maximum is 4GB, only 8-9 times less than that limit.

Even phones and tablets are pushing the 2GB mark.

Sure, truly small embedded Linux devices continue to exist and will for a long time to come. They're rarely x86 or x64, and aren't likely to be, so x32 is irrelevant for them.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 3:37 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

x32 can use almost 4Gb _per_ _process_, it's not limited to 2Gb since there's no user/kernel address space split (kernel is completely 64-bit). And 4GB _per_ _process_ is still pretty big. Even most of games use less RAM (thanks to RAM-starved consoles).

Right now the biggest process on my development machine is a Java process running IntelliJ IDEA with a large project (about 1MLOC) opened. It's a whopping 600Mb monster using 1096Mb of address space.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 5:04 UTC (Tue) by ringerc (subscriber, #3071) [Link]

The context probably got lost as it's several parent posts up. I wasn't referring to x32 in general being pointless, but to the development of an x32 kernel. I don't see the notion of a kernel that lives in the lower 4GB and uses mostly 32-bit pointers while using the native x64 mode to be particularly useful.

I can maybe see x32 with a 64-bit kernel, which is the only thing the x32 folks ever proposed, being useful.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 6:24 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Oh, yes, sorry.

Of course, I completely agree with you given the context of x32 kernel.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 7:26 UTC (Tue) by elanthis (guest, #6227) [Link]

The hilarious part here is that I'm pretty sure that poster was simply talking about the kernel maintainers accepting the x32 personality (necessary for an x86_64 kernel to run x32 programs). :)

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 16:05 UTC (Wed) by butlerm (guest, #13312) [Link]

No, I actually think an x32 native kernel (or the equivalent for ARM64) would be an excellent idea for a large class of embedded systems - routers and file servers in particular. It would also be promising for use with hosted virtual machines, where the impact of running dozens of kernels starts to add up.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 3:38 UTC (Tue) by dlang (subscriber, #313) [Link]

one thing you are missing, you use x32 with a 64 bit kernel, so the kernel doesn't have memory limits, your only limit is 4G of address space per process (and the kernel needs almost none of that)

As a result, any system up to 4G is perfectly happy as x32, and if the system is doing more than one thing, you could easily get 16G or larger systems without needing 64 bit binaries.

And if you are doing VMs, this is the size of the VM, not the size of the overall system.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 5:06 UTC (Tue) by ringerc (subscriber, #3071) [Link]

The context of the reply was someone proposing an "x32" kernel.

x32 userspace with an x64 kernel makes sense (ish) and that's all the x32 folks themselves ever proposed.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 20:33 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

I think the idea of this post was mainly "Hey, ricers, this is not for you" and it's just that it looks weird here on LWN out of that context.

I guess this because it's Gentoo (which is infested with ricers) and because the "debunking" seems to spend a lot of time on things nobody I know assumed had anything to do with x32, like shrinking binaries on disk.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 20:48 UTC (Mon) by mikemol (subscriber, #83507) [Link]

I've been reading Diego's blog for a few years. It's not generally about performance tuning or "ricing", but about low-level things like ABI, systemic testing, autotools and a bunch of stuff.

His focus is on systemic compile-time cleanliness and stability.

Incidentally, a <em>lot</em> of stuff breaks under x32.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:20 UTC (Mon) by Flameeyes (subscriber, #51238) [Link]

Uhm no, again, let's remember that Intel themselves are considering x32 a "closed system" ABI, an embedded ABI, something that Gentoo is very useful for (I know that for experience having worked multiple times before on embedded Gentoo Linux devices).

Please see the other linked article as well, it's easier to see the two of them together.

Also, for what concerns "binaries on disk", the comparison between the two libc files is actually done on the _allocated_ sizes, which is memory, not disk. The fact that it refers to files on disk is definitely not the point.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 22:47 UTC (Mon) by tialaramex (subscriber, #21167) [Link]

If my earlier interpretation was wrong then this is an even more disappointing use of LWN's space than I thought.

Putting the two articles together I am even more inclined to think it's supposed to be targeted at ricers. Who else could imagine that recompiling some assembly-heavy video codecs for x32 was somehow even remotely relevant? At no point in these articles does it appear that you've really grasped what the people who proposed, developed and shipped this ABI were trying to achieve. You almost touch on it in the second article, but only long enough to lurch onto the afore-mentioned tangent about the C standard library and files on disk.

A lot of the "myths" look more like strawmen. They aren't common misconceptions anywhere, some aren't even mentioned in the not-so-bright comments to the previous article.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 5:25 UTC (Tue) by cmccabe (guest, #60281) [Link]

I'm not sure why you're being so hard on Diego. The whole point of the x32 ABI was increased performance. Diego is questioning whether the API will actually achieve that. You may disagree with his conclusions, but there isn't any need for name-calling.

You don't have to be a "ricer" (whatever that is) to be interested in performance. You could just be an engineer who gets paid to make servers or embedded devices go faster and use less battery. Or someone who is interested in the topic in general.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 17:08 UTC (Tue) by mikemol (subscriber, #83507) [Link]

"ricer" is a pejorative term intended to describe (and be dismissive of) someone who is interested in speed, does things to get more speed, but has no actual understanding of what it is they're doing.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:06 UTC (Tue) by dlang (subscriber, #313) [Link]

from how I've seen the term thrown around, there's no attempt to know if the person has any understanding of that they are doing or not. Just the assumption that it's a waste of time.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 17:34 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

In any case, whether it's true or not, the perception that "ricers prefer Gentoo" is far from new. There's an entire website devoted to mocking them, and that website is ooooooollld. (Almost as old as Gentoo.)

There are plenty of folk who know what they're doing, I'm sure. At the same time, there is (or at least was) a visible crowd whose competence seems to barely rise above "script kiddie," applying the equivalent of "go faster stripes" to their computer. (20 years ago, I probably would have been no different, I must admit.) If the latter crowd thinks x32 is the latest "go faster stripes" that will magically make their computer a gazillion times faster, then they need to be told why that's not the case.

Anyway, I don't really have an opinion on whether Diego's post was aimed at that crowd. I saw some valid criticisms, and some odd attention to multiply latency. *shrug*

What I'd really like to see is some comprehensive benchmarks. Now *that* might be interesting. One I'd particularly like to see is "memory footprint of Firefox after loading these 100 tabs." ;-) These days, that seems to be the most common resource hog on my own machine, cycle or RAM-wise.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 0:30 UTC (Tue) by rich0 (guest, #55509) [Link]

You really should read more of Diego's blog. He could be wrong, but I wouldn't be so quick to just write him off, especially when talking about ABIs and such. He is VERY competent when it comes to C, ABIs, ELF, autotools, and a number of fairly low-level details. By all means feel free to disagree, but you should weigh his arguments carefully.

I never got the whole Gentoo ricers thing. I'd say that Gentoo isn't populated with ricers so much as with people who tend to do unusual things with their linux systems, including obsessing over performance, running Gentoo Prefix, coming up with x32, or designing embedded systems. Gentoo tends to be a very malleable distro, and it tends to appeal to those for whom the typical 99% solution just isn't good enough. If Ubuntu or whatever floats your boat by all means enjoy using it. I wouldn't be so quick to write off those who invest in Gentoo - if you ever run into an oddball problem one of them might be able to help you out...

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 18:38 UTC (Tue) by butlerm (guest, #13312) [Link]

I don't doubt that. The problem here is only that he apparently didn't actually benchmark anything, or seriously address the only issue likely to make a substantive difference - data cache impact.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:17 UTC (Mon) by Flameeyes (subscriber, #51238) [Link]

> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.

Where are you reading that 40%? And what are you comparing that against? And how do you know it's non-trivial?

I happen to like being thorough on what I do, and see if I made mistakes so if I missed a non-trivial benchmark that shows a 40% against what was the best possible situation before x32 came to be, I'd be thrilled.

Unfortunately, the LPC talk by Intel's engineers from September 2011 lists a 5-11% increase in performance _against i386_ and a 5-8% against amd64 on the SPEC2k benchmark, which is by far not what I'd call a "non-trivial benchmark". (Note: some previous papers do refer to a much bigger improvement, but that was against x86 as well, not amd64, _and_ even Intel is downplaying those numbers now.)

The problem is that the only ones touting benchmark numbers are the very same guys who're trying to "sell" the idea — which is never a good idea to listen to by default.

About the size of dynamically allocated pointer-heavy structures — it might make a substantial difference, but I honestly don't think so, I noted something about it on the post before that, it's something that people seem to refer to, but nobody has numbers for.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:45 UTC (Mon) by mansr (guest, #85328) [Link]

>> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.
>
> Where are you reading that 40%?

There is a brief mention on https://sites.google.com/site/x32abi/ of a 40% improvement in 181.mcf from spec2k.

This is a pointer chasing benchmark where the entire data set consists of pointers, or hardly a realistic test case. It is furthermore an amazingly poorly written piece of software. In many places it needlessly uses global variables in inner loops, which can thwart compiler optimisations (particularly aliasing-based ones). It also uses 'long' exclusively where any sane programmer would use 'int', possibly leading to more expensive 64-bit operations being used where there really is no need.

Moreover, the website does not mention which compiler, let alone which compiler flags, were used, nor does it provide any raw numbers from the benchmark run. A lone percentage figure as presented there means absolutely nothing whatsoever.

Finally, out of all the spec2k modules, they chose to showcase only two, presumably because those two showed the most favourable results. The second one is 186.crafty, showing a meagre 3% improvement (4% on Atom). This leaves one wondering what the results of the remainder looked like. Something tells me they showed improvements of less than 3%, if any at all.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 22:05 UTC (Mon) by butlerm (guest, #13312) [Link]

181.mcf doesn't sound like a toy benchmark to me:

"For the considered single-depot case, the problem can be formulated as a large-scale minimum-cost flow problem that we solve with a network simplex algorithm accelerated with a column generation. The core of the benchmark 181.mcf is the network simplex code 'MCF Version 1.2 -- A network simplex implementation', For this benchmark, MCF is embedded in the column generation process."
http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf...

In any case, if 181.mcf is so poorly written, perhaps someone could do us the favor of benchmarking other pointer intensive code instead of dismissing x32 without bothering to conduct a single relevant benchmark.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:48 UTC (Mon) by JoeBuck (guest, #2330) [Link]

To give an example from one field (though one that matters very much to the proponents of this API, I suspect): the current state of affairs is that it is common for EDA users to use both a 32-bit and a 64-bit version of the same application (simulator, model checker, synthesis tool, etc) on x86-64 architecture, preferring the former if the problem size will fit into 32 bits. That's because these applications are pointer-heavy, data access heavy and limited by what will fit into physical memory (it's not just the cache size, though of course that is a factor as well). The result is often that the 32-bit version is faster, at least for a certain range of problem size that is rather common, because it takes less memory. The 64-bit version is often reserved for test cases that require more than 4GB, because the performance of a server farm running lots of simulations is often constrained by how many jobs will fit into physical memory.

With x32 there would still be two different versions of the executable. For that reason, the relevant comparison is between the x32 version and a traditional x86 32-bit version. There are a number of wins: the larger register set means much less penalty for PIC code; 64 bit operations are available, there is much less register-spilling code. I would still recommend a 64-bit kernel, but I find x32 to be very interesting and don't think that the author of this piece really understands why the proponents' employer might be investing heavily in this.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 5:18 UTC (Tue) by alonz (subscriber, #815) [Link]

I wonder – are there any benchmarks based on Icarus Verilog? (It may not be a state-of-the-art EDA tool, but it is open-source, therefore accessible for a benchmark... And its operation is very pointer-heavy, esp. if you load a significant model)

Unfortunately I don't have access to an x32 system (nor the time to start tinkering with Gentoo), so I just run the benchmark myself :(

Big Advantage of x32 myths

Posted Jun 29, 2012 10:56 UTC (Fri) by brianomahoney (guest, #6206) [Link]

LLP 64 cause a LOT of old software to break (P>I), now foxing that is (a) right, (b) good but ALWAYS of pain, the x32 option will make this p.

MFG, ombroblem go away

Big Advantage of x32 myths

Posted Jun 29, 2012 12:46 UTC (Fri) by mansr (guest, #85328) [Link]

That was a problem when 64-bit CPUs first came out. Now, 20 years later, all remotely important software has been fixed, so this isn't really an argument any more.

Pettenò: Debunking x32 myths

Posted Jun 25, 2012 21:44 UTC (Mon) by slashdot (guest, #22014) [Link]

Well, the data cache utilization that he conveniently doesn't highlight can be huge for some software.

You can probably show a 3x speedup with an ad-hoc benchmark that traverses a randomly arranged linked list that fits exactly in LLC when using 32-bit pointers, since on 4 GHz Sandy Bridge accessing RAM apparently takes 6x as much as accessing LLC.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 0:11 UTC (Tue) by ebiederm (subscriber, #35028) [Link]

Reading between the lines. x32 is primarily about Android on Atom.

x32 is primarily about code bases that are not 64bit clean so you need a 32bit userspace.

x32 is about the fact that Atom is an in order processor unlike the rest of x86 that is super scaler and out of order. To keep a processor like Atom busy you have to have your instructions in a good order in the instruction stream and to keep your instructions that way you need more register names, to avoid spilling to memory.

Given the push towards power efficiency we may see more limited x86 cores like Atom targeting the embedded space. So I expect x32 may have a long life if x86 as enough adoption in the embedded space.

x32 in general? Ridiculous. 32bit pointers are just too small and I fail just at the moment when you are doing something interesting in your application. A few percent better performance is not worth the risk of application that don't work.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 5:31 UTC (Tue) by cmccabe (guest, #60281) [Link]

Yeah, I definitely got the sense that x32 was Intel's stab at the Android and mobile markets. People are already talking about 64-bit ARM, though. Intel needs to move fast if x32 is going to matter.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 7:36 UTC (Tue) by butlerm (guest, #13312) [Link]

> People are already talking about 64-bit ARM, though. Intel needs to move fast if x32 is going to matter.

That is a bit misleading. x32 is an ABI for 64 bit processors. ARM 64-bit, when it becomes a reality, is likely to perform much like x32 on all applications that do not make heavy use of pointers, and substantially slower than x32 on those.

If x32 is useful, something similar for ARM 64-bit is likely to be useful for exactly the same reason. It might even predominate over a pure 64 bit ARM ABI. How many portable devices are likely to need to address more than 4 GB per process?

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 7:58 UTC (Tue) by dlang (subscriber, #313) [Link]

I actually don't expect that an ARM version of x32 will ever make sense.

The only reason that it makes sense in the x86 world is the historic accidents that lead to the creation of the AMD64 architecture.

as far as I know, x86 vs AMD64 is the only case where 32 bit vs 64 bit makes any changes other than the size of the registers (and pointers). The fact that it doubles the number of registers on a CPU platform that is one of the most register starved in existence makes a very significant difference. I don't expect that ARM64 is going to end up doing a similar thing. I don't believe that they are nearly as register starved to begin with, and there's far less of a push for perfect backwards compatibility between ARM processor generations (in fact, there's very little push for backwards compatibility at the binary level at all, what little there is, exists at the source code level)

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 9:52 UTC (Tue) by butlerm (guest, #13312) [Link]

Assuming you have a 64-bit processor, the improvement relative to the 32-bit processor (or the 64-bit processor in a pure 32 bit mode) is basically irrelevant. No surprise there.

The question would be decided in both cases rather by the performance advantages of running 64 bit code with either 32 bit or 64 bit pointers. My understanding is that many Java applications slow down by as much as 20% simply by switching from a 32 bit JVM to a 64 bit JVM on the same machine.

Considering that the JVM implementation is gaining registers and register width, that result is rather remarkable. The only explanation appears to be that the cache impact of larger pointers in a Java environment is severe. Take a look at this for example:

Why 64-bit Java is slow
http://asserttrue.blogspot.com/2008/11/why-64-bit-java-is...

Naturally, the reasonable expectation should be that an x32 JVM would be considerably faster than both current IA-32 and current x86-64 JVMs. An ARM64 JVM with 32 bit pointers should be considerably faster than ARM64 JVM with 64 bit pointers for the same reason.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 9:57 UTC (Tue) by dlang (subscriber, #313) [Link]

> An ARM64 JVM with 32 bit pointers should be considerably faster than ARM64 JVM with 64 bit pointers for the same reason.

that assumes that an ARM64 system has some advantage other than address space when compared to ARM32

none of the other processors that have both 32 bit and 64 bit modes (Sparc, Power, etc) have any advantage other than address space for their 64 bit modes, which is why it is still so common to see such systems running a 64 bit kernel with 32 bit userspace.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 11:49 UTC (Tue) by mansr (guest, #85328) [Link]

ARM64 has 31 general-purpose registers (ARM32 has 14).

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 13:37 UTC (Tue) by butlerm (guest, #13312) [Link]

>that assumes that an ARM64 system has some advantage other than address space when compared to ARM32

That is beside the point - the issue is the relative performance of _ARM64_ code when compiled to use different pointer sizes. The advantage relative to ARM32 is irrelevant. ARM64 with 32 bit pointers must outperform ARM64 with 64 bit pointers on important workloads to be worth supporting at all.

There is plenty of evidence to suggest that will indeed be the case, and the difference between LP64 and L64P32 ("x32") on x86-64 will make that even more clear than the current major performance _loss_ one experiences when going from a pure 32 bit to pure 64 bit JVM.

Take a look at this:

64-bit Performance Thoughput/Memory Improvements in WAS7.0
http://webspherecommunity.blogspot.com/2008/10/64-bit-per...

The only way for them to get Websphere on a 64 bit JVM to approach the performance of Websphere on a 32 bit JVM was to use compressed references, i.e. smaller pointers. An L64P32 model on a 64 bit processor is essentially the same idea, and will make essentially the same improvement relative to LP64 without requiring developers to rewrite all the pertinent C code to use compressed pointers by hand.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 13:46 UTC (Tue) by Flameeyes (subscriber, #51238) [Link]

> There is plenty of evidence to suggest that will indeed be the case, and the difference between LP64 and L64P32 ("x32") on x86-64 will make that even more clear than the current major performance _loss_ one experiences when going from a pure 32 bit to pure 64 bit JVM.

You should probably try to make sure you understand how the ABI works before trying to discuss it in details. It's ILP32, not L64P32 (i.e. "long" is also 32-bit).

As for the issue of compressed pointers in the JVM, it shows very well one thing: that you can easily solve the performance issue by making your code smarter, instead of breaking compatibility with what has be done up to now to make a new ABI.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:11 UTC (Tue) by butlerm (guest, #13312) [Link]

>It's ILP32, not L64P32 (i.e. "long" is also 32-bit).

Thanks for the correction. I am not sure why anyone would want 32 bit longs on a 64 bit processor, unless there is a large code base out there that is lazy about using longs where ints would suffice.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:28 UTC (Tue) by mansr (guest, #85328) [Link]

> unless there is a large code base out there that is lazy about using longs where ints would suffice.

Like SPEC2k.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:50 UTC (Tue) by slashdot (guest, #22014) [Link]

Because almost all software supports x86, and thus supports ILP32, while it may fail to work on L64P32 since no existing architecture uses that.

Programs can still use 64-bit integers via long long, int64_t or intmax_t (and I'd guess intfast_t too?).

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 20:31 UTC (Tue) by butlerm (guest, #13312) [Link]

Almost all Linux software supports LP64, no? L64P32 would be almost identical, and from what I can see so far would eliminate most of the porting issues.

The struct timespec / timeval issues would go away, for example. The number of x32 specific syscalls required would go down. Problems with ioctl structure differences would be greatly reduced, as would problems porting LP64 code in general.

32 bit longs are the wave of the past. IA-32 is rapidly becoming obsolete. Why any special effort would be made to retain compatibility with ILP32 rather than with LP64 (as much as possible) is a mystery to me. The whole thing is going to run on an LP64 kernel with a parallel LP64 userspace in many cases, after all. 32 bit pointers can be a major improvement. 32 bit longs on a 64 bit architecture on the other hand just make life difficult without any substantive gains, so far as I can tell.

In any case ILP32 was a mistake. It should have been L64P32 to begin with.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 1:50 UTC (Wed) by slashdot (guest, #22014) [Link]

The mistake was introducing the "char", "short", "int" and "long" keywords in C without specifying their meaning, despite the fact that there is no way to write a portable, correct and fast programs in such a language.

Again, almost all software supports 32-bit compilation, and since all existing relevant 32-bit ABIs (i.e. x86 and arm) are ILP32, it would be insane to use a different size for "long" than the rest of the 32-bit world.

New programs should never use the "long" keyword anyway and should instead use the typedefs in <stdint.h> which actually have a defined useful meaning.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 2:51 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

>The mistake was introducing the "char", "short", "int" and "long" keywords in C without specifying their meaning, despite the fact that there is no way to write a portable, correct and fast programs in such a language.

So would you like to have "int" to be 18 bits or 36 bits in length?

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 4:30 UTC (Wed) by cmccabe (guest, #60281) [Link]

The C standard DOES specify the meaning of "char," "short," "int," and so forth. char is "the smallest addressable unit" (in practice always 1 byte), short and int are at least 2 bytes, and long is at least 4 bytes. It may not seem important to you now, but saving a few bytes definitely mattered back in the 1970s, when C was designed.

It would be nice for the stdint.h types to be built-in, and more widely used in some cases, but it's really not a big deal. There are always higher-level languages you can use if you don't want to deal with this stuff. Some of them even have unlimited length integers! The 70s are over, you know.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 13:49 UTC (Wed) by nix (subscriber, #2304) [Link]

char is "the smallest addressable unit" (in practice always 1 byte)
It's not a matter of 'in practice'. char is, by definition, one byte long. In closely adjacent paragraphs of every C Standard ever published, sizeof() is defined as yielding 'the size (in bytes) of its operand', and then sizeof (char) is defined as returning 1. If you change sizeof (char), you change the definition of a byte on that platform's C ABI.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 14:22 UTC (Wed) by mansr (guest, #85328) [Link]

While char is always and by definition one byte, a byte is not always one octet.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 21:27 UTC (Wed) by nix (subscriber, #2304) [Link]

Exactly. Of course, on any reasonable platform these days (other than tiny DSPs normally not programmed in C at all), a byte is one octet, but in olden days this was not true. Of course by the time C was standardized at all it was pretty widely true, and by the time of C99 it was universal. But it wasn't always so.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 21:45 UTC (Wed) by mansr (guest, #85328) [Link]

POSIX does mandate 8-bit bytes. As for those DSPs, a lot of code running on them is actually written in C. Only the critical parts are typically done in assembly.

Pettenò: Debunking x32 myths

Posted Jun 29, 2012 4:57 UTC (Fri) by sethml (guest, #8471) [Link]

Check out the TI C2000 / TMS320 architecture. I've written a lot of C++ with 16-bit chars on that architecture over the past few years. Yes, it's a bit of a brain-dead architecture, but there are definitely archs with non 8 bit chars around still.

Pettenò: Debunking x32 myths

Posted Jun 29, 2012 12:11 UTC (Fri) by nix (subscriber, #2304) [Link]

At least that's a multiple of 8. Are there any 9/18/36-bit bytes around these days? I hope not :}

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 8:38 UTC (Wed) by dgm (subscriber, #49227) [Link]

> The mistake was introducing the "char", "short", "int" and "long" keywords in C without specifying their meaning.

There was a reason for that. The intention was that they mapped cleanly to the word sizes of the underlaying architecture.

> despite the fact that there is no way to write a portable, correct and fast programs in such a language.

Absurd. C was invented for exactly that. The original UNIX code was written in PDP-7 assembly (a 18-bit machine), and later rewritten in C and ported to the PDP-11 (a 16-bit one). The first C version of UNIX was portable, correct _and_ fast. And all that in a newborn language that would be considered "crude" compared to today's C.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 17:45 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

Of course you say "use <stdint.h>" *now*. It wasn't officially part of C until C99 as I recall, and many codebases have code that's noticeably older than that. Furthermore, just because it was in C99 didn't mean all compilers had it in 1999. (Although I'll grant many had it before then.)

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 20:04 UTC (Tue) by Flameeyes (subscriber, #51238) [Link]

> Thanks for the correction. I am not sure why anyone would want 32 bit longs on a 64 bit processor, unless there is a large code base out there that is lazy about using longs where ints would suffice.

It's mostly not to break the assumption that sizeof(long) == sizeof(void*) which is true for _most_ Unix software. Although it's getting less common nowadays due to portability issues with Windows (Win64 being L32P64).

As mansr already pointed out, Spec2k makes wide use of long where int (or properly sized stdints) should be used, which is one of the reasons why its benchmarks are getting better results than they should theoretically get.

I can discuss this more, and I'll probably do so, in a blog post with more detailed discussion on the cache issue.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 20:36 UTC (Tue) by butlerm (guest, #13312) [Link]

>Spec2k makes wide use of long where int (or properly sized stdints) should be used

It sounds like Spec2k should be rewritten to use int32_t instead of int, and int64_t instead of long, in any places where this might make a significant difference.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 20:58 UTC (Tue) by Flameeyes (subscriber, #51238) [Link]

Yes, too bad that it's a quite secretive and closed-source benchmark, which is my first problem with the benchmark's numbers — took quite a bit of investigative work to actually find somebody who could confirm my hunch about longs.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 3:11 UTC (Wed) by slashdot (guest, #22014) [Link]

You can apparently download an unlicensed copy for free from http://ks.tier2.hep.manchester.ac.uk/Mirrors/SPEC_CPU2006... and probably other places.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 17:41 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

I ran into plenty of it trying to port 32-bit software to a machine that had 40-bit 'long'. It breaks things in surprising ways. (Especially since sizeof(long) == 8 on said machine.)

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 10:43 UTC (Tue) by Cyberax (✭ supporter ✭, #52523) [Link]

Java objects are very pointer-heavy. In fact, there's a special mode (CompressedOops) for 64-bit JVMs which compresses some 64 bit pointers into 32 bits. It often makes a significant performance difference.

Why 64-Bit Java Is Slow

Posted Jun 27, 2012 1:28 UTC (Wed) by ldo (guest, #40946) [Link]

Maybe the reason for the slowdown going from x86 to AMD64 has to do with the fact that the Sun JVM is stack-based, not register-based. Maybe an interpreter with a register-based architecture, like Dalvik (as used in Android) will benefit better from 64-bit architectures.

Why 64-Bit Java Is Slow

Posted Jun 27, 2012 1:47 UTC (Wed) by alankila (guest, #47141) [Link]

Java is defined as a stack machine. I don't think people execute it as a stack machine.

Why 64-Bit Java Is Slow

Posted Jun 27, 2012 2:50 UTC (Wed) by Cyberax (✭ supporter ✭, #52523) [Link]

JVMs only use interpreter during startup. Then the code is JIT-compiled into native code and it doesn't matter at all whether the bytecode is stack-based or register-based. After all, they can be trivially interconverted.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 11:51 UTC (Tue) by mansr (guest, #85328) [Link]

The ARM64 ELF ABI is available right here: http://infocenter.arm.com/help/topic/com.arm.doc.ihi0056a...

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 11:07 UTC (Tue) by k3ninho (subscriber, #50375) [Link]

>If x32 is useful, something similar for ARM 64-bit is likely to be useful for exactly the same reason.

This is what Thumb[1] already does for 16-bit code in a 32-bit processor (well, it actually cuts down the instructions as well as the data to 16-bit symbols, and initially restricted the functionality to subset of the ARMv4 IA). I have no idea what Thumb-like behaviour would be included in a 64-bit ARM IA.

1: http://en.wikipedia.org/wiki/ARM_architecture#Thumb

K3n.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 11:47 UTC (Tue) by mansr (guest, #85328) [Link]

> [Thumb] cuts down the instructions as well as the data to 16-bit symbols

Wrong. The original Thumb instruction set has 16-bit instructions with reduced functionality, most notably many instructions can access only the low 8 registers. The register size is still 32 bits. Thumb2 extends the instruction set with additional 32-bit instructions providing full access to the entire architecture.

The 64-bit ARM has no equivalent of Thumb mode, although 32-bit ARMv7 userspace is still supported.

See http://www.arm.com/files/downloads/ARMv8_Architecture.pdf for more information.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 6:45 UTC (Tue) by Otus (subscriber, #67685) [Link]

Is there a live CD I could run where I could benchmark amd64 vs. x32 (vs. x86) easily? (I only need gcc and glibc that support both/all.)

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 6:59 UTC (Tue) by yoshi314 (guest, #36190) [Link]

there is experimental gentoo stage3 for x32, so you can quickly test things out with it.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 16:12 UTC (Tue) by Otus (subscriber, #67685) [Link]

I was kind of hoping for a Debian or Fedora derivative, but maybe I'll have to learn Gentoo.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 7:51 UTC (Tue) by drag (subscriber, #31333) [Link]

Personally if I want to use 32bit on a 64bit processor I will just use a 64bit kernel and x86 32bit userspace.

That way I get pretty much the same benefit, but without even the least tiny bit of headache that I would get by trying to use x32 on anything.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 8:00 UTC (Tue) by dlang (subscriber, #313) [Link]

and by sticking with pure 32 bit code you are throwing away half of your possible registers, which can have a significant performance difference.

It's clear why x32 has significant performance advantages over plain 32 bit x86 mode, what's less clear are the benefits when compared to 64 bit mode.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 9:30 UTC (Tue) by tao (subscriber, #17563) [Link]

Compared to 64-bit, 32-bit (be it x32 or x86) uses less cache, memory and disk. The latter two are not all that important, but the decreased cache size can make a big difference in some cases.

The main use for x32 isn't for regular desktops. For regular desktops/servers the advantage of x64 is clear (being able to address more than 4GB/process).

On an embedded device, which probably doesn't even have 4GB of RAM, however, the question is rather: what benefits would x64 have compared to x32?

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 9:53 UTC (Tue) by dlang (subscriber, #313) [Link]

very few desktop apps need more than 4G/process (including the browser, at least with current versions),

however, you are missing significant benefits of x32 vs traditional 32 bit

added total address space (since you are using a 64 bit kernel)

added per-process memory (4G instead of 2G)

twice the number of registers.

ability to rely on advanced CPU features (compared to i[3456]86)

64 bit commands.

64 bit time_t

moving from x32 to full 64 bit adds

increased per-process address space (beyond 4G) with the drawback of increased pointer size (and the effects this can have on cache pressure)

The advantages of moving off of pure 32 bit are pretty clear, the only drawback being that it's new and has less testing.

however, when comparing x32 and 64 bit, it's less clear if the overall win is going to be in the address space or the cache efficiency.

unless, as you point out, the system just doesn't have enough ram to use the additional address space, and that point is probably higher than most people thing. I expect it's probably at 8+GB of ram, not at 4G of ram. Remember that memory used for disk caches, the kernel, housekeeping apps, etc can easily use the ram beyond what the 'main' app of the system uses, even assuming that the 'main' app of the system is a single process.

If the 'main' app of the system in multiple processes not a single process with multiple threads, which is a fairly good thing to do when faced with multi-core CPUs anyway, the amount of memory in a 'typical' system before it really needs 64 bit addressing in userspace can easily go beyond 16G

on the other hand, some people will have apps that really want more address space, even on low-memory machines (it may be more efficient to address the memory sparsely than to add the complication to use it more efficiently)

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 12:59 UTC (Tue) by tao (subscriber, #17563) [Link]

I think you must've misunderstood or misread my comment. You explicitly asked for advantages of x32 over x64. The advantages I mentioned (cache footprint, etc) were about that, nothing else.

Nowhere did I question the merits of x64 (or x32, for that matter) over x86.

As far as address space goes though, you can use a 64-bit kernel even with x86.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 16:00 UTC (Tue) by Otus (subscriber, #67685) [Link]

> however, you are missing significant benefits of x32 vs traditional 32 bit
> added total address space (since you are using a 64 bit kernel)

It's the same with normal 32-bit userspace on a 64-bit kernel.

> added per-process memory (4G instead of 2G)

Ditto?

> twice the number of registers.

Yes.

> ability to rely on advanced CPU features (compared to i[3456]86)

With amd64 you can only rely on SSE and SSE2 being available. I assume it's
the same with x32? You'll still need some sort of cpuid checks for more
recent features and you can do that on normal 32-bit as well.

(Or you can just tell the compiler to assume they are available.)

> 64 bit commands.

What do you mean?

> 64 bit time_t

Yes.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:05 UTC (Tue) by dlang (subscriber, #313) [Link]

>> 64 bit commands.

> What do you mean?

My understanding is that with x32 the compiler can use CPU instructions that operate on 64 bit items.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 14:29 UTC (Wed) by hummassa (subscriber, #307) [Link]

>> twice the number of registers.

>Yes.

I couldn't parse your "yes"es...

Did you mean "yes, I know that using x32 instead of ia32 I have access to twice the number of registers"?

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 19:42 UTC (Wed) by Otus (subscriber, #67685) [Link]

"Yes, that's true." Basically, AFAICT, the *only* advantages of x32 over
32-bit user-space on 64-bit kernel are 1) twice the registers, and 2) 64-bit
time_t (and some other syscall things).

Listing the advantages of x32 over 32-bit/32-bit or 64-bit/64-bit
user-space/kernel is misleading, because 32-bit/64-bit is the one it needs
to really improve upon to be worth the effort.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 21:28 UTC (Wed) by nix (subscriber, #2304) [Link]

x32 also has a better floating-point ABI (using SSE rather than the x87).

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 21:51 UTC (Wed) by mansr (guest, #85328) [Link]

GCC has -mfpmath=sse for that.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 21:58 UTC (Wed) by dlang (subscriber, #313) [Link]

the problem is that on ia32 you can't assume that the CPU has sse, on x32 you can.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 22:20 UTC (Wed) by mansr (guest, #85328) [Link]

If you are considering x32, your hardware obviously has SSE.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 22:35 UTC (Wed) by dlang (subscriber, #313) [Link]

correct, but unless you are using a source-based distro where you customize your compiler flags, the distro is going to be building the ia32 binaries for a least-common-denominator, and so they will not be compiling with flags that will only work on some CPUs

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 23:02 UTC (Wed) by mansr (guest, #85328) [Link]

If they are not willing to build a version with compiler flags that only work on CPUs newer than 10 years, what makes you think they'll be willing to spend orders of magnitude more effort to rebuild for an entire new ABI?

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 23:50 UTC (Wed) by dlang (subscriber, #313) [Link]

because building a multiple ia32 versions is still doing an entire new ABI, binaries compiled for it won't work on the 'regular' distro.

In the past, some distros had kernels for i386, i486, i586, and i686. it was a major headache and not very effective. In many ways doing an entire new ABI is easier to deal with than dealing with a slight variation to an existing one.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 23:59 UTC (Wed) by nix (subscriber, #2304) [Link]

Yep. This is why -mfpmath=sse doesn't actually change the ABI. :)

(But with regard to the flag which *does*, everything you say is true. glibc's hwcaps mechanism will allow you to implement 'slight variation on instruction set', allowing some but not all libraries to have alternate versions for various hwcaps, plus 'tls' as a now-obsolete special case. On x86, this tends to get used to compile different x86-32 binaries for machines supporting versus not supporting the CMOV instruction; on e.g. SPARC64, it is (or was) used to provide alternate versions of libraries for the SPARCv9 32-bit instruction set, which is much like x32 except ABI-compatible with the usual SPARC 32-bit ABI -- all the SPARCv9 registers plus integer multiply and divide instructions, imagine that! You can't use the hwcaps mechanism to support different ABIs though, because nothing stops a hwcapped library calling a non-hwcapped one, or vice versa.)

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 5:33 UTC (Thu) by Otus (subscriber, #67685) [Link]

You could build a separate 32-bit userspace for 64-bit kernels that
assumed everything amd64 requires in compiler flags, including SSE1 and 2.
I don't see why this would be more work than supporting an x32 userspace.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 23:35 UTC (Wed) by nix (subscriber, #2304) [Link]

Not so. -mfpmath=sse only causes SSE to be used for math within a single function (including temporaries), and calls between functions with static linkage. All calls between functions with external linkage must still conform to the ABI, which means they must use the x87 registers or be spilled to memory. Thus, -mfpmath=sse can actually slow down code due to needless moves from SSE to x87 and back.

The option you're thinking of is -msseregparm, which elicits warnings whenever you use it because it breaks the ABI, meaning that you must link every single thing that you pass floating-point arguments to or receive floating-point return values from with the same option.

This includes libm, which you'll probably need to hack to expect its arguments in SSE registers, since a lot of its 32-bit code expects to receive them in x87 -- and sacrifice compatibility with everyone else's 32-bit x86 code, since nobody else uses that option. If you're doing that these days, you may as well use x32. :)

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 0:42 UTC (Thu) by mansr (guest, #85328) [Link]

Thanks for the clarification on the flags.

While you are right that -mfpmath=sse still uses the x87 parameter passing, it is my experience that (well-written) software making heavy use of floating-point spends most its time inside functions rather than in calls between them. Moreover, such software mostly passes around pointers to large arrays of data, not individual floating-point values.

Concerning libm, many compilers (gcc included) inline many of its functions, often using only one or a few instructions. For example, on x86 a call to sqrt() is turned into a single sqrtsd instruction.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 14:23 UTC (Thu) by nix (subscriber, #2304) [Link]

You are, of course, correct that -mfpmath=sse provides most of the performance benefits of the -msseregparm-equivalent used by x32 -- however, it doesn't provide all of them, and in extreme circumstances can actually be slower than x87 (though it generally requires contrived benchmarks to do that).

Regarding inlined math operations, yes, quite a few can be inlined. A lot of the more complex stuff is just too large to usefully inline, though :( but I suppose the really common things generally are inlined (sqrt() being rather more commonly used than, e.g., y1f()).

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 10:11 UTC (Tue) by ebirdie (guest, #512) [Link]

Heh. Guite the opposite. On the laptop typing this text I managed to gain performance by changing to pure 32 bit system code with 64 bit kernel about a year ago. The laptop is equipped with 1G RAM. With pure 64 bit the system had frequent swap storms and hitting 1G RAM usage while keeping my basic set of programs open on Xfce desktop. With 32 bit system memory usage got under 512 meg. The change was made plain and simple: got tired to lock ups, managed to monitor, what was the cause while system was nearly or totally unresponsive (kswapd, RAM usage and usually occured when switching between applications), and finally system reinstall with 32 bit binaries. After reinstall RAM usage with normal apps loaded showed 470 megs and I've been happy user since although I have pure 64 bit systems, but their minimum RAM is 2 Gigs. .

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 11:21 UTC (Tue) by drag (subscriber, #31333) [Link]

Any performance gains I get by using x32 is going to be more then wiped out by any sort of time and effort it takes to deal with software incompatibilities caused by having a different ABI.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 19:20 UTC (Tue) by butlerm (guest, #13312) [Link]

Those applications would probably break a lot less if the decision was made to use L64P32 instead of ILP32. Is that too late to fix?

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 21:00 UTC (Tue) by Flameeyes (subscriber, #51238) [Link]

It is. And it's one of my complaints, if you read the original article. They came up with this ABI too late in the game (most software has been soundly tested, or even entirely developed, on amd64), and with too little common points to either x86 or amd64 to make it easy to port.

So the (IMHO little) benefit on the data cache utilisation is made useless by the huge effort and compatibility issues brought in by having a full new ABI. Which is probably why the original presentation "sold" x32 as a closed-system ABI, not a generic one, like it seems people expect it to be right now.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 17:49 UTC (Thu) by jzbiciak (subscriber, #5246) [Link]

Well, more correctly, half your possible register *names*. You still get full benefit of all the rename registers.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 9:56 UTC (Tue) by farnz (subscriber, #17727) [Link]

A question I've not seen answered elsewhere (I'll post a comment on Diego's blog as well, in case he doesn't read this). Does anyone have figures showing that MIPS n32 is faster than MIPS n64?

For background, there were (in the IRIX days) 3 MIPS ABIs. o32 was the original 32 bit ABI, for 32-bit MIPS CPUs. When MIPS got a 64-bit version, IRIX sprouted two new ABIs; n64 was a true 64-bit ABI, using 64 bit pointers. n32 was also a 64-bit ABI, but with 32 bit pointers; aside from the pointer size difference, n32 and n64 were the same.

This seems to me (naively) to parallel the x86 situation very nicely; i386 is the x86 equivalent of o32, AMD64 is the x86 equivalent of n64, and now people are trying to build an equivalent of n32 called x32.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 10:06 UTC (Tue) by gioele (subscriber, #61675) [Link]

The biggest supposed advantage of x32 is the reduction of cache problems thanks to smaller pointers. This advantage is relevant in code that deals with many pointers, for example languages that create many objects in the heap.

My question is: wouldn't it be better to modify such pointer-heavy code to work with base pointers + offsets or other similar schemes? With such modifications one could achieve better performance both on 64-bit and 32-bit pointers (let's use 16-bit offsets and save another 50% of cache lines). The JVM already does something similar [1]. Other VMs use similar tricks to avoid pointer dereferences for basic types such as integers, strings or booleans.

In general, if you manage many data structures of type X, wouldn't you optimise your code to make X more efficient? In this case X == pointers to objects you manage.

[1] https://wikis.oracle.com/display/HotSpotInternals/Compres...

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 10:42 UTC (Tue) by marcH (subscriber, #57642) [Link]

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 14:10 UTC (Tue) by butlerm (guest, #13312) [Link]

>My question is: wouldn't it be better to modify such pointer-heavy code to work with base pointers + offsets or other similar schemes?

Changing the compiler to support smaller pointers is much more effective than making the code changes required in a large code base to use offsets instead, changes that would typically make the source code less readable, less portable, and less type safe than it was before.

I can see someone doing that in a VM or an interpreter, for more general purpose code, not so much. Compiler level support is much easier to deal with all around.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 17:40 UTC (Tue) by mansr (guest, #85328) [Link]

Your argument collapses when a substantial porting effort is required for the software to run on x32, as is frequently the case.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 15:49 UTC (Wed) by butlerm (guest, #13312) [Link]

The porting effort for software to run on x32 is a rounding error compared to what would typically be required to universally use offsets instead of pointers in dynamically allocated objects.

Something like a JVM can do this reasonably easily because the JVM bytecode itself is pointer size independent. Making a comparable change to Firefox or Chrome without compiler support would be essentially impossible. It would amount to translating C and C++ into an entirely new dialect. The chances of the upstreams for something like Webkit merging that are basically non-existent.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 16:03 UTC (Wed) by gioele (subscriber, #61675) [Link]

> The porting effort for software to run on x32 is a rounding error compared to what would typically be required to universally use offsets instead of pointers in dynamically allocated objects.

This kind of changes can be shoehorned into compilers and VMs. Even if you do not make changes to Chrome, just using something similar to JVM's compressed pointer in V8 may make it much less cache-hungry. The same argument can be made for any other JIT.

Also, please consider in the total x32 effort also the amount of changes, testing and support that compilers and distros will have to sustain to make x32 available to end users.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 5:25 UTC (Thu) by butlerm (guest, #13312) [Link]

Any distribution that doesn't think x32 is worthwhile clearly isn't going to support it. If I were in charge of a general purpose distribution, I would be inclined to say that x32 might be worth supporting as a replacement for x86, but not in addition to it. The reason should be clear enough.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 17:56 UTC (Tue) by oak (guest, #2786) [Link]

> Changing the compiler to support smaller pointers is much more effective
> than making the code changes required in a large code base to use offsets
> instead, changes that would typically make the source code less readable,
> less portable, and less type safe than it was before.

Fontconfig uses offsets instead of pointers. That is really annoying in a shared library because every program linking that (even indirectly like all GUI apps do) gets then bogus memory leak reports etc from Valgrind as Valgrind cannot heuristically determine that offset is a valid "pointer" to an allocation...

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 12:14 UTC (Wed) by gmaxwell (guest, #30048) [Link]

Huh? That shouldn't be the case at all. The memcheck tool should work fine regardless of what pointer you access the data through. Have an easily reproduced example? The fontconfig things I thought to try here are running clean in valgrind for me.

I googled around and found cases of people complaining about valgrind reports in older versions of it— but those cases sure appeared to be reads straddling distinct calls to malloc, which isn't valid (even if the memory happened to appear contiguous, IIRC you're not permitted to read across the boundary because it could be in separate segments).

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 13:51 UTC (Wed) by nix (subscriber, #2304) [Link]

The memcheck tool should work fine regardless of what pointer you access the data through.
Quite. It's not like memcheck's libVEX CPU emulation knows or cares whether you use a pointer-plus-offset or a direct pointer. It just simulates and spies on the resulting memory accesses.

Pettenò: Debunking x32 myths

Posted Jul 1, 2012 20:47 UTC (Sun) by oak (guest, #2786) [Link]

I wasn't talking about the memory access issues memcheck reports, but it detecting whether the allocated memory is lost or not (--leak-check option).

AFAIK Valgrind scans at the end through the program's memory to see whether there are still valid pointers to the still allocated memory, before it reports what is lost.

According to Memcheck manual:
http://valgrind.org/docs/manual/mc-manual.html#mc-manual....

It considers only "pointers" that point to start of an allocated block, or to middle of it (like e.g. at least earlier Firefox JS engine did as it used some of the bits to store object info).

Because fontconfig stores allocations as offsets, not as pointers, they don't point to allocated memory and thus memcheck considers (at least some of them) lost.

Pettenò: Debunking x32 myths

Posted Jul 1, 2012 22:20 UTC (Sun) by nix (subscriber, #2304) [Link]

memcheck's algorithm should only cause problems if fontconfig is doing multiple allocations and then only storing a pointer to one of them, and referencing the rest via pointer subtraction rather than separate pointers. Since subtraction of pointers to distinct objects is undefined according to the C Standard I think that an incorrect leakage warning is the least you can expect from this.

However, what it actually appears to be doing is malloc()/realloc()ing an arena and then referencing objects *inside this arena* via offsets. This is perfectly valid (it's just a C array): the benefit is increased speed, the cost is increased memory usage if some of the elements are unused, and of course that valgrind cannot automatically detect leakage of elements within that array, only leakage of the array as a whole (if FcObjectSetAdd() gets called but then the array is never FcObjectSetDestroy()ed). This is of course entirely permissible: a whole lot of programs using a single fontset might create an FcObjectSet(), store it statically, and then never bother to destroy it, even on exit. I suspect that this behaviour of not freeing allocated storage on exit (particularly on abnormal exit) is near-universal among Unix programs: I know I've done it many times.

The moral is to only expect leak detectors to work on programs that have actually been designed in that expectation (often with a compilation flag to force them to free all storage on exit -- and even then it is unlikely they will bother to free anything on abnormal exit). Such freeing is a waste of time unless you're running a leak detector, is code that is useless if you're not running a leak detector, and if you mess it up you might cause a double-free(), and thus a security hole, where none was before. So it is not surprising that most people don't do it.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 12:26 UTC (Wed) by gioele (subscriber, #61675) [Link]

> Fontconfig uses offsets instead of pointers.

It would be much better if there was a shared library or just a set of macros in headers (CCAN?) that C/C++ projects could reuse without dealing with dangerous pointer manipulations.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 16:04 UTC (Tue) by basmevissen (subscriber, #54935) [Link]

I think the author missed one important argument for x32 mode: position independent code (PIC), commonly used in shared libraries, is quite slow in x86 mode. See slide 3 of http://linuxplumbersconf.net/2011/ocw//system/presentatio... The claim is that on x86 the performance penalty of PIC is > 20%. As most code of an application comes from a shared library, this is affecting application performance considerably.

Embedded systems with an Intel Atom that currently run only x86 code might benefit most from x32 as they are not likely going to have an AMD64 user space. An x32 kernel might be an excellent idea for these class of systems.

Pettenò: Debunking x32 myths

Posted Jun 26, 2012 16:43 UTC (Tue) by Flameeyes (subscriber, #51238) [Link]

Yes because I haven't written enough on the topic ...

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 10:42 UTC (Thu) by basmevissen (subscriber, #54935) [Link]

Nevertheless an omission in your article about x32.

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 15:35 UTC (Wed) by i3839 (guest, #31386) [Link]

When I first heard of x32, I thought it was a joke.
I still think it's a joke and I'm glad someone put
effort in writing such a good article on why it might
not be such a hot idea after all.

It's a crazy idea that will prove itself to be stupid.

No one will use x32, except Intel to run some hand picked
benchmark now and then. And then it's stuck in the kernel
for backward compatible reasons forever.


Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds