> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.
Where are you reading that 40%? And what are you comparing that against? And how do you know it's non-trivial?
I happen to like being thorough on what I do, and see if I made mistakes so if I missed a non-trivial benchmark that shows a 40% against what was the best possible situation before x32 came to be, I'd be thrilled.
Unfortunately, the LPC talk by Intel's engineers from September 2011 lists a 5-11% increase in performance _against i386_ and a 5-8% against amd64 on the SPEC2k benchmark, which is by far not what I'd call a "non-trivial benchmark". (Note: some previous papers do refer to a much bigger improvement, but that was against x86 as well, not amd64, _and_ even Intel is downplaying those numbers now.)
The problem is that the only ones touting benchmark numbers are the very same guys who're trying to "sell" the idea — which is never a good idea to listen to by default.
About the size of dynamically allocated pointer-heavy structures — it might make a substantial difference, but I honestly don't think so, I noted something about it on the post before that, it's something that people seem to refer to, but nobody has numbers for.
Posted Jun 25, 2012 21:45 UTC (Mon) by mansr (guest, #85328)
[Link]
>> This makes a big difference on some applications, on the order of 40% on both Atom and Core i7, as demonstrated by non-trivial benchmarks run by the x32 ABI developers.
>
> Where are you reading that 40%?
This is a pointer chasing benchmark where the entire data set consists of pointers, or hardly a realistic test case. It is furthermore an amazingly poorly written piece of software. In many places it needlessly uses global variables in inner loops, which can thwart compiler optimisations (particularly aliasing-based ones). It also uses 'long' exclusively where any sane programmer would use 'int', possibly leading to more expensive 64-bit operations being used where there really is no need.
Moreover, the website does not mention which compiler, let alone which compiler flags, were used, nor does it provide any raw numbers from the benchmark run. A lone percentage figure as presented there means absolutely nothing whatsoever.
Finally, out of all the spec2k modules, they chose to showcase only two, presumably because those two showed the most favourable results. The second one is 186.crafty, showing a meagre 3% improvement (4% on Atom). This leaves one wondering what the results of the remainder looked like. Something tells me they showed improvements of less than 3%, if any at all.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 22:05 UTC (Mon) by butlerm (subscriber, #13312)
[Link]
181.mcf doesn't sound like a toy benchmark to me:
"For the considered single-depot case, the problem can be formulated as a large-scale minimum-cost flow problem that we solve with a network simplex algorithm accelerated with a column generation. The core of the benchmark 181.mcf is the network simplex code 'MCF Version 1.2 -- A network simplex implementation', For this benchmark, MCF is embedded in the column generation process." http://www.spec.org/cpu2000/CINT2000/181.mcf/docs/181.mcf...
In any case, if 181.mcf is so poorly written, perhaps someone could do us the favor of benchmarking other pointer intensive code instead of dismissing x32 without bothering to conduct a single relevant benchmark.
Pettenò: Debunking x32 myths
Posted Jun 25, 2012 21:48 UTC (Mon) by JoeBuck (subscriber, #2330)
[Link]
To give an example from one field (though one that matters very much to the proponents of this API, I suspect): the current state of affairs is that it is common for EDA users to use both a 32-bit and a 64-bit version of the same application (simulator, model checker, synthesis tool, etc) on x86-64 architecture, preferring the former if the problem size will fit into 32 bits. That's because these applications are pointer-heavy, data access heavy and limited by what will fit into physical memory (it's not just the cache size, though of course that is a factor as well). The result is often that the 32-bit version is faster, at least for a certain range of problem size that is rather common, because it takes less memory. The 64-bit version is often reserved for test cases that require more than 4GB, because the performance of a server farm running lots of simulations is often constrained by how many jobs will fit into physical memory.
With x32 there would still be two different versions of the executable. For that reason, the relevant comparison is between the x32 version and a traditional x86 32-bit version. There are a number of wins: the larger register set means much less penalty for PIC code; 64 bit operations are available, there is much less register-spilling code. I would still recommend a 64-bit kernel, but I find x32 to be very interesting and don't think that the author of this piece really understands why the proponents' employer might be investing heavily in this.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 5:18 UTC (Tue) by alonz (subscriber, #815)
[Link]
I wonder – are there any benchmarks based on Icarus Verilog? (It may not be a state-of-the-art EDA tool, but it is open-source, therefore accessible for a benchmark... And its operation is very pointer-heavy, esp. if you load a significant model)
Unfortunately I don't have access to an x32 system (nor the time to start tinkering with Gentoo), so I just run the benchmark myself :(