"Average win of 1.6% to 3.8% does not look very compelling."
Perhaps but the average win on typical GNU/Linux systems is greater than that: GCC does not do so well on register starved platforms. Find a comparison on GCC and you the difference is greater because x86 is really quite register starved.
Additionally, x86_64 makes mandatory a number of micro-architectural improvements which were previously optional. For example, consider SSE2 which is not universally available on x86. So generally distributed binaries must either offer alternative functions and detect at run time, or, more commonly, avoid using those instructions. x86_64 reset the baseline and you can just assume that many of these features will be there. This difference won't show up in benchmarks like the one you cited, since it used compiles directed at the particular CPUs in use, but it does show up in the real world.
There are also a couple of other advantages of x86_64 mode which were not mentioned up-thread: Much faster sysctls and improved security (no-execute pages are always available, bigger address space means that addresses can always contain null bytes, etc).
And I'm not sure why you say that ~3% by itself isn't all that compelling when the price difference between a 2.83GHz core2 quad and a 3.0GHz core 2 quad cpu is $230 US. Based on that simplistic analysis we might expect 3% to be worth $115 US while if you use a modern x86_64-painless distro and don't use non-free software the cost of 64bit mode is pretty close $0.
Posted Oct 18, 2008 18:18 UTC (Sat) by khim (subscriber, #9252)
[Link]
Perhaps but the average win on typical GNU/Linux systems is
greater than that: GCC does not do so well on register starved platforms.
Find a comparison on GCC and you the difference is greater because x86 is
really quite register starved.
GCC was really, really bad on register starved architectures, but since
IA32 was the most important architectures for many years it got many tricks
which helped. This difference is not so
great with gcc too: 5.2%.
Additionally, x86_64 makes mandatory a number of micro-
architectural improvements which were previously optional. For example,
consider SSE2 which is not universally available on x86.
Very few systems today don't have SSE2. Athlon XP was discontinued three
years ago and even VIA
C7 and Intel Atom
support SSE2 today - but they don't support x86-64!
So generally distributed binaries must either offer alternative
functions and detect at run time, or, more commonly, avoid using those
instructions.
1. There are yet another alternative: barf and refuse to run. Few
systems will be affected.
2. Flash was written long ago - so MMX/SSE/SSE2 versions of all routines
already exist.
And I'm not sure why you say that ~3% by itself isn't all that
compelling when the price difference between a 2.83GHz core2 quad and a
3.0GHz core 2 quad cpu is $230 US. Based on that simplistic analysis we
might expect 3% to be worth $115 US while if you use a modern x86_64-
painless distro and don't use non-free software the cost of 64bit mode is
pretty close $0.
Because average difference of just 3% with difference from -40% to 30%
in case by case basis means even bigger win will be to carefully select
which programs must be 32bit and which - 64bit. And win from profile based
optimization and whole program optimizations can dwarf that 3-5% in many
cases - but nobody does that. It looks kinda hypocritical to press Adobe to
rewrite the Flash for mere 3-5% win when Linux community does not do it's
own
homework.
Whopping 5% for GCC...
Posted Oct 18, 2008 20:32 UTC (Sat) by gmaxwell (subscriber, #30048)
[Link]
Don't make the mistake of assuming that I think Adobe should do a 64-bit Linux port of flash. I don't. I don't use flash nor do I care what they do. I was responding specifically to your argument that x86_64 is not worthwhile.
You're still arguing from a non-realistic position: It's not reasonable for distributions to do profile driven compilation because of the load it imposes on the build infrastructure. Likewise they can't reasonably do whole program optimization because of the enormous memory requirements. Today in the real world x86_64 does give real measurable performance improvements. The floating point improvement alone justifies it.
Likewise, being unable to support non-P4+ systems isn't reasonable. There are still plenty of x86 CPUs sold today without full SSE2 support (Geode, for example). It makes a lot of sense to batch together all of the little non-backwards-compatible micro-architecture improvements and break compatibility once, offering x86_64 for new systems, and i486 compatible code for everything else. Which is exactly what is being done.
It seems to me that you have missed the point I was making about 3%. People are paying $240/cpu for what is *at best* a 6% improvement. If you have any floating point workloads x86_64 is a big win, and even if you have purely integer workloads x86_64 is still a worthwhile improvement in typical cases.
Your cited benchmark here is antiquated: Would you really expect GCC from 2004 to take full advantage of a CPU architecture which had just become available in 2003? I do not have a copy of the speccpu benchmarks, but I'm familiar with the mincost optimization code (mcf) that performed so poorly on x86_64 in your cited benchmark. My own quick test here shows mcf on x86_64 performing only 3% slower than when compiled for x86 with gcc 4.3.0 on core2. My test problem[1] may not be comparable to whatever is in specint, but this suggests to me that the 40% hit documented in your linked benchmark is probably no longer representative of x86_64 performance with GCC, even on one-off micro-benchmarks.