Posted Jun 27, 2012 19:42 UTC (Wed) by Otus (guest, #67685)
[Link]
"Yes, that's true." Basically, AFAICT, the *only* advantages of x32 over
32-bit user-space on 64-bit kernel are 1) twice the registers, and 2) 64-bit
time_t (and some other syscall things).
Listing the advantages of x32 over 32-bit/32-bit or 64-bit/64-bit
user-space/kernel is misleading, because 32-bit/64-bit is the one it needs
to really improve upon to be worth the effort.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:28 UTC (Wed) by nix (subscriber, #2304)
[Link]
x32 also has a better floating-point ABI (using SSE rather than the x87).
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:51 UTC (Wed) by mansr (guest, #85328)
[Link]
GCC has -mfpmath=sse for that.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:58 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
the problem is that on ia32 you can't assume that the CPU has sse, on x32 you can.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 22:20 UTC (Wed) by mansr (guest, #85328)
[Link]
If you are considering x32, your hardware obviously has SSE.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 22:35 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
correct, but unless you are using a source-based distro where you customize your compiler flags, the distro is going to be building the ia32 binaries for a least-common-denominator, and so they will not be compiling with flags that will only work on some CPUs
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:02 UTC (Wed) by mansr (guest, #85328)
[Link]
If they are not willing to build a version with compiler flags that only work on CPUs newer than 10 years, what makes you think they'll be willing to spend orders of magnitude more effort to rebuild for an entire new ABI?
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:50 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
because building a multiple ia32 versions is still doing an entire new ABI, binaries compiled for it won't work on the 'regular' distro.
In the past, some distros had kernels for i386, i486, i586, and i686. it was a major headache and not very effective. In many ways doing an entire new ABI is easier to deal with than dealing with a slight variation to an existing one.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:59 UTC (Wed) by nix (subscriber, #2304)
[Link]
Yep. This is why -mfpmath=sse doesn't actually change the ABI. :)
(But with regard to the flag which *does*, everything you say is true. glibc's hwcaps mechanism will allow you to implement 'slight variation on instruction set', allowing some but not all libraries to have alternate versions for various hwcaps, plus 'tls' as a now-obsolete special case. On x86, this tends to get used to compile different x86-32 binaries for machines supporting versus not supporting the CMOV instruction; on e.g. SPARC64, it is (or was) used to provide alternate versions of libraries for the SPARCv9 32-bit instruction set, which is much like x32 except ABI-compatible with the usual SPARC 32-bit ABI -- all the SPARCv9 registers plus integer multiply and divide instructions, imagine that! You can't use the hwcaps mechanism to support different ABIs though, because nothing stops a hwcapped library calling a non-hwcapped one, or vice versa.)
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 5:33 UTC (Thu) by Otus (guest, #67685)
[Link]
You could build a separate 32-bit userspace for 64-bit kernels that
assumed everything amd64 requires in compiler flags, including SSE1 and 2.
I don't see why this would be more work than supporting an x32 userspace.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:35 UTC (Wed) by nix (subscriber, #2304)
[Link]
Not so. -mfpmath=sse only causes SSE to be used for math within a single function (including temporaries), and calls between functions with static linkage. All calls between functions with external linkage must still conform to the ABI, which means they must use the x87 registers or be spilled to memory. Thus, -mfpmath=sse can actually slow down code due to needless moves from SSE to x87 and back.
The option you're thinking of is -msseregparm, which elicits warnings whenever you use it because it breaks the ABI, meaning that you must link every single thing that you pass floating-point arguments to or receive floating-point return values from with the same option.
This includes libm, which you'll probably need to hack to expect its arguments in SSE registers, since a lot of its 32-bit code expects to receive them in x87 -- and sacrifice compatibility with everyone else's 32-bit x86 code, since nobody else uses that option. If you're doing that these days, you may as well use x32. :)
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 0:42 UTC (Thu) by mansr (guest, #85328)
[Link]
Thanks for the clarification on the flags.
While you are right that -mfpmath=sse still uses the x87 parameter passing, it is my experience that (well-written) software making heavy use of floating-point spends most its time inside functions rather than in calls between them. Moreover, such software mostly passes around pointers to large arrays of data, not individual floating-point values.
Concerning libm, many compilers (gcc included) inline many of its functions, often using only one or a few instructions. For example, on x86 a call to sqrt() is turned into a single sqrtsd instruction.
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 14:23 UTC (Thu) by nix (subscriber, #2304)
[Link]
You are, of course, correct that -mfpmath=sse provides most of the performance benefits of the -msseregparm-equivalent used by x32 -- however, it doesn't provide all of them, and in extreme circumstances can actually be slower than x87 (though it generally requires contrived benchmarks to do that).
Regarding inlined math operations, yes, quite a few can be inlined. A lot of the more complex stuff is just too large to usefully inline, though :( but I suppose the really common things generally are inlined (sqrt() being rather more commonly used than, e.g., y1f()).