Compared to 64-bit, 32-bit (be it x32 or x86) uses less cache, memory and disk. The latter two are not all that important, but the decreased cache size can make a big difference in some cases.
The main use for x32 isn't for regular desktops. For regular desktops/servers the advantage of x64 is clear (being able to address more than 4GB/process).
On an embedded device, which probably doesn't even have 4GB of RAM, however, the question is rather: what benefits would x64 have compared to x32?
Posted Jun 26, 2012 9:53 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
very few desktop apps need more than 4G/process (including the browser, at least with current versions),
however, you are missing significant benefits of x32 vs traditional 32 bit
added total address space (since you are using a 64 bit kernel)
added per-process memory (4G instead of 2G)
twice the number of registers.
ability to rely on advanced CPU features (compared to i[3456]86)
64 bit commands.
64 bit time_t
moving from x32 to full 64 bit adds
increased per-process address space (beyond 4G) with the drawback of increased pointer size (and the effects this can have on cache pressure)
The advantages of moving off of pure 32 bit are pretty clear, the only drawback being that it's new and has less testing.
however, when comparing x32 and 64 bit, it's less clear if the overall win is going to be in the address space or the cache efficiency.
unless, as you point out, the system just doesn't have enough ram to use the additional address space, and that point is probably higher than most people thing. I expect it's probably at 8+GB of ram, not at 4G of ram. Remember that memory used for disk caches, the kernel, housekeeping apps, etc can easily use the ram beyond what the 'main' app of the system uses, even assuming that the 'main' app of the system is a single process.
If the 'main' app of the system in multiple processes not a single process with multiple threads, which is a fairly good thing to do when faced with multi-core CPUs anyway, the amount of memory in a 'typical' system before it really needs 64 bit addressing in userspace can easily go beyond 16G
on the other hand, some people will have apps that really want more address space, even on low-memory machines (it may be more efficient to address the memory sparsely than to add the complication to use it more efficiently)
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 12:59 UTC (Tue) by tao (subscriber, #17563)
[Link]
I think you must've misunderstood or misread my comment. You explicitly asked for advantages of x32 over x64. The advantages I mentioned (cache footprint, etc) were about that, nothing else.
Nowhere did I question the merits of x64 (or x32, for that matter) over x86.
As far as address space goes though, you can use a 64-bit kernel even with x86.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 16:00 UTC (Tue) by Otus (guest, #67685)
[Link]
> however, you are missing significant benefits of x32 vs traditional 32 bit
> added total address space (since you are using a 64 bit kernel)
It's the same with normal 32-bit userspace on a 64-bit kernel.
> added per-process memory (4G instead of 2G)
Ditto?
> twice the number of registers.
Yes.
> ability to rely on advanced CPU features (compared to i[3456]86)
With amd64 you can only rely on SSE and SSE2 being available. I assume it's
the same with x32? You'll still need some sort of cpuid checks for more
recent features and you can do that on normal 32-bit as well.
(Or you can just tell the compiler to assume they are available.)
> 64 bit commands.
What do you mean?
> 64 bit time_t
Yes.
Pettenò: Debunking x32 myths
Posted Jun 26, 2012 19:05 UTC (Tue) by dlang (✭ supporter ✭, #313)
[Link]
>> 64 bit commands.
> What do you mean?
My understanding is that with x32 the compiler can use CPU instructions that operate on 64 bit items.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 14:29 UTC (Wed) by hummassa (subscriber, #307)
[Link]
>> twice the number of registers.
>Yes.
I couldn't parse your "yes"es...
Did you mean "yes, I know that using x32 instead of ia32 I have access to twice the number of registers"?
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 19:42 UTC (Wed) by Otus (guest, #67685)
[Link]
"Yes, that's true." Basically, AFAICT, the *only* advantages of x32 over
32-bit user-space on 64-bit kernel are 1) twice the registers, and 2) 64-bit
time_t (and some other syscall things).
Listing the advantages of x32 over 32-bit/32-bit or 64-bit/64-bit
user-space/kernel is misleading, because 32-bit/64-bit is the one it needs
to really improve upon to be worth the effort.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:28 UTC (Wed) by nix (subscriber, #2304)
[Link]
x32 also has a better floating-point ABI (using SSE rather than the x87).
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:51 UTC (Wed) by mansr (guest, #85328)
[Link]
GCC has -mfpmath=sse for that.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 21:58 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
the problem is that on ia32 you can't assume that the CPU has sse, on x32 you can.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 22:20 UTC (Wed) by mansr (guest, #85328)
[Link]
If you are considering x32, your hardware obviously has SSE.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 22:35 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
correct, but unless you are using a source-based distro where you customize your compiler flags, the distro is going to be building the ia32 binaries for a least-common-denominator, and so they will not be compiling with flags that will only work on some CPUs
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:02 UTC (Wed) by mansr (guest, #85328)
[Link]
If they are not willing to build a version with compiler flags that only work on CPUs newer than 10 years, what makes you think they'll be willing to spend orders of magnitude more effort to rebuild for an entire new ABI?
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:50 UTC (Wed) by dlang (✭ supporter ✭, #313)
[Link]
because building a multiple ia32 versions is still doing an entire new ABI, binaries compiled for it won't work on the 'regular' distro.
In the past, some distros had kernels for i386, i486, i586, and i686. it was a major headache and not very effective. In many ways doing an entire new ABI is easier to deal with than dealing with a slight variation to an existing one.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:59 UTC (Wed) by nix (subscriber, #2304)
[Link]
Yep. This is why -mfpmath=sse doesn't actually change the ABI. :)
(But with regard to the flag which *does*, everything you say is true. glibc's hwcaps mechanism will allow you to implement 'slight variation on instruction set', allowing some but not all libraries to have alternate versions for various hwcaps, plus 'tls' as a now-obsolete special case. On x86, this tends to get used to compile different x86-32 binaries for machines supporting versus not supporting the CMOV instruction; on e.g. SPARC64, it is (or was) used to provide alternate versions of libraries for the SPARCv9 32-bit instruction set, which is much like x32 except ABI-compatible with the usual SPARC 32-bit ABI -- all the SPARCv9 registers plus integer multiply and divide instructions, imagine that! You can't use the hwcaps mechanism to support different ABIs though, because nothing stops a hwcapped library calling a non-hwcapped one, or vice versa.)
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 5:33 UTC (Thu) by Otus (guest, #67685)
[Link]
You could build a separate 32-bit userspace for 64-bit kernels that
assumed everything amd64 requires in compiler flags, including SSE1 and 2.
I don't see why this would be more work than supporting an x32 userspace.
Pettenò: Debunking x32 myths
Posted Jun 27, 2012 23:35 UTC (Wed) by nix (subscriber, #2304)
[Link]
Not so. -mfpmath=sse only causes SSE to be used for math within a single function (including temporaries), and calls between functions with static linkage. All calls between functions with external linkage must still conform to the ABI, which means they must use the x87 registers or be spilled to memory. Thus, -mfpmath=sse can actually slow down code due to needless moves from SSE to x87 and back.
The option you're thinking of is -msseregparm, which elicits warnings whenever you use it because it breaks the ABI, meaning that you must link every single thing that you pass floating-point arguments to or receive floating-point return values from with the same option.
This includes libm, which you'll probably need to hack to expect its arguments in SSE registers, since a lot of its 32-bit code expects to receive them in x87 -- and sacrifice compatibility with everyone else's 32-bit x86 code, since nobody else uses that option. If you're doing that these days, you may as well use x32. :)
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 0:42 UTC (Thu) by mansr (guest, #85328)
[Link]
Thanks for the clarification on the flags.
While you are right that -mfpmath=sse still uses the x87 parameter passing, it is my experience that (well-written) software making heavy use of floating-point spends most its time inside functions rather than in calls between them. Moreover, such software mostly passes around pointers to large arrays of data, not individual floating-point values.
Concerning libm, many compilers (gcc included) inline many of its functions, often using only one or a few instructions. For example, on x86 a call to sqrt() is turned into a single sqrtsd instruction.
Pettenò: Debunking x32 myths
Posted Jun 28, 2012 14:23 UTC (Thu) by nix (subscriber, #2304)
[Link]
You are, of course, correct that -mfpmath=sse provides most of the performance benefits of the -msseregparm-equivalent used by x32 -- however, it doesn't provide all of them, and in extreme circumstances can actually be slower than x87 (though it generally requires contrived benchmarks to do that).
Regarding inlined math operations, yes, quite a few can be inlined. A lot of the more complex stuff is just too large to usefully inline, though :( but I suppose the really common things generally are inlined (sqrt() being rather more commonly used than, e.g., y1f()).