LWN.net Logo

Pettenò: Debunking x32 myths

Pettenò: Debunking x32 myths

Posted Jun 27, 2012 23:35 UTC (Wed) by nix (subscriber, #2304)
In reply to: Pettenò: Debunking x32 myths by mansr
Parent article: Pettenò: Debunking x32 myths

Not so. -mfpmath=sse only causes SSE to be used for math within a single function (including temporaries), and calls between functions with static linkage. All calls between functions with external linkage must still conform to the ABI, which means they must use the x87 registers or be spilled to memory. Thus, -mfpmath=sse can actually slow down code due to needless moves from SSE to x87 and back.

The option you're thinking of is -msseregparm, which elicits warnings whenever you use it because it breaks the ABI, meaning that you must link every single thing that you pass floating-point arguments to or receive floating-point return values from with the same option.

This includes libm, which you'll probably need to hack to expect its arguments in SSE registers, since a lot of its 32-bit code expects to receive them in x87 -- and sacrifice compatibility with everyone else's 32-bit x86 code, since nobody else uses that option. If you're doing that these days, you may as well use x32. :)


(Log in to post comments)

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 0:42 UTC (Thu) by mansr (guest, #85328) [Link]

Thanks for the clarification on the flags.

While you are right that -mfpmath=sse still uses the x87 parameter passing, it is my experience that (well-written) software making heavy use of floating-point spends most its time inside functions rather than in calls between them. Moreover, such software mostly passes around pointers to large arrays of data, not individual floating-point values.

Concerning libm, many compilers (gcc included) inline many of its functions, often using only one or a few instructions. For example, on x86 a call to sqrt() is turned into a single sqrtsd instruction.

Pettenò: Debunking x32 myths

Posted Jun 28, 2012 14:23 UTC (Thu) by nix (subscriber, #2304) [Link]

You are, of course, correct that -mfpmath=sse provides most of the performance benefits of the -msseregparm-equivalent used by x32 -- however, it doesn't provide all of them, and in extreme circumstances can actually be slower than x87 (though it generally requires contrived benchmarks to do that).

Regarding inlined math operations, yes, quite a few can be inlined. A lot of the more complex stuff is just too large to usefully inline, though :( but I suppose the really common things generally are inlined (sqrt() being rather more commonly used than, e.g., y1f()).

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds