You are, of course, correct that -mfpmath=sse provides most of the performance benefits of the -msseregparm-equivalent used by x32 -- however, it doesn't provide all of them, and in extreme circumstances can actually be slower than x87 (though it generally requires contrived benchmarks to do that).
Regarding inlined math operations, yes, quite a few can be inlined. A lot of the more complex stuff is just too large to usefully inline, though :( but I suppose the really common things generally are inlined (sqrt() being rather more commonly used than, e.g., y1f()).