Maybe your pulseaudio is resampling, or maybe it spends 90 % of time in gettimeofday(). It'd be a good idea to acquire an oprofile trace to figure out where it is hurting so much. Also, be sure to fix CPU speed to some known frequency, otherwise percentual usage values are a very poor metric for measuring relative performance.
I know that mobile systems may require the use of integer arithmetics, although I am hoping that floating point capability will get added to every CPU in time. Software emulation of floating point exists, but in practice software float is too wasteful to be acceptable. I did some quick testing on a HTC Hero and got the result that software emulated float took about 5x the time of similar integer code.
My practical experience suggests that the sort of things gstreamer needs to do (dithering, scaling, mixing, copying) take insignificant time compared to any other work that is also ongoing. That would also include decoding any codec.