For me, I do not think FatELF would solve the problems with regards to optimizations. Let us take the more "modern" architectures ( and disregard the need for "i686 without CMOV" ) we have things like Atom(ipia) vs. i686 where the two have vastly different behaviour wrt. performance. Right now we don't even know which way will be dominant in the future, but a fair guess would be to see that atom-like architectures become more common in the future, and I don't think the dynamic linker will be a good place for this kind of logic.
Then again, I also believe in link time optimization for gcc and the tooth fairy.