It works great, especially after a profiling run of the code.
Is there ANYTHING that CPU silicon can do with an instruction stream that software cannot do? No, not really.
It's not rare at all. Most software improves quite a bit on any CPU when rebuilt with performance feedback optimizations. After profiling, true data is available for branch and memory predictions. CPU silicon without hints can only look a few steps ahead and make guesses.
And I'm not sure what you mean by tasks where SMP is unusable. Most Itanium systems are SMP. IA64 SMP works even better than Xeon's because Intel fixed some of x86's short-sighted memory ordering rules.