I think that you may be confused about your chips.
The Itanium was nearly the complete OPPOSITE of a P4 design. In the Itanium design the compiler was responsible for figuring out what memory to preload, what branches to predict and what instructions to run in parallel. The Itanium CPU itself was a very RISCy design in its way without much special logic.
In a P4 and other IA32 designs, the CPU has big piles of logic dedicated to branch predictions, instruction decoding, speculative execution and parallel instruction dispatch with the associated timeline cleanup at the end to make it all appear sequential.
Itanium dropped quite a lot of that, which I think was a very good decision.