Firstly, you're going on about deep pipelines. Which causes processor stalls. Which was, I believe, a major reasoning for abandoning the Pentium 4 architecture - it was so prone to massive stalls it wasn't true.
And secondly, while I can't remember / don't know an awful lot about 50-series architecture, I don't understand why ring-switching should be slow. It's something to do with the memory segmentation, but the point was the segmentation gave you fast AND SAFE switching.
The Intel architecture won. Intel architecture cannot do a fast ring-switch. Doesn't mean that other architectures can't, doesn't mean that Intel architecture is the best. It just happened to be the one that gained the market share needed for network effects to knock out the competition.
If Pr1me hadn't lost out in the market, and had continued development of their cpus, I'm sure they could have taken advantage of all the same things as Intel, and we would expect fast ring-switching as a matter of course. Iirc, the difference in speed between a same-ring and a ring-switch call (for the second invocation, first was I believe somewhat slower) was pretty near nothing.
The Pentium 4 was the last gasp of the MegaHurtz wars - it wasn't a good architecture - it was a good marketecture which blew up badly in the real world.