EPIC failure (to cancel the project when it was first failing)

Posted Nov 10, 2023 14:01 UTC (Fri) by farnz (subscriber, #17727)
In reply to: EPIC failure by CChittleborough
Parent article: The push to save Itanium

It's worth noting that when the EPIC project began in 1994, it was not clear that OoOE would win out; the Pentium Pro project hadn't yet delivered a chip, and was promising a reorder window somewhere around the 40 instruction mark. There were hand-crafted sequences that showed that, compared to compiler output targeting the Pentium and earlier processors, EPIC could exploit more ILP than the Pentium Pro's reorder window could find in the compiler output; this led people to assume that compiler enhancements would allow EPIC to exploit all of that ILP, without significantly changing the amount of ILP a PPro derivative could find as compared to a 1994 x86 compiler.

Additionally, there was still a strong possibility that growing the reorder window would not scale nicely, while we understood how to scale caches; it was plausible in 1994 that by 1998 (the intended delivery date of the first Itanium processor), Intel could build chips with megabytes of full-speed L2 cache (as opposed to the 512 KiB of 50% speed L2 cache they delivered with 1998's Pentium IIs), but with a reorder window still stuck around the 50 instruction mark, and that by 2004 (Prescott timeframe), they'd maybe have a reorder window around 60 instructions.

Three of the assumptions behind EPIC were proven wrong over time:

Compiler improvements to support finding ILP for EPIC also allowed the compiler to bring more ILP into a PPro sized reorder window.
Cache per dollar didn't grow as fast as needed to compensate for low code density compared to RISC or x86 CPUs.
AMD grew the reorder window faster than Intel had assumed was possible for x86.

Under the initial assumptions, EPIC made a lot of sense; Intel's failure was to not recognise that EPIC was built on predictions that had been proven false, and to bring it all the way to market a year late (1999 instead of 1998) when they should have been able to work out in 1996 (year after the PPro was released) that at least one of their assumptions was completely false (compiler improvements benefiting the PPro as much as they benefited simulated EPIC designs, instead of only the EPIC design benefiting).

EPIC failure (to cancel the project when it was first failing)

Posted Nov 10, 2023 23:26 UTC (Fri) by CChittleborough (subscriber, #60775) [Link]

This is an informative and insightful comment. Thank you.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 11, 2023 14:27 UTC (Sat) by pizza (subscriber, #46) [Link] (16 responses)

I'd argue that Itanium was actually an overwhelming success.

It got multiple RISC server vendors to scrap their in-house designs and hitch themselves to Intel's offerings,

EPIC failure (to cancel the project when it was first failing)

Posted Nov 11, 2023 15:21 UTC (Sat) by joib (subscriber, #8541) [Link] (15 responses)

Arguably industry consolidation was inevitable anyway due to exponentially increasing chip and process R&D costs, and the clock was ticking for the Unix vendors with their high margin low volume businesses. That Itanium delivered the coupe de grace to several of them was inconsequential, the ultimate winners being x86(-64), Windows and Linux.

One could even argue that without Itanium Intel would have introduced something x86-64-like sooner. Of course a butterfly scenario is what if in this case Intel would have refused to license x86-64 to AMD?

EPIC failure (to cancel the project when it was first failing)

Posted Nov 11, 2023 15:49 UTC (Sat) by pizza (subscriber, #46) [Link] (5 responses)

> Arguably industry consolidation was inevitable anyway

You're still looking at this from a big picture/industry-wide perspective.

The fact that Itanium was a technical failure doesn't mean it wasn't a massive strategic success for Intel. By getting the industry to consolidate around an *Intel* solution, they captured the mindshare, revenue, and economies of scale that would have otherwise gone elsewhere.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 11, 2023 17:01 UTC (Sat) by joib (subscriber, #8541) [Link] (4 responses)

All of them, Itanium included, faded away into near irrelevance. So whether Intel created Itanium or not, the industry would ultimately have consolidated around x86-64/windows/Linux.

Unclear whether Intel profited more from Itanium compared to the alternative scenario where they would have introduced x86-64 earlier.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 12, 2023 9:02 UTC (Sun) by ianmcc (subscriber, #88379) [Link] (3 responses)

If Intel had introduced x86-64 rather than AMD, would Intel have screwed it up?

EPIC failure (to cancel the project when it was first failing)

Posted Nov 13, 2023 12:41 UTC (Mon) by farnz (subscriber, #17727) [Link] (2 responses)

They'd have gone in one of two directions:

Panic-implement x86, but 64-bit. This is basically what AMD did for AMD64, because they needed a 64-bit CPU, but didn't have the money to do a "clean-sheet" redesign; Intel could have done similar.
A "new" ISA, based on IA-64 but built around OoOE instead of explicit compiler scheduling.

It'd be interesting to see what could have been if 1995 Intel had redesigned IA-64 around OoOE instead of EPIC; they'd still want compiler assistance in this case, because the goal of the ISA changes from "the compiler schedules everything, and we have a software-visible ALAT and speculation" to "the compiler stops us from being trapped when we're out of non-speculative work to do".

EPIC failure (to cancel the project when it was first failing)

Posted Dec 1, 2023 12:20 UTC (Fri) by sammythesnake (guest, #17693) [Link] (1 responses)

> Panic-implement x86, but 64-bit. This is basically what AMD did for AMD64, because they needed a 64-bit CPU, but didn't have the money to do a "clean-sheet" redesign

Although I'm sure the cost of a from-scratch design would have been prohibitive in itself for AMD, I think the decision probably had at least as much to do with a very pragmatic desire for backward compatibility with the already near-universal x86 ISA.

History seems to suggest that (through wisdom or luck) that was the right call, even with technical debt going back to the 4004 ISA which is now over half a century old(!) (https://en.m.wikipedia.org/wiki/Intel_4004)

EPIC failure (to cancel the project when it was first failing)

Posted Dec 1, 2023 13:17 UTC (Fri) by farnz (subscriber, #17727) [Link]

There's a lot about AMD64 that is done purely to reuse existing x86 decoders, rather than because they're trying to have backwards compatibility with x86 at the assembly level. They don't have backwards compatibility at the machine code level between AMD64 and x86, and they could have re-encoded AMD64 in a new format, while having the same instructions as they chose to implement.

That's what I mean by "not having the money"; if they wanted assembly-level backwards compatibility, but weren't short on cash to implement the new design, they could have changed instruction encodings so that (e.g.) we didn't retain special encodings for "move to/from AL" (which exist for ease of porting to the 8086 from the 8085). Instead AMD reused the existing x86 encoding, with some small tweaks.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 11, 2023 22:48 UTC (Sat) by Wol (subscriber, #4433) [Link] (4 responses)

> Of course a butterfly scenario is what if in this case Intel would have refused to license x86-64 to AMD?

A butterfly scenario? Don't you mean an alternate reality?

In THIS reality, what would have happened if AMD had refused to licence x86-64 to Intel?

In reality, I think that that couldn't happen - I don't know the details of the intricate licencing deals (which I believe goes back to the Cyrix 686 - yes that long ago), but I think there are licence sharing deals in place that meant Intel could use x86-64 without having to negotiate.

Cheers,
Wol

EPIC failure (to cancel the project when it was first failing)

Posted Nov 12, 2023 7:30 UTC (Sun) by joib (subscriber, #8541) [Link] (3 responses)

> A butterfly scenario? Don't you mean an alternate reality?

It was a reference to the "butterfly effect",

https://en.m.wikipedia.org/wiki/Butterfly_effect

, meaning that seemingly minor details can result in major unforeseen consequences.

(which is one reason why "alternate history" is seldom a usable tool for serious historical research)

> In THIS reality, what would have happened if AMD had refused to licence x86-64 to Intel?

IIRC Intel was making various threats towards AMD wrt licensing various aspects of the x86 ISA. AMD was definitely in a kind of legal underdog situation. Inventing x86-64 put AMD in a much stronger position and forced Intel into a cross licensing arrangement, guaranteeing a long lasting patent peace. Which was good for customers.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 14, 2023 10:12 UTC (Tue) by anselm (subscriber, #2796) [Link] (2 responses)

IIRC Intel was making various threats towards AMD wrt licensing various aspects of the x86 ISA. AMD was definitely in a kind of legal underdog situation. Inventing x86-64 put AMD in a much stronger position and forced Intel into a cross licensing arrangement, guaranteeing a long lasting patent peace. Which was good for customers.

There would have had to be some sort of arrangement in any case, because large customers (think, e.g., US government) tend to insist on having two different suppliers for important stuff.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 15, 2023 13:27 UTC (Wed) by Wol (subscriber, #4433) [Link] (1 responses)

Wich is why I mentioned the 686. I don't remember the details, but there was some sort of deal (with IBM?) and Cyrix which meant the 686 was legally licenced, and I thought AMD had inherited that. Either way, I'm sure AMD had some sort of grandfather licence deal.

Cheers,
Wol

EPIC failure (to cancel the project when it was first failing)

Posted Nov 15, 2023 14:22 UTC (Wed) by james (subscriber, #1325) [Link]

It largely goes back to the early IBM PC days, when both IBM and AMD acquired second-source licenses so they could make chips up to (and including) the 286 using Intel's designs, including patent cross-licenses.

They weren't the only ones.

When Intel and HP got together to create Merced (the original Itanium), they put the intellectual property into a company they both owned, but which didn't have any cross-license agreements in place, which is why AMD wouldn't have been able to make Itanium-compatible processors except on Intel's (and HP's) terms.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 13, 2023 1:01 UTC (Mon) by marcH (subscriber, #57642) [Link]

> That Itanium delivered the coupe de grace...

Coup: blow, strike, punch, kick, etc. Silent "p".
Coupe: cut (haircut, card deck, cross-section, clothes,...). Not silent "p" due to the following vowel.

So, some "coups de grâce" may have indeed involved some sort of... cut. Just letting you know about that involuntary, R-rated image of yours :-)

According to wiktionary, the two words have only one letter difference but totally different origin.

For completeness:
Grâce: mercy (killing) or thanks (before a meal or in "thanks to you")
Dropping the ^ accent on the â doesn't... hurt much.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 17, 2023 11:10 UTC (Fri) by lproven (guest, #110432) [Link] (2 responses)

> One could even argue that without Itanium Intel would have introduced something x86-64-like sooner.

That's a good point and it's almost certainly true.

> Of course a butterfly scenario is what if in this case Intel would have refused to license x86-64 to AMD?

There is an interesting flipside to this.

There *were* 2 competing x86-64 implementations: when Intel saw how successful AMD's was becoming, it invented its own, _poste haste,_ and presented it secretly to various industry partners.

Microsoft told it no, reportedly with a comment to the effect of "we are already supporting *one* dead-end 64-bit architecture of yours, and we are absolutely *not* going to support two of them. Yours offers no additional improvements, and AMD64 is clearly winning, and so you must be compatible with the new standard."

(This was reported in various forum comments at the time and I can't give any citations, I'm afraid.)

For clarity, the one dead-end arch I refer to is of course Itanium.

Intel was extremely unhappy about this and indeed furious but it had contractual obligations with HP and others concerning Itanium so it could not refuse. Allegedly it approached a delighted AMD and licensed its implementation, and issued a very quiet public announcement about it with some bafflegab about existing mutual agreements -- as AMD was already an x86 licensee, had been making x86 chips for some 20 years already, and had extended this as recently as the '386 ISA. Which Intel was *also* very unhappy about, but some US governmental and military deals insisted that there were second sources for x86-32 chips, so it had to.

My personal and entirely unsubstantiated notion is that UEFI was Intel's revenge on the x86-64 market for being forced to climb down on this. We'd all have been much better off with OpenFirmware (as used in the OLPC XO-1) or even LinuxBios.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 17, 2023 16:24 UTC (Fri) by james (subscriber, #1325) [Link] (1 responses)

This was reported in various forum comments at the time and I can't give any citations, I'm afraid.

I can.

Obviously, neither Microsoft nor Intel have publicly confirmed this, so a quote from Charlie is as good as you're going to get.

(And I can quite see Microsoft's point: the last thing they wanted was FUD between three different 64-bit instruction sets, with no guarantee as to which was going to win, one of them requiring users to purchase new versions of commercial software to get any performance, and the prospect that you'd then have to buy new versions again if you guessed wrong.

It would have been enough to drive anyone to Open Source.)

EPIC failure (to cancel the project when it was first failing)

Posted Nov 17, 2023 17:56 UTC (Fri) by lproven (guest, #110432) [Link]

Oh well done!

I used to write for the Inq myself back then, too. Never got to meet Charlie, though.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 16, 2023 11:13 UTC (Thu) by anton (subscriber, #25547) [Link] (6 responses)

The promise of EPIC was also that the hardware would be simpler, faster and allow wider issue because:

The hardware would not have to check for dependences between registers in the same instruction group, while an ordinary n-wide superscalar RISC (even if in-order) has to check whether any of the n next instructions accesses a register that an earlier of those instructions writes. The argument was that this requires quadratic effort and does not scale.
The hardware would not have to deal with scheduling and could therefore be built to run at higher clockspeeds. In reality IA-64 implementations were always at a clock speed disadvantage compared to out-of-order AMD64 implementations. And looking at other instances of in-order vs. OoO, OoO usually was competetive or even had the upper hand in clock speed. We saw this from the start with the Pentium at 133MHz and the Pentium Pro at 200MHz in 1995.
Compilers have a better understanding of which instructions are on the critical path and therefore need to be reordered, whereas hardware scheduling just executes ready instructions. In practice compiler knowledge is limited by compilation unit boundaries and stuff like indirect calls, cache misses, and, most importantly, by the much lower accuracy of static branch prediction compared to dynamic (hardware) branch prediction. Admittedly in 1994 the hardware branch predictors had not advanced as far, so one still might think at the time that the other expected advantages would compensate for that.
Some people think that the Achilles heel of EPIC is cache misses, but the fact that IA-64 implementations prefered smaller (i.e., less predictable) L1 caches with lower latency over bigger, more predictable L1 caches shows that the compilers have more problems dealing with the latency than with the unpredictability.

These promises were so seductive that not just Intel and HP, but also Transmeta's investors burned a lot of money on them, and even in recent years I have seen people advocating EPIC-like ideas.

One aspect that is often overlooked in these discussions is the advent of SIMD instructions in general-purpose computers in the mid-1990s. The one thing were IA-64 shone was dealing with large arrays of regular data, but SIMD also works well for that, at lower hardware cost. So mainstream computing squeezed EPIC from both sides: SIMD ate its lunch on throughput computing, while OoO outperformed it in latency computing.

As for coping with reality, the Pentium 4 was released in November 2000 with a 128-instruction reorder window and SSE2. This project and its parameters had been known inside Intel several years in advance, while Merced (the first Itanium) was released in May 2001 (with 800MHz while the Pentium 4 was available with 1700MHz at the time).

But of course, by that time, Intel had made promises about IA-64 for several years, and a lot of other companies had invested in Intel's roadmaps, so I guess that Intel could not just say "Sorry, we were wrong", but they had to continue on the death march. The relatively low hardware development activities after the McKinley (2002) indicates that Intel had mostly given up by that time (but then, how do we explain Poulson?).

EPIC failure (to cancel the project when it was first failing)

Posted Nov 16, 2023 15:56 UTC (Thu) by farnz (subscriber, #17727) [Link] (3 responses)

A couple of things:

Intel's failure of imagination with compiler technology was a failure to observe that compiler scheduling goes hand-in-glove with hardware scheduling; the original comparison between what a 1994 compiler could generate for the Pentium Pro versus hand-optimized code for a hypothetical EPIC machine should have been redone as the compiler for the EPIC machine improved. Had they done this, they'd have noticed that the compiler improvements needed for EPIC also benefited the Pentium Pro/II/III, and would have been a lot less bullish on IA-64.
The implementations of IA-64 needed a lot of cache, but not low-latency cache, in order to perform adequately; the ISA made it possible to not really care much about latency of instruction fetch (at least in theory), but did require a decent throughput. So, where IA-32 had 1 MiB of L3 cache, Merced had 4 MiB of L3 in the same timescale, and this increased requirement for total cache stayed throughout IA-64's lifetime.

I stuck to what Intel should have known in 1995 for two reasons: first because this was before the hype train for IA-64 got properly into motion (and as you've noted, once the hype train got moving, Intel couldn't easily back out of IA-64). Second is that by my read of things like this oral history about Bob Colwell's time at Intel, Intel insiders with reason to be heard (had worked on Multiflow, for a start) had already started sounding the alarm about Itanium promises by 1995, and so I don't think it impossible that an alternate history would have had Intel reassessing the chances of success at this point in time.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 16, 2023 16:56 UTC (Thu) by anton (subscriber, #25547) [Link] (2 responses)

I don't think that compiler technology that benefitted IA-64 would have benefitted OoO IA-32 implementations much, for the following reasons: Techniques like basic block instruction scheduling, superblock scheduling, and modulo scheduling (software pipelining) were already known at the time of the Pentium, and the in-order Pentium would have benefitted more from it than the Pentium Pro. However, these techniques tend to increase the register pressure (IA-64 has 128 integer registers for a reason), and IA-32 has only 8 registers, so applying such techniques could have easily resulted in a slowdown.
IA-64 also has architectural features for speculation and for dealing with aliases that IA-32 does not have and so an IA-32 compiler cannot not use. But given the lack of registers, that's moot.
Every CPU profits from more cache for applications that need it (e.g., see the Ryzen 5800X3D), and given main memory latencies of 100 cycles and more, the benefit of OoO execution for that is limited (at that time, but also now). Itanium (II) got large caches, because a) that's what HP's customer's were used to and b) because even outside HP the initial target market was high end stuff. Also, given all the promises about superior performance, cache is an easy way to get it on applications that benefit from cache (and if you skimp on it, it's an easy way to make your architecture look bad in certain benchmarks). So Itanium II (not sure about Itanium) could score at least a few benchmark wins, and its advocates could say: "See, that's what we promised. And when compilers improve, we will also win the rest."

EPIC failure (to cancel the project when it was first failing)

Posted Nov 16, 2023 18:40 UTC (Thu) by farnz (subscriber, #17727) [Link] (1 responses)

I disagree, in large part because the benefits of EPIC were being touted by comparison of hand-crafted EPIC examples versus compiler output; in other words, Intel could reasonably (by 1995) have had an EPIC compiler, and be showing that it was a long way short of the needed quality to meet hand-crafted examples. I'd also note that the techniques that would be needed to make EPIC compilers meet the hand-crafted examples go well beyond today's state of the art.

And underlying this is the degree to which EPIC was focused on impractical "if only software would do better" situations, not on the real world.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 21, 2023 20:25 UTC (Tue) by JohnDallman (guest, #168141) [Link]

I spent 1999-2003 porting a mathematical modeller to Itanium on Windows and HP-UX. I was one of the first to ship commercial software on Itanium, but it never made any money. The only positives from the work were educational. I had a lot of contact with Intel, and an excellent engineering contact, and got some insight into their thinking.

They never had a real plan for how to make compilers discover the parallelization opportunities that they wanted to exist in single-threaded code. The intention was to have a horde of developers, and discover lots of heuristics that would add up to that. Essentially, a fantasy, but it meant the compiler team got to add more people and its managers got career progression. This had started early in the project, when the compiler people claimed "we can handle that" for many of the difficulties in hardware design.

EPIC failure (to cancel the project when it was first failing)

Posted Nov 17, 2023 14:57 UTC (Fri) by foom (subscriber, #14868) [Link] (1 responses)

When I first read about itanium, I thought that surely Intel must be targeting it to JIT-compiled languages, and just didn't care that much about ahead-of-time compiled languages. (Java is the future, right?)

Because, in the context of a JIT, many of the impossible static scheduling problems of the in order EPIC architecture seem to go away.

You can have the CPU _measure_ the latencies, stalls, dependencies, etc, and then have software just reorder the function so the next executions are more optimally arranged to take advantage of that observed behavior. You should be able to make more complex and flexible decisions in your software JIT than e.g. a hardware branch predictor can do, while still being able to react to changing dynamic execution conditions unlike AOT compiled code.

But, I guess Transmeta thought so too; their architecture was to have a software JIT compiler translating from X86 to a private internal EPIC ISA. And that didn't really work either...

EPIC failure (to cancel the project when it was first failing)

Posted Nov 17, 2023 15:44 UTC (Fri) by paulj (subscriber, #341) [Link]

If hardware can apply an optimisation, a software JIT should be able to as well. The question must then be:

- Can the transistors (and pipeline depth) saved in hardware then be used to gain performance?

Transmeta did well on power, but was not fast. Which suggests the primary benefit is an increase in performance/watt from the transistor savings - not outright performance. Either that, or Transmeta simply didn't have the resources (design time, expertise) to use the extra transistor budget / simpler pipeline to improve outright performance.