Supporting 64-bit ARM systems
Supporting 64-bit ARM systems
Posted Jul 13, 2012 8:55 UTC (Fri) by etienne (guest, #25256)In reply to: Supporting 64-bit ARM systems by zlynx
Parent article: Supporting 64-bit ARM systems
Seems that CPU silicon can adapt to memory access taking different times (in layer 1 cache / in layer 2 cache...) (so re-order instructions) better than compiler, mostly when the same code is executed at different times.
Posted Jul 14, 2012 22:58 UTC (Sat)
by jzbiciak (guest, #5246)
[Link] (2 responses)
For example, the compiler doesn't have sight of the other software on the system. Suppose process A is polluting the caches. When running unrelated process B, the CPU still could improve its performance through out-of-order scheduling and other tricks that hide the miss penalties. The compiler had no way of predicting those when it compiled process B's executable.
Statically scheduled architectures such as IA64 won't reorder the instruction scheme, although they will (especially once they got to McKinley) aggressively try to reorder memory requests and allow multiple misses in flight. As a result, to address concerns such as the pollution issue above, the compiler needs to try to schedule loads as early as possible. This is why IA64 has "check loads." they all you issue a speculative load even earlier -- perhaps before you even know the pointer is valid -- and then issue a "check load" to copy the loaded value into an architectural register at the load's original execution point.
The "check load" is where all exceptions get anchored in the case of a fault. If the speculative load got invalidated for some reason (it doesn't have to be a fault -- the hardware can drop a speculative load for any reason at all), the check-load will re-issue it. It allows the compiler to mimic the hardware's speculation to a certain extent.
It's not problem free. Speculative load / check load pairs tie up more issue slots than straight loads that hardware might speculate. If the check load stalls, it stalls all the code that follows it (statically scheduled, remember?), whereas with hardware speculation, only the dependent instructions stall while others can proceed.
The original promise of the IA-64 architecture was to be able to ramp up the processor clock given the simplified instruction pipeline thanks to static scheduling, to overcome any losses associated with lack of hardware speculation. Furthermore, it was supposed to be more energy efficient since it wasn't spending hardware tracking dependencies in large instruction windows. In the end, it didn't seem to live up to any of that.
I don't think IA64 failure is an indictment of VLIW principles, though. I work with a VLIW processor regularly that doesn't seem to have any of the same problems. When measured against the promises, EPIC (the official name for IA64's brand of VLIW), was an EPIC failure, IMHO.
Posted Jul 27, 2012 21:06 UTC (Fri)
by marcH (subscriber, #57642)
[Link] (1 responses)
Not just hardware but self-modifying software too. This how "Hotspot" got its name. This looks like a more general compile-time versus run-time question rather than a hardware versus software one.
At this stage this thread could probably use an informed post about Transmeta?
Posted Jul 27, 2012 21:48 UTC (Fri)
by jzbiciak (guest, #5246)
[Link]
JITs can use dynamic profile information (both block and path frequency) and specific details of the run environment to tailor the code, but they can't respond at the granularity of, say, re-ordering instructions due to cache misses, for example. If you have a purely statically scheduled instruction set like EPIC, no amount of JIT will help reorder loads if the miss patterns are data dependent and dynamically changing. (Speculate/Check loads can help, but only so much.)
Speaking of the ultimate JIT platform Transmeta: Even Transmeta has some hardware patents for hardware memory access speculation mechanisms. I recall one which was a speculative store buffer: The software would queue up stores for a predicted path, and then another instruction would either commit the queue or discard it. Or something like that... Ah, I think this might be the one: http://www.freepatentsonline.com/7225299.html
Supporting 64-bit ARM systems
Supporting 64-bit ARM systems
Supporting 64-bit ARM systems