Another perspective that several people have advocated (e.g. Burton Smith), is that in the future if we can have enough fine-grained parallelism, then latency can be hidden. (e.g. you do one instruction from thread 1, the one instruction from thread 2, then... by the time you get back to thread 1, it is many cycles later and any memory operation will have completed). This was the underlying idea behind things like Intel's Hyper-threading stuff (which does it on a small scale) or the Tera MTA supercomputer (on a large scale).
One problem, of course, is extracting that much parallelism from most programs is still rather hard. There are very few programs out there that are ready to use 100 or 1000 processors.
(Parallelism is actually a huge problem on the computer-science horizon. Processors aren't getting much faster any more, and the only way to get increased performance is by exploiting parallelism, but writing parallel programs is still the province of an elite few.)
Copyright © 2017, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds