LWN.net Logo

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 19, 2008 17:02 UTC (Thu) by bobort (guest, #5019)
Parent article: The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

The number of SBOPs per FLOP has not been constant over time, and it isn't really obvious how
that's going to change in the future.  Remember you have to account for all of the control
infrastructure, which dwarfs the arithmetical logic on a modern CPU - thousands of
multiplexers, comparators, buffers, lookup tables, signal repeaters, etc. just to get the data
from memory to the ALU and back - and the trip gets farther every year.  I suspect there are
more like 100k SBOPs per FLOP in a typical CPU, but that's a guess.

Will things get more or less "efficient" over time?  It's very hard to say, there are powerful
forces pulling in both directions.  I'd say the biggest undetermined aspect is whether we can
weasel around Amdahl's law, or how much.  How parallelizable will the software running on the
ultimate laptop be?  If everything ends up being highly parallelized, microarchitecture is
likely to evolve towards simplicity and SBOPs per FLOP will probably stay roughly the same or
even come down a bit.  If nobody comes up with a way around Amdahl (which is a rigorous
theorem unlike Moore's "law"), single-thread performance will need to be continually
increased, and that will increase SBOPs per FLOP over time.


(Log in to post comments)

The Kernel Hacker's Bookshelf: Ultimate Physical Limits of Computation

Posted Jun 23, 2008 6:05 UTC (Mon) by jzbiciak (✭ supporter ✭, #5246) [Link]

I imagine an ultimate laptop would look more like a dataflow machine of some sort where
functional units are connected directly to neighboring functional units rather than time
multiplexing instructions through a smaller set of functional units with all the overhead you
describe.

After all, with that many compute elements, why wouldn't you?  I imagine data storage would
largely return to a delay element model too, just for its compactness.  Since compute elements
would be in contact with all the data elements at any given time, you could easily access all
of your data in parallel and do *something* with it, even if it's just moving it along.  The
sheer dynamics of such a dense system would mean that the data has to keep moving though.

Copyright © 2013, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds