LWN.net Logo

Multicores are admission of defeat - and they are here to stay...

Multicores are admission of defeat - and they are here to stay...

Posted Apr 27, 2008 21:14 UTC (Sun) by khim (subscriber, #9252)
Parent article: Interview with Donald Knuth (InformIT)

For the last quarter of century hardware designers are fighting supreme adversary. Basically as CPUs are going faster "hot spot" where computation can actually occur is going smaller and smaller. In contemporary CPU it's tiny pimple on a die - and there are no way to make it bigger! First caches were introduced, then a lot of speculations, etc. Finally we've reached the point where you can not actually speed up typical programs by going from 50mm2 to 100mm2 - and mechanical and marketing limitation mean that "normal" CPU must have die of 100mm2-200mm2. At this point there are nothing left except introduction of multicore architecture.

So no, multicores are not fad and they are not going away: if someone will invent clever new way to speedup linear program another 50% by adding hundred million transistors - we'll go from four cores to two - and then back to four. Number of transistors on die are growing but number of transistors in "hot spot" is essentially fixed - multicores are admission of defect: hardware people can not give us anything else. You can choose one core, two cores, four cores and so on - but they all will be of more-or-less the same speed! If you can use them - well, it's great, if you can not - tough look.


(Log in to post comments)

Multicores are admission of defeat - and they are here to stay...

Posted Apr 28, 2008 4:46 UTC (Mon) by nevyn (subscriber, #33129) [Link]

Yes, it's going to be really hard to make HW faster in the future. All the HW people are saying that the big obvious gains have been found, and there are no more to come (for serialized instructions). Indeed there hasn't been a must have CPU upgrade in the last year or two, although they have got faster and in some cases the commonality of dual cores has been a boon. However that doesn't mean you can wave a magic wand and say "all software will be multi-threaded, and run correctly".

Now on the other hand we've had the possibility of doing multi-tasking for at least 20 years (via. fork() + large Unix boxes), mmap with a unified cache might be a bit more recent. pthread_create() is a bit more recent still, and having it all be accessible via Linux is even more recent. But it's fair to say that you could "fairly easily" get access to 2 CPU boxes 10 years ago.

But with 10 years lead time the SW has basically made zero progress, it's still just as hard and just as error prone to write C+pthread code. Now maybe, due to dual core by default, there will be some breakthrough in the next 10 years ... or maybe we'll all magically start (re-)writing in erlang/whatever. But personally I doubt it, I find it much easier to believe that the answer from the SW people will be "128 core CPUs are irrelevant, start getting used to things not getting faster".

The combustion engine didn't get significantly faster forever, and the world didn't end ... I imagine the same will be true here.

128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!

Posted Apr 28, 2008 5:54 UTC (Mon) by khim (subscriber, #9252) [Link]

We had no progress for 50 years with multi-core algorithms because there was no need: hardware people did most of the work. Now they've stopped doing it. And of couse there huge number of programs which can benefit from 128 cores - at least in theory. Which ones will be rewritten depends on speed of said program: sure, you can rewrite ls to be multicore-aware but in real world few cases of ls usage will be accelerated so no, there are nothing to gain, but with convert (from ImageMagick)... it's different story. There are tons of programs which can (and will) be multithreaded and more still which are not really needed but will be used anyway (how many people need games? yet today's GPU is mostly result of this pseudo-need).

It's just for a long time software people had the luxury of faster and faster CPUs every few years and had no real pressing need to use SMP. Today - they are forced to use SMP. Different situation and it'll lead to different outcome.

128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!

Posted Apr 28, 2008 6:44 UTC (Mon) by drag (subscriber, #31333) [Link]

Some things, like graphics, are natural things to parallelize. Your dealing with multiple
tiles in a mpeg video or some number of discrete mathematical functions be used over and over
again on a large number of 3D objects or whatnot. 


then there are other things, like imagemagick you mentioned, that can be setup to run in batch
mode were you have lots of images or calculations to do over and over again on a number of
images or whatnot. With that there probably isn't much need to do any multithreading.. Just
fork it for how many images your working on. Batch programming at it's best.

Even for static renderings with raytracing this is the way to go.. the multitheaded portion
would be relatively small and then just fork it and stitch the images back together after it's
been rendered. 

For lots of this stuff it would make sense to have to do multithreading if your in a
Windows-only world, but with Linux the overhead of fork is much smaller and leads to simpler
and less buggy programs.

And of course there is multitasking and all that. Even then how many things do you have going
on at once? 3 things? 4 things?  How is that going to translate to big savings when your
dealing with a entry level Dell desktop computer with 8-16 cores?

Do you really want to see something with the complexity of Firefox or OpenOffice.org have it's
code base be refactored to support multi threaded programming? They have a tough enough job
now as it is trying to maintain their code base without all that extra overhead. 

How many programs that you use on a daily basis that could benefit from it?

I did see big benefits from moving from One CPU to Two. I always wanted to do that, but I
couldn't justify the expense of a SMP board and such for home use. 

From Two CPUs to Four will probably see some benefits.

For rendering games and doing other stuff integrating the GPU into the CPU die will help. So
your going to take advantage of maybe another 4 cores. 

If people start doing crazy stuff with raytracing and graphics then I can see taking advantage
of up to 16 cores... Plus it'll make video processing much less buggy since we can hopefully
get away from the proprietary hell that Nvidia and (historically) ATI have driven us into.

I donno. 

Intel and friends are talking about _EIGHTY_ cores. (Maybe they'll use 4-8 for normal tasks
and then the rest for graphics and a few specialized cores for giggles and grins?)


It seems that they want programmers to start using parallel processing on even basic tasks. I
think that this is a tall order given that, in general, software is in a such a lousy state as
it is right now. I mean it's not like even very advanced programmers have perfected single
threaded application programming. 

128 core CPUs are irrelevant, start getting used to things not getting faster? Wrong answer!

Posted Apr 28, 2008 7:29 UTC (Mon) by ekj (subscriber, #1524) [Link]

But the interesting question isn't what percentage of -all- loads can be paralellized. The interesting question is what percentage of the loads that MATTER can be parallellized.

The loads that -matter- are those where you would like to do more, but are limited by CPU. Or where you'd like to do what you do today quicker.

Offcourse in principle you -always- want to go quicker, but if "ls" already spends 0.042s waiting for I/O and 0.002s doing CPU-work, then it really is of no practical importance if those 0.002s could be efficiently 128-way parallellized.

I think that -most- loads that matter can be parallellized easily. In some cases TRIVIALLY. Can you give a few examples of real-world cases where waiting for the CPU is a real concern, but the problem is not parallellizable ?

I know that the things where I spend time waiting for my CPU are easily parallellizable:

  • Transcode a movie
  • Encrypt/decrypt large amounts of data.
  • Pack/unpack images (PNG, JPG, Raw)
  • FLAC/Ogg-code wav-files
  • Graphic performance, particularily 3D.
But really, mostly when I'm waiting for my computer, I'm waiting for it to do IO. Not waiting for it to compute. Thus advances in IO, such as seek-less solid-state disc and faster internet-links are much more relevant to me than faster CPUs. I suspect this is true for most peronal computer-users.

The lightspeed-limit bites for IO too offcourse, particularily the type where I'm doing IO off some device in Australia.

Multicores are admission of defeat - and they are here to stay...

Posted Apr 28, 2008 11:28 UTC (Mon) by smitty_one_each (subscriber, #28989) [Link]

It struck me that the distinction between the hardware and the operating system is blurring a
bit.
If the kernel is judiciously sprinkling processes over multiple cores, then it seems hasty to
decry this industry direction, even if the human brain has trouble grasping how to span a
single process across multiple cores: so what?

Multicores with a powerful language rule

Posted Apr 28, 2008 16:08 UTC (Mon) by ncm (subscriber, #165) [Link]

You can say we've made zero progress only if, in fact, you pay no attention to the progress
that has been made.

VSIPL++ transparently parallelizes array and image processing operations, just about
completely hiding its use of MPI libraries underneath.  It scales linearly with the number of
processors.  It does this using the very powerful abstraction mechanisms that only became
widely usable in Standard C++, and are still available only in C++ and in a few researchy
languages like Haskell and OCaml.

The lesson here is that progress on fundamentally difficult problems comes not just from hard
work (and the developers of VSIPL++ implementations have certainly worked hard) but from
fundamentally powerful language mechanisms.  Most problems are easy, and popular scripting
languages make it quick to deal with easy problems, but hard problems fall only to powerful
languages and people skilled in using powerful languages.  (Reader should note that
"object-oriented" features were not especially helpful in solving this problem.)

Multicores are admission of defeat - and they are here to stay...

Posted Apr 28, 2008 7:17 UTC (Mon) by ekj (subscriber, #1524) [Link]

True. Lightspeed means that at 1Ghz, no signal can travel more than about 30cm. At 3Ghz, no
signal can travel more than 10cm. And switching-times aren't instantaneous, so the practical
limits on die-size are smaller.

I'd like to point out though, that the *physical* limit on serially dependant computation is a
*volume* and not an *area*.

If lightspeed allows a 100mm^2 (10mm by 10mm) then lightspeed also allows a cube of 10mm by
10mm by 10mm, or 1000mm^3.

There are reasons we don't do it that way, both construction and cooling would be a real
bitch. It's hard enough dealing with the multiple layers we have on todays CPUs, making an
awful LOT more layers would -not- simplify design.

Volume? Not yet...

Posted Apr 28, 2008 11:34 UTC (Mon) by khim (subscriber, #9252) [Link]

It's hard enough dealing with the multiple layers we have on todays CPUs, making an awful LOT more layers would -not- simplify design.

Before we'll do this we'll need technology for at least two layers. Currently we are using one and only one transistor-layer in our CPUs. Sure there are talks about 7, 9 even 15 layers - but these are metalization layers - all transistors (which do actual work) are in single layer...

You can easily glue many dies together - but again, it's multicores, not bigger hot-spot...

Volume? Not yet...

Posted Apr 28, 2008 11:56 UTC (Mon) by ekj (subscriber, #1524) [Link]

True.

My point was just that if you're only concerned with PHYSICAL limitations, like lightspeed,
then the limit is a volume, and not an area.

I agree with you we are nowhere near being technically capable of designing and manufacturing
a true 3d cpu (one that uses the entire volume of say a 1000mm3 to do computation)

But this is a technical limitation, and one that can, atleast in principle, be overcome. And
not a physical one that asfar as we know is final -- like lightspeed.

Multicores are admission of defeat - and they are here to stay...

Posted Apr 28, 2008 16:34 UTC (Mon) by iabervon (subscriber, #722) [Link]

Consumer systems aren't doing single linear tasks any more. There's generally a bunch of
intermittent stuff going on throughout the session, and a single task that the user's
attention is on, whose reaction time determines the quality of the user experience. For this
sort of workload, adding cores helps up to the maximum number of possible
simultaneously-running tasks. That is, people will watch movies while waiting for their
spreadsheets to recalculate, and they notice how long it takes and whether the movie is
smooth.

That means that, while O(N) cores isn't yet useful for most people, the small constant number
of useful cores is more than one. I expect that there will be demand for about 8 cores when
software developers get used to being able to do CPU-hungry background tasks without impacting
the interactive thread at all.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds