LWN.net Logo

Frankly I Don't See Either Side As the Future...

Frankly I Don't See Either Side As the Future...

Posted Oct 21, 2005 1:26 UTC (Fri) by emkey (guest, #144)
Parent article: Ballmer: Microsoft to go after Linux strongholds (ZDNet)

What OS does the worlds fastest super computer run? (BlueGene/L) Hint, its not Linux or Microsoft Windows. Yes, some portion of the system does in fact run a Linux, but the actual computations is done on a very stripped down proprietary microkernel. And that in my opinion is the future.

Large complex operating systems have far to much overhead and complexity to scale well to the huge CPU counts that we're moving towards. Why you ask? Here's one example...

Imagine you have a problem where every CPU has to complete some bit of work in lockstep. You can't move from one timestep to another until every single CPU has completed its work. In this scenario you are only as fast as your slowest CPU. Got 399 2.5GHz CPU's and one 1GHz CPU as part of your job, congrats, you only get 400x1GHz of performance out of your job.

Now imagine a complex operating system with cron and other daemons running. How many interupts per second does your average Linux system have to deal with? And how close to in sync are hundreds of nodes likely to be, even if all their clocks are syncronized? And what if one of them has a failing hard drive or some other bit of hardware? All these things are going to tend to introduce noise that will reduce performance in hugely parallel computers and codes. As we steadily move to larger CPU counts these issues become more and more of a problem.

The best performance and scalability will be found by way of minimizing the amount of hardware that can go bad and simplifying the OS down to the barest possible minimum to support running codes. Windows is horribly poorly suited to this, and Linux isn't nearly so far ahead as many of us would like to think.

There certainly is an important nitch for Linux in all this. You need a more full featured OS on front end nodes where code compilation and data reduction go on. You need a more powerfull OS to deal with much of the book keeping and similer tasks.

Of course a very scaled down and simplified Linux kernel could be created that would be easily capable of handling this sort of task.


(Log in to post comments)

Frankly I Don't See Either Side As the Future...

Posted Oct 21, 2005 2:18 UTC (Fri) by Mithrandir (subscriber, #3031) [Link]

s/nitch/niche/
s/powerfull/powerful/

:)

Frankly I Don't See Either Side As the Future...

Posted Oct 21, 2005 6:07 UTC (Fri) by emkey (guest, #144) [Link]

Yeah, I should have spell checked before posting but I was in a hurry. I probably should have spent more time on my English homework and less time fooling around with computers when I was young. :-)

Frankly I Don't See Either Side As the Future...

Posted Oct 21, 2005 6:26 UTC (Fri) by joib (guest, #8541) [Link]

AFAIK the Cray XD1 Linux/Opteron "turnkey cluster" does contain some changes to the scheduler to ensure that parallel processes execute in lockstep. Or actually, it's implemented the other way around, i.e. the non-parallel housekeeping processes get to run at only a few specified time slots. See page 8 on http://www.cray.com/downloads/dhbrown_crayxd1_oct2004.pdf .

Frankly I Don't See Either Side As the Future...

Posted Oct 21, 2005 14:29 UTC (Fri) by emkey (guest, #144) [Link]

That solution just tends to point out the fundamental flaws in using a complex OS to run highly parallel jobs in my opinion.

There is a real tendancy for clever people to want to solve problems. This isn't always a good thing in that we often come up with "bag on the side" solutions to deal with issues that are really a result of having what is in context a poor design.

I always try to take a step back in these situations and look at the big picture. Often times things appear in a whole new light when I do so and more productive paths forward present themselves.

Frankly I Don't See Either Side As the Future...

Posted Oct 24, 2005 4:55 UTC (Mon) by dvdeug (subscriber, #10998) [Link]

In what sense are we heading towards huge CPU counts? Intel had a system with 1024 286s out when 286s were the going thing. At a smaller level, my old university replaced a 24 CPU Dynamix machine with a 2 CPU Sun machine some time ago.

Perhaps I'm missing something obvious, but what BlueGene does is esoteric, and I don't see any indications that the world is going to systems where a large number of CPUs have to be kept in tight lock-step. I don't see that most problems are best solved by such low-level parallelism; a lot of parallel problems have a certain minimal chunk that can takes hundreds, thousands or even millions or billions of clock cycles.

Frankly I Don't See Either Side As the Future...

Posted Oct 24, 2005 5:39 UTC (Mon) by emkey (guest, #144) [Link]

In the world of scientific computing many problems lend themselves well to massive parallelization. And as you add more cpu's you can increase the problem size and/or decrease the scale at which you simulate. More CPU's means new problems can be solved. It is my understanding that scientists could easily make use of a system ten or even twenty times as fast as BlueGene/L.

You do touch on an important issue though. Some problems may take a second or less per clock step. Others might take many times as long. The quicker the clock cycle the more likely a particular application is to benefit from the sort of approach that IBM took with BlueGene/L, as random perturbations are going to have a much larger impact on performance in that scenario.

IBM has announced the sale of additional systems beyond the original sale. I doubt these systems are cheap, so at least some people see value in them.
Though you are correct in that not every problem is suitable for such an architecture.

Copyright © 2012, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds