LWN.net Logo

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Royal Pingdom analyzes the results of a recent supercomputer survey. "Operating systems on supercomputers used to be custom-made affairs, but this has changed. These days, Linux has become a popular choice for supercomputers. But how popular? You may be surprised. Top500.org maintains a list of the fastest supercomputers in the world. A new list was published yesterday (it happens twice a year), so we took the opportunity to go through the list and find out what OS the top 20 supercomputers are using."
(Log in to post comments)

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 18:44 UTC (Wed) by jgg (guest, #55211) [Link]

This has been going on for a number of years.. A lot of supercomputing relates software is pretty much developed exclusively on Linux.

MS has been trying to get into the smaller scale side of this market with their HPC server, but it hasn't really got the take up I think they would like. Several of their 'deployments' ran Windows long enough to get in the top 500 and then were converted to Linux to get actual work done.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 18:45 UTC (Wed) by tzafrir (subscriber, #11501) [Link]

references, please

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 20:05 UTC (Wed) by drag (subscriber, #31333) [Link]

Micorosft is trying to create HPC OS that is business friendly to take clusters and apply it to business workloads.

They've been trying to do this for some years now.

Just google for it.
http://www.google.com/search?hl=en&safe=off&q=win...

It's very obvious that they are trying to do this for a while now.

And Microsoft has been trying for years to prove that NT is the equal to Linux and Unix for these sort of applications. They've been able to keep at least one Windows cluster in the 'Top500' list since 1995 or so. The NT 4 stuff was so poor that people couldn't barely keep a cluster running long enough to do the benchmark, much less do any useful work. Nowadays I am sure that they are as stable as anything else.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 20:34 UTC (Wed) by jgg (guest, #55211) [Link]

Indeed. Take the 'magic cube' system in the top 20. This is a very strange configuration. 30k cores, InfiniBand and Windows?! From Dawning no less?

The largest 'experimental' Windows cluster I've heard of is not even that big. Further IB drivers on Windows are hugely immature and much worse than their Linux counterparts.

The kicker is that Dawning makes their own (Chinese designed) CPUs for other systems in their lineup. Those systems run Linux. Other Dawning systems at the Shanghai Supercomputer Center use Linux. The Windows cluster is entirely out of character.

In Linux clothes this system would be basically significantly Chinese designed, and open. With Windows they'd cede a big chunk of control to Redmond and other US based software providers. Scuttlebutt is that SSC demanded a Windows system - why would they do this? All their existing systems are Linux. They have in-house Linux skill. They have Linux apps.

A similar story is seen in several other Windows Top 500 listings, if you look closely enough.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 19:09 UTC (Wed) by leoc (subscriber, #39773) [Link]

This is the year of Windows on the Supercomputer!

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 22:34 UTC (Wed) by sbergman27 (guest, #10767) [Link]

Imagine a Beowulf^WHPC cluster of blue-screens! :-P

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 19:12 UTC (Wed) by andrel (subscriber, #5166) [Link]

One can make a case that operating systems on supercomputers remain custom-made affairs. One reason Linux swept this market is the relative ease of customization. A famous early example is the original Beowulf project, which rewrote parts of the networking drivers to get better performance and implement channel bonding.

(Another killer feature is the GPL. No need to pony up more licensing fees when you bring another thousand nodes online.)

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 19:58 UTC (Wed) by drag (subscriber, #31333) [Link]

Sorta...

With large clusters of computers the application is specifically designed to match the requirements of the users and the design of the cluster. They use their own IPC for each process, usually using some sort of MPI library, and that's about it. Maybe some sort of distributed file system like Lustre or whatever.

The best thing that the OS can do in a situation like that is just stay out of the way and let the application do it's job. The load per node is going to be very simple and the OS is simply required to do nothing but provide efficient access to network I/O, memory, and such.

With Windows this is very difficult because when you setup Windows you always get the 'WHOLE' Windows. With Linux you can setup a simple system with a Linux kernel and not much else... it's very easy to get a system that 'stays out of the way'.

Microsoft is changing the way that they ship Windows in order to compete with Linux for things like this. They are going to provide the equivelent to a simple stripped down NT environment and should end up being much more competitive in the future.. but right now I don't see any reason why anybody would want to move away from Linux...

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 22:00 UTC (Wed) by ebiederm (subscriber, #35028) [Link]

Linux also has an advantage in large systems because when you have problems
you have all of the source code allowing you to track down what is going on,
and fix it.

In large systems like this you always see strange or weird problems. Things that on a small scale don't happen frequently enough to track down or reproduce are trivial to reproduce at 1000+ machines running in lockstep.

not really running Linux on many of these

Posted Jun 24, 2009 20:05 UTC (Wed) by stevenj (guest, #421) [Link]

Note that all of the ones listed as "Linux (CNK, ...)" and "Linux (UNICOS, ...)" are not running the Linux kernel on the compute nodes. CNK stands for Compute Node Kernel, and is IBM's own custom (non-Linux) kernel running a stripped down half-POSIX OS on the compute nodes. Linux on these IBM Bluegene systems is only used for the front-ends and I/O nodes. Similarly for the ones listed as "Linux (UNICOS, ...)", which are running Cray's version of the same idea (their custom compute-node kernel is called Catamount).

Correspondingly, these systems are a PITA to use because all software that runs on them needs to be cross-compiled, and the target systems aren't even close to a full Unix/POSIX environment. This is especially problematic with newer scientific software that might use a Python front end for scripting, etcetera (as opposed to the old style of one monolithic Fortran program), because it can be challenging to cross-compile complex full-featured software for exotic non-POSIX environments. Not to mention that 99% of scientific-computing users had never heard of cross-compiling, in my experience, and don't understand its implications. (That's probably why Cray and IBM were able to sell this stupid idea.)

Of course, users of supercomputing have been accustomed to quirky systems for many years. Compared to the old 1980s and 1990s machines, systems running GNU/Linux on everything (front end and compute nodes) were a joy to use; recent crops of IBM and Cray supercomputers that forced you to cross-compile were a huge step backwards in my opinion. They've been getting a bit better with recent systems that at least use a customized Linux kernel on the compute nodes, but last I heard the compute nodes were still different enough from the front ends that you still need to cross-compile.

not really running Linux on many of these

Posted Jun 24, 2009 21:01 UTC (Wed) by joib (guest, #8541) [Link]

Similarly for the ones listed as "Linux (UNICOS, ...)", which are running Cray's version of the same idea (their custom compute-node kernel is called Catamount).

IIRC most Cray XT sites have upgraded from Catamount to the Linux compute kernel (CNL). Basically the dual core support on Catamount was a big hack, and for the quad cores, or 2xquad nodes nowadays it's just getting worse, and Catamount doesn't support threads (OpenMP). Also, CNL has a much more advanced virtual memory system.

They've been getting a bit better with recent systems that at least use a customized Linux kernel on the compute nodes, but last I heard the compute nodes were still different enough from the front ends that you still need to cross-compile.

Yes, you have to crosscompile, and no shared libraries either. Using python, as you mentioned, is possible, though tedious. Basically you have to crosscompile a custom static python interpreter containing all the C extensions you need (numpy etc.). See e.g. GPAW installation instructions (GPAW is a massively parallel electronic structure simulation program implemented in Python/C/MPI).

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 20:39 UTC (Wed) by lindahl (subscriber, #15266) [Link]

Do not be fooled by the one running Windows:

http://tyan.com/newsroom_pressroom_detail.aspx?id=1289

Microsoft has pulled stunts before where they paid people to advertise that their supercomputer runs Windows, when in fact it dual-booted and ran Linux almost all of the time. The press release above says that the top Windows system in the Top500 is dual-boot. Oddly enough, other press releases leave this detail out. What does it actually run most of the time?

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 25, 2009 16:57 UTC (Thu) by ballombe (subscriber, #9523) [Link]

Direct quote from your link:

" The SSC supercomputer system running on Linux and WHS 2008 operation system will achieve a theoretical peak performance of approximately 230 TFlop/s and beyond 70% of Linpack efficiency. "

Added to all the reason why this deal is fishy: Why would a communist government be the *only one* to choose a HPC OS from a big US software company ? Especially when using in-house companies for everything else ?

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 25, 2009 21:49 UTC (Thu) by drag (subscriber, #31333) [Link]

Yes.

This particular cluster was originally suppose to show off the longsoon proccessor, which is a 64bit MIPS processor. It can't support Windows.. it only supports x86 through hardware-accelerated emulation with a 30% performance hit.

So when the design started they all of a sudden decided MS was a requirement and switched from their own processor design to AMD.

From wikipedia, btw.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 20:54 UTC (Wed) by jengelh (subscriber, #33263) [Link]

I wonder why the global Google grid is not listed on Top500. It would displace #1 for years to come.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 24, 2009 21:01 UTC (Wed) by leoc (subscriber, #39773) [Link]

What is its LINPACK score?

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 25, 2009 0:34 UTC (Thu) by gdt (subscriber, #6284) [Link]

Google have never submitted an application to the Top 500. Given Google's habitual secrecy, I'd be surprised if they did provide the information requested in the application form.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 25, 2009 5:35 UTC (Thu) by jgg (guest, #55211) [Link]

More critically their LINPACK score wouldn't be very good (at least relative to the number of cores they have). By HPC standards Google does not have a sufficiently high performance network. It is no accident that InfiniBand commands a considerable share of the list. 30% overall, >50% of the top 10, and the *ONLY* open interconnect in the top 10 at all.

For instance the first Ethernet system (well, actually an Ethernet/IB hybrid) is number 16, with 30240 Nehalem 2.5GHz cores gets only 168TF, while number 10, with QDR IB gets 274TF out of its 26304 2.8GHZ Nehalems.

10GbE is not so bad, realy

Posted Jun 25, 2009 17:13 UTC (Thu) by khim (subscriber, #9252) [Link]

Google uses 10GbE in clusters and so can easily reach top500, but to top it? Unlikely - their systems are designed for completely different use-case...

500'000 cores in hundred datacenters with non-trivil topology is not very LINPACK-friendly material...

10GbE is not so bad, realy

Posted Jun 25, 2009 17:26 UTC (Thu) by jengelh (subscriber, #33263) [Link]

LINPACK, well… I mean, if it were to run “properly segmented” programs that do not rely on “online operation”, such as BOINC and the well-known Seti@home, would make top500 look different.

10GbE is not so bad, realy

Posted Jun 25, 2009 18:30 UTC (Thu) by joib (guest, #8541) [Link]

Yes, and?

If anything, LINPACK is frequently criticized for nowadays being a rather poor representative of real supercomputer applications, that in most cases emphasize network and memory subsystem performance to a much higher degree than LINPACK.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 26, 2009 10:56 UTC (Fri) by trasz (guest, #45786) [Link]

What a great success - leadership in the market that noone really cares about. Face it - there is no market for operating systems for TOP500-class hardware, so the only parties interested in developing an operating systems for this machines are the hardware manufacturers, who need to run something. Also, workload is very different from the typical server workloads, so the required optimizations are different as well.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 26, 2009 16:35 UTC (Fri) by rriggs (subscriber, #11598) [Link]

I'm not sure why you wish to so easily dismiss this achievement.

And your comment doesn't really stand up to scrutiny. It doesn't explain why even Sun runs Linux on their supercomputers -- they have their own OS, Solaris, that they can customize. There is only one Solaris entry and it was fielded by Fujitsu. Nor does it explain why BSD, another OS easily customized by HPC manufacturers, has only one entry in the list.

This is a total domination of a field that used to have numerous Unix entries scattered throughout. It's a reflection of the complete domination that Linux has in the HPC market. It is a truly amazing feat.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jul 1, 2009 10:14 UTC (Wed) by trasz (guest, #45786) [Link]

Sun used Linux, because there is no point in customizing Solaris to do this task - Linux has the neccessary features, customers are used to it, and such an investment into Solaris wouldn't pay back anyway, because modifications needed wouldn't benefit other tasks, like general purpose serving. That's exactly what I'm talking about - there is no market for operating systems for TOP500, so nobody cares about creating competition to Linux.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 26, 2009 18:39 UTC (Fri) by mangoo (guest, #32602) [Link]

HPC, with its size of around 30 billion US dollars in 2008 (hardware, software, storage and support) - and you call it a "market no one really cares about"?

Don't be ridiculous.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jul 1, 2009 10:11 UTC (Wed) by trasz (guest, #45786) [Link]

I'm talking about the market for _operating systems for TOP500_, not about TOP500 itself.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 26, 2009 22:46 UTC (Fri) by jgg (guest, #55211) [Link]

That is really underselling what is happening here. This is fundamentally a complete success of the open source collaborative model. Ie why did Sun choose Linux for Ranger? Simply because of the incredible work and effort that other companies, the Labs and so on have put into making Linux actually work at this scale. The pace of technology change in the HPC world is very fast, and quite simply no single vendor can supply a complete software stack and continually rev it every 2 years to keep up.

At the HPC level, you take the latest and greatest hardware platform and Linux is the first (sometimes even ONLY) OS to fully support all the hardware in the box. Many companies are involved in making this happen and it is driven to a large degree by demand for Linux based compute resources from the customers.

One unique thing about the HPC space is the major customer demand is very strongly aligned with open source principles. The major US Labs that run the biggest machines *demand* from their vendors open source solutions, period full stop. They do this entirely out of their own self interest, after decades of vendor abuse and neglect.

If this model can work in the HPC space, it can work in other spaces too. We see strong similarities in the embedded space as well.

The triumph of Linux as a supercomputer OS (Royal Pingdom)

Posted Jun 29, 2009 4:15 UTC (Mon) by dmag (subscriber, #17775) [Link]

Remember, today's supercomputer is tomorrows PDA.

Copyright © 2009, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds