LWN.net Logo

The largest Linux cluster

Linux NetworX has sent out a press release proclaiming the sale of "the largest and most powerful Linux cluster" ever. This system has been sold to Lawrence Livermore National Laboratory, and should be operational this fall. This cluster, which will employ 1920 2.4-GHz Intel Xeon processors, is expected to be one of the five fastest supercomputers in the world.

LWN has long maintained that Linux-based clusters were going to take over the supercomputing field. The economics of clusters built with commodity hardware and free software are simply too good to ignore. The biggest impediment to cluster World Domination, perhaps, has been the "some assembly required" nature of Linux clusters. Supercomputers are, in general, not low-maintenance devices, but Linux clusters have tended to require even more than the usual amount of work. To be truly successful, Linux clusters must become polished, easy to manage products.

Linux NetworX, like other cluster vendors, has long understood the need for more refined cluster products. Some of the features of their current cluster offerings are worth a look as an indication of how far Linux clustering has come. Linux NetworX is certainly not the only vendor offering these sorts of features; in the context of this sale, however, they make a good example.

Early Linux clusters consisted of large numbers of beige boxes with even larger numbers of cables between them. Modern cluster vendors have long since moved past that mode, which is wasteful of energy, space, and system administrator time. In this case, Linux NetworX is employing its "Evolocity II" product, which crams two processors into a "sub 1U" rack space. Throw in easy interconnects and the basic job of plugging the cluster together becomes much easier.

Then, throw in the "ICE Box," a small, Linux-powered box which performs console management, power management, and temperature monitoring for a set of cluster nodes. Among other things, this box allows a (remote) administrator to power down sets of nodes when they are not in use; when your cluster has thousands of nodes, turning off unneeded nodes can yield significant power savings.

What about when you want to bring those idle nodes back up to get some work done? One of the interesting things that Linux NetworX has done is to work with the LinuxBIOS project. LinuxBIOS replaces the regular BIOS on the motherboard, allowing a system to boot into a Linux kernel in as little as three seconds.

Finally, there is the issue of how one manages a cluster with almost 2000 nodes. The Simple Linux Utility for Resource Management (SLURM) is a cooperative project between Linux NetworX and LLNL; it gives administrators the ability to control access to groups of processors in an easy manner. SLURM appears to be in an early state of development at this time; the plan is to release it under the GPL at some point.

All of this, of course, leaves out one crucial part of the problem: making the customer's applications work on a clustered system. Parallelizing a program so that it makes the best use of a cluster is a hard task. There is still no easy way around this one. A cluster-optimizing version of gcc remains the stuff of dreams at this point.

Even with the programming challenges, Linux clusters are earning an increasing amount of respect in the high performance computing world. They are getting steadily more powerful, easier to buy, and easier to run. Brad Rutledge of Linux NetworX tells us: "We anticipate this is the first of many Linux clusters that will measure as top supercomputers within in the next year." Things look likely to turn out just that way.


(Log in to post comments)

The largest Linux cluster: What do we need for parallelization?

Posted Jul 18, 2002 18:31 UTC (Thu) by BogusUser ((unknown), #2100) [Link]

I'm not the most facile compiler engineer in the world, but I do well understand how they work. The problem of paralellizing the compiler was done way back in VMS 5.0 where all compilers had an option to specify the maximum number of processors in a VMS cluster. Then at runtime, the executable honored a logical name which specified the number of processors to use for that instance of the program. It was a pretty good system.

Copyright © 2002, Eklektix, Inc.
Comments and public postings are copyrighted by their creators.
Linux is a registered trademark of Linus Torvalds