Predicting the weather with Linux - FSL's cluster
As is the nature of government programs, the "prototype" program shed its drop-dead date and grew into something larger and more permanent. For a while it became the "Program for Regional Observing and Forecasting Systems" before morphing into its current incarnation as the NOAA Forecast Systems Laboratory. FSL's mission covers a lot of ground, but, in the end, they remain a technology transfer group, dedicated to developing and evaluating technology in the weather forecasting arena.
The VAXen are long gone, of course, replaced by high-end SGI servers and such. FSL took a different turn, however, with this announcement last September that it was installing a new, $15 million supercomputing system provided by High Performance Technologies Inc., also known as HPTi. This isn't just any supercomputer, though: it's a Beowulf-style Linux cluster. It is, perhaps, the first system of its kind. Government agencies have been piecing together clusters for years, but this may be the first that was purchased as a supported commercial product.
Greg Lindahl, senior architect at HPTi and leader of the FSL cluster project, invited me over to have a look. It wouldn't be like me to turn down a chance to see one of the biggest Linux systems on the planet, especially since it's in my home town...
What Jet is made of
The FSL cluster (called "Jet") currently consists of 276 nodes, organized into three long banks. The nodes are unmodified, off-the-shelf Compaq Alpha systems with 667 MHz processors and 512 MB of memory. The current installation is simply the first phase of the system; the second phase, due to be deployed by late summer, will double the number of nodes. Then comes the third phase where, according to Mr. Lindahl, it "gets really big." The third phase also involves replacing the nodes currently being used, on the idea that they will be considered somewhat slow by then.
All of these nodes are tied together by a Myrinet interconnect, which is alleged to allow every single node to be talking to another one at full speed simultaneously. The Myrinet system, by virtue of its speed, also eliminates the need to set up complicated network topologies between the nodes. Simple topology means that users do not need to worry about which nodes their job is running on, which makes their life easier. Run-time variance on this system runs at about 2% - a fraction of what can be encountered on clusters with complicated networking.
Computing in the atmospheric sciences deals in massive amounts of data. All the processing power in the world is of little use if the data can not be shoveled in and out quickly enough. The Jet cluster has a subset of nodes which are charged with providing disk and tape access to keep the rest of the processors well-supplied with data to crank through. Files are stored using the proprietary CentraVision file system which is able to sustain high bandwidths serving large files. There is also a tape store capable of holding 70TB of data; it provides 20 drives, each of which is able to sustain 5 MB/sec. This system can move a lot of bits.
The software side
The nodes in the Jet cluster run Red Hat's Alpha distribution, almost straight out of the box. They have applied the NFSv3 patch, and added a module for the Myrinet networking; it is otherwise a stock system. Low-level networking is done with MPI, though they have a version which has been hacked to work well with Myrinet.
The interesting software, of course, is at the higher levels. Numerical weather prediction involves dividing the world (or a subsection thereof) into many grid cells, then cranking through a number of really hairy partial differential equations on each cell. With suitably clever programming (to deal with interactions between cells), this is the sort of job that was just meant for clustered systems.
In the case of Jet, the software of interest includes the MM5 model produced by Pennsylvania State University and the National Center for Atmospheric Research, and the Rapid Update Cycle (RUC) model written by FSL and used by the National Center for Environmental Prediction. RUC is used daily for aviation weather forecasts; the Jet cluster is occasionally been called on to do RUC runs in a backup role when NCEP's systems are not functioning properly.
The business of Linux clusters
The Jet cluster is an important step toward the legitimization of commercial Beowulf clusters. The "big iron" business is a hard one to break into, and Linux-based clusters have not had the track record and high-profile deployments to be allowed to play on that field. After all, a manager who has finally gotten funded to buy a multi-million dollar system is not going to be inclined to take chances. Such people want security.
HPTi, with this deployment, has gone a long way toward providing that security. The Jet cluster is a successful system that is busy solving real problems. Linux clusters are increasingly a proven alternative for high-end computing problems; they are also a readily-available commercial product. Beowulf clusters are going to be a big business.