Donald Becker's keynote
Here is a brief summary of his talk.
Most people know me from my work on the Linux kernel, particularly in the area of network driver support. What not everyone knows is that the reason I did so much work for the Linux kernel was to support high performance clustering projects, first for NASA and then, in 1998, for Scyld Computing, the company that I started.
The slowdown of the US economy has been very sudden. I think it is a good thing that Linux is no longer the hot topic; it is just an accepted part of what people do. We'll move forward more successfully now.
10 years ago, vector supercomputers ruled the supercomputer world. Linux would have been laughable to challenge those platforms. Today, it is an accepted fact. Today, I consider it one of the two most popular operating systems.
We succeed just by existing, by changing the commercial model of computing. We have created an Open Source infrastructure world.
Commodity supercomputers are numerous and growing. They are likely to be the primary supercomputing platform for the next five years. Most of them are Linux-based and most of them are Beowulfs. We hope that they will be Scyld Beowulfs.
Beowulf is a trademark controlled by Linux International. It is defines as scalable performance clusters based on commodity hardware, Open Source infrastructure software and a private system network.
Linux has been dismissed by some as being a "Unix clone", as if that were a bad thing. I think it is a good thing. Unix was well designed and has strong academic roots. Unix has been around for thirty years and has lasted. Don't pick an unproven model to change the entire world. Model on a successful system.
Beowulf, while based on previous ideas, is clearly a significant departure from what has been done in the past. A reimplementation, a rethink, but presented the same architectural model.
Scyld Beowulf is a cluster operating system. It is a CD intended for deploying cluster applications from 32 to 64 nodes. We are past research and development; we are focused on deployment. It will work with little clusters or larger clusters, but this is the sweet spot in the market. We focus on integration and administration, to make it easy to use. We released the full product at LinuxWorld New York in January.
The model for Scyld Beowulf is install once, execute everywhere. It only requires an install on the master server of the cluster. This eliminates the risk of version skew, which was one of, if not the largest, problem found in early clusters. With 32 copies of Linux, upgrading one machine or missing one machine during an upgrade could cause problems. The bigger the cluster, the bigger the problem. Workarounds existed, but they were clumsy. This was the number one administration problem.
Scyld Beowulf uses the installer based on Anaconda, which comes from Red Hat. We chose Red Hat 6.2 because it is the most commonly used distribution out there.
The slaves don't get a full install. They only need about a 1MB install. We don't give them each a name; instead we use the MAC address since it already exists as the unique identifier. Slave broadcasts the MAC, the masters chooses whether or not to include it.
Network booting is ideal, but we need to support more than one kind of boot. We support booting over Myrinet, from SCSI disk, CDROM, floppy, harddrive, flash disk, LinuxBIOS, PXE. These are all acceptable BeoBoot Boot media. That may look like a tiny detail, but people cared strongly about how their cluster booted.
LinuxBios is a project from Ron Minnick at Los Alamos National Labs. It replaces the computer BIOS with a tiny copy of Linux. He is working directly with hardware manufacturers, so someday we may buy machines with LinuxBIOS installed. People laughed at him initially, but at the Extreme Linux forum this year, the LinuxBios was just a given. Of course Linux would be in the BIOS. In a couple of years, you'll be able to ask anyone and they'll comment, "Of course Linux makes an excellent BIOS, what is new about that?".
Ron adopted the Scyld Stage 1 boot image for inside the BIOS. It is available now from LinuxLabs. It can boot an x86 in less than 20seconds, an Alpha in 3 seconds. Of course, you have to be careful; disks don't spin up that fast, so there must be checks in place to prevent problems. You can get a reliable network boot without any moving parts. That is an example of how Scyld's work has influenced other areas of Linux, in this case, the embedded market.
Two Kernel MonteThe Scyld Beowulf also uses Two Kernel Monte. This means that first one kernel booted, then another is loaded into memory. Last, the system swaps kernels in an already-running system. This essentially allows a reboot without reading from the disk. It is a cute, clever mechanism.
Phase 2 BootPhase 2 of the Scyld Beowulf boot is just a RAM disk with cached libraries, a few modules, a few binaries. It doesn't touch any peripherals.
BProcBProc is the kernel mechanism used at the core of our system. It is a modification to the Linux kernel. BProc is the Beowulf Process Space. We add a little magic to the Linux kernel to make clustering work. The unified process space allows us to use basic tools like ps and top across the cluster. The cluster looks like a large SMP. There is no overhead on the slave node to use bproc; it is just a kernel mechanism.
The work wasn't in creating BProc, but then in continuing to simplify the slave systems so that everything was done from the front end, so it could look like a big SMP.
[Showed slide of running top across a cluster]. There are nine jobs displayed. Each is using 95% (average) of the CPU, yet the total CPU usage is very low. The only process that reports memory usage is the one running on the master. This is very low cost. 32 way SMP would cost a few hundred thousand dollars while a comparable Beowulf will be ten to twenty thousand dollars, depending on the performance of the hardware chosen.
A few additional new commands have been added, including, bpsh (like rsh/ssh) and bpcp (like scp). This allows a simple command interface to distributing work across the cluster. Agressive library caching is used on the slave system; this will eventually become dynamic.
Donald emphasized that Scyld Beowulf clusters do not solve the parallel programming problem, just the end user problem.
User can choose any kind of filesystem they want. Some people assume Scyld uses NFS; that would be too small. A Slave runs two processes, one to multicast its status and one to run the job assigned to it. You can use whatever parallel file system or network file system you wish, GFS, PVFS, Coda, Intermezzo, NFSv3, etc.
beosetup initially sets up the computer. It is a GUI tool that can also do power management. It can be used to power off unused machines and then power machines on when larger jobs are scheduled. This functionality will be integrating into the scheduler in the future, so that the scheduler can choose to power systems on and off without manual intervention.
Future DevelopmentPlans for future development include:
Scyld Beowulf supports both Intel and Alpha platforms.
Intel platforms account for approximately 95% of the
installed base. However, the Alpha systems account for
more than 30% of Scyld's revenue. Apparently the more
money spent initially on hardware, the more money the
customer is willing to spend for support. Alpha is 64bit
processor, which helps with large file support. Currently,
they have about a
thousand installs deployed, mostly 16 processor systems.
They expect commercial
installs to be 32 processors. All processors must have
the same architecture. They have tested support for
up to 250 nodes on Fast Ethernet, but recommend no more
than 100 nodes.
Eklektix, Inc. all rights
Linux ® is a registered trademark of Linus Torvalds