Make the Beowulf clusters more accessible by incorporating Beowulf software in standard Linux distributions. Make clustering software robust and efficient by implementing the Beowulf cluster software with appropriate modification to the standard Linux kernel.
A Cluster Computer is not as tightly coupled, nor can it support as fine a grain of parallelism, as massively parallel processor (MPP) computers; however, a Cluster Computer provides the same unified system image and programming model. The original Beowulf was a Cluster Computer designed and built to satisfy the ESS requirement for a Gigaflop/s workstation. The Beowulf research group has since grown to a small community of researchers around the country that seek commodity-off-the-shelf (COTS) Cluster Computer solutions to their computational requirements. The activity at CESDIS centers on enhancements to the Beowulf software to provide the user with an MPP programming model.
A Cluster Computer is a dedicated resource that can be custom-designed to fit an individual's computation requirements. In addition to avoiding the obvious difficulties with a network of workstations (scheduling, robustness and availability), there are subtle differences that have a significant impact on performance. A network of workstations is designed to make a large number of inactive users more productive. In a Cluster Computer, the nodes lose their individuality, and the operating system parameters are tuned to make a single parallel program run efficiently across the entire cluster. Many of the enhancements to Linux developed at CESDIS can be thought of as contributing to making Beowulfs more MPP-like.
Linux is a POSIX-compliant operating system originally developed for the x86 architecture. It is publicly available and distributed with source for free on the World Wide Web. It is also provided at cost on CD-ROM by companies such as Red Hat Software, Inc. and Slackware, Inc., which provide support and configuration services. Linux has been ported to other popular commodity architectures (Alphas, Mips, PA-RISC) and currently has an installed base of more than 3 million machines. Linux is the ideal choice for this project since it is the UNIX for PC-hardware and provides an excellent technology transfer medium.
Each of the Beowulf sites has its own area particular research agenda, for example, application development, education, file system and operating system research. CESDIS has been and continues to be the center of networking activity. Providing Linux drivers for cost-effective networks has been a critical component of the widespread proliferation of the Beowulf clusters. Recently, CESDIS has begun working more closely with vendors such as Myricom and Package Engines that are developing high-performance networks compatible with PC-market hardware. Such networks will allow us to construct larger clusters and balanced clusters based on high-performance microprocessors.
CESDIS continues to maintain a leadership role in the Cluster Computing community. The Beowulf project was presented (by Donald Becker) as a keynote session at IEEE Aerospace '97 and a CESDIS/JPL/Caltech collaboration produced the "How to build a Beowulf tutorial" that has been given twice and is scheduled to be given twice more this fall.
In addition to these presentations and published articles, the Beowulf project has built and maintains a significant Web presence. CESDIS has started the Web-based Beowulf University Consortium to make it easier to collaborate with others in the academic Beowulf community and has established and moderates several majordomo mailing lists on Beowulf-related topics.
Recent accomplishment in the area of cluster development include: remote signals, global process ID's, and the Beowulf unified system image. Beowulf global process ID's provide a cluster-wide process-id namespace, which enables remote signals, process migration, and fast synchronization. Each node in the cluster still runs its own copy of the kernel, as does a node in a network of workstations; however, that image is obtained from a master node at boot time, unlike a network of workstations. With respect to the kernel, the nodes are stateless and run the minimal kernel needed to support parallel activities. This unified system will make small clusters much easier to run "out of the box" and will increase the robustness of large clusters like the Beowulf Scalable Mass Storage system at Goddard Space Flight Center and other large clusters being constructed around the country.
Remote signals and global process ID's allow for a tightly coupling implementation of the standard message passing systems, MPI and PVM. The Beowulf versions have the look and feel of the implementation on current supercomputers. The Beowulf Distributed Shared Memory system is now implemented as a runtime loadable module. The system allows each node in the cluster to shared a portion of their virtual address space.
The success of the Beowulf workstation and the spontaneous proliferation of Beowulf clusters has demonstrated the potential of exploiting very inexpensive and widely available components for high-performance computing. The current activities, improving the unified system image and incorporating Beowulf software in the Red Hat distribution, will make results of Beowulf software development more accessible to a broader user community.
The current plan is to maintain a leadership role in the Beowulf community, encouraging the deployment of Beowulf systems through tutorials and personal collaborations. We plan to continue to be the focal point for networking. By working closely with vendors of high-performance networks, the Beowulf project will continue to ride the bow wave produce by these technologies. We plan to continue to work with Red Hat, Inc. so that the Beowulf software can be distributed as minimal modifications to the standard Linux distribution.
Dr. Phillip R. Merkey
Center of Excellence in Space Data and Information Sciences (CESDIS)
Goddard Space Flight Center
merk@cesdis.gsfc.nasa.gov
301-286-3805
http://cesdis.gsfc.nasa.gov/beowulf/
Table of Contents | Section Contents -- System Software R&D