NASA
High Performance Computing
and Communications Program
Computational AeroSciences Project
LaRC/NAS IBM SP2 MetaCenter
Objective: The initial goal was to increase utilization of the IBM SP2 at NASA Langley. However, discussions led to the concept of a MetaCenter that would address the initial goal, improve throughput of the system at NAS, and leverage experience at both sites to reduce the time required for system upgrades.
Approach: The first step was to understand why the systems exhibited such a difference in utilization, which was found to be proportional to the user base of both system, which was 4-5 times larger at NAS than at LaRC. This imbalance arose because participants in the CRA did not request time on the LaRC system, unless they were at LaRC. The second step was to look at the user programming environments on both systems and to move them closer together to ease the effort for a user to run on either system. LaRC was using the IBM LoadLeveler software to manage batch jobs because it also supported the interactive access required by the FIDO lab, while NAS used the Portable Batch System (PBS). Once PBS supported interactive access, LaRC switched. Also, LaRC was current in system patches, while they had not been applied uniformly at NAS. The third step was to define requirements and implement the MetaCenter in a manner that would be transparent to most users.
Accomplishments: Users of the NAS and LaRC SP2's were cross validated. To bring both systems to the same level of system software, staff from both Centers installed a major upgrade to the system at LaRC followed by an upgrade at NAS, thus maintaining some availability to users at both sites. A major rewrite of the PBS job scheduler permits users to submit a job on either system that will run on the system that first makes the necessary resources available. Both systems were included in the call for participation for the new operational period.
Significance: This MetaCenter is the first practical example of remote cooperative computing between Centers and has identified a number of issues that were solvable for the SP2's, but which will be major problems for consolidated supercomputing in general.
Status/Plans: The MetaCenter will be operational shortly after the start of fiscal 1997, using a common basis for validation and accounting. At that time users will be able to access the new scheduler to improve throughput. There will be an exhibit at Supercomputing '96 in Pittsburgh that features the MetaCenter
Points of Contact:
Geoffrey M. Tennille
NASA Langley Research Center
g.m.tennille@larc.nasa.gov
(757) 864-5786
Leigh Ann Tanner
NASA Ames Research Center
tanner@nas.nasa.gov
(415) 604-4306