
Top 5 ESS Project Accomplishments
1) The 9 ESS Round-2 Investigator Teams in collaboration
with SGI/Cray achieved aggressive performance milestones negotiated into
their Cooperative Agreements; SGI/Cray installed a 512 processor CRAY T3E
in support of the Investigators; ESS Guest Investigators were chosen and
their codes moved to the T3E
In Round-2 (FY96-9) ESS signed Cooperative Agreements worth $12.6M with
9 Grand Challenge Teams to achieve Project milestones of 10, 50, and 100
Gigaflop/s sustained on their scientific codes, and work began in August
1996. ESS also signed a $13.2M Cooperative Agreement with SGI/Cray to place
a large scalable parallel Testbed, primarily to support the research needs
of the Round-2 Investigators, but also to assist in transitioning the broader
NASA science community to parallel computing and support research of the
HPCC Computational Aerosciences (CAS) Project. Both the Round-2 Investigators
and the Testbed were acquired through a single Cooperative Agreement Notice
(CAN-21425/041) structured to incentivize strong collaboration between the
Testbed vendor and the Round-2 Investigators to meet aggressive ESS performance
milestones using Investigator codes. All payments under the Round-2 Cooperative
Agreements are tied to achievement of 117 negotiated milestones. At the
end of FY97, all of the 9 ESS Round-2 Grand Challenge Teams had achieved
10 Gigaflop/s sustained performance on their code(s) as negotiated, 7 had
submitted these codes to the National HPCC Software Exchange, and 3 had
achieved 50 Gigaflop/s sustained. Extensive information about the ESS Project
is on the wwW at: http://sdcd.gsfc.nasa.gov/ESS/
a) By
March, all 9 Round-2 Teams had achieved 10 Gigaflop/s sustained performance
on their code(s), had their milestone submissions validated, and received
payment. Technical support by Cray Research played a key role in many instances.
National press coverage was obtained of several Team achievements.
- By September, 7 of the 9 Round-2 Teams had submitted their documented
10 Gigaflop/s performance code to the National HPCC Software Exchange (NHSE).
These 7 are the teams of G.Carey, P.Olson, P.Lyster, A.Malagoli, J.Gardner,
T.Gombosi, and P.Saylor. Milestone progress can be followed at http://sdcd.gsfc.nasa.gov/ESS/can.milestones.htm.
b) By
September, three Round-2 Teams had achieved their 50 Gigaflop/s performance
milestone.
- MGFLO, the microgravity code developed by the PI Team led by G.Carey/University
of Texas at Austin, ran at 51.6 Gigaflop/s on the 512-processor T3E at
Goddard Space Flight Center (GSFC), even though the Cooperative Agreement
allowed use of a computer twice that size. Using time on the UKMET T3E-900
(450 MHz), the code was verified running at 71.3 Gigaflop/s on 680 nodes.
Both cases were run in full 64-bit precision.
- Demonstrated 60 Gigaflop/s sustained with FCTMHD3, a flux correcting
MHD code from the J.Gardner/Naval Research Laboratory Team. This performance
was attained using 680 nodes of the UKMET T3E-900. A 32 bit version of
this code achieved 50 Gigaflop/s on the testbed at GSFC. Further study
will determine if the reduced precision gives acceptable scientific results.
In both cases, significant performance gains were made by fusing loops
to increase cache re-use.
- Achieved 86.8 Gigaflop/s with the P.Olson/Johns Hopkins University
Team's TERRA code on 1,024 nodes of the T3E-900. The code had not previously
been run on a processor partition of this size. It allowed a problem resolution
not previously envisioned by the developer.
- The accomplishments cited above enabled ESS to achieve two (!!!) NASA
HPCC Level-1 Program Commitment Agreement (PCA) milestones and both in
the same month, June. These milestones are:
- GC4 "Demonstrate integrated, multidisciplinary applications on
TeraFLOPS scalable testbeds," which required that codes from two teams
achieve 50 Gigaflop/s; and
- CT4 "Install 50-100 GFLOPS sustained scalable testbed," which
required achievement of 50 Gigaflop/s sustained performance on the 512
processor CRAY T3E at GSFC.
c) In
April, ESS selected the first group of Guest Investigators to use the 20
percent of the CRAY T3E allotted for investigations drawn from the breadth
of NASA science to prepare that community to use scalable parallel systems.
- Following a panel review, 14 large allocations were decided and announced
publicly April 21 for the four quarters beginning April '97, July '97,
October '97, and January '98. A total of 106,250 CPU hours was allocated
for the first 6-month period, leaving 124,750 CPU hours in reserve. Healthy
reserves were maintained to enable ESS to rapidly augment allocations to
those selected investigators who are achieving proposed goals and require
additional machine time. The work to be performed under these allocations
fell under the following NASA science programs: Code-M (1), Code-Y (4),
Code-SZ (4), Code-SS (4), and Code-SL (1). The 14 PIs and 11 Co-Is are
located at 7 universities, 3 NASA Centers, and 1 commercial firm. GSFC
has four Science Laboratories represented. Additional proposals will be
selected via a similar proposal process every 6 months .
- The Guest Investigator proposals receiving large allocations were provided
to SGI/Cray staff, who identified 10 codes to bring over to the CRAY T3E
and subsequently met and were paid for the milestone "Demonstrate
10 codes from the broader community on the T3E" by assisting these
investigations.
- Through an Ames Research Center (ARC)-issued solicitation, the 15 percent
of ESS CAN Testbed resources reserved for use by NASA's computational aerosciences
community was given initial allocations. ARC selected 16 proposals and
announced allocations in May totaling 240,000 processor hours. ESS created
accounts on the T3E for these Investigators.
- By the end of September, 23 proposals had been received in response
to the call for the second group of Guest Investigator proposals (the deadline
had been extended by a month from August 27 to September 29). Ten are from
Universities and 13 are from NASA Centers. The allocation process is expected
to be completed by early November.
d) In
October, March ,and June, SGI/Cray installed at GSFC a succession of three
increasingly capable Testbed systems in support of ESS Round-2 Investigators,
the most powerful being a 512 processor CRAY T3E
- In October, SGI/Cray's installed a 512 processor CRAY T3D system at
GSFC. The system was integrated into NCCS user accounting, networking,
and operations. CAN PI teams received user accounts and began using the
T3D, named seymour (in memory of Seymour Cray). This system was listed
as #30 on the TOP500 List of supercomputers in the world.
- In December, SGI/Cray's new CAN Applications Support staff members
began work at GSFC.
- In March, SGI/Cray installed a 256 processor CRAY T3E system at GSFC.
This system was installed with 32 gigabytes of memory and 480 gigabytes
of disk, and had a peak performance level of 153 Gigaflop/s. Based on the
November 1996 list of the World's most powerful computing systems produced
by Netlib which uses the LinPak benchmark, this system ranked 5th in the
U.S. and 11th in the world. A formal dedication ceremony was held May 14.
- In June, SGI/Cray upgraded the GSFC CRAY T3E to 512 processors (300
MHz) in support of the ESS Grand Challenge applications. This system, which
has a peak performance level of 306 Gigaflop/s, contains 64 Gigabytes of
memory and 480 Gigabytes of disk. This configuration is 14 percent more
powerful and contains 33 percent more memory and memory bandwidth than
Cray's originally proposed 384 processor T3E (268 Gigaflop/s peak and 48
Gigabytes of memory). This T3E system must achieve 25 Gigaflop/s sustained
performance on each of 9 ESS Grand Challenge Team applications codes by
early FY98 under the terms of the Cooperative Agreement GSFC signed with
SGI/Cray. This T3E ranks 1st in NASA and 6th in the world according to
the June 19, 1997, TOP500 list authored by Jack Dongarra and others, which
uses the LINPACK Benchmark for performance comparison.
- In early July, SGI/Cray staff ran MGFLO, the parallel microgravity
code developed by the ESS PI Team led by G.Carey/University ofTexas at
Austin, on 1,024 nodes of a T3E-900 located at a Cray facility in Minnesota.
The code solved a steady state surface-tension driven flow at a rate of
112 Gigaflop/s. This is the same software package and problem that ran
at 51.6 Gigaflop/s on the 512 node T3E-600 at GSFC in June and at 16.5
Gigaflop/s on the 512 node CRAY T3D at GSFC in March. The MGFLO code had
a 90 percent scaling efficiency from 2 nodes of a T3E to 1,024 nodes of
a T3E-900. This achievement demonstrates the scalability of both the solver
and the hardware.
e) In June, the ESS Project was presented to the 1997 (4th) HPCC Independent
Annual Review (IAR) Panel who, following the reveiw, praised ESS for the
performance-based contracting aspects of the Cooperative Agreements, the
excellent progress being made toward the Program Commitment Agreement Level-1
milestones, and the science results from the Investigator Teams.
3) Beowulf Parallel Linux technology increased
in capability and popularity
- In October, a gravitational N-body simulation, using a tree-code developed
by J.Salmon/Caltech, was ported to the Caltech/JPL
16 processor Beowulf and achieved, on a 10 million particle simulation,
a sustained performance of 1.26 Gigaflop/s for a cost of approximately
$50K.
- In November, ESS submitted Beowulf
as a potential nominee to Discover Magazine's Computer Innovation Awards
and the Awards organizer decided to nominate it.
- Beowulf systems from Los Alamos National Lab and Caltech,
brought to Supercomputing '96 in November, were joined together into a
32-processor Beowulf (worth around $100K) on the exhibit floor and ran
Warren/Salmon tree code problems at around 2.2 Gigaflops.
- The December 13 issue of Science carried an article entitled "Do-It-Yourself
Supercomputers" that presents the Beowulf
project and its recent performance price breakthrough.
- In a joint effort with the Caltech
Beowulf team, a "How to build a Beowulf"
tutorial was developed. The 4 hour tutorial was given at the Cluster Computing
Conference (CCC97) held in March at Emory University in Atlanta and at
the Hot Interconnects Conference at Stanford University. This series of
lectures included demonstration of a complete eight node Beowulf Pile of
PCs running an N-body simulation. Hardware and software assembly, installation,
and use were discussed, with numerous demonstrations. These tutorials were
extremely well received, and similar tutorials are planned for Caltech
in October and again at SC97 in San Jose in November.
- D.Ridge and D.Becker/CESDIS worked with RedHat Software to produce
multi-platform CD's of Beowulf
Linux. This year's CD (RedHat 4.2) includes Global Process ID Space
system software, which brings tools for parallel computing to the mainstream
for the first time. RedHat is shipping 300,000 CD versions of this software
monthly. Other Parallel Linux enhancements are forthcoming as RedHat packages.
NASA and CESDIS are fully identified in the CD, which is available at Comp
USA and other retail stores.
- The OSU MPI homepage (the "official" homepage for the LAM
MPI implementation) points to CESDIS for the RedHat
Linux MPI package.
- Planned the first NASA-wide meeting on PC cluster computing, the NASA
Beowulf-class
Clustered Computing Workshop, to be held in Pasadena October 22-23, 1997.
The workshop involves workers from Goddard, JPL, and Ames. Tom Cwik/JPL
is the Workshop chair, and Thomas Sterling/JPL/Caltech is the Program Chair.
- Wiley Publishers has shown interest in a book about Beowulf-class
computing to provide a single source of documentation across the diverse
topic areas related to the implementation and application of these systems.
An outline and schedule for a proposal has been agreed upon.
Points of Contact
James R. Fischer
Goddard Space Flight Center
James.R.Fischer.1@gsfc.nasa.gov
301-286-3465
Robert Ferraro
Jet Propulsion Laboratory
ferraro@zion.jpl.nasa.gov
818-354-1340
Table of Contents | Next Section
-- Applications