NRL HPCC/ESS Annual Report 1996

Phase II Annual Report: July 1997

NASA Cooperative Agreement Notice 21425/041

Period: 15 August 1996 - 15 July 1997

Objective: The objective of the project Understanding Solar Activity and Heliospheric Dynamics is to develop and optimize a suite of codes using two different types of numerical algorithms for use on massively parallel computers and apply these codes to a number of outstanding fundamental problems related to the dynamics and heating of the solar corona including the origins of the solar wind. The two algorithms are: Fourier transform pseudospectral methods and finite-volume or finite-difference approaches. The codes are written to run on the Cray T3D/T3E MIMD architecture using a message-passing paradigm. Our initial technical objectives, to meet the 10 and 25/50 GFlop benchmarks with these codes on the Cray T3D/T3E computers, were achieved. The scientific goals are to develop a greater understanding of solar activity including the development of chromospheric eruptions, spicules, surges, and other explosive events, the dynamics of coronal loops, prominence formation and coronal ejecta, and the structure and heating of coronal holes, leading ultimately to a predictive capability for solar weather.

Approach: There are basically three different paradigms for solving computational fluid dynamics problems: Spectral/Pseudospectral, Finite Volume/Finite Difference, and Particle methods. There also are basically two paradigms for massively parallel processor (MPP) computer architectures: Single Instruction Multiple Data (SIMD) and Multiple Instruction Multiple Data (MIMD). Previously, we had developed a suite of codes to run on SIMD computer architectures, which favor codes that have multiple processors executing identical instructions in lockstep. Over the past year, we have converted and optimized these codes to run on MIMD computers such as the Cray T3D/T3E, which support codes with distinct blocks of instructions that can be executed independently and simultaneously on different processors. The goal is to develop codes whose execution speeds scale linearly with the number of processors, to take advantage of the larger machines as they become available. Since each new generation of MPP machines can be expected to be larger and faster than its predecessor, this approach will allow us to become more ambitious in the scientific problems we address and to include increasingly complex physics packages in the codes as they are developed. Efficient use of numerical resolution is also an important part of large-scale numerical simulation. To that end, we have begun developing an adaptive mesh refinement code to extend the range of scales that we can simulate.

As the codes were developed and the machines became available we concurrently have begun to apply them to a variety of scientifically important questions concerning the dynamics of the solar atmosphere and the heliosphere. In particular, we are addressing the consequences of magnetic reconnection and its role in the formation of spicules, surges, and other explosive coronal events. Also, we will be investigating prominence formation and coronal mass ejections, coronal loop dynamics, coronal holes, the resonant absorption and phase-mixing of coronal MHD waves, and the acceleration of the solar wind.

Accomplishments: The initial tasks undertaken by our team were to complete the conversion of the existing SIMD computer models to the T3D message-passing paradigm, and to optimize the codes sufficiently to attain 10 GFlops (sustained) on the 512-node T3D seymour at Goddard Space Flight Center. Our experience with both the pseudospectral and finite-volume codes demonstrated that managing the data structures and program flow to minimize data-cache misses was the most critical step in achieving optimal single-node performance on the T3D. A strategy of secondary but nontrivial importance was to minimize page faults while accessing data from main memory. Finally, the pseudospectral code needed to exploit the very fast FFTs (fast Fourier transforms) resident on each node of the machine. For this code, the strategy adopted was to keep resident on processor all of the first and second spatial coordinates of the problem, and distribute the third across processors. Performing the FFTs along the third coordinate then required the use of a fast transpose to bring those data in processor, after which the FFTs were done and an inverse transpose sent the data back where they were needed. For the finite-volume code, we adopted a new data structure based on 4-dimensional arrays, with the second dimension serving to index the working variables, with the first, third, and fourth discretizing the spatial coordinates. We also minimized the amount of communications done by exchanging three planes of data between neighboring processors at the beginning of each integration step. For the rest of each step, the calculations were done on processor with no further communications required.

We met our 10 GFlops milestone for both of the codes on schedule, in December 1996. Further details, including scaling results, may be viewed here.

As part of the agreement negotiated between NASA and NRL for participating in this project, codes attaining the benchmarks are to be documented and posted on the World Wide Web for other researches to access and use. Our CRUNCH3D pseudospectral and FCTMHD3D finite-volume codes were made available in February 1997.

Our third and most ambitious milestone of the year was to demonstrate 25 GFlops (sustained) on Goddard's replacement testbed computer, the 256-node Cray T3E jsimpson, and 50 GFlops on a T3E no more than twice as large. A substantial fraction of the required speedup was provided by the faster hardware, but additional changes in the software were necessary, as well. In these tasks we were greatly assisted by the staff of Cray Research. For the finite-volume code, data-cache reuse was further enhanced by enlarging the first array (spatial) dimension on each processor, and data acquisition from memory was accelerated by exploiting the streaming and prefetching capabilities of the T3E processors. A radical rewrite of the fluid equation solver - the workhorse of the MHD code - provided a further substantial speedup in the code. For the pseudospectral code, the transpose across processors used to perform the z-direction FFT was substantially improved. A new transpose was written which determines the optimal path for data transfer between processors; this in turn depends upon the number and configuration of the processors. The x and y FFTs also were improved by reducing the number of data copies, and the T3E e-registers were employed to speed up the on-processor data transpose required for the y FFT. Data acquisition from memory in this code was accelerated by using both the stream buffers and the e-registers, and fusing loops within the master timestep loop provided some additional speedup. Finally, both codes now exploit the fast SHMEM (shared memory) native communications library on the T3E.

We also met our 25/50 GFlops milestone for the codes on schedule, in June 1997. Further details of those results may be viewed here.

Significance: Modeling the solar corona involves many different scales of phenomena, from the very small resistive dissipation scales to the very large ideal MHD scales. This requires very high resolution codes in three dimensions. Present day computers do not have the speed or memory to resolve all the relevant scales. The MPP architectures, potentially with Teraflop speeds, will provide for the first time sufficient power to perform these simulations. Our present codes are gearing up to take advantage of these potential speeds. In the meantime, with the rapidly advancing computing power available we are able to perform less ambitious but relevant scientific studies. The research involves continuous comparison with observation data, which both helps validate the codes and provides valuable interpretation of the data. The goal is to provide sufficient understanding of solar activity that ultimately we will be able to make predictions of activity much like those routinely done for terrestial climate and weather.

Status/Plans: We have successfully implemented our two principal codes on the Cray T3D/T3E and attained the performance milestones set for them. Our future efforts with these two powerful tools will be oriented principally toward exploring new science. We are extending our work on the reconnection of magnetic flux tubes using the improved resolution made available by the increased computing speed, and moving our previously 2- and 2.5-dimensional work on sheared coronal arcades into more realistic, 3-dimensional geometries. The two codes will also be applied to some new solar coronal problems of keen interest, which previously were not solvable by the tools available to us. The next year also will find us working on the development of a finite-volume, adaptive mesh refinement code that will allow us to extend the range of scales simulated by over an order of magnitude.

Further Information: More detailed information about our work is available through the following links:

Bibliography
Milestones

Point of Contact:

John H. Gardner
Naval Research Laboratory
Phone 202.767.6582
e-mail:gardner@lcp.nrl.navy.mil