Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Modeling Code

Objective: Work with Professor Mechoso's team at UCLA on analyzing and improving the computational performance of the UCLA atmospheric global circulation model (AGCM) code.

Approach: Through detailed algorithmic and code-timing analysis, the parallel scalability and the single-node performance bottlenecks of the code on massively parallel systems were determined. Implement efficient numerical and parallel algorithms in place of those inefficient ones in the AGCM code, including a parallel, load-balanced, Fast Fourier Transform (FFT)-based filter for stabilizing finite-differences in the Dynamics, a load-balancing module for Physics, and single-node performance optimizations.

Accomplishments: Designed and implemented a load-balanced parallel FFT filter for the AGCM code. Use of the new filter reduced the execution time of AGCM code by 45% on 240 nodes of Cray T3D. Developed and implemented an efficient load-balancing scheme for the Physics part of the AGCM code. Simulated result from applying our scheme with two passes indicates the Physics load-imbalance is reduced from 48% to 6% on 252 nodes of Cray T3D, and a further reduction in AGCM execution time of 10~15% can be achieved. Several strategies for improving the single-node performance were tried on selected parts of AGCM code, and results obtained thus far suggest further reductions in execution time in the range of 25~35% are possible.

Significance: General circulation models (GCMs) of the atmosphere are among the most powerful tools available for studies of the climate system. Numerical simulations performed using GCMs are among the most computationally expensive scientific applications, because a large number of three-dimensional fields need to be updated at each time step through solving systems of partial differential equations for a long simulation period. It is therefore important to develop the most computationally efficient AGCM code in order to be able to complete long-term simulations within acceptable time frames. The experience and lessons learned from our work are also useful in optimizing ocean and atmospheric/ocean chemistry codes.

Status/Plans: The implementation of a one-pass load-balancing scheme for Physics is its final stage. A two-pass scheme will be implemented in the near future. Work on singe-node performance optimizations is still in progress.

Point of contact: John Z. Lou, Jet Propulsion Laboratory, (818) 354-4870, lou@acadia.jpl.nasa.gov