NASA
High Performance Computing
and Communications Program
Computational AeroSciences Project
An Efficient Parallelization Procedure for Multi-block CFD Codes
Objective: Develop an efficient procedure for parallelization of multi-block computational fluid dynamics (CFD) codes.
Approach: The CFD code Computational Fluids Lab 3 Dimensional (CFL3D) has been extensively used by research institutions, universities, and industry as a reliable computational tool for analyzing compressible fluid flow. Until recently, this sequential code was primarily employed on the CRAY supercomputers to analyze complicated floe problems requiring millions of grid points. To utilize the available high performance resources, a parallel version of CFL3D was developed. Multi-block codes, such as CFL3D, are structured to readily incorporate coarse-grain parallelism, wherein one or more blocks of the global grid are assigned to each processor of a workstation cluster or parallel computer. More than one block can reside on a processor, but coarse granularity implies that a block cannot be split among multiple processors.
Accomplishments: To utilize the available high performance resources, a parallel version of CFL3D was developed. The code retains almost all of its original features, yet yields significant performance speedups when used in a high performance environment. Linear speedups, as shown in figure 1, are approachable if the biggest blocks can be partitioned to allow better load (and data) balancing. Because data distribution is necessary and inevitable for realistic problem sizes, exchange of information between processors is performed via messages using the Message Passing Interface (MPI) protocols. The primary considerations of the parallelization task were: a) minimal code changes from the original sequential version, and b) the capability to generate sequential distributed, and parallel versions from one code using simple compiler directives.
Significance: The use of coarse-grain parallelism eases the hardship in simulating compressible flow situations with CFD that require multiple blocks involving millions of grid points. This work demonstrates that linear speedups are approachable if the biggest blocks can be partitioned to allow good load (and data) balancing.
Status/Plans: Future work will include a study of the feasibility of fine-grain parallelism, wherein one block may be shared between two or more processors, into CFL3D. Such a procedure will split the biggest block of a problem containing non-uniform blocks between two or more processors, enabling better load-balancing. A judicious combination of fine- and coarse-grained parallelism will be required to tackle the complex fluid-flow problems that will engage scientists and researchers in the years ahead.
Contact:
Dr. Chen-Huei Liu
NASA Langley Research Center
Mail Stop 128
c.liu@larc.nasa.gov
(757) 864-2154