Objective: Our commercial customers have asked for lower wall clock times for CFD in a design environment. We are using the SP-2 to meet this need.
Approach: When Hans Mark (1979) predicted the demise of wind tunnels at the hands of Computational Fluid Dynamics (CFD), he probably didn't foresee how long that would take. One area in which wind tunnels still excel is in parameter studies. Once a model is in the tunnel, measurements can be taken very rapidly. For example an angle of attack sweep might take only a minute. In contrast, even a 2-D CFD run for a multi element airfoil (INS2D) might take 10 minutes on a single processor Cray C-90. An angle of attack sweep would require 2.5 hours of CPU time on a Cray , for which one might have to stand in line for most of a day. The effectiveness of CFD in this design role is limited.
By running each angle of attack on a separate set of processors, it is possible to get 16 fold parallelism with very little parallel overhead. This alone reduces the wall clock time for an angle of attack sweep to 40 minutes on 16 processors. With other applications, e.g. multipoint design optimization, the general idea can be extended to make full use of all 160 processors of the SP-2. Instead of parallelizing the flow solver, we have found it more profitable to parallelize an aspect of the design or optimization process (see figure).
Accomplishment: A port of INS2D to a single processor of the SP-2 has been carried out. The code runs at about 31 Mflops or 25% of a C-90 processor. Although the algorithm has not been modified, implementation changes made for efficiency on the SP-2 have resulted in unexpected improvements in efficiency on the C-90 and on the IRIS Indigo. An angle of attack sweep on 16 processors has been performed on the SP-2. Each processor runs at about 31 Mflops, 25% of a C-90 processor. A gross speed of 0.5 Gflops has been obtained for this small configuration.
Significance: The coarse grained parallelism offered by parameter studies is an efficient use of a parallel machine. It is also an efficient way to use networks of workstations and this has been implemented also. Coarse grained parallelism is not difficult or academically glamorous, but it may be of immediate use to our commercial customers as a way to reduce wall clock time and make CFD more useful in a design environment.
This work applies directly to milestone 2.3.5 (multipoint optimization). Since separate cases are needed for each design variable, design point pair, the degree of parallelism can easily exceed the number of processors.
Status/Plans: The next major step is to parallelize a single case over several processors to further reduce the wall clock time for angle of attack sweeps. I believe it is realistic to expect a further factor of 4 reduction, to about 10 minutes. This would make it possible to run single cases on the SP-2 at speeds comparable to the C-90, using only 4 processors, or to run an angle of attack sweep in the same time as a single case, just by using more processors. Points of Contact: