Structural Matrix Generator/Assembler for Massively-Parallel Supercomputers Demonstrated for High Speed Civil Transport

Objective: Develop a new method to reduce the time to generate and assemble stiffness and mass matrices of complex aerospace vehicles on massively-parallel supercomputers.

Approach: Structural matrix generation and assembly can be a large fraction of finite element analysis time. This is especially true for design optimization and trade-off studies as well as for nonlinear analysis where matrix generation and assembly must be done many times. Conventional finite element codes, executing on sequential computers, use an element-by-element algorithm to generate and assemble stiffness and mass matrices. Algorithms for parallel computers have the potential to reduce computer time for matrix generation and assembly. One approach to parallelize the conventional procedure for operation on massively parallel computers is to distribute element stiffness calculations among many processors. However, this approach results in poor performance since synchronized communication of the processors is required to simultaneously add stiffness contributions from many elements to the same node. To overcome this problem, a parallel node-by-node stiffness and mass matrix generation and assembly algorithm was developed to distribute nodal, rather than element, calculations to different processors. This algorithm avoids synchronization since no communication between processors is required.

Accomplishment: Previously, an algorithm was developed for shared-memory parallel computers such as the Cray with eight processors. This algorithm has recently been extended to massively parallel, distributed-memory computers such as the Intel Delta. The algorithm's performance using the 512-processor Intel Delta is compared with its performance using an 8-processor Cray Y-MP for a Mach 2.4 High Speed Civil Transport in the attached figure. As shown in the figure, the algorithm's performance was found to be scalable. That is, the computation time reduces in direct proportion to the number of processors. Scalable performance is highly desirable but, in general, difficult to achieve on massively parallel computers.

Significance: This new parallel, node-by-node generation and assembly algorithm may be the first structural analysis method demonstrated to execute significantly faster on a ''massively parallel'' supercomputer than on a Cray. The new algorithm markedly improves computation speed for element generation and assembly as the number of processors increases. The algorithm is well-suited for applications where the global stiffness and mass matrices are calculated repeatedly such as in structural optimization, nonlinear static and dynamic finite element analysis, panel flutter analysis and frequency calculations for large flexible space structures. The algorithm continues to perform well as the size and complexity of the structural model increases.

Status/Plans: The versatility of the algorithm is being tested on both a 160,000-equation HSCT model and a 112,000-equation Space Station Freedom model.

Point of Contact:

Return to the Table of Contents

curator: Larry Picha