ESS Project FY95 Annual Report: Applications Software

Dense Linear Algebra Algorithms Based on "Parallel" BLAS

Objective: The objective of this project is to investigate techniques for basing dense linear algebra algorithms on parallel versions of the Basic Linear Algebra Subprograms (BLAS), thereby allowing parallelism to be hidden within these subprograms.

Approach: High-performance dense linear algebra libraries like LAPACK are written in terms of a few computational kernels, the Basic Linear Algebra Subprograms, which make it possible to attain high performance in a portable fashion by only tuning those kernels for different platforms. The ScaLAPACK project at the University of Tennessee, Oak Ridge National Laboratory, Rice University, UCLA, UC-Berkeley, and the University of Illinois, is an extension of this effort that targets distributed memory parallel supercomputers. To achieve a high level of code reuse, the current approach is to try to extract as much parallelism as possible at the BLAS level.

To understand the contribution made to this and other projects by our NASA-funded project, one must understand the hierarchy on which ScaLAPACK is built. At the lowest level, there are the BLAS, and the Basic Linear Algebra Communication Subprograms (BLACS). The BLAS provide the compute kernels for each node of the parallel computer. The BLACS provide the communication interface for communication between the processors. On top of this, a parallel version of the BLAS has been written, the Parallel Blocked BLAS (PBBLAS). Finally, the ScaLAPACK codes for dense linear algebra are written on top of the PBBLAS.

We are systematically evaluating improvements to the different levels of this hierarchy, ultimately providing better implementations of the BLACS, PBBLAS, and some of the ScaLAPACK codes. This effort builds on our experience with collective communication algorithms for parallel architectures and our experience with parallel dense linear algebra codes.

Accomplishments:

1. We have discovered a new, simpler, and more efficient approach to parallel matrix multiplication, which we call the Scalable Universal Matrix-Multiplication Algorithm (SUMMA). This approach has already been used by researchers at Pacific Northwest Laboratories for chemistry applications, by ScaLAPACK at the University of Tennessee, and by the PRISM project, a collaboration between Argonne National Laboratory, the Supercomputer Research Company, and a number of universities. This work has led to one technical report and a submitted journal paper.

2. We have used SUMMA to implement what we believe is the first truly high-performance parallel Strassen matrix-matrix multiplication. This effort has led to one technical report and a submitted journal paper.

3. Through collaboration with researchers in the Texas Institute for Computational and Applied Mathematics, we have discovered that standard distributions of matrices to distributed memory parallel computers do not allow for easy interfaces between libraries and applications. While the library views the matrix as the center of the universe, applications work much more naturally with the vectors that are part of linear systems. We have identified through a series of case studies a possible solution, which we call Physically Based Matrix Distribution. This collaboration has led to one technical report.

Significance The discovery of simpler, more efficient parallel matrix-matrix multiplication algorithms has opened the door to the efficient, systematic parallel implementation of all Basic Linear Algebra Subprograms, which are at the center of many high-performance linear algebra libraries. The discovery of alternative matrix distribution methods has far-reaching implications, allowing for a unified approach for parallelizing dense, sparse iterative, and sparse direct methods for solving linear systems.

Status/Plans: We have implemented and distributed codes for efficient parallel matrix-matrix multiplication and the Strassen variant. We are in the process of designing parallel BLAS implementations that use our Physically Based Matrix Distributions. This will allow us to examine methods for implementing a superset of ScaLAPACK. We are working with the ScaLAPACK project to bring these ideas into that library as early as possible.

Point of Contact:

Robert van de Geijn
Department of Computer Sciences
The University of Texas at Austin
rvdg@cs.utexas.edu
512-471-9720
URL: http://www.cs.utexas.edu/users/rvdg

Table of Contents | Section Contents -- Applications Software | Subsection Contents -- Guest Computational Investigators