ESS Project FY95 Annual Report: Applications Software

Four-Dimensional Data Assimilation

Objectives: Four-Dimensional Data Assimilation (4DDA) involves using climate models of the Earth System (atmosphere, land surface, and ocean, i.e., climate models) and estimation-theoretical methods for melding observations into the model. 4DDA attempts to provide the best estimate of the evolving state of the Earth System by extracting the maximum amount of information from the available observations.

(1) Our primary objective is to use the computing load of 4DDA to explore the high-performance computing limits of: processor speed, main-memory volume, memory-access speed (including interprocessor as well as on-processor communication bandwidths), and I/O. In the coming years, the computing requirements of Goddard Space Flight Center's (GSFC) Data Assimilation Office (DAO) are estimated to be:

These numbers were obtained by using present-day performance figures and conservative projections based on the realistic trends in Mission to Planet Earth (MTPE) and Earth Observing System Data and Information System (EOSDIS) in a budget-constrained environment.

(2) In performing our primary goal, the scientific objective is to facilitate the programmatic demands of the DAO: to provide accurate assimilated Earth science data sets to the scientific community; to incorporate the larger data volumes and data types that are becoming available in MTPE; and to perform research into advanced techniques of 4DDA. In 1998, the DAO will deliver an operational data assimilation system (DAS) to EOSDIS.

4DDA involves three distinct processes. First, the model performs a 6-hour forecast. Then, the observational data are subject to quality control (QC). Finally, a statistical analysis is performed to meld the observations with the forecast. The analysis part of the DAS, which is far more computationally demanding and less well-studied than the model, is one of the main foci of our work.

Approach: We have formed a team of interdisciplinary Earth and computer scientists from GSFC, the Jet Propulsion Laboratory (JPL), the University of Maryland at College Park, and the Northeast Parallel Architectures Center at Syracuse University. In support of the HPCC and DAO's goals, scientists have studied identifiable segments of the DAS. We also have initiated a study of the top-down system complete with I/O, floating point, and memory access issues.

Apart from I/O, the principal segments of the DAS are the models, the data QC modules, and the analysis-solve routines. For the models we have deferred to and consulted with the Grand Challenge teams of Suarez and Mechoso, who are using plug-compatible versions of our codes. We also have studied the parallel efficiency and implementation of semi-Lagrangian models. Work on the QC modules and the production Optimal Interpolation (OI) analysis-solve routines has been performed by a G. von Laszewski, a PhD student at Syracuse, who uses message-passing software on IBM SP-2 and DEC alpha clusters. M. Makivic has studied the QC modules using data-parallel software on the Thinking Machines Corp. CM-5.

In the last year, we have initiated collaboration with JPL (R. Ferraro and H. Ding) on the newer production analysis-solve routine called Physical space Statistical Analysis System (PSAS). The JPL team uses message-passing software on Intel Paragon and CRAY T3D computers to study a memory-intensive parallel conjugate gradient solver. Work on the advanced Kalman filter (KF) assimilation system has focused on gas constituents in the stratosphere. We use message-passing software on the Paragon and T3D for the memory and processor-speed intensive application.

Accomplishments: As of 1992, none of the DAO codes had been ported to massively parallel processors (MPP's), nor was there any inhouse parallel computing expertise. The distributed-management associated with the multi-institutional project was conducted with considerable success.

A summary of significant achievements is:

(1) G. von Laszewski has achieved 400 MFLOPS for the analysis of 80,000 observations on 40 processors of an IBM SP-2. M. Makivic has achieved 0.5 GFLOPS (3.1 GFLOPS is peak) using a data-parallel code on the CM-5 for the QC. This exceeded the metric we set in 1992.

(2) In preliminary work on an advanced Physical space Statistical Analysis System (PSAS), JPL workers have achieved 18.3 GFLOPS for the key analysis-solve routines (matrix generation and solve) on 512 processors of the Intel Paragon. This performance equates to at least a 30-fold speedup over the same code on a single processor on the CRAY C90.

(3) For the van Leer-type transport codes, we achieved 2.5 GFLOPS sustained and 6.8 GFLOPS peak on a 256-node CM-5, despite high communication latencies and a horizontal resolution of 2 degrees.

(4) The KF represents a rigorous approach to 4DDA that minimizes ad hoc approximations. The peak speed at horizontal resolution of 2 degrees on 512 processors of the Paragon is 1.3 GFLOPS (1 hour of wall-clock time per day of assimilation).

(5) We have ported the sequential PSAS code to a DEC Alpha 2100 in order to study aspects of the evolving computing environment. The DEC Alpha yielded a rate of 50 percent of the speed of PSAS on a single processor of a CRAY C90.

(6) Several modules have been submitted to the HPCC Software Exchange, including data decomposition for data-parallel transport codes, a message-passing version of the van Leer transport scheme, and a global matrix solve suitable for the KF. We also presented a talk (by P. Lyster) at the sixth European Center for Medium-range Weather Forecasting workshop on the use of parallel processors in meteorology; it was subsequently published.

Significance: The work on 4DDA (especially the PSAS module) is computationally intensive in terms of memory speed, volume, and data throughput. Hence, this application is suited to test most aspects of high-performance computing in the coming years. Scientifically, the DAO has a programmatic requirement to provide consistent gridded data sets to the community for the study of global climate, in particular, to provide operational code to EOSDIS by 1998, with continuing support beyond that. Ultimately, the data sets that are derived by 4DDA help to reduce our uncertainties in assessment and predictions of global change. We also regard the successful development of this KF algorithm as a significant achievement and one of the first of the new generation of codes that has been enabled by large-memory MPP's.

Status/Plans: Our major focus in the coming years will be the development of the entire end-to-end system into a state-of-the-art computing environment in 1998 (refer to the table for the expected computing requirements). The DAO plans to implement a large part of its production codes in RISC systems like Silicon Graphics (model physics packages, model dynamical core), IBM SP-2 (model dynamical core), and Cray T3D (model dynamical core, model physics packages). Work at Syracuse will focus on data-parallel software aspects of the analysis, while at JPL the message-passing conjugate gradient algorithm will be advanced. We will also go to 1-degree resolution by 60 vertical levels while keeping the 30 days per day metric intact.

We will also continue work on the KF. This effort will involve implementing the code on faster machines with larger memory. In doing so, we will collaborate with vendors onthe most efficient algorithms and advance the science of Kalman filtering. This code is currently being used for studies of nitrous oxide transport in the middle stratosphere using data from the Upper Atmosphere Research Satellite (UARS).

The figure shows the nitrous oxide mixing ratio plotted at the 850 Kelvin isentropic surface on September 10, 1992, obtained by using the KF to assimilate UARS data over the 4 previous days. The corresponding relative error of the mixing ratio, an important by-product of the KF, is also contoured in the figure. Consistent and reliable estimation of errors is one of the major scientific motivations behind pursuing the KF, and we feel it is an essential attribute of data sets that will be used to understand global change.


Investigator Progress Metric


Points of Contact:

Richard Rood
Peter Lyster
NASA/Goddard Space Flight Center
rood@dao.gsfc.nasa.gov, 301-286-8203
lys@dao.gsfc.nasa.gov, 301-805-6960
URL: http://dao.gsfc.nasa.gov/subpages/hpcc.html


Table of Contents | Section Contents -- Applications Software | Subsection Contents -- Grand Challenge Investigator Teams