Scalable I/O Initiative

Scalable I/O Initiative Visualizing SAR I/O

Visualizing synthetic aperature radar application I/O

Objective

The Scalable I/O Initiative systematically investigates a primary obstacle to effective use of current gigaflops-scale and future teraflops-scale computing systems: getting data into, around, and out of the system. To achieve balance between compute power and I/O capability, the Scalable I/O Initiative adopts a system-wide perspective to analyze and enhance the diverse, intertwined system software components influencing I/O performance.

Approach

Using the PABLO performance evaluation system, application developers and computer scientists are cooperating to determine the I/O characteristics of a comprehensive set of I/O-intensive applications. These characteristics are guiding development of parallel I/O features for related system software components: compilers, runtime libraries, parallel file systems, high-performance network interfaces, and operating system services. Proposed new features, built upon results from ongoing research projects sponsored by DARPA, DOE, NASA, and NSF, are evaluated by measuring I/O performance of full-scale applications using prototype implementations of these features on full-scale massively parallel computer systems. The Scalable I/O Initiative, sponsored jointly by DARPA, DOE, NASA, and NSF, is a collaborative effort among application developers, computer scientists, and SPP vendors to ensure not only efficiency, scalability and portability of the resulting techniques and software, but also adoption of these results by vendors and users.

Accomplishments

The Pablo toolkit was enhanced to collect and analyze detailed application and physical I/O performance data, with emphasis on extension to new hardware platforms and augmentation of tools to capture and process physical I/O data via a device driver instrumentation toolkit. An extensive comparative I/O characterization of parallel applications was conducted to understand the interactions among disk hardware configurations, application structure, file system APIs, and file system policies. This study showed that achieved I/O performance is strongly sensitive to small changes in access patterns or hardware configurations. Based on this study, standard I/O benchmarks are being developed to capture common I/O patterns.

Checkpointing, a requirement for large-scale applications on scalable parallel systems, and 3-D rendering are I/O intensive examples. A checkpoint and restart tool, called CLIP, has been developed for Fortran or C applications using either NX or MPI message-passing libraries on the Intel Paragon. Tests show that CLIP is simple to use and its performance matches application-specific tools. A multi-port frame buffer has been completed for the Intel Paragon. Pixel I/O bandwidth has been a bottleneck with the traditional HiPPI frame buffer approach to performing 3-D rendering on a multi-computer. This multi-port frame buffer has the capability to do Z-buffering in hardware. Experiments, using PGL (Parallel Graphics Library), show that the 4-port prototype frame buffer can deliver applications about 600 MBytes/sec.

A memory server, which allows applications to use remote physical memory as the backing store of its virtual memory system, was completed and released for the Intel Paragon. Swapping a page with the memory server takes 1.4 msec compared to about 27 msec using the traditional virtual memory system. A shared virtual memory system, which allows applications written for shared memory multiprocessors to run on the Intel Paragon, was completed. The system uses a novel coherence protocol called home-based lazy release consistency protocol. All SPLASH-2 benchmark programs have been shown speedups.

Experiments with a user-level TCP/IP that bypasses centralized network server demonstrated 2-3x speedup for a single TCP connection and scaling with multiple connections. The structure of the Intel Paragon OS (how it divides functionality between the Unix server and emulation library) was determined to be incompatible with a complete/robust user-level TCP/IP. An implementation of decomposed TCP/IP and UDP network services was completed for the Intel Paragon. This implementation allows a client program to send or receive network packets directly, without sending the data through the Intel Paragon server. However, due to architectural deficiencies in the original single server, the current implementation of the decomposed server only runs on Intel Paragon's server nodes.

The definition of Scalable I/O Low-Level Parallel I/O API was completed and released for review by the community. This API is simple and yet provides enough mechanisms to build high-level, efficient APIs. A reference implementation of the Scalable I/O Low-Level API 1.0 on the Paragon has been implemented, tested, and released. A parallel file system was implemented using the Scalable I/O Low-Level API reference implementation on the Paragon. This file system uses the same API as the Intel Paragon PFS. Performance experiments show that this PFS is more efficient than the Intel Paragon PFS. This result shows that the Scalable I/O Low-Level API is an efficient and effective API. In addition, ADIO (see below) has been implemented using the Scalable I/O Low-Level API on the Intel Paragon.

The specification of the Abstract Device for I/O (ADIO) was completed and implemented on the Intel Paragon, IBM SP, NFS, and Unix file systems. A portable implementation of Intel PFS and IBM PIOFS interfaces was completed using ADIO. A portable implementation of most features of the I/O section of the MPI-2 specification, including non-blocking and collective operations, was completed. This system, called Romio, was based on ADIO and works with any implementation of MPI, in particular, the Intel Paragon, IBM SP, SGI Origin 2000, HP Exemplar, and networks of workstations. As high-speed networks make it easier to use distributed resources, it is increasingly common that applications and their data are not colocated causing manual staging of data to and from remote computers. An initial implementation of RIO, a prototype Remote I/O library based on ADIO, was completed. Rather than introducing a new remote I/O paradigm, RIO allows programs to use the MPI-IO parallel I/O interface to access remote file systems and use the Nexus communication library to interface to configuration and security mechanisms provided by Globus. An application study shows that RIO provides an alternative to staging techniques and can provide superior performance.

A first implementation of end-to-end out-of-core arrays in the dHPF research compiler was completed. The current version works only for small kernel programs, but will be enhanced to accept more complex programs over the next year. The compiler produces calls to the PASSION I/O library to move the data in and out of memory as needed. PASSION has been implemented on the Intel Paragon and the IBM SP-2 and was used to implement a Hartree-Fock code, in which the I/O time was reduced to as much as 10 percent of the original I/O time on the Paragon. In another application at Sandia National Laboratory, users were able to obtain application level bandwidth of 110 MB/sec out of a maximum possible of 180 MB/sec on 3 I/O nodes of the ASCI Red System.

The development of a framework and a first prototype of the Active Data Repository were completed. The Active Data Repository framework, which is designed to support the optimized storage, retrieval and processing of sensor and simulation datasets, supports various types of processing including data compositing and projection and interpolation operations. This framework was used to prototype a sensor data metacomputing application in which a parallelized vegetation classification algorithm, executing on an ATM connected Digital Alpha workstation network, used the MetaChaos metacomputing library to access the IBM SP-2 based Active Data Repository prototype.

Significance

The objective of the Scalable I/O Initiative is to help assure future scalable parallel systems are balanced systems with respect to I/O and computation. The HPCC program has demonstrated that parallel systems can be designed and constructed with arithmetic speed scaling effectively as the number of processors increases. The Scalable I/O Initiative seeks equivalent progress in techniques for scaling these systems' I/O capabilities to keep pace with continuing advances in computing power.

Because of increasing imbalances between improvements in processing and memory capabilities and in I/O and storage devices, I/O is a critical problem. The performance of off-the-shelf microprocessors used in current scalable parallel processors is improving at rates in the range of 50% to 100% per year. In contrast, I/O performance of single storage devices is limited by bit densities and mechanical constraints and is only improving at a rate of about 20% or less per year. Moreover, currently I/O on massively parallel systems is primarily performed sequentially. Therefore, even application programs that can execute in parallel can be severely limited in their performance due to I/O bottlenecks. Parallelism or other mechanisms must be introduced into their I/O functions to obtain the levels of performance improvement required by large-scale applications and potentially available with scalable parallel systems.

The Scalable I/O Initiative involves major SPP vendors, including HP, IBM, Intel, and SGI/Cray, to promote transition of results to their current and future SPP products and to build consensus on cross-platform issues such as parallel I/O APIs. Products of the Scalable I/O Initiative are being used by other activities. For example, the Pablo I/O characterization tools, which are available for the Intel Paragon, IBM SP-2, HP Exemplar, and other UNIX-compatible systems, are the basis for analysis not only in the Scalable I/O Initiative, but also the Department of Energy's Accelerated Strategic Computing Initiative and the NSF PACI centers. Romio is being adopted by SGI as the initial implementation of MPI-IO on the Origin and Power Challenge machines. There is cooperation with Hewlett-Packard's Convex Division to provide similar functionality for the HP Exemplar. Techniques developed in PASSION are being incorporated in software supported by vendors including HP, IBM, SGI/Cray, and Portland Group.

Scalable I/O Initiative researchers are active participants in many other activities including, for example, the MPI Forum which has incorporated parallel I/O into its scope during the past year, DARPA programs including Quorum, the DOD Modernization Program, the NSF Partnerships in Advanced Computational Infrastructure, and the DOE Accelerated Strategic Computing Initiative. In each case, results from the Scalable I/O Initiative are being applied in new contexts.

Status/Plans

Plans for the next year emphasize vertical integration of different software technologies developed in the Scalable I/O Initiative (e.g., Scalable I/O Low-Level Parallel I/O API, Romio implementation of MPI-2 I/O functionality, RIO for remote file access, out-of-core arrays in dHPF, PASSION, and the Active Data Repository) and continuing investigation of I/O scalability within a single SPP system and in both local and wide area network environments.

Points of Contact

For additional information about the Scalable I/O Initiative, see http://www.cacr.caltech.edu/SIO or contact:

Paul Messina, Director
Center for Advanced Computing Research, M/C 158-79
California Institute of Technology
Pasadena, CA 91125
(626) 395-3907 VOICE
(626) 584-5917 FAX
messina@cacr.caltech.edu

James C. T. Pool, Deputy Director
Center for Advanced Computing Research, M/C 158-79
California Institute of Technology
Pasadena, CA 91125
(626) 395-6743 VOICE
(626) 584-5917 FAX
jpool@cacr.caltech.edu