Amdahl Universal Measurement Architecture (A+UMA)


The last few years have seen remarkable development in the commercialization of UNIX(R)systems-based computing. At the same time, the standardization, flexibility, and interoperability introduced by Open Systems in the application development arena has been sought in the area of performance management - a cornerstone of commercial computing. This requirement stems from the fact that UNIX systems do not have a well-defined interface for gathering and reporting information about system performance. A myriad of commands, such as sar and ps, are used, albeit inadequately, to provide such information. Applications that require metrics unavailable through these commands have to resort to delving into the kernel.

The Performance Management Working Group (PMWG), which is a working group within the Computer Measurement Group, and whose participating members include representatives from Amdahl, AT&T, DEC, HP, IBM, and others, was formed to address the need for the collection, management, and distribution of performance data. The resulting specification, called Universal Measurement Architecture (UMA), has been presented to X/Open(R) for approval. The Amdahl A+UMA(TM) Performance Data Manager is the first product implementation of UMA.

The first part of this document describes the UMA reference model and the associated Data Pool specification. The second part introduces A+UMA.


Part I - Addressing the Performance Problem

The objectives of UMA are to provide:

The UMA Reference Model

The UMA reference model defines four layers and two interfaces, as shown below.
(Not available in html at this time)
Data Capture Layer

The Data Capture Layer is responsible for collecting the raw data. Its architecture allows data from multiple sources to be collected by a single layer, and this in turn improves the synchronization of the data collection. (Compare this to data collected using programs such as sar and iostats, where the difference in collection methods can result in unpredictable time delays between data points collected by one program and data points collected by the other.)
Data Capture Interface

The Data Capture Interface is the interface between the Measurement Control Layer and the Data Capture Layer. It provides the means for dynamically extending data collection to new providers such as databases, without affecting existing programs. This means that collection is not restricted to the kernel, as in the case of commands such as ps.

The Data Capture Interface is not discussed further in this paper, which focuses on the Measurement Layer Interface.

Measurement Control Layer

The Measurement Control Layer schedules and synchronizes data capture, and manages data collection.
Data Services Layer

The Data Services Layer accepts measurement requests from the MAP (through the Measurement Layer Interface), and distributes data to the destination requested by the MAP. A destination may include private files, the MAP itself, or the UMA Data Storage (an explanation of the UMA Data Storage, UMADS, is provided later).
Measurement Layer Interface

The Measurement Layer Interface is the interface between the Measurement Application Layer and the Data Services Layer. It provides the medium for all interactions between a MAP and UMA, thus isolating the MAP from the implementation details of the rest of UMA. Separate instances of the Measurement Layer Interface exist as a library linked to each active MAP.

The Measurement Layer Interface allows transparent communication across networks, therefore a MAP running on one system can request and examine data from another system. Together with the Data Services Layer, it provides an infrastructure for the distribution of data over large numbers of heterogeneous sites and multiple platforms.

Measurement Application Layer

The Measurement Application Layer consists of the various MAPs providing services for technical support of management goals. These MAPs may be used for performance monitoring, capacity planning, providing tuning advice, and so on. Amdahl's OpenTune is an example of a MAP. Other examples include the A+UMA Basic Reporter (which provides information in a tabular format, similar to sar), and the A+UMA Scheduler/Manager (which maintains the UMADS).
Data Collection, Reporting, and Recording

UMA distinguishes between the reporting of data to a MAP and the collection of data. A MAP requests data from a specified source to be reported to a specified destination. These requests are transformed into messages by the MLI before being sent to the UMA facility. The UMA facility acts on behalf of a MAP to perform the actual data collection through the Data Capture Interface. Performance overhead is minimized by making use of existing collections in progress for other MAPs that have requested the same performance measurement data.
Command Messages

Performance data is communicated between UMA and the MAP (through the Measurement Layer Interface) using messages. The organization of data into messages provides a well defined format suitable for postprocessing either locally or at another system. This format facilitates the writing of programs for data reduction and data display, and enables the introduction of new data types without requiring program modification.
Data Messages

Data messages (sent to the MAP) contain either interval data or event data.

Certain subclasses have both event and interval forms. This permits the MAP to select whether data is to be reported at each interval end, at an event (for example, the termination of a process), or both. Depending on the destination, a data message may be directed to the MAP itself, to UMADS, or to a private file for later processing.

Other Messages

A MAP may also receive condition (status) messages from UMA. These messages indicate the severity of the problem and include a textual description of the problem (for example, "End of session encountered").
Screening and Filtering Data

UMA provides three means by which the message traffic to the MAP can be reduced. These enable the user to:
  1. Restrict the reporting of classes, subclasses, and data segments (see The Data Pool section).
  2. Establish threshold settings, thereby preventing the transmission of data messages unless the threshold conditions are satisfied (for example, when the runqueue length reaches a particular value).
  3. Adjust the granularity of the collected data (for example, by restricting reporting to a particular process or user id).

UMA Data Storage

As mentioned earlier, UMA provides for the reading and writing of messages to and from conventional private files. In addition, UMA provides UMADS, a common public facility for access and maintenance of historical data.
Seamless Access

UMA provides seamless access between historical and recent data. This means that a MAP may be receiving UMADS historical data until the time reaches the present, at which time UMA automatically switches its source to provide live data (see below). UMA also provides a backwards seek mechanism, so that a MAP can seamlessly access UMADS data from the present time (see below).

Programming Interface

A MAP accesses the services of the UMA facility by first establishing a session and then issuing Measurement Layer Interface (MLI) calls.
Sessions

A session is a channel of communication over which the MLI sends messages to UMA to set up and control the reporting of data and to receive status and data messages. Each session has an associated data source, a data destination, and property flags that specify certain fixed characteristics of the session; these constitute the session context. A session also has certain changeable attributes. These include session start time, session end time, reporting priority, reporting interval size, and search-limiting flags (for example, to limit reporting to UMADS only).
Communicating With the MLI

The following table presents a typical sequence of MLI function calls. It should be noted, however, that different MAPs have different purposes, and there can be many variations to this sequence. For more information on MLI, refer to the PMWG Specification for the UMA Measurement Layer Interface.

The Specification for the UMA Measurement Layer Interface document can be obtained from the document server archive amdahl.com via anonymous ftp under pub/uma or from tarpon.instrumental.com via anonymous ftp under pub/pmwg.

 Open a session, and return a session id which    umaCreate

 is used in later calls to identify the session.

 

 Obtain system and UMA configuration              umaRequestConfig

 information (this shows what classes and 

 subclasses are available).

 

 Specify or change session attributes.            umaSetAttr



 

 Establish threshold values.                      umaSetThreshold



 

 Start collecting the required data.              umaStart



 

 Release any held starts. By default, when a      umaRelease

 session is created, the data reporting is held

 until a umaRelease

 call.

 

 Return the next message (used when               umaGetMsg

 destination is the MAP).

 

 Change the time at which the next data interval  umaSeek

 is reported.          

 

 Stop reporting specified data. Data collection   umaStop

 continues if other sessions collect the same data.

 

 Close a session.                                 umaClose

 

 Establish the control connection to a previously umaReconnect

 closed "non-terminating" session.


The Data Pool

As already stated, messages are used to transport performance metrics to MAPs. The formats of these messages are defined in the Data Pool specification (Requirements for a Performance Measurement Data Pool ).
(The Requirements for a Performance Measurement Data Pool document can be obtained from the document server archive amdahl.com via anonymous ftp under pub/uma or from tarpon.instrumental.com via anonymous ftp under pub/pmwg. )
The Data Pool groups the data into classes and subclasses. Each data class can have several subclasses. The class identifies the major grouping (memory, processor, and so on) and the subclass provides a specific grouping within class (virtual memory usage, block I/O counters, and so on).
The Data Pool specification describes three conceptual data segments or groupings within a data subclass. These segments are the smallest selectable units of data and enable the MAP to limit data requests to subsets within a subclass.
  1. Basic - This is a segment of universally supplied data for the subclass as defined by the Data Pool. Every implementation of UMA must supply this.
  2. Optional - This is a segment of data whose structure and contents are defined by the Data Pool, but this segment may or may not be present in a particular implementation.
  3. Extension - This may be present. It is data specific to a vendor's hardware or software implementation.

The UMA Advantage

The fundamental design considerations of UMA facilities provide the benefits of
This design presents UMA as a viable standard in a distributed UNIX system environment which may include high-performance mainframes, minicomputers, and workstations.
Features                                               Benefits


Part II - A+UMA - The First Product Implementation

This section describes the structure of the Amdahl data collection and management product, A+UMA, the first product implementation of UMA. A+UMA is an application that runs in user space and gathers kernel data (see below). A+UMA provides:

A+UMA Structure

Data management and maintenance is performed by the Data Services Layer (DSL), which is composed of the following:
  • DSL Daemon, which establishes DCL collectors and a dedicated DSL session component for each active MAP session.
  • DSL session component, each instance of which provides data management for a single session.
  • UMADS, which maintains historical UMA data.

DSL Daemon

The DSL daemon is responsible for:
  • Session authorization and set-up
  • Detection of aborted DCL and DSL session processes
  • Error logging and recovery

DSL Session Components

The DSL session components manage and support sessions and the collection and reporting of recent and historical data. Once started, a DSL session component responds to calls from the associated MAP to initiate the collection and reporting of system performance data.
Session Management

The DSL session component supports the establishment and maintenance of sessions with MAPs. It performs the following functions:
  • Initializes the session.
  • Maintains MAP start/stop requests for the session.
  • Maintains the current state of the session (such as the session current time pointer).
  • Closes the session and terminates processing.
  • Determines if the requested data is in the recent data facility or in UMADS, and accesses it accordingly.
  • Selects the subset of classes/subclasses that have been requested by the MAP.
  • Sends the selected classes/subclasses to the destination specified by the MAP.
Recent Data Component

The DSL session keeps data for the most recent intervals. The capacity of the recent data facility can be controlled using parameters specified in a configuration file. Data for times prior to those found in recent data can be searched for in UMADS (refer to the earlier section, Seamless Access).
Historical Data Management (UMADS)

The historical data management component allows read-only access to historical data files through the MLI. Only one specific, privileged MAP is allowed to write to these data files. This component locates data from UMADS and presents it to the DSL session component for further processing. The DSL session component then forwards the data to the requesting MAP. Collecting the Data
Data collection is performed by the DCL, which consists of a set of collectors individually started and terminated by the DSL daemon. One DCL collector is started for each unique interval granularity requested. A collector can serve multiple sessions (those sessions with the same requested interval size). When the number of session DSLs requesting a specific interval becomes zero, the DSL daemon terminates the related DCL collector.

RELIABILITY, AVAILABILITY, SERVICEABILITY

The A+UMA DSL brings reliability, availability, and serviceability to the process of data measurement through special features designed to ensure the continued collection of data even after a system crash.
Reliability
The UMADS component is able to reconstruct its table of contents if accidental deletion or corruption of the table occurs.
Availability
The DSL daemon writes information relating to its current internal state into a state file. This information is used to initiate "clean-up" actions when the DSL daemon is restarted, by a watchdog process, after an unexpected termination.
Serviceability
Two features ensure a high level of serviceability:
  1. The DSL daemon maintains a log file containing all error messages. This file is available to system administration personnel for diagnostic purposes.
  2. A+UMA comes with a configuration file, which can be used by the system administrator to configure A+UMA according to the particular needs and constraints of the system.

Administration Features

A+UMA provides administrative commands for:
  • Starting and shutting down A+UMA
  • Displaying information about active sessions and collectors
  • Initiating or modifying UMADS collections
  • Removing a UMADS Table of Contents entry and the corresponding data from UMADS D Listing the current UMADS table of contents
  • Maintaining a UMADS schedule
  • Producing tabular reports on UMADS data.

Summary

What Does A+UMA Solve?

Users want tools to provide comprehensive performance management, both to optimize their systems, and to reduce costs. A+UMA provides:
  • Consolidation - the facilities available from the multiple tools in use today are consolidated by A+UMA into one measurement facility, providing statistics, alarm data, and event data. This simplifies use of the tools, and provides synchronized data for greater accuracy.
  • Single Collection Facility - providing data to other programs or users. This prevents resource wastage and data duplication, which occur when the same data is collected independently by several utilities.
  • Isolated Data Collection - the data collection process is isolated from the programs requesting the data. A standardized data collection interface encourages the development of additional data collectors for alternative platforms and capacity planning tools.
  • Historical Data - can be viewed over an extended period, determined by the user. Trend analysis requires the ability to adjust the observed period over a wide range, subject to the variables being measured.

What Does A+UMA Provide?

A+UMA provides the information necessary to run mission-critical applications at optimum performance. With this information, it is possible to:
  • Monitor the System - and so determine the performance levels being achieved and perform the detailed analysis to isolate and remove bottlenecks. It is important to be able to observe current and historical data, either separately, or at the same time.
  • Plan for System Upgrades - analyze the historical data for the current system and so extrapolate future growth requirements.

Amdahl and UTS are registered trademarks and the A+ logo and A+UMA are trademarks of Amdahl Corporation. UNIX is a registered trademark in the U.S. and other countries, licensed exclusively to X/Open Company Limited. All other trademarks are the property of their respective owners.
(c)1994 Amdahl Corporation. All rights reserved.