Guidelines: Metrics

Topics

Principles
A Taxonomy of Metrics
A Small Set of Metrics
A Complete Metrics Set

Principles

Metrics must be simple, objective, easy to collect, easy to interpret, and hard to misinterpret.
Metrics collection must be automated and non-intrusive, i.e., not interfere with the activities of the developers.
Metrics must contribute to quality assessment early in the lifecycle, when efforts to improve software quality are effective.
Metric absolute values and trends must be actively used by management personnel and engineering personnel for communicating progress and quality in a consistent format.
The selection of a minimal or more extensive set of metrics will depend on the project's characteristics and context: if it is large or has stringent safety or reliability requirements and the development and assessment teams are knowledgeable about metrics, then it may be useful to collect and analyze the technical metrics. The contract may require certain metrics to be collected, or the organization may be trying to improve it skills an processes in particular areas. There is no simple answer to fit all circumstances, the Project Manager must select what is appropriate when the Measurement Plan is produced. When introducing a metrics program for the first time though, it is sensible to err on the side of simplicity.

A Taxonomy of Metrics

Metrics for certain aspects of the project, include:

Progress in terms of size and complexity.
Stability in terms of rate of change in the requirements or implementation, size, or complexity.
Modularity in terms of the scope of change.
Quality in terms of the number and type of errors.
Maturity in terms of the frequency of errors.
Resources in terms of project expenditure versus planned expenditure

Trends are important, and somewhat more important to monitor than any absolute value in time.

Metric	Purpose	Sample measures/perspectives
Progress	Iteration planning Completeness	Number of classes SLOC Function points Scenarios Test cases These measures may also be collected by class and by package Amount of rework per iteration (number of classes)
Stability	Convergence	Number and type of changes (bug vs. enhancement; interface vs. implementation) This measure may also be collected by iteration and by package Amount of rework per iteration
Adaptability	Convergence Software "rework"	Average person-hours/change This measure may also be collected by iteration and by package
Modularity	Convergence Software "scrap"	Number of classes/categories modified per change This measure may also be collected by iteration
Quality	Iteration planning Rework indicator Release criterion	Number of errors Defect discovery rate Defect density Depth of inheritance Class coupling Size of interface (number of operations) Number of methods overridden Method size These measures may also be collected by class and by package
Maturity	Test coverage/adequacy Robustness for use	Test hours/failure and type of failure This measure may also be collected by iteration and by package
Expenditure profile	Financial insight Planned vs. actual	Person-days/class Full-time staff per month % budget expended

This example is extracted from Software Project Management, a Unified framework [ROY98]. It represents the minimal set of metrics that are necessary from a Project Management viewpoint. A more extensive set may be found in A Complete Metrics Set, which also includes some OO-specific technical metrics which, it is widely agreed, contribute to product quality.

Metrics and Primitives metrics

Total SLOC	SLOCt = Total size of the code
SLOC under configuration control	SLOCc = Current baseline
Critical defects	SCO0 = number of type 0 SCO
Normal defects	SCO1 = number of type 1 SCO
Improvement requests	SCO2 = number of type 2 SCO
New features	SCO3 = number of type 3 SCO
Number of SCO	N = SCO0 + SCO1 + SCO2
Open Rework (breakage)	B = cumulative broken SLOC due to SCO1 and SCO2
Closed rework (fixes)	F = cumulative fixed SLOC
Rework effort	E = cumulative effort expended fixing type 0/1/2 SCO
Usage time	UT = hours that a given baseline has been operating under realistic usage scenarios

Quality Metrics for the End-Product

From this small set of metrics, some more interesting metrics can be derived:

Scrap ratio	B/SLOCt, percentage of product scrapped
Rework ratio	E/Total effort, percentage of rework effort
Modularity	B/N, average breakage per SCO
Adaptability	E/N, average effort per SCO
Maturity	UT/(SCO0 + SCO1), Mean time between defects
Maintainability	(scrap ratio)/(rework ratio), maintenance productivity

In-progress Indicators

Rework stability	B - F, breakage versus fixes over time
Rework backlog	(B-F)/SLOCc, currently open rework
Modularity trend	Modularity, over time
Adaptability trend	Adaptability, over time
Maturity trend	Maturity, over time

A Complete Metrics Set

What Should be Measured?

The things to be measured are:

the Process - the sequence of activities invoked to produce the software product (and other artifacts);

the Product - the artifacts of the process, including software, documents and models;

the Project - the totality of project resources, activities and artifacts;

the Resources - the people, methods and tools, time, effort and budget, available to the project.

The Process

To completely characterize the process, measurements should be made at the lowest level of formally planned activity. Activities will be planned by the Project Manager using an initial set of estimates. A record should then be kept of actual values over time and any updated estimates that are made.

Metrics	Comments
Duration	Elapsed time for the activity
Effort	Staff effort units (staff-hours, staff-days, ...)
Output	Artifacts and their size and quantity (note this will include defects as an output of test activities)
Software development environment usage	CPU, storage, software tools, equipment (workstations, PCs), disposables. Note that these may be collected for a project by the Software Engineering Environment Authority (SEEA).
Defects, discovery rate, correction rate.	Total repair time/effort and total scrap/rework (where this can be measured) also needs to be collected; will probably come from information collected against the defects (considered as artifacts).
Change requests, imposition rate, disposal rate.	Comments as above on time/effort.
Other incidents that may have a bearing on these metrics (freeform text)	This is a metric in that it is a record of an event that affected the process.
Staff numbers, profile (over time) and characteristics
Staff turnover	A useful metric which may explain at a post-mortem review why a process went particularly well, or badly.
Effort application	The way effort is spent during the performance of the planned activities (against which time is formally recorded for cost account management) may help explain variations in productivity: some subclasses of effort application are, for example: training familiarization management (by team lead, for example) administration research productive work - it's helpful to record this by artifact, and attempt a separation of 'think' time and capture time, particularly for documents. This will tell the Project manager how much of an imposition the documentation process is on the engineer's time. lost time meetings inspections, walkthroughs, reviews û preparation and meeting effort (some of these will be separate activities and time and effort for them will be recorded against a specific review activity)
Inspections, walkthroughs, reviews (during an activity - not separately scheduled reviews)	Record the numbers of these and their duration, and the numbers of issues raised.
Process deviations (raised as non-compliances, requiring project change)	Record the numbers of these and their severity. This is an indicator that more education may be required, that the process is being misapplied, or that the way the process was configured was incorrect
Process problems (raised as process defects, requiring process change)	Record the number of these and their severity. This will be useful information at the post-mortem reviews and is essential feedback for the Software Engineering Process Authority (SEPA).

The Product

The products in the Rational Unified Process are the artifacts, which are documents, models or model elements. The models are collections of like things (the model elements) so the recommended metrics are listed here with the models to which they apply: it is usually obvious if a metric applies to the model as a whole, or an element. Explanatory text is provided where this is not clear.

Artifact Characteristics

In general, the characteristics we are interested in measuring are the following:

Size - a measure of the number of things in a model, the length of something, the extent or mass of something

Quality

Defects - indications that an artifact does not perform as specified or is not compliant with its specification, or has other undesirable characteristics

Complexity - a measure of the intricacy of a structure or algorithm: the greater the complexity, the more difficult a structure is to understand and modify, and there is evidence that complex structures are more likely to fail

Coupling - a measure of the how extensively elements of a system are interconnected

Cohesion - a measure of how well an element or component meets the requirement of having a single, well-defined, purpose

Primitiveness - the degree to which operations or methods of a class can be composed from others offered by the class

Completeness - a measure of the extent to which an artifact meets all requirements (stated and implied - the Project Manager should strive to make explicit as much as possible, to limit the risk of unfulfilled expectations). We have not chosen here to distinguish between sufficient and complete.

Traceability - an indication that the requirements at one level are being satisfied by artifacts at a lower level, and, looking the other way, that an artifact at any level has a reason to exist

Volatility - the degree of change or churn in an artifact because of defects or changing requirements

Effort - a measure of the work (staff-time units) that is required to produce an artifact

Not all of these characteristics apply to all artifacts: the relevant ones are elaborated with the particular artifact in the following tables. Where several metrics are listed against a characteristic, all are potentially of interest, because they give a complete description of the characteristic from several viewpoints. For example, when considering the traceability of Use Cases, ultimately all have to be traceable to a (tested) implementation model: in the interim, it will still be of interest to the Project Manager to know how many Use Cases can be traced to the Analysis Model, as a measure of progress.

Documents

The recommended metrics apply to all the Rational Unified Process documents.

Characteristic	Metrics
Size	Page count
Effort	Staff-time units for production, change and repair
Volatility	Numbers of changes, defects, opened, closed; change pages
Quality	Measured directly through defect count
Completeness	Not measured directly: judgment made through review
Traceability	Not measured directly: judgment made through review

Models

Requirements

Requirements Attributes

This is actually a model element.
Characteristic Metrics

Size

number of requirements in total (= Nu+Nd+Ni+Nt)

number to be traced to use cases ( = Nu)

number to be traced to design, implementation, test only ( = Nd)

number to be traced to implementation, test only ( = Ni)

number to be traced to test only ( = Nt)

Note that this partitions requirements into those that will be modeled by Use cases and those that will not. The expectation is that Use Case traceability will account for those requirements assigned to Use Cases, to track design, implementation and test.

Effort

Staff-time units (with production, change and repair separated)

Volatility

Number of defects and change requests (open, closed)

Quality

Defects û number of defects, by severity, open, closed

Traceability

Requirements-to-UC Traceability = Traceable to Use Case Model/Nu

Design Traceability = Traceable to Design Model/Nd

Implementation Traceability = Traceable to Implementation Model/(Nd+Ni)

Test Traceability = Traceable to test model/(Nd+Ni+Nt)

Use-Case Model

Characteristic Metrics

Size

Number of Use Cases

Number of Use Case Packages

Reported Level of Use Case (see white paper, "The Estimation of Effort and Size based on Use Cases" from the Resource Center)

Number of scenarios, total and per use case

Number of actors

Length of Use Case (pages of event flow, for example)

Effort

Staff-time units (with production, change and repair separated)

Volatility

Number of defects and change requests (open, closed)

Quality

Reported complexity (0-5, by analogy with COCOMO [BOE81], at class level; complexity range is narrower at higher levels of abstraction - see white paper, "The Estimation of Effort and Size based on Use Cases" from the Resource Center)

Defects û number of defects, by severity, open, closed

Completeness

Use Cases completed (reviewed and under configuration management with no defects outstanding)/use cases identified (or estimated number of use cases)

Requirements-to-UC Traceability (from Requirements Attributes)

Traceability

Analysis

Scenarios realized in analysis model/total scenarios

Design

Scenarios realized in design model/total scenarios

Implementation

Scenarios realized in implementation model/total scenarios

Test

Scenarios realized in test model (test cases)/total scenarios

Design

Analysis Model
Characteristic Metrics

Size

Number of classes

Number of subsystems

Number of subsystems of subsystems à

Number of packages

Methods per class, internal, external

Attributes per class, internal, external

Depth of inheritance tree

Number of children

Effort

Staff-time units (with production, change and repair separated)

Volatility

Number of defects and change requests (open, closed)

Quality Complexity

Response For a Class (RFC): this may be difficult to calculate because a complete set of interaction diagrams is needed.

Coupling

Number of children

Coupling between objects (class fan-out)

Cohesion

Number of children

Defects

Number of defects, by severity, open, closed

Completeness

Number of classes completed/number of classes estimated (identified)

Analysis traceability (in Use Case model)

Traceability Not applicable - the analysis model becomes the design model.

Here we see some OO-specific technical metrics that may be unfamiliar - depth of inheritance tree, number of children, response for a class, coupling between objects, and so on. See [HEND96] for details of the meaning and history of these metrics. Several of these metrics were originally suggested by Chidamber and Kemerer (see "A metrics suite for object oriented design", IEEE Transactions on Software Engineering, 20(6), 1994), but we have applied them here as suggested in [HEND96] and have preferred the definition of LCOM (lack of cohesion in methods) presented in that work.

Design Model
Characteristic Metrics

Size

Number of classes

Number of design subsystems

Number of subsystems of subsystems à

Number of packages

Methods per class, internal, external

Attributes per class, internal, external

Depth of inheritance tree

Number of children

Effort

Staff-time units (with production, change and repair separated)

Volatility

Number of defects and change requests (open, closed)

Quality Complexity

Response For a Class (RFC): this may be difficult to calculate because a complete set of interaction diagrams is needed.

Coupling

Number of children

Coupling between objects (class fan-out)

Cohesion

Number of children

Defects

Number of defects, by severity, open, closed

Completeness

Number of classes completed/number of classes estimated (identified)

Design traceability (in Use Case model)

Design traceability (in Requirements Attributes)

Traceability Number of classes in Implementation Model/number of classes

Implementation

Implementation Model
Characteristic Metrics

Size

Number of classes

Number of components

Number of implementation subsystems

Number of subsystems of subsystems à

Number of packages

Methods per class, internal, external

Attributes per class, internal, external

Size of methods*

Size of attributes*

Depth of inheritance tree

Number of children

Estimated size* at completion

Effort

Staff-time units (with production, change and repair separated)

Volatility

Number of defects and change requests (open, closed)

Breakage* for each corrective or perfective change, estimated (prior to fix) and actual (upon closure)

Quality Complexity

Response For a Class (RFC)

Cyclomatic complexity of methods**

Coupling

Number of children

Coupling between objects (class fan-out)

Message passing coupling (MPC)***

Cohesion

Number of children

Lack of cohesion in methods (LCOM)

Defects

Number of defects, by severity, open, closed

Completeness

Number of classes unit tested/number of classes in design model

Number of classes integrated/number of classes in design model

Implementation traceability (in Use Case model)

Implementation traceability (in Requirements Attributes)

Test model traceability multiplied by Test Completeness

Active integration and system test time (accumulated from test process), i.e. time with system operating (used for maturity calculation)

* Some method of measuring code size should be chosen and then consistently applied, for example non-comment, non-blank. See [ROY98] for a discussion of the merits and application of 'lines of code' as a metric. Also see the same reference for the definition of 'breakage'.

** The use of cyclomatic complexity is not universally accepted - particularly when applied to the methods of a class. See [HEND96] for a discussion of this metric.

*** Originally from Li and Henry, "Object-oriented metrics that predict maintainability", J. Systems and Software, 23(2), 1993, and also described in [HEND96].

Test

Test Model
Characteristic Metrics

Size

Number of Test Cases, Test Procedures, Test Scripts

Effort

Staff-time units (with production, change and repair separated) for production of test cases, etc.

Volatility

Number of defects and change requests (open, closed) - against the test model

Quality

Defects - number of defects by severity, open, closed (these are defects raised against the test model itself, not defects raised by the test team against other software)

Completeness

Number of test cases written/number of test cases estimated

Test traceability (in Use Case model)

Test traceability (in Requirements Attributes)

Code coverage

Traceability

Number of Test Cases reported as successful in Test Evaluation Summary/Number of test cases

Management

Change Model (this is a notional model for consistent presentation - the metrics will be collected from whatever system is used to manage Change Requests)
Characteristic Metrics

Size

Number of defects, change requests by severity and status, also categorized as number of perfective changes, number of adaptive changes and number of corrective changes.

Effort

Defect repair effort, change implementation effort in staff-time units

Volatility

Breakage (estimated, actual) for the implementation model subset.

Completeness

Number of defects discovered/number of defects predicted (if a reliability model is used)

Project Plan (section 4.2 of the Software Development Plan)

These are measures that come from the application of Earned Value Techniques to project management; together they are called Cost/Schedule Control Systems Criteria (C/SCSC). Included are:

BCWS - Budgeted Cost for Work Scheduled

BCWP - Budgeted Cost for Work Performed

ACWP - Actual Cost of Work Performed

BAC - Budget at Completion

EAC - Estimate at Completion

CBB - Contract Budget Base

LRE - Latest Revised Estimate (EAC)

and derived factors for cost variance, schedule variance etc. See [ROY98] for a discussion of the application of an earned value approach to software project management.

The Project

The project needs to be characterized in terms of type, size, complexity and formality (although type, size and complexity usually determine formality), because these aspects will condition expectations about various thresholds for lower level measures. Other constraints should be captured in the contract (or specifications). Metrics derived from the process, product and resources will capture all other project level metrics. Project type and domain can be recorded using a text description, making sure that there is enough detail to accurately characterize the project. Record the project size by cost, effort, duration, size of code to be developed, function points to be delivered. The project's complexity can be described - somewhat subjectively - by placing the project on a chart showing technical and management complexity relative to other completed projects. [ROY98], Figure 14-1 shows such a diagram.

The derived metrics described in [ROY98], which are the Project Manager's main indicators, can be obtained from the metrics gathered for product and process. These are:

Modularity = average breakage (NCNB*) per perfective or corrective change on implementation model
Adaptability = average effort per perfective or corrective change on implementation model
Maturity = active test time/number of corrective changes
Maintainability = Maintenance Productivity/Development Productivity = [actual cumulative fixes/cumulative effort for perfective and corrective changes]/[estimated number of NCNB at completion/estimated production effort at completion]
Rework stability = cumulative breakage-cumulative fixes
Rework backlog = [cumulative breakage-cumulative fixes]/NCNB unit tested

* NCNB is non-comment, non-blank code size.

Progress should be reported from the project plan, which is statused using artifact completion metrics - with particular weight (from an earned value perspective) being given to the production of working software.

If an estimation model such as COCOMO (see [BOE81] is used, the various scale factors and cost drivers should be recorded. These actually form a quite detailed characterization of the project.

The Resources

The items to be measured include people (experience, skills, cost, performance), methods and tools (in terms of effect on productivity and quality, cost), time, effort, budget (resources consumed, resources remaining).

The staffing profile should be recorded over time, showing type (analyst, designer, etc.), grade (which implies cost) and team to which allocated. Both actuals and plan should be recorded.

Again, the COCOMO model requires the characterization of personnel experience and capability and software development environment, and is a good framework in which to keep these metrics.

Expenditure, budget and schedule information will come from the Project Plan.