Guidelines:
Metrics
Topics
- Metrics must be simple, objective, easy to collect, easy to interpret, and
hard to misinterpret.
- Metrics collection must be automated and non-intrusive, i.e., not
interfere with the activities of the developers.
- Metrics must contribute to quality assessment early in the lifecycle, when
efforts to improve software quality are effective.
- Metric absolute values and trends must be actively used by management
personnel and engineering personnel for communicating progress and quality
in a consistent format.
- The selection of a minimal or more extensive set of metrics will depend on
the project's characteristics and context: if it is large or has stringent
safety or reliability requirements and the development and assessment teams
are knowledgeable about metrics, then it may be useful to collect and
analyze the technical metrics. The contract may require certain metrics to
be collected, or the organization may be trying to improve it skills an
processes in particular areas. There is no simple answer to fit all
circumstances, the Project Manager must select what is appropriate when the
Measurement Plan is produced. When introducing a metrics program for the
first time though, it is sensible to err on the side of simplicity.
Metrics for certain aspects of the project, include:
- Progress in terms of size and complexity.
- Stability in terms of rate of change in the requirements or
implementation, size, or complexity.
- Modularity in terms of the scope of change.
- Quality in terms of the number and type of errors.
- Maturity in terms of the frequency of errors.
- Resources in terms of project expenditure versus planned expenditure
Trends are important, and somewhat more important to monitor
than any absolute value in time.
Metric |
Purpose |
Sample
measures/perspectives |
Progress |
Iteration
planning
Completeness |
- Number of classes
- SLOC
- Function points
- Scenarios
- Test cases
These measures may also be collected by class and by package
- Amount of rework per iteration (number of classes)
|
Stability |
Convergence |
- Number and type of changes (bug vs. enhancement; interface vs.
implementation)
This measure may also be collected by iteration and by package
- Amount of rework per iteration
|
Adaptability |
Convergence
Software "rework" |
- Average person-hours/change
This measure may also be collected by iteration and by package |
Modularity |
Convergence
Software "scrap" |
- Number of classes/categories modified per change
This measure may also be collected by iteration |
Quality |
Iteration
planning
Rework indicator
Release criterion |
- Number of errors
- Defect discovery rate
- Defect density
- Depth of inheritance
- Class coupling
- Size of interface (number of operations)
- Number of methods overridden
- Method size
These measures may also be collected by class and by package |
Maturity |
Test
coverage/adequacy
Robustness for use |
- Test hours/failure and type of failure
This measure may also be collected by iteration and by package |
Expenditure
profile |
Financial
insight
Planned vs. actual |
- Person-days/class
- Full-time staff per month
- % budget expended
|
This example is extracted from Software Project Management, a Unified
framework [ROY98]. It represents the
minimal set of metrics that are necessary from a Project Management viewpoint. A
more extensive set may be found in A Complete
Metrics Set, which also includes some OO-specific technical metrics which,
it is widely agreed, contribute to product quality.
Metrics and Primitives metrics
Total
SLOC |
SLOCt
= Total size of the code |
SLOC
under configuration
control |
SLOCc
= Current baseline |
Critical
defects |
SCO0
= number of type 0 SCO |
Normal
defects |
SCO1
= number of type 1 SCO |
Improvement
requests |
SCO2
= number of type 2 SCO |
New
features |
SCO3
= number of type 3 SCO |
Number
of SCO |
N
= SCO0 + SCO1 + SCO2 |
Open
Rework (breakage) |
B
= cumulative broken SLOC due to SCO1 and SCO2 |
Closed
rework (fixes) |
F
= cumulative fixed SLOC |
Rework
effort |
E
= cumulative effort expended fixing type 0/1/2 SCO |
Usage
time |
UT
= hours that a given baseline has been operating under realistic usage
scenarios |
Quality Metrics for the End-Product
From this small set of metrics, some more interesting metrics can be derived:
Scrap
ratio |
B/SLOCt,
percentage of product scrapped |
Rework
ratio |
E/Total
effort, percentage of rework effort |
Modularity
|
B/N,
average breakage per SCO |
Adaptability
|
E/N,
average effort per SCO |
Maturity
|
UT/(SCO0
+ SCO1), Mean time between defects |
Maintainability
|
(scrap
ratio)/(rework ratio), maintenance productivity |
In-progress Indicators
Rework
stability |
B
- F, breakage versus fixes over time |
Rework
backlog |
(B-F)/SLOCc,
currently open rework |
Modularity
trend |
Modularity,
over time |
Adaptability
trend |
Adaptability,
over time |
Maturity
trend |
Maturity,
over time |
The things to be measured are:
- the Process - the sequence of activities invoked to produce the software
product (and other artifacts);
- the Product - the artifacts of the process, including software,
documents and models;
- the Project - the totality of project resources, activities and
artifacts;
- the Resources - the people, methods and tools, time, effort and budget,
available to the project.
To completely characterize the process, measurements should be made at the
lowest level of formally planned activity. Activities will be planned by the
Project Manager using an initial set of estimates. A record should then be kept
of actual values over time and any updated estimates that are made.
Metrics |
Comments |
Duration |
Elapsed time for the activity |
Effort |
Staff effort units (staff-hours, staff-days, ...) |
Output |
Artifacts and their size and quantity (note this will
include defects as an output of test activities) |
Software development environment usage |
CPU, storage, software tools, equipment (workstations, PCs),
disposables. Note that these may be collected for a project by the Software
Engineering Environment Authority (SEEA). |
Defects, discovery rate, correction rate. |
Total repair time/effort and total scrap/rework (where this
can be measured) also needs to be collected; will probably come from
information collected against the defects (considered as artifacts). |
Change requests, imposition rate, disposal rate. |
Comments as above on time/effort. |
Other incidents that may have a bearing on these metrics
(freeform text) |
This is a metric in that it is a record of an event that
affected the process. |
Staff numbers, profile (over time) and characteristics |
|
Staff turnover |
A useful metric which may explain at a post-mortem review
why a process went particularly well, or badly. |
Effort application |
The way effort is spent during the performance of the
planned activities (against which time is formally recorded for cost account
management) may help explain variations in productivity: some subclasses of
effort application are, for example:
- training
- familiarization
- management (by team lead, for example)
- administration
- research
- productive work - it's helpful to record this by artifact, and attempt
a separation of 'think' time and capture time, particularly for
documents. This will tell the Project manager how much of an imposition
the documentation process is on the engineer's time.
- lost time
- meetings
- inspections, walkthroughs, reviews û preparation and
meeting effort (some of these will be separate activities and time and
effort for them will be recorded against a specific review activity)
|
Inspections, walkthroughs, reviews (during an activity - not
separately scheduled reviews) |
Record the numbers of these and their duration, and the
numbers of issues raised. |
Process deviations (raised as non-compliances, requiring project
change) |
Record the numbers of these and their severity. This is an
indicator that more education may be required, that the process is being
misapplied, or that the way the process was configured was incorrect |
Process problems (raised as process defects, requiring process
change) |
Record the number of these and their severity. This will be
useful information at the post-mortem reviews and is essential feedback for
the Software Engineering Process Authority (SEPA). |
The products in the Rational Unified Process are the artifacts,
which are documents, models or model elements. The models are collections of
like things (the model elements) so the recommended metrics are listed here with
the models to which they apply: it is usually obvious if a metric applies to the
model as a whole, or an element. Explanatory text is provided where this is not
clear.
Artifact Characteristics
In general, the characteristics we are interested in measuring are the
following:
- Size - a measure of the number of things in a model,
the length of something, the extent or mass of something
- Quality
- Defects - indications that an artifact does not perform as specified
or is not compliant with its specification, or has other undesirable
characteristics
- Complexity - a measure of the intricacy of a structure or algorithm:
the greater the complexity, the more difficult a structure is to
understand and modify, and there is evidence that complex structures
are more likely to fail
- Coupling - a measure of the how extensively elements of a system are
interconnected
- Cohesion - a measure of how well an element or component meets the
requirement of having a single, well-defined, purpose
- Primitiveness - the degree to which operations or methods of a class
can be composed from others offered by the class
- Completeness - a measure of the extent to which an
artifact meets all requirements (stated and implied - the Project Manager
should strive to make explicit as much as possible, to limit the risk of
unfulfilled expectations). We have not chosen here to distinguish between sufficient
and complete.
- Traceability - an indication that the requirements at
one level are being satisfied by artifacts at a lower level, and, looking
the other way, that an artifact at any level has a reason to exist
- Volatility - the degree of change or churn in an
artifact because of defects or changing requirements
- Effort - a measure of the work (staff-time units) that
is required to produce an artifact
Not all of these characteristics apply to all artifacts: the relevant ones
are elaborated with the particular artifact in the following tables. Where
several metrics are listed against a characteristic, all are potentially of
interest, because they give a complete description of the characteristic from
several viewpoints. For example, when considering the traceability of Use Cases,
ultimately all have to be traceable to a (tested) implementation model: in the
interim, it will still be of interest to the Project Manager to know how many
Use Cases can be traced to the Analysis Model, as a measure of progress.
Documents
The recommended metrics apply to all the Rational Unified Process documents.
Characteristic |
Metrics |
Size |
Page count |
Effort |
Staff-time units for production, change and repair |
Volatility |
Numbers of changes, defects, opened, closed; change pages |
Quality |
Measured directly through defect count |
Completeness |
Not measured directly: judgment made through review |
Traceability |
Not measured directly: judgment made through review |
Models
Requirements
Requirements Attributes
This is actually a model element.
Characteristic |
Metrics |
Size |
- number of requirements in total (= Nu+Nd+Ni+Nt)
- number to be traced to use cases ( = Nu)
- number to be traced to design, implementation, test only ( = Nd)
- number to be traced to implementation, test only ( = Ni)
- number to be traced to test only ( = Nt)
Note that this partitions requirements into those that will be modeled
by Use cases and those that will not. The expectation is that Use Case
traceability will account for those requirements assigned to Use Cases, to
track design, implementation and test. |
Effort |
- Staff-time units (with production, change and repair separated)
|
Volatility |
- Number of defects and change requests (open, closed)
|
Quality |
- Defects û number of defects, by severity, open, closed
|
Traceability |
|
Use-Case Model
Characteristic |
Metrics |
Size |
- Number of Use Cases
- Number of Use Case Packages
- Reported Level of Use Case (see white paper, "The
Estimation of Effort and Size based on Use Cases" from the
Resource Center)
- Number of scenarios, total and per use case
- Number of actors
- Length of Use Case (pages of event flow, for example)
|
Effort |
- Staff-time units (with production, change and repair separated)
|
Volatility |
- Number of defects and change requests (open, closed)
|
Quality |
- Reported complexity (0-5, by analogy with COCOMO [BOE81],
at class level; complexity range is narrower at higher levels of
abstraction - see white paper, "The Estimation of Effort and Size
based on Use Cases" from the Resource Center)
- Defects û number of defects, by severity, open, closed
|
Completeness |
- Use Cases completed (reviewed and under configuration management
with no defects outstanding)/use cases identified (or estimated number
of use cases)
- Requirements-to-UC
Traceability (from Requirements Attributes)
|
Traceability |
- Analysis
- Scenarios realized in analysis model/total scenarios
- Design
- Scenarios realized in design model/total scenarios
- Implementation
- Scenarios realized in implementation model/total scenarios
- Test
- Scenarios realized in test model (test cases)/total scenarios
|
Design
Analysis Model
Characteristic |
Metrics |
Size |
- Number of classes
- Number of subsystems
- Number of subsystems of subsystems à
- Number of packages
- Methods per class, internal, external
- Attributes per class, internal, external
- Depth of inheritance tree
- Number of children
|
Effort |
- Staff-time units (with production, change and repair separated)
|
Volatility |
- Number of defects and change requests (open, closed)
|
Quality |
Complexity |
- Response For a Class (RFC): this may be difficult to calculate
because a complete set of interaction diagrams is needed.
|
Coupling |
- Number of children
- Coupling between objects (class fan-out)
|
Cohesion |
|
Defects |
- Number of defects, by severity, open, closed
|
Completeness |
- Number of classes completed/number of classes estimated (identified)
- Analysis
traceability (in Use Case model)
|
Traceability |
Not applicable - the analysis model becomes
the design model. |
Here we see some OO-specific technical metrics that may be unfamiliar - depth
of inheritance tree, number of children, response for a class, coupling between
objects, and so on. See [HEND96] for
details of the meaning and history of these metrics. Several of these metrics
were originally suggested by Chidamber and Kemerer (see "A metrics suite
for object oriented design", IEEE Transactions on Software Engineering,
20(6), 1994), but we have applied them here as suggested in [HEND96]
and have preferred the definition of LCOM (lack of cohesion in methods)
presented in that work.
Design Model
Characteristic |
Metrics |
Size |
- Number of classes
- Number of design subsystems
- Number of subsystems of subsystems à
- Number of packages
- Methods per class, internal, external
- Attributes per class, internal, external
- Depth of inheritance tree
- Number of children
|
Effort |
- Staff-time units (with production, change and repair separated)
|
Volatility |
- Number of defects and change requests (open, closed)
|
Quality |
Complexity |
- Response For a Class (RFC): this may be difficult to calculate
because a complete set of interaction diagrams is needed.
|
Coupling |
- Number of children
- Coupling between objects (class fan-out)
|
Cohesion |
|
Defects |
- Number of defects, by severity, open, closed
|
Completeness |
|
Traceability |
Number of classes in Implementation
Model/number of classes |
Implementation
Implementation Model
Characteristic |
Metrics |
Size |
- Number of classes
- Number of components
- Number of implementation subsystems
- Number of subsystems of subsystems à
- Number of packages
- Methods per class, internal, external
- Attributes per class, internal, external
- Size of methods*
- Size of attributes*
- Depth of inheritance tree
- Number of children
- Estimated size* at completion
|
Effort |
- Staff-time units (with production, change and repair separated)
|
Volatility |
- Number of defects and change requests (open, closed)
- Breakage* for each corrective or perfective change, estimated (prior
to fix) and actual (upon closure)
|
Quality |
Complexity |
- Response For a Class (RFC)
- Cyclomatic complexity of methods**
|
Coupling |
- Number of children
- Coupling between objects (class fan-out)
- Message passing coupling (MPC)***
|
Cohesion |
- Number of children
- Lack of cohesion in methods (LCOM)
|
Defects |
- Number of defects, by severity, open, closed
|
Completeness |
|
* Some method of measuring code size should be chosen and then consistently
applied, for example non-comment, non-blank. See [ROY98]
for a discussion of the merits and application of 'lines of code' as a metric.
Also see the same reference for the definition of 'breakage'.
** The use of cyclomatic complexity is not universally accepted -
particularly when applied to the methods of a class. See [HEND96]
for a discussion of this metric.
*** Originally from Li and Henry, "Object-oriented metrics that predict
maintainability", J. Systems and Software, 23(2), 1993, and also described
in [HEND96].
Test
Test Model
Characteristic |
Metrics |
Size |
- Number of Test Cases, Test Procedures, Test Scripts
|
Effort |
- Staff-time units (with production, change and repair separated) for
production of test cases, etc.
|
Volatility |
- Number of defects and change requests (open, closed) - against the
test model
|
Quality |
- Defects - number of defects by severity, open, closed (these are
defects raised against the test model itself, not defects raised by
the test team against other software)
|
Completeness |
|
Traceability |
- Number of Test Cases reported as successful in Test Evaluation
Summary/Number of test cases
|
Management
Change Model (this is a notional model for consistent presentation - the
metrics will be collected from whatever system is used to manage Change
Requests)
Characteristic |
Metrics |
Size |
- Number of defects, change requests by severity and status, also
categorized as number of perfective changes, number of adaptive
changes and number of corrective changes.
|
Effort |
- Defect repair effort, change implementation effort in staff-time
units
|
Volatility |
- Breakage (estimated, actual) for the implementation model subset.
|
Completeness |
- Number of defects discovered/number of defects predicted (if a
reliability model is used)
|
Project Plan (section 4.2 of the
Software Development Plan)
These are measures that come from the application of Earned Value
Techniques to project management; together they are called Cost/Schedule
Control Systems Criteria (C/SCSC). Included are:
- BCWS - Budgeted Cost for Work Scheduled
- BCWP - Budgeted Cost for Work Performed
- ACWP - Actual Cost of Work Performed
- BAC - Budget at Completion
- EAC - Estimate at Completion
- CBB - Contract Budget Base
- LRE - Latest Revised Estimate (EAC)
and derived factors for cost variance, schedule variance etc. See [ROY98]
for a discussion of the application of an earned value approach to software
project management.
The project needs to be characterized in terms of type, size, complexity and
formality (although type, size and complexity usually determine formality),
because these aspects will condition expectations about various thresholds for
lower level measures. Other constraints should be captured in the contract (or
specifications). Metrics derived from the process, product and resources will
capture all other project level metrics. Project type and domain can be recorded
using a text description, making sure that there is enough detail to accurately
characterize the project. Record the project size by cost, effort, duration,
size of code to be developed, function points to be delivered. The project's
complexity can be described - somewhat subjectively - by placing the project on
a chart showing technical and management complexity relative to other completed
projects. [ROY98], Figure 14-1 shows such a
diagram.
The derived metrics described in [ROY98],
which are the Project Manager's main indicators, can be obtained from the
metrics gathered for product and process. These are:
- Modularity = average breakage (NCNB*) per perfective or corrective
change on implementation model
- Adaptability = average effort per perfective or corrective change
on implementation model
- Maturity = active test time/number of corrective changes
- Maintainability = Maintenance Productivity/Development Productivity
= [actual cumulative fixes/cumulative effort for perfective and corrective
changes]/[estimated number of NCNB at completion/estimated production effort
at completion]
- Rework stability = cumulative breakage-cumulative fixes
- Rework backlog = [cumulative breakage-cumulative fixes]/NCNB unit
tested
* NCNB is non-comment, non-blank code size.
Progress should be reported from the project plan, which is statused using
artifact completion metrics - with particular weight (from an earned value
perspective) being given to the production of working software.
If an estimation model such as COCOMO (see [BOE81]
is used, the various scale factors and cost drivers should be recorded. These
actually form a quite detailed characterization of the project.
The items to be measured include people (experience, skills, cost,
performance), methods and tools (in terms of effect on productivity and quality,
cost), time, effort, budget (resources consumed, resources remaining).
The staffing profile should be recorded over time, showing type (analyst,
designer, etc.), grade (which implies cost) and team to which allocated. Both
actuals and plan should be recorded.
Again, the COCOMO model requires the characterization of personnel experience
and capability and software development environment, and is a good framework in
which to keep these metrics.
Expenditure, budget and schedule information will come from the Project Plan.
Copyright
⌐ 1987 - 2000 Rational Software Corporation
| |

|