Title: Software Measurement
1Software Measurement
- UCLA Computer Science Department
- CS130
- Winter, 2002
2Reference
- Material in this lecture is taken from chapters
1-3 of Software Metrics A Rigorous and Practical
Approach (2nd ed.), Norman E. Fenton and Shari
Lawrence Pfleeger, 1997, PWS Publishing Company,
Boston, MA, ISBN 0534954251
3Overview
- Measurement what is it and why do we do it?
- Measurement basics
- A goal-based software measurement framework
4Measurement What Is It and Why Do We Do It?
- Measurement in Everyday Life
- Measurement in Software Engineering
- The Scope of Software Metrics
5Measurement in Everyday Life
- Measurement governs many aspects of everyday
life - Economic indicators determine prices, pay raises
- Medical system measurements enable diagnosis of
specific illnesses - Measurements in atmospheric systems are the basis
of weather prediction
6Measurement in Everyday Life
- How do we use measurement in our lives?
- In a shop, price is a measure of the value of an
item, and we calculate the bill to make sure we
get the correct change. - Height and size measurements ensure clothing will
fit correctly. - When traveling, we calculate distance, choose a
route, measure speed, and predict when well
arrive - Measurement helps us to
- Understand our world
- Interact with our surroundings
- Improve our lives
7Measurement in Everyday Life
- What is Measurement?
- Common thread in previous examples some aspect
of a thing is assigned a descriptor that allows
us to compare it with other things. - More formally the process by which
- Numbers or symbols are assigned to attributes of
entities in the real world in such a way as to
describe them. - According to clearly defined rules.
8Measurement in Everyday Life
- Definition of measurement process is far from
clear cut. - To understand measurement, must ask questions
that are difficult to answer - In a room with blue walls, is blue a measure of
the color of the room? - A persons height is a commonly understood
attribute that can be easily measured. What
about other attributes of people, such as
intelligence? - Some measurements (e.g., intelligence, wine
quality) may have wide error margins is this a
reason to reject them? - How do we decide which error margins are
acceptable and which are not? - When is a measurement scale acceptable for the
purpose to which it is put (e.g., is it
appropriate to measure a persons height in
kilometers)? - What types of manipulations can we apply to the
results of measurement? - Material in next section (Measurement Basics)
will allow us to answer these questions.
9Measurement in Everyday Life
- Making Things Measurable
- What is not measurable, make measurable
(Galileo Galilei) - One aim of science is to find ways of measuring
attributes of things were interested in. - Measurement makes concepts more visible,
therefore more understandable and controllable. - Attributes previously thought to be unmeasurable
now form basis for decisions affecting our lives
(e.g., air quality, inflation index). - Measuring the unmeasurable improves understanding
of particular entities, attributes - Act of proposing a particular measure can open
discussion that will lead to greater
understanding - Making new measurement may requiring modifying
environment or practices (e.g., using a new tool,
adding a step in a process)
10Measurement in Everyday Life
- Measurement in Software Engineering
- In many instances, measurement is considered a
luxury. For many projects - Measurable targets are not set (e.g., products
are supposed to be user-friendly, reliable, and
maintainable, but we dont quantify what that
means). - The component costs of projects are not
quantified or understood. - Product quality is not quantified.
- Too much reliance on anecdotal evidence (e.g.,
try our product and youll improve your
productivity by 50!). Most of the time, theres
no measurable basis for the claims.
11Measurement in Everyday Life
- Measurement in Software Engineering (contd)
- When measurements are made, they tend to be
- Incomplete
- Inconsistent
- Infrequent
- Most of the time, were not told anything about
- How experiments were designed
- What was measured and how
- Realistic error margins
- Without this information, cant decide whether to
apply results to a development effort, and cant
do an objective study to repeat the measurements. - Lack of measurement in SW engineering is
compounded by lack of a rigorous approach.
12Measurement in Everyday Life
- Software Measurement Objectives
- Assessing status
- Projects
- Products for a specific project or projects
- Processes
- Resources
- Identifying trends
- Need to be able to differentiate between a
healthy project and one thats in trouble - Determine corrective action
- Measurements should indicate the appropriate
corrective action, if any is required.
13Measurement in Everyday Life
- Types of information required to understand,
control, and improve projects - Managers
- What does the process cost?
- How productive is the staff?
- How good is the code?
- Will the customer/user be satisfied?
- How can we improve?
- Engineers
- Are the requirements testable?
- Have all the faults been found?
- Have the product or process goals been met?
- What will happen in the future?
14Measurement in Everyday Life
- The Scope of Software Metrics
- Cost and effort estimation
- Productivity measures and models
- Data collection
- Quality models and measures
- Reliability models
- Performance evaluation and models
- Structural and complexity metrics
- Capability-maturity assessment
- Management by metrics
- Evaluation of methods and tools
15Measurement in Everyday Life
- The Scope of Software Metrics some details
- Cost and effort estimation
- Motivation accurately predict costs early in
the development life cycle. - Numerous empirical cost models have been
developed - COCOMO, COCOMO 2
- Putnams model (see Pressman Ch 3)
- ...
16Measurement in Everyday Life
- The Scope of Software Metrics some details
- Productivity models and measures
- Estimate staff productivity to determine how much
specified changes will cost - Naive measure size divided by effort. Doesnt
take into account things like defects,
functionality, reliability. - More comprehensive models have been developed
next slide illustrates a possible model.
17Measurement in Everyday Life
- The Scope of Software Metrics some details
- Possible productivity model
Productivity
Cost
Value
Personnel
Resources
Complexity
Quantity
Quality
HW
Env Cnstrst
Time
Problem difficulty
Reliability
Defects
Functionality
Size
Money
SW
18Measurement in Everyday Life
- The Scope of Software Metrics some details
- Software quality model
Factor
Criteria
Use
Communicativeness
Usability
Accuracy
Product Operation
Reliability
Consistency
Efficiency
Device Efficiency
Accessibility
Reusability
Metrics
Completeness
Maintainability
Product Revision
Structuredness
Conciseness
Portability
Device Independence
Testability
Legibility
Self-descriptiveness
Traceability
19Overview
- Measurement what is it and why do we do it?
- Measurement basics
- A goal-based software measurement framework
20Measurement Basics
- Overview
- The representational theory of measurement
- Measurement and models
- Measurement scales and scale types
- Meaningfulness in measurement
21Measurement Basics
- Overview
- Understanding of software attributes not as deep
as understanding of non-software entities (e.g.,
length, weight, temperature) - Questions that are relatively easy to answer for
non-software entities are difficult for software - How much must we know about an attribute before
its reasonable to consider measuring (e.g.,
program complexity)? - How do we know if weve really measured the
attribute we want to measure? Does a count of
the number of defects found in a system measure
its quality, or does it measure something else? - Using measurement, what meaningful statements can
we make about an attribute and the entities that
possess it (e.g., can we talk about doubling a
designs quality)? - What meaning operations can we perform on
measures (e.g., can we compute the average
productivity of a group of developers, or the
average quality of a set of modules)? - Answering these questions requires developing a
theory of measurement
22Measurement Basics
- The representational theory of measurement
- Developed as a classical discipline from the
physical sciences - Provides rules for
- Making consistent measurements
- Interpreting data resulting from measurement
- Representational theory of measurement formalizes
intuition about the way the world works.
23Measurement Basics
- Empirical relations
- Data obtained as measures should represent
attributes of observed entities - Manipulating data should preserve observed
relationships - Example Taller than
- Binary relation defined on the set of pairs of
people. Either - A is taller than B, or
- B is taller than A
- Empirical relations are not restricted to binary
relations can be unary (e.g., A is tall),
ternary (A sitting on Bs shoulders is taller
than C), etc.
24Measurement Basics
- Empirical relations (contd)
- Empirical relations are mappings from the
empirical, real world to a formal mathematical
world. - Height maps a set of people to the set of real
numbers - Greater functionality (from survey results)
- x has greater functionality than y if (x,y) gt
60. Relation is (C,A), (C,B), (C,D), (A,B),
(A,D). - Surveys can help gain preliminary understanding
of relationships.
25Measurement Basics
- Empirical relations (contd)
- Definitions
- Measurement a mapping from the empirical world
to the formal, relational world. - Measure number or symbol assigned to an entity
by the mapping in order to characterize an
attribute.
26Measurement Basics
- Rules of Mapping
- Measures must specify domain and range as well as
the rule for performing the mapping - Domain real world is domain of mapping that
defines the measurement - Range the mathematical world into which
real-world attributes are mapped - Examples
- Measuring height
- Is height measured in inches, centimeters, feet?
- Are people measured sitting or standing?
- Are shoes allowed to be worn during the
measurement? - Measuring lines of code
- Are lines of code reused without change counted?
- Are non-executable lines counted?
- Declarations
- Compiler Directives
- Comments
- Blank lines
27Measurement Basics
- The representation condition
- Behavior of measures in number system needs to be
the same as corresponding elements in the real
world. - Formally, a measurement mapping M must map
entities into numbers and empirical relations
into numerical relations in such a way that - Empirical relations preserve numerical relations
- Empirical relations are preserved by numerical
relations
28Measurement Basics
- The representation condition example
- Taller than
- A is taller than B iff M(A) gt M(B), where M is a
mapping from the empirical world to the real
numbers. - Whenever Joe is taller than Frank, then M(Joe)
must be a bigger number than M(Frank) - Jane can be mapped to a bigger number than John
only if Jane is taller than John.
29Measurement Basics
- The representation condition example 2
- Software failures criticality
- Three types of failures examined
- Delayed response
- Incorrect output
- Data loss
- At this point, we have a relation system
consisting of 3 unary relations - R1 for delayed response
- R2 for incorrect output
- R3 for data loss
- With this information, we cant yet judge the
relative criticality of these types of failures.
30Measurement Basics
- The representation condition example 2 (contd)
- We can find a representation in the set of real
numbers by choosing three distinct numbers - M(delayed response) 6
- M(incorrect output)4
- M(data loss)50
- Further investigation of criticality reveals that
data loss is more critical than incorrect output,
which in turn is more critical than a delayed
response. - To develop a real-number representation for this
enriched relation, we must be more careful in
assigning numbers. - Using gt to mean more critical than, data-loss
failures must be mapped to a higher number than
incorrect output failures, which in turn must
mapped to a higher number than delayed responses.
31Measurement Basics
- The representation condition (contd)
- There may be many different measures for a given
attribute (e.g., in., cm., furlongs). - Any measure satisfying the representation
condition is a valid measurement - The richer the empirical relation system, the
fewer the valid valid measures - Relational systems are rich if they have a large
number of relations that can be defined. - As the number of empirical relations increases,
so does the number of conditions a measurement
mapping must satisfy in its representation
condition.
32Measurement Basics
- Measurement and models
- Model an abstraction of reality allowing us to
- Strip away unnecessary detail
- View an entity or concept from a particular
perspective - Representation condition requires every measure
to be associated with a model of how the measure
maps real world entities and attributes to
elements of a numerical system. These models are
essential in - Understanding how measure is derived
- Interpreting behavior of numerical elements when
we return to the real world.
33Measurement Basics
- Defining Attributes
- Always a temptation to focus too much on formal,
mathematical system, rather than on empirical
system. - Before we set out to measure something (e.g.,
program complexity), we need to - Identify a set of characteristics of the thing
were trying to measure - A model that associates the characteristics
- We can then define measures for each
characteristic, and use the representation
condition to help understand the relationships.
34Measurement Basics
- Direct and Indirect Measurement
- Direct measure relates an attribute to a number
or symbol without reference to no other object or
attribute (e.g., height). - Indirect measure
- Used when an attribute must be measured by
combining several of its aspects (e.g., density) - Requires a model of how measures are related to
each other
35Measurement Basics
- Direct and Indirect Measures for Software
examples - Direct
- Length or source code (lines of code)
- Duration of testing process
- Number of defects discovered during test
- Time a developer spends on a project
- Indirect
- Programmer productivity (LOC/workmonths of
effort) - Module defect density (number of defects/module
size) - Defect detection efficiency ( defects
detected/total defects) - Requirements stability (initial
requirements/total requirements) - Test effectiveness ratio (number of items
covered/total number of items) - System spoilage (effort spent fixing faults/total
project effort)
36Measurement Basics
- Measurement for prediction
- So far weve talked about measuring some entity
that already exists - Useful for assessing current situation or
understanding what has happened in the past - In many cases, we want to predict an attribute of
an entity that doesnt yet exist (e.g., project
cost, reliability of fielded system). - Requires model relating measurement that can be
taken now to attributes that will be predicted - Empirical cost models
- Software reliability models
- Model is not sufficient by itself to perform
required prediction. Need a prediction system
including - A model relating the measurements to the desired
attribute - A procedure to model parameters
- Procedures for interpreting model results
37Measurement Basics
- Measurement for prediction
- Accurate predictive measurement is always based
on measurement in the assessment sense - Everyone wants to predict key determinants of
success (e.g., effort to build a new system,
operational reliability), but... - There are no magic models. They all depend on
- High-quality measurements of past projects
- High-quality measurements of current project
38Measurement Basics
- Measurement scales and scale types
- A measurement scale is our mapping, M, together
with the empirical and numerical relation
systems. - If the relation systems (domain and range) are
obvious from context, sometimes M alone is
referred to as the scale. - Three important questions concerning
representations and scales - How do we determine when one numerical relation
system is preferable to another? - How do we know if a particular empirical relation
system has a representation in a given numerical
relation system? - What do we do when we have several different
possible representations (and hence many scales)
in the same numerical relation system?
39Measurement Basics
- Measurement scales and scale types (contd)
- Three questions
- How do we determine when one numerical relation
system is preferable to another? - Answer We can map the scale to a symbolic
relational system. In practice, this can be
unwieldy (symbolic vs. numerical manipulation).
We try to use real numbers whenever possible. - How do we know if a particular empirical relation
system has a representation in a given numerical
relation system? - Answer This is known as the representation
problem, one of the basic problems of measurement
theory. This is a solved problem for various
types of relation systems characterized by
specific axioms. Discussion is beyond the scope
of this course, but solutions can be found in
texts on measurement theory. - What do we do when we have several different
possible representations (and hence many scales)
in the same numerical relation system? - Answer This is the uniqueness problem.
Following slides address this question.
40Measurement Basics
- Measurement scale types
- Nominal
- Ordinal
- Interval
- Ratio
- Absolute
- One relational system is richer than another if
all relationships in the second system are
contained in the first. - Scale types above are listed in order of
increasing richness.
41Measurement Basics
- Measurement scale types (contd)
- Why is this important?
- If we have a satisfactory measure for an
attribute with respect to an empirical relation
system, we want to know what other measures exist
that are acceptable. - Mapping from one acceptable measure to another is
called an admissible transformation. - Example when considering length, admissible
transformations are of the form MaM.
Transformations of the form MbaM, or MaMb
are not acceptable when b ltgt 0. - The more restrictive the class of admissible
transformations, the most sophisticated the
measurement scale.
42Measurement Basics
- Nominal scale
- Most primitive form of measurement define
classes or categories, and place each category in
a particular class or category - Two major characteristics
- Empirical relation consists only of different
classes no notion of ordering - Any distinct number or symbolic representation is
an acceptable measure no notion of magnitude
associated with numbers or symbols. - Any two mappings, M and M, will be related to
each other in that M can be obtained from M by a
one-to-one mapping - Example software faults can belong to one of
the following classes, according to where they
were first introduced during development - Specification
- Design
- Code
43Measurement Basics
- Measurement types and scale
- Ordinal scale
- Augments nominal scale with ordering information.
- Three major characteristics
- Empirical relation system consists of classes
that are ordered with respect to the attribute - Any mapping preserving the ordering (i.e., a
monotonic function) is acceptable - Numbers represent ranking only, so arithmetic
operations have no meaning - Set of admissible transformations is set of all
monotonic mappings - Example software complexity two valid
measures
44Measurement Basics
- Measurement type and scale
- Interval scale
- Captures information about size of intervals that
separate classes. - Three characteristics
- Preserves order
- Preserves differences, but not ratios
- Addition and subtraction are acceptable, but not
multiplication and division - Class of admissible transformations is the set of
affine transformations MaMb, where agt0. - Example software complexity suppose the
difference in complexity between a trivial and a
simple system is the same as that between a
simple and a moderate system. Where this equal
step applies to each class, we have an attribute
measurable on an interval scale.
45Measurement Basics
- Measurement type and scale
- Ratio scale
- Most useful scale, common in physical sciences
captures information about ratios - 4 characteristics
- Preserves ordering, size of intervals between
entities, and ratios between entities - There is a zero element, representing total lack
of the attribute - Measurement mapping must start at 0 and increase
at equal intervals (units) - All arithmetic can be meaningfully applied to
classes in the range of the mapping. - Acceptable transformations are ratio
transformations MaM, where a is a scalar. - Example program length can be measured by
lines of code, number of characters, etc. Number
of characters may be obtained by multiplying the
number of lines by the average number of
characters per line.
46Measurement Basics
- Measurement type and scale
- Absolute scale
- Most restrictive in terms of admissible
transformations - For any two measures, M and M, theres only one
admissible transformation (identity
transformation), since theres only one way to
make the measurement. - 4 characteristics
- Measurement is made simply by counting the number
of elements in the entity set. - Attribute always takes the form of number of
occurrences of x in the entity - Only one possible measurement mapping, namely the
actual count - All arithmetic analysis of the resulting count is
meaningful. - Example lines of code in a module is an
absolute scale measure.
47Measurement Basics
- Measurement type and scale - summary
48Measurement Basics
- Meaningfulness in measurement
- After making measurements, key question is can
we deduce meaningful statements about entities
being measured? - Harder to answer than it first appears consider
these statements - The number of errors discovered during the
integration testing of a program X was at least
100 - The cost of fixing each error in program X is at
least 100 - A semantic error takes twice as long to fix as a
syntactic error - A semantic error is twice as complex as a
syntactic error
49Measurement Basics
- Meaningfulness in measurement (contd)
- First statement seems to make sense
- Second statement doesnt make sense number of
errors may be specified without reference to a
particular scale, but cost to fix them must be - Statement 3 seems sensible the ratio of time
taken is the same, whether time is measured in
second, hours, or fortnights - Statement 4 does not appear to be meaningful and
requires clarification - If complexity means time to understand the error,
than it makes sense - Other definitions of complexity may not admit
measurement on a ratio scale (e.g. examples in
previous slides) in which case statement 4 is
meaningless.
50Measurement Basics
- Meaningfulness in measurement
- Definition a statement involving measurement is
meaningful if its truth value is invariant of
transformations of allowable scales.
51Measurement Basics
- Meaningfulness in measurement examples
- John is twice as tall as Fred
- Implies measures are at least on the ratio scale.
Its meaningful because no matter what
transformation we use (and all we have is ratio
transformations), the truth or falsity of the
statement remains constant. - Temperature in Tokyo today is twice that in
London - Implies a ratio scale, but is not meaningful. We
measure in F and C. If Tokyo is 40 C and
London is 20 C, then the statement is true, but
if Tokyo is 104 F and London is 68 F, the
statement is no longer true. - Failure x is twice as critical as failure y
- Not meaningful if we only have an ordinal scale
for criticality (common scale for software
failures is catastrophic, significant, moderate,
minor, and insignificant).
52Measurement Basics
- Meaningfulness in measurement
- Note that our notion of meaningfulness says
nothing about - Usefulness
- Practicality
- Worthwhile
- Ease of measurement
53Measurement Basics
- Statistical operations on measures
- Analyses dont have to be sophisticated, but we
want to know something about how a set of data is
distributed. - What types of statistical analysis are relevant
to a given measurement scale?
54Measurement Basics
- Indirect measurement and meaningfulness
- Done when measuring a complex attribute in terms
of simpler sub-attributes - Scale type for an indirect measure M is generally
no stronger than the weakest of the scale types
of the sub-attributes - Example testing efficiencydefects/effort
- Defects is on the absolute scale, while effort is
on the ratio scale. Therefore effort is on the
ratio scale. - What is E2.7v121w26x12y22z-497, where
- v is the number of program instructions
- x and y are the number of internal and external
documents - z is the program size in words
- w is a subjective measure of complexity
55Overview
- Measurement what is it and why do we do it?
- Measurement basics
- A goal-based software measurement framework
56A Goal-Based Software Measurement Framework
- Classifying software measures
- Determining what to measure
57A Goal-Based Software Measurement Framework
- Classifying software measures
- Three types of software entities to measure
- Processes collections of software related
activities - Products
- Resources entities required by a process
activity - Within each class, we have
- Internal attributes measured purely in terms of
the entity itself - External attributes measured with respect to
how entity relates to its environment. Behavior
of the entity is important - Managers want to be able to measure and predict
external attributes - However, external attributes are more difficult
to measure than internal ones, and are measured
late in the development process - Desire is to predict external attributes in terms
of more easily-measured internal attributes
58A Goal-Based Software Measurement Framework
- Determining what to measure
- Measurement is useful only if it helps understand
the underlying process or one of its resultant
products - Goal-Question-Metric (GQM) has been proven to be
effective in selecting and implementing metrics - List the major goals of the development project
- Derive from each goal the questions that must be
answered to determine if goals are being met - Decide what must be measured in order to be able
to answer the questions adequately
59A Goal-Based Software Measurement Framework
- GQM example goal is to evaluate effectiveness
of coding standard
Goal
Goal
Who is using standard?
What is coder productivity?
What is code quality?
Questions
- Proportion of coders
- Using standard
- Using language
- Experience of coders
- With standard
- With language
- With environment, etc.
Effort
Errors
Code size (lines of code, function points, etc
Metrics
60A Goal-Based Software Measurement Framework
- GQM example 2 ATT goals, questions, metrics
61A Goal-Based Software Measurement Framework
- Templates for goal definition
- Purpose to (characterize, evaluate, predict,
motivate, etc.) the (process, product, model,
metric, etc.) in order to (understand, assess,
manage, engineer, learn, improve, etc.) it. - Example To evaluate the maintenance process in
order to improve it. - Perspective Examine the (cost, effectiveness,
correctness, defects, changes, product measures,
etc.) from the viewpoint of the (developer,
manager, customer, user, etc.) - Example Examine the cost from the viewpoint of
the manager - Environment The environment consists mainly of
the following process factors, people factors,
problem factors, methods, tools, constraints,
etc. - Example the maintenance staff are poorly
motivated programmers who have limited access to
tools.