Mutation testing

About This Presentation

Title:

Mutation testing

Description:

... are only a small subset of eth whole space of all possible slices considered ... In Measuring cohesion using eth overlap of slices, we need to perform operations ... – PowerPoint PPT presentation

Number of Views:38

Avg rating:3.0/5.0

Slides: 47

Provided by: disc8

Category:

more less

Transcript and Presenter's Notes

Title: Mutation testing

1
Mutation testing
I am grateful to Kostas Adamopoulos for his
permission to use slides he prepared about
mutation testing as part of this weeks slides
2
Tutorial 6 - solutions

1. Binkley and Harman found that the typical size
of a slice was about a third of the program from
which it is constructed. However, how an they be
sure that they slices are typical? Provide a
critique of their choice of slicing criteria for
their study.
Sample Answer
In a certain application only a certain kind of
slice will be constructed. For example, if a
program is to be decomposed to help with
comprehension, then slices need only be
constructed at the end of a procedure for
important variables. The slices in this
situation are only a small subset of eth whole
space of all possible slices considered by
Binkley and Harman. Thus, in different
applications, the size of a typical slice may
be very different.
Harman and Binkley include all criteria. This
will include those which tend towards degenerate
criteria, such as backward slicing right at eth
start of a procedure, which may produce a very
small slice. This might artificially reduce the
average size of a slice for the sample of all
possible criteria, compared to realistic
choices of sets of criteria.

3
Tutorial 6 - solutions

2. Why is an amorphous slice always as small and
possibly smaller than a syntax-preserving slice
constructed for the same criterion?
Sample Answer
The amorphous slice does not require that
transformation is used. All syntax-preserving
slices are also amorphous slices (though the
reverse is not true). Therefore, a valid (though
silly) amorphous slicing algorithm would be to
simply produce the syntax-preserving slice.
Therefore an amorphous slice need be no larger
than eth corresponding syntax-preserving slice.
However, as we have seen, the ability to use
transformation can reduce the size of the slice,
so amorphous slices may be smaller than their
syntax-preserving counter-parts.

4
Tutorial 6 - solutions

3. Give two situations in which syntax-preserving
slicing would be better than amorphous slicing
and two in which amorphous slicing would be
better than syntax-preserving slicing
sample Answer
In debugging, we want to find a bug in a program
so it would not be helpful if the slice
transformed the program syntax-preserving
slicing would be better.
In Measuring cohesion using eth overlap of
slices, we need to perform operations like union
and intersection on slices and this requires
syntax-preservation to be meaningful
syntax-preserving slicing would be better.
In testing we are concerned with generating test
cases. If we are trying to cover some
predicate-controlled branch, then we may as well
have an amorphous slice, which will be smaller
(especially if it is faster). The user does not
need to see the slice, so it does not matter if
the syntax is preserved amorphous slicing will
be better
In comprehension of an unfamiliar program,
slicing can be used to split the program into
smaller parts the smaller the batter. Since
amorphous slices will be smaller than
syntax-preserving slices, amorphous slicing will
be better.

5
Tutorial 6 - solutions

4. Consider the program below
D 2r
FaceArea Pirr
C PiD
SurfaceArea 2FaceArea Ch
slice SurfaceArea
What is the smallest amorphous slice on the final
value of the variable slice?
It is
slice 2 (r2 hr) Pi

6
Tutorial 6 - solutions

The syntax-preserving slice for the same program
and criterion is the whole program

7
Faults and Failures

Fault is the group of incorrect statements in
the program that causes a failure
Failure is an external, incorrect behaviour of a
program (i.e. an incorrect output or a runtime
failure)

Faults generate Failures
8
MT is based on two basic Assumptions

The Competent Programmer HypothesisIn general
programmers are competent. That is, the programs
they write are nearly correct. The program
differs from a correct version in only a few
small ways.
The Coupling Effect HypothesisLarge program
faults, particularly those of a semantic nature
are coupled with smaller syntactic faults that
can be detected with mutation testing(Hypothesis
ed 1978, supported empirically 1992, demonstrated
theoretically 1995, but is it true?).

9
What is Mutation Testing

White-box, error-based testing technique
Build-in adequacy criteria
The quality of a test set is measured according
to its effectiveness or ability to detect faults
Tool support
Goal is generating good sets of tests rather than
finding faults

10
The Idea Underpinning Mutation Testing

Seeding the implementation with a fault (mutating
the original program) by applying a mutation
operator
Then determine whether testing identifies this
fault
Different result
the fault introduced has been identified
If the test case distinguishes between the mutant
and the original program
it is said to kill the mutant
Same result
the mutant and the original program produce
identical results
the mutant is still alive

11
How it works (again)
If P and P give different results, thenmutant
P is killed by test case T, so Tcan detect the
difference between the correctand buggy program

If P and P give the same results then either
Test case T not good enough to detect thefault
so we need to come with a better test
Or P and P are equivalent programs Pis an
equivalent mutant, no tests can bedevised that
can distinguish them

Program P
Test both P and P using the same test case T
Apply mutation operator
Mutant P
so this is mainly a way tojudge the
effectiveness of our test data, to improve them,
and by doing that to detect faults (syntactic
semantic ones)
12
Test Data Effectiveness

Mutation testing provides a way tojudge the
effectiveness of the test data
The test set should kill all the mutants
If not, then we can improve the test set
Test generation may be based on mutation testing
Tests are generated to kill the mutants
In the same way mutants should be of high
performance
Difficult to be eliminated

13
Mutation Score (MS)

Mutation Score (Adequacy Score) of Program P and
test set T is

of Killed Mutants
MS (P,T)
of Non-Equivalent Mutants
Where, Non-Equivalent Mutants Total Mutants
Equivalent Mutants 0 lt MS lt 1 or 0 lt
MS lt 100 Test data is mutation-adequate if its
mutation score is 100 (in this case they kill
all non-equivalent mutants)
14
Mutants

Original Program P
Mutant P of P
A program similar to P
P differs from P by a single mutation
Each kind of mutation corresponds to a typical
error programmers usually make
Off-by-one, spelling, typos, etc.
i.e. imagine that P was identical to P except
that exactly one was changed to a

15
Mutation Operators(Also called mutant operator,
mutagenic operator, mutagen, mutation
transformation, mutation rule)

It is a rule that is applied to a program to
create mutants
Replace each operand by every other syntactically
legal operand
Modify expressions by replacing operators and
inserting new operators
Delete entire statements, etc.
Categorised into mutation classes
Statement, Operator, Variable, Constant, etc.

16
Examples of Mutation Operators
Some Mothra Mutation Operators for Fortran (from
a total of 22 operators)
Proteum has 71 Mutation Operators for C
categorised as follows Statement 15, Operator
46, Variable 7, Constant 3
17
Creating a Mutant
Apply a Mutation OperatorChange to
.. .. .. c a b
.. .. ..
.. .. .. c a b
.. .. ..
Original Program P
Mutant P of Original Program P
18
Testing a Mutant
Original Program P
Mutant P
.. c a b ..
.. c a b ..
Apply Test Case Tto both P and P
Result R
Result R
If R ltgt R then mutant P is killed by Test Case T
Loop If we continue improving test case T and
still getting R R then P is possibly an
Equivalent Mutant
If R R then Improve Test Case T and test again
19
An Example of a Mutant

Program P. x y z.

A mutant P of P. x y z.

Test case1 (y 3, z 1) kills the mutant

X4
X3

Test case2 (y 2, z 2), the mutant is still
live

X4
X4
20
Equivalent Mutant Problem

An equivalent mutant is syntactically different
from the original program, but has the same
behavior.
The general problem of deciding whether a mutant
is equivalent to the original program is
theoretically undecidable.
This is a hugely important obstacle which will
need to be overcome to facilitate mutation
testing in practice.

21
Example of an Equivalent Mutant

Original Program P.If (y2 z2) x y
z.

Mutant P of P .If (y2 z2) x y
z.

X 4
X 4
P is an equivalent mutant of P because no
possible test canever kill this mutant. If the
condition is true the mutated statementreturns
x4 for any possible test case, same as the
original statement.
22
Equivalent Mutant Problem

Determining whether a mutant is equivalent is not
decidable
How do we know whether a mutant which remains
unkilled is simply hard to kill (stubborn) or
equivalent?
Could we avoid generating equivalent mutants?

23
Large Number of Mutants
. . . . . .. M . . . . . . . . . .
. . . . . M. . . . . . . . . . . .
P2
P1
. . . . . .. . . . . . . . . . . .
. . . . . .. . . . . . . . . . . M
P
P3
. . . . . .M . . . . . . . . . . .
M . . . . .. . . . . . . . . . . .

Pn
P4
24
Large Number of Mutants

Even for simple programs n can be a very large
number
N depends on the size of P and on how many
mutation operators we apply on P
Reducing the number of mutants is the second
problem that needs to be addressed

25
Existing Methodologies to Reduce Large Number of
Mutants

Selective Mutation
Reduces the number of mutation operators applied

Mutant Sampling
Randomly selects a subset of mutants

The number of mutants under test is reduced
26
Advances in Mutation Testing

Reduction technique Selective Mutation
Approximation technique Weak Mutation
Algorithmic execution technique Schema-based
Mutation
Heuristics for detecting equivalent mutants
Algorithms for automatic test data generation
Interface Mutation, Class Mutation
Distribution of computational expense
Avoiding human intensiveness

27
Selective Mutation(a do fewer approach)

Applying mutation with only the most critical
mutation operators being used key operators
provide almost the same coverage as non-selective
mutation
Select only mutants that are truly distinct from
other mutants
decreases the number of mutants produced
reduces computational cost significantly
Getting as much testing strength as possible with
as few mutants as possible

28
Mutation Sampling(a do fewer approach)

Sampling only a randomly selected subset of the
mutants to run
Using samples of some a priori fixed size
Using samples without a priori fixed size
select mutants until sufficient evidence has been
collected to determine that a statistically
appropriate sample size has been reached

29
Weak Mutation(a do smarter approach)

An approximation technique that compares the
internal states of the mutant and the original
program immediately after execution of the
mutated portion of the program
Reduces the computational cost, but do we really
get what we want?

30
Weak Mutation Example
x x y x z Print (x)
x x y / inspect x / x z Print (x)
31
Using Distributed Computational Resources (a do
smarter approach)

Using novel computer architectures to distribute
the computational expense over several machines

32
Using Intelligent Algorithms(a do smarter
approach)

Intelligently storing state information, this
technique factors the expense of running a mutant
over several related mutant executions and
thereby lowers the total computational cost

33
Schema-based Mutation(a do faster approach)

Not mutating an intermediate form
The Mutant Schema Generation (MSG) method encodes
all mutations into one source-level program, a
metamutant
This program is compiled (once), with the same
compiler used during development and is executed
in the same operational environment at
compiled-program speeds

34
Example
x x y
Switch (N) Case 1 x x y Case 2 x x /
y Case 3 x x y Case 4 x z y Case 5

35
Mutation Testing Tools

Mothra (for Fortran 77 )
Downloadable http//www.isse.gmu.edu/ofut/rsrch/
mut.html
For UNIX systems
22 mutation operators
Interpretive approach
Proteum PROgram TEsting Using Mutants (for C)
Proteum/IM, Proteum/IM 2.0, Proteum/FSM,
Proteum/ST, Proteum/PN
Downloadable?
For UNIX systems
71 operators
Separate compilation approach
Jester JUnit test tester (for Java)
Downloadable http//jester.sourceforge.net/
Insure (for C)
Commercial product

36
Mothra

Mothra is a suite of tools for performing
mutation testing for Fortran 77
Interpretive execution
Mutgen for generating mutants
A testing harness for running a test on a set of
mutant programs and recording the results
Godzilla for automatically generating test cases

37
Using Mothra

Select and generate a set of mutants
Generate an initial set of test cases and the
corresponding outputs that they generate
Confirm the outputs are correct
Repeat until all mutants are killed
Run the mutants on the test sets
Equivalent mutants
Generate and confirm new tests
When you are done, you have an adequate suite of
tests

38
Proteum Family Tools

Proteum is a suite of tools for performing
mutation testing for C programs
Unit testing (Proteum), Integration testing
(Proteum/IM), both (Proteum/IM 2.0), Finite State
Machines (Proteum/FSM), Statecharts
Specifications (Proteum/ST), Petri Nets
specification (Proteum/PN)
Test case handling (execution, inclusion/exclusion
, etc), Mutant handling (creation, selection,
execution, analysis), Adequate Analysis (mutation
score and reports)
Allows separate compilation each mutant is
individually created, compiled, linked, and run
This approach can be significantly faster (15-20
times) than an interpretive system, if mutant run
times greatly exceed individual compilation/link
times else compilation bottleneck may result

39
Fundamental premise of Mutation Testing

In practice, if the software contains a fault,
there will usually be a set of mutants that can
only be killed by a test case that also detects
the fault.

40
Future Testing Systems

Programmer submits a program unit
System replies with a set of input/output pairs
that are guaranteed to form an effective test of
the unit by being close to mutation adequate

41
Current Research at Kings

Kostas Adamopoulos is a PhD student working on
Mutation Testing
We are looking at search as a way of attacking
the twin problems of number of mutants and
equivalent mutants.
This part of the lecture is not examinable. Feel
free to leave now (quietly) if you are not
interested.
Make sure you come back for the tutorial though
Ok, for the two of you left, come to the front

42
Co-evolution?

GA for Mutants
Fitness is measured according to the ability to
avoid being killed
If this ability is too high then penalize the
fitness of this mutant because it probably an
equivalent one

GA for Test Cases
Fitness is measured according to the ability of
killing mutants

Two competitive populations.Can this lead to
Co-evolution?
43
Co-evolution

Fitness of each individual of one population is
re-evaluated with respect to the other population
Achieves selective mutation
Mutation operators not selected a priori
Individual mutants selected
Tailored to the specific program under test,
based upon their fitness
Guarantees non equivalent mutants
Stubborn mutants might also be eliminated (?)
The robustness of the algorithm probably will
rediscover eliminated stubborn mutants (?)

44
Work in Progress and Future Work

Mutation tool GAs for Co-evolution
Comparison of real results and simulation
Comparative analysis with other methodologies
(selective mutation, mutant sampling)

45
High-order Mutants

This methodology could be used to check the
validity of the Coupling Effect Hypothesis
Large program faults, particularly those of a
semantic nature are coupled with smaller
syntactic faults that can be detected with
mutation testing
Will an effective test set for simple,
first-order mutants be in the same level of
effectiveness for more complex, high-order
mutants?

46
Mutation testing Tutorial

1. What is an equivalent mutant and why are
equivalent mutants a problem?
2. For the program fragment x xy give five
examples of mutants which are equivalent and five
which are not equivalent
3. Give examples of test cases which kill you
five non-equivalent mutants
4. Now make up some simple program fragments and
try to think of some more stubborn mutants of
these fragments. That is, mutants which are hard
to kill but which are not equivalent.