Title: Mutation testing
1Mutation testing
I am grateful to Kostas Adamopoulos for his
permission to use slides he prepared about
mutation testing as part of this weeks slides
2Tutorial 6 - solutions
- 1. Binkley and Harman found that the typical size
of a slice was about a third of the program from
which it is constructed. However, how an they be
sure that they slices are typical? Provide a
critique of their choice of slicing criteria for
their study. - Sample Answer
- In a certain application only a certain kind of
slice will be constructed. For example, if a
program is to be decomposed to help with
comprehension, then slices need only be
constructed at the end of a procedure for
important variables. The slices in this
situation are only a small subset of eth whole
space of all possible slices considered by
Binkley and Harman. Thus, in different
applications, the size of a typical slice may
be very different. - Harman and Binkley include all criteria. This
will include those which tend towards degenerate
criteria, such as backward slicing right at eth
start of a procedure, which may produce a very
small slice. This might artificially reduce the
average size of a slice for the sample of all
possible criteria, compared to realistic
choices of sets of criteria.
3Tutorial 6 - solutions
- 2. Why is an amorphous slice always as small and
possibly smaller than a syntax-preserving slice
constructed for the same criterion? - Sample Answer
- The amorphous slice does not require that
transformation is used. All syntax-preserving
slices are also amorphous slices (though the
reverse is not true). Therefore, a valid (though
silly) amorphous slicing algorithm would be to
simply produce the syntax-preserving slice.
Therefore an amorphous slice need be no larger
than eth corresponding syntax-preserving slice.
However, as we have seen, the ability to use
transformation can reduce the size of the slice,
so amorphous slices may be smaller than their
syntax-preserving counter-parts.
4Tutorial 6 - solutions
- 3. Give two situations in which syntax-preserving
slicing would be better than amorphous slicing
and two in which amorphous slicing would be
better than syntax-preserving slicing - sample Answer
- In debugging, we want to find a bug in a program
so it would not be helpful if the slice
transformed the program syntax-preserving
slicing would be better. - In Measuring cohesion using eth overlap of
slices, we need to perform operations like union
and intersection on slices and this requires
syntax-preservation to be meaningful
syntax-preserving slicing would be better. - In testing we are concerned with generating test
cases. If we are trying to cover some
predicate-controlled branch, then we may as well
have an amorphous slice, which will be smaller
(especially if it is faster). The user does not
need to see the slice, so it does not matter if
the syntax is preserved amorphous slicing will
be better - In comprehension of an unfamiliar program,
slicing can be used to split the program into
smaller parts the smaller the batter. Since
amorphous slices will be smaller than
syntax-preserving slices, amorphous slicing will
be better.
5Tutorial 6 - solutions
- 4. Consider the program below
- D 2r
- FaceArea Pirr
- C PiD
- SurfaceArea 2FaceArea Ch
- slice SurfaceArea
- What is the smallest amorphous slice on the final
value of the variable slice? - It is
- slice 2 (r2 hr) Pi
6Tutorial 6 - solutions
- The syntax-preserving slice for the same program
and criterion is the whole program
7Faults and Failures
- Fault is the group of incorrect statements in
the program that causes a failure - Failure is an external, incorrect behaviour of a
program (i.e. an incorrect output or a runtime
failure)
Faults generate Failures
8MT is based on two basic Assumptions
- The Competent Programmer HypothesisIn general
programmers are competent. That is, the programs
they write are nearly correct. The program
differs from a correct version in only a few
small ways. - The Coupling Effect HypothesisLarge program
faults, particularly those of a semantic nature
are coupled with smaller syntactic faults that
can be detected with mutation testing(Hypothesis
ed 1978, supported empirically 1992, demonstrated
theoretically 1995, but is it true?).
9What is Mutation Testing
- White-box, error-based testing technique
- Build-in adequacy criteria
- The quality of a test set is measured according
to its effectiveness or ability to detect faults - Tool support
- Goal is generating good sets of tests rather than
finding faults
10The Idea Underpinning Mutation Testing
- Seeding the implementation with a fault (mutating
the original program) by applying a mutation
operator - Then determine whether testing identifies this
fault - Different result
- the fault introduced has been identified
- If the test case distinguishes between the mutant
and the original program - it is said to kill the mutant
- Same result
- the mutant and the original program produce
identical results - the mutant is still alive
11How it works (again)
If P and P give different results, thenmutant
P is killed by test case T, so Tcan detect the
difference between the correctand buggy program
- If P and P give the same results then either
- Test case T not good enough to detect thefault
so we need to come with a better test - Or P and P are equivalent programs Pis an
equivalent mutant, no tests can bedevised that
can distinguish them
Program P
Test both P and P using the same test case T
Apply mutation operator
Mutant P
so this is mainly a way tojudge the
effectiveness of our test data, to improve them,
and by doing that to detect faults (syntactic
semantic ones)
12Test Data Effectiveness
- Mutation testing provides a way tojudge the
effectiveness of the test data - The test set should kill all the mutants
- If not, then we can improve the test set
- Test generation may be based on mutation testing
- Tests are generated to kill the mutants
- In the same way mutants should be of high
performance - Difficult to be eliminated
13Mutation Score (MS)
- Mutation Score (Adequacy Score) of Program P and
test set T is
of Killed Mutants
MS (P,T)
of Non-Equivalent Mutants
Where, Non-Equivalent Mutants Total Mutants
Equivalent Mutants 0 lt MS lt 1 or 0 lt
MS lt 100 Test data is mutation-adequate if its
mutation score is 100 (in this case they kill
all non-equivalent mutants)
14Mutants
- Original Program P
- Mutant P of P
- A program similar to P
- P differs from P by a single mutation
- Each kind of mutation corresponds to a typical
error programmers usually make - Off-by-one, spelling, typos, etc.
- i.e. imagine that P was identical to P except
that exactly one was changed to a
15Mutation Operators(Also called mutant operator,
mutagenic operator, mutagen, mutation
transformation, mutation rule)
- It is a rule that is applied to a program to
create mutants - Replace each operand by every other syntactically
legal operand - Modify expressions by replacing operators and
inserting new operators - Delete entire statements, etc.
- Categorised into mutation classes
- Statement, Operator, Variable, Constant, etc.
16Examples of Mutation Operators
Some Mothra Mutation Operators for Fortran (from
a total of 22 operators)
Proteum has 71 Mutation Operators for C
categorised as follows Statement 15, Operator
46, Variable 7, Constant 3
17Creating a Mutant
Apply a Mutation OperatorChange to
.. .. .. c a b
.. .. ..
.. .. .. c a b
.. .. ..
Original Program P
Mutant P of Original Program P
18Testing a Mutant
Original Program P
Mutant P
.. c a b ..
.. c a b ..
Apply Test Case Tto both P and P
Result R
Result R
If R ltgt R then mutant P is killed by Test Case T
Loop If we continue improving test case T and
still getting R R then P is possibly an
Equivalent Mutant
If R R then Improve Test Case T and test again
19An Example of a Mutant
- Test case1 (y 3, z 1) kills the mutant
X4
X3
- Test case2 (y 2, z 2), the mutant is still
live
X4
X4
20Equivalent Mutant Problem
- An equivalent mutant is syntactically different
from the original program, but has the same
behavior. - The general problem of deciding whether a mutant
is equivalent to the original program is
theoretically undecidable. - This is a hugely important obstacle which will
need to be overcome to facilitate mutation
testing in practice.
21Example of an Equivalent Mutant
- Original Program P.If (y2 z2) x y
z.
- Mutant P of P .If (y2 z2) x y
z.
X 4
X 4
P is an equivalent mutant of P because no
possible test canever kill this mutant. If the
condition is true the mutated statementreturns
x4 for any possible test case, same as the
original statement.
22Equivalent Mutant Problem
- Determining whether a mutant is equivalent is not
decidable - How do we know whether a mutant which remains
unkilled is simply hard to kill (stubborn) or
equivalent? - Could we avoid generating equivalent mutants?
23Large Number of Mutants
. . . . . .. M . . . . . . . . . .
. . . . . M. . . . . . . . . . . .
P2
P1
. . . . . .. . . . . . . . . . . .
. . . . . .. . . . . . . . . . . M
P
P3
. . . . . .M . . . . . . . . . . .
M . . . . .. . . . . . . . . . . .
Pn
P4
24Large Number of Mutants
- Even for simple programs n can be a very large
number - N depends on the size of P and on how many
mutation operators we apply on P - Reducing the number of mutants is the second
problem that needs to be addressed
25Existing Methodologies to Reduce Large Number of
Mutants
- Selective Mutation
- Reduces the number of mutation operators applied
- Mutant Sampling
- Randomly selects a subset of mutants
The number of mutants under test is reduced
26Advances in Mutation Testing
- Reduction technique Selective Mutation
- Approximation technique Weak Mutation
- Algorithmic execution technique Schema-based
Mutation - Heuristics for detecting equivalent mutants
- Algorithms for automatic test data generation
- Interface Mutation, Class Mutation
- Distribution of computational expense
- Avoiding human intensiveness
27Selective Mutation(a do fewer approach)
- Applying mutation with only the most critical
mutation operators being used key operators - provide almost the same coverage as non-selective
mutation - Select only mutants that are truly distinct from
other mutants - decreases the number of mutants produced
- reduces computational cost significantly
- Getting as much testing strength as possible with
as few mutants as possible
28Mutation Sampling(a do fewer approach)
- Sampling only a randomly selected subset of the
mutants to run - Using samples of some a priori fixed size
- Using samples without a priori fixed size
- select mutants until sufficient evidence has been
collected to determine that a statistically
appropriate sample size has been reached
29Weak Mutation(a do smarter approach)
- An approximation technique that compares the
internal states of the mutant and the original
program immediately after execution of the
mutated portion of the program - Reduces the computational cost, but do we really
get what we want?
30Weak Mutation Example
x x y x z Print (x)
x x y / inspect x / x z Print (x)
31Using Distributed Computational Resources (a do
smarter approach)
- Using novel computer architectures to distribute
the computational expense over several machines
32Using Intelligent Algorithms(a do smarter
approach)
- Intelligently storing state information, this
technique factors the expense of running a mutant
over several related mutant executions and
thereby lowers the total computational cost
33Schema-based Mutation(a do faster approach)
- Not mutating an intermediate form
- The Mutant Schema Generation (MSG) method encodes
all mutations into one source-level program, a
metamutant - This program is compiled (once), with the same
compiler used during development and is executed
in the same operational environment at
compiled-program speeds
34Example
x x y
Switch (N) Case 1 x x y Case 2 x x /
y Case 3 x x y Case 4 x z y Case 5
35Mutation Testing Tools
- Mothra (for Fortran 77 )
- Downloadable http//www.isse.gmu.edu/ofut/rsrch/
mut.html - For UNIX systems
- 22 mutation operators
- Interpretive approach
- Proteum PROgram TEsting Using Mutants (for C)
- Proteum/IM, Proteum/IM 2.0, Proteum/FSM,
Proteum/ST, Proteum/PN - Downloadable?
- For UNIX systems
- 71 operators
- Separate compilation approach
- Jester JUnit test tester (for Java)
- Downloadable http//jester.sourceforge.net/
- Insure (for C)
- Commercial product
36Mothra
- Mothra is a suite of tools for performing
mutation testing for Fortran 77 - Interpretive execution
- Mutgen for generating mutants
- A testing harness for running a test on a set of
mutant programs and recording the results - Godzilla for automatically generating test cases
37Using Mothra
- Select and generate a set of mutants
- Generate an initial set of test cases and the
corresponding outputs that they generate - Confirm the outputs are correct
- Repeat until all mutants are killed
- Run the mutants on the test sets
- Equivalent mutants
- Generate and confirm new tests
- When you are done, you have an adequate suite of
tests
38Proteum Family Tools
- Proteum is a suite of tools for performing
mutation testing for C programs - Unit testing (Proteum), Integration testing
(Proteum/IM), both (Proteum/IM 2.0), Finite State
Machines (Proteum/FSM), Statecharts
Specifications (Proteum/ST), Petri Nets
specification (Proteum/PN) - Test case handling (execution, inclusion/exclusion
, etc), Mutant handling (creation, selection,
execution, analysis), Adequate Analysis (mutation
score and reports) - Allows separate compilation each mutant is
individually created, compiled, linked, and run - This approach can be significantly faster (15-20
times) than an interpretive system, if mutant run
times greatly exceed individual compilation/link
times else compilation bottleneck may result
39Fundamental premise of Mutation Testing
- In practice, if the software contains a fault,
there will usually be a set of mutants that can
only be killed by a test case that also detects
the fault.
40Future Testing Systems
- Programmer submits a program unit
- System replies with a set of input/output pairs
that are guaranteed to form an effective test of
the unit by being close to mutation adequate
41Current Research at Kings
- Kostas Adamopoulos is a PhD student working on
Mutation Testing - We are looking at search as a way of attacking
the twin problems of number of mutants and
equivalent mutants. - This part of the lecture is not examinable. Feel
free to leave now (quietly) if you are not
interested. - Make sure you come back for the tutorial though
- Ok, for the two of you left, come to the front
42Co-evolution?
- GA for Mutants
- Fitness is measured according to the ability to
avoid being killed - If this ability is too high then penalize the
fitness of this mutant because it probably an
equivalent one
- GA for Test Cases
- Fitness is measured according to the ability of
killing mutants
Two competitive populations.Can this lead to
Co-evolution?
43Co-evolution
- Fitness of each individual of one population is
re-evaluated with respect to the other population - Achieves selective mutation
- Mutation operators not selected a priori
- Individual mutants selected
- Tailored to the specific program under test,
based upon their fitness - Guarantees non equivalent mutants
- Stubborn mutants might also be eliminated (?)
- The robustness of the algorithm probably will
rediscover eliminated stubborn mutants (?)
44Work in Progress and Future Work
- Mutation tool GAs for Co-evolution
- Comparison of real results and simulation
- Comparative analysis with other methodologies
(selective mutation, mutant sampling)
45High-order Mutants
- This methodology could be used to check the
validity of the Coupling Effect Hypothesis - Large program faults, particularly those of a
semantic nature are coupled with smaller
syntactic faults that can be detected with
mutation testing - Will an effective test set for simple,
first-order mutants be in the same level of
effectiveness for more complex, high-order
mutants?
46Mutation testing Tutorial
- 1. What is an equivalent mutant and why are
equivalent mutants a problem? - 2. For the program fragment x xy give five
examples of mutants which are equivalent and five
which are not equivalent - 3. Give examples of test cases which kill you
five non-equivalent mutants - 4. Now make up some simple program fragments and
try to think of some more stubborn mutants of
these fragments. That is, mutants which are hard
to kill but which are not equivalent.