Title: Data Mutation Testing
1Data Mutation Testing
- -- A Method for Automated Generation of
Structurally Complex Test Cases
Hong Zhu Dept. of Computing and Electronics,
Oxford Brookes Univ., Oxford, OX33 1HX,
UK Email hzhu_at_brookes.ac.uk
2Outline
- Motivation
- Overview of existing work on software test case
generation - The challenges to software testing
- The Data Mutation Testing Method
- Basic ideas
- Process
- Measurements
- A Case study
- Subject software under test
- The mutation operators
- Experiment process
- Main results
- Perspectives and future works
- Potential applications
- Integration with other black box testing methods
3Motivation
- Test case generation
- Need to meet multiple goals
- Reality to represent real operation of the
system - Coverage functions, program code, input/output
data space, and their combinations - Efficiency not to overkill, easy to execute,
etc. - Effective capable of detecting faults, which
implies easy to check the correctness of
programs output - Externally useful help with debugging,
reliability estimation, etc. - Huge impact on test effectiveness and efficiency
- One of the most labour intensive tasks in
practices
4Existing Work
- Program-based test case generation
- Static analysis of code without execution, e.g.
symbolic execution - Path oriented
- Howden, W. E. (1975, 1977, 1978) Ramamoorthy,
C., Ho, S. and Chen, W. (1976) King, J. (1975)
Clarke, L. (1976) Xie T., Marinov, D., and
Notkin, D. (2004) J. Zhang. (2004), Xu, Z. and
Zhang, J. (2006) - Goal oriented
- DeMillo, R. A., Guindi, D. S., McCracken, W. M.,
Offutt, A. J. and King, K. N. (1988) Pargas, R.
P., Harrold, M. J. and Peck, R. R. (1999) Gupta,
N., Mathur, A. P. and Soffa, M. L. (2000) - Dynamic through execution of the program
- Korel, B. (1990) , Beydeda, S. and Gruhn, V.
(2003) - Hybrid combination of dynamic execution with
symbolic execution, e.g. concolic techniques - Godefroid, P., Klarlund, N., and Sen, K.. (2005)
- Techniques
- Constraint solver, Heuristic search, e.g. genetic
algorithms - McMinn, P. and Holcombe, M. (2003) , Survey
McMinn, P. (2004)
5- Specification-based test case generation
- Derive from either formal or semi-formal
specifications of the required functions and/or
the designs - Formal specification-based
- First order logic, Z spec and Logic programs
Tai, K.-C. (1993) Stocks, P. A. and Carrington,
D. A. (1993) Ammann, P. and Offutt, J. (1994)
Denney, R. (1991) - Algebraic specification Bouge, L., Choquet, N.,
Fribourg, L. and Gaudel, M.-C. (1986) Doong, R.
K. and Frankl, P. G. (1994) Chen, H. Y., Tse,
T. H. and Chen, T. Y. (2001) Zhu (2007) - Finite state machines Fujiwara, S., et al..
(1991) Lee, D. and Yannakakis, M. (1996)
Hierons, R. M. (2001) Zhu, H., Jin, L.
Diaper, D. (1999) - Petri nets Morasca, S. and Pezze, M. (eds).
(1990) Zhu, H. and He, X. (2002) - Model-based derive from semi-formal graphic
models - SSADM models Zhu, H., Jin, L. and Diaper, D.
(1999, 2001) - UML models Offutt, J. and Abdurazik, A. (2000)
Tahat, L. H., et al. (2001) Hartman, A. and
Nagin, K. (2004) Li, S., Wang, J. and Qi, Z.-C.
(2004) - Techniques
- Constraint solving Theorem prover Model-checker
6- Random testing
- Through random sampling over input domain based
on probabilistic models of the operation of the
software under test. - Profile-based Sampling over an existing
operation profile at random - Stochastic model based Use a probabilistic model
of software usages - Markov chain
- Avritzer, A. Larson, B. (1993) Avritzer, A.
Weyuker, E. J. (1994) Whittaker, J. A. and
Poore, J. H. (1993) Guen, H. L., Marie, R. and
Thelin, T. (2004) Prowell, S. J. (2005) - Stochastic automata networks
- Farina, A. G., Fernandes, P. and Oliveira, F. M.
(2002, 2004) - Bayesian networks
- Fine, S. and Ziv, A. (2003)
- Adaptive random testing Even spread of randomly
test cases (Chen, T. Y., Leung, H. and Mak, I. K.
(2004) ) - Variants Mirror, Restricted, and Probabilistic
ART
7- Domain-specific techniques
- Database applications
- Zhang, J., Xu, C. and Cheung, S. C. (2001)
- Spreadsheets
- Fisher, M., Cao, M., Rothermel, G., Cook, C. and
Burnett, M. (2002) - Erwig, M., Abraham, R., Cooperstein, I., and
Kollmansberger S. (2005) - XML Scheme
- Lee, S. C. and Offutt, J. (2001) Li, J. B. and
Miller, J. (2005) - Compiler
- See Boujarwah, A. S. and Saleh, K. (1997) for a
survey.
8The Challenge
- How to generate adequate test cases of high
reality for programs that process structurally
complex inputs? - Structural complexity
- A large number of elements
- A large number of possible explicitly represented
relationships between the elements - A large number of constraints imposed on the
relationships - Meaning of the data depends on not only the
values of the elements, but also the
relationships and thus their processing - Reality
- Likely or close to be a correct real input in the
operation of the system - Likely or close to be an input that contains
errors that a user inputs to the system in
operation - Examples
- CAD, Word processor, Web browser, Spreadsheets,
Powerpoint, Software modelling tools, Language
processor, Theorem provers, Model-checkers,
Speech recognition, Hand writing recognition,
Search engine,
9Basic Ideas of Data Mutation Testing
- Preparing the seeds, i.e. a small set of test
cases - Contain various types of elements and
relationships between them - Highly close to the real input data
- Easy to check their correctness
- Generating mutant test cases by modifying the
seeds slightly - Preserve the validity of the input
- Change at one place a time unless imposed by the
constraints (but may use second order even higher
order mutants) - Make as many different mutants as possible
- Executing the software under test on both seeds
and their mutants - What to observe
- programs correctness on both seeds and mutants
- the differences of the programs behaviours on
seed and their mutants - Uses of metrics and measurements
- seeds are sufficient
- mutations are effective and/or sufficient
- Feedback to step 1 and 2 if necessary, or to
improve the observation.
10Illustrative Example
The lengths of the sides
- Triangle classification
- Input x, y, z Natural Numbers
- Output
- equilateral, isosceles, scalene, non-triangle
- Seeds
The type of triangles
11- Mutation operators
- IVP Increase the value of a parameter by 1
- DVP Decrease the value of a parameter by 1
- SPL Set the value of a parameter to a very large
number, say 1000000 - SPZ Set the value of a parameter to 0
- SPN Set the value of a parameter to a negative
number, say -2 - WXY Swap the values of parameters x and y
- WXZ Swap the values of parameters x and z
- WYZ Swap the values of parameters y and z
- RPL Rotate the values of parameters towards
left - RPR Rotate the values of parameters towards
right.
12- Generation of mutant test cases
- For example, by applying the mutation operator
IVP to test case t1 on parameter x, we can obtain
the following test case t5. - IVP(t1, x) t5 Input (x6, y5, z5).
- Total number of mutants
- (53 5)4 80
- Covering all sorts of combinations of data
elements - Systematically produced from the four seeds
13- Execution of program and classification of
mutants - A mutant is classified as dead, if the execution
of the software under test on the mutant is
different from the execution on the seed test
case. Otherwise, the mutant is classified as
alive. - For example
- For a correctly implemented Triangle
Classification program, the execution on the
mutant test case t5 will output isosceles while
the execution on its seed t1 will output
equilateral. - TrC(t5) ?TrC(t1) ? t5 is dead
It depends on how you observe the behaviour!
14- Analyse test effectiveness
- Reasons why a mutant can remain alive
- The mutant is equivalent to the original with
respect to the functionality or property of the
software under test. - RPL(t1)t1
- The observation on the behaviour and output of
the software under test is not sufficient to
detect the difference - RPL(t2) t6 Input (x5, y7, z5).
- The software is incorrectly designed and/or
implemented so that it is unable to differentiate
the mutants from the original.
Same output, but different execution paths for a
correct program.
15Measurements of Data Mutation
Number of equivalent mutants
- Equivalent mutant score EMS
- A high equivalent mutant score EMS indicates
that the mutation operators have not been well
designed to achieve variety in the test cases. - Live mutant score LMS
- A high LMS indicates that the observation on the
behaviour and output of the software under test
is insufficient. - Typed live mutant score LMSF ,
-
- where F is a type of mutation operators
- A high LMSF reveals that the program is not
sensitive to the type of mutation probably
because a fault in design or implementation.
Total number of mutants
Number of life mutants
16Process of Data Mutation Testing
17Analysis of Program Correctness
- Can data mutation testing be helpful to the
analysis of program correctness? - Consider the examples in Triangle Classification
- IVP or DVP to test case t1, we can expect the
output to be isosceles. - For the RPL, RPR, WXY, WYZ, and WYZ mutation
operators, we can expect that the program should
output the same classification on a seed and its
mutant test cases. - If the softwares behaviour on a mutant is not as
expected, an error in the software under test can
be detected.
18Case Study
- The subject
- CAMLE Caste-centric Agent-oriented Modelling
Language and Environment - Automated modelling tool for agent-oriented
methodology - Developed at NUDT of China
- Potential threats to the validity of the case
study - Subject is developed by the tester
- The developer is not professional software
developer - Validation of the case study against the
potential threats - The test method is black box testing. The
knowledge of the code and program structure
affect the outcomes. - The subject was developed before the case study
and no change at all was made during the course
to enable the case study to be carried out. - In software testing practice, systems are often
tested by the developers. - The developer is a capable master degree student
with sufficient training at least equivalent to
an average programmer. - The correctness of the programs output can be
judges objectively.
19Complexity of the Input Data
- Input models in CAMLE language
- Multiple views
- a caste diagram that describes the static
structure of a multi-agent system, - a set of collaboration diagrams that describe how
agents collaborate with each other, - a set of scenario diagrams that describe typical
scenarios namely situations in the operation of
the system, and - a set of behaviour diagrams that define the
behaviour rules of the agents in the context of
various scenarios. - Well-formedness constraints
- Each diagram has a number of different types of
nodes and arcs, etc. - Each diagram and the whole model must satisfy a
set of well-formedness conditions to be
considered as a valid input (e.g. the types of
nodes and arcs must match with each other)
20The Function to Be Tested
- Consistency checker
- Consistency constraints are formally defined in
first order logic - Potential threat to the validity
- The program is not representative.
- Validation of the case study
- The programs input is structurally complex
- The program is non-trivial
21Types of Data Mutation Operators
2213 Rename env node Rename an existing environment node in a sub-collaboration diagram
14 Delete node annotation Remove an annotation on an existing node
15 Replicate edge Replicate an existing non-interaction edge
16 Delete edge Delete an existing edge in a diagram
17 Change edge association Change the Start or End node of an existing edge
18 Change edge direction Reverse the direction of an existing edge
19 Change edge type Replace an existing edge in a diagram with a new edge of another type
20 Replicate interaction edge Replicate an existing interaction edge without Action List
21 Replicate interaction Replicate an existing interaction edge with Action List
22 Change edge annotation Change the Action List annotated to an existing interaction edge
23 Delete edge annotation Delete the Action List of an existing interaction edge
24 Change edge end to env Change the Start or End node of an existing edge to an env node
23The Seed Test Cases
- Models developed in previous case studies of
agent-oriented software development methodology - The evolutionary multi-agent Internet information
retrieval system Amalthaea (originally developed
at MIT media lab) - Online auction web service
- The agent-oriented model of the United Nations
Security Council on the organisational structure
and the work procedure to pass resolutions at
UNSC. - All seeds passed consistency check before the
case study started - No change was made to these seeds in this case
study
24The Seed Test Cases and Their Mutants
25The Results Fault Detecting Ability
No. of Detected Faults
No. of Inserted Faults
Fault Type
By mutants
By seeds
Indigenous
Inserted
2
12 (100)
5 (42)
12
Missing path
Domain
2
17 (100)
8 (47)
17
Path selection
0
21 (88)
14 (58)
24
Incorrect variable
Computation
0
31 (100)
13 (42)
31
Omission of statements
1
14 (93)
9 (60)
15
Incorrect expression
0
19 (100)
12 (63)
19
Transposition of statements
5
114 (97)
61 (52)
118
Total
26Detecting Design Errors
- In the case study, we found that a large number
of mutants remain alive
Table. The numbers of alive and dead mutants
- Review Three possible reasons
- improper design of data mutation operators,
- insufficient observation on the behaviour and
output - defects in the software under test.
27Statistics on Amalthaea test suite
Some typed mutation score is very low
Design of consistency checker has errors!
Especially, the consistency constraints are weak.
28Results Detecting Design Errors
- Hypothesis
- Design of the tool is weak in detecting certain
types of inconsistency or incompleteness - Validation of the hypothesis
- Strengthening the well-formedness constraints
- Strengthening the consistency constraints 3
constraints modified - Introducing new completeness constraints 13 new
constraints introduced - Test again using the same seeds and the same
mutation operators - A significant change in the statistic data is
observed.
29Test Adequacy
- Our experiments show that high test adequacy can
be achieved through data mutation. - Coverage of input data space
- Measured by the coverage of various kinds of
mutants - Coverage of program structure
- Measured by code coverage (equivalent to the
branches covered) - Coverage of the functions of the requirements
- Measured by the consistency constraints used in
checking - Two factors the determines the test adequacy
- the seeds
- the mutation operators
30Coverage of scenario diagram variants
31Coverage of Program Structure and Functions
The test data achieved 100 coverage of the
functions of the consistency checker and 100 of
the branches in the code.
32Test Cost
Table. Summary of the test cost spent in the case
study
Source of cost Amount in case study
Design and implementation of data mutation operators 1.5 man-month
Development of seed test cases 0 man-month
Analysis of program correctness on each test case 2 man-month (estimated)
The seeds were readily available from previous
case studies of the tool.
33Analysis Programs Correctness
- The experiment took the black-box approach
- The output on a test case consists of
- Whether the input (a model) is consistent and
complete - The error message(s) and/or warning message(s),
if any - The expected output on a mutant is specified
34Experiments
- The experiments
- Mutants are selected at random
- The programs correctness on each mutant is
checked manually - Time is measured for how long it needs to check
the correctness of the program on each test case - Two experiments were conducted
- Experiment 1
- 1 mutant selected at random from each set of the
mutants generated by one type of mutation
operator (24 mutants in total) - Detected 2 faults in the checker and 1 fault in
other parts of the tool - Experiment 2
- 22 live mutants from the Amalthaea suite selected
at random - Detected 2 faults in the other parts of the tool
35The Experiment Data
- Results
- Checking correctness on dead mutants 3
minute/per mutant - Checking correctness on live mutants 1
minute/per mutant
36Related Works
- Mutation testing
- Program or specification is modified
- Used as a criteria to measure test adequacy
- Data mutation testing adopted the idea of
mutation operators, but applied to test cases to
generate test case, rather than to measure
adequacy. - Meek and Siu (1989)
- Randomisation in error seeding into programs to
test compiler - Adaptive Random Testing (Chen, et al. 2003, 2004)
- Random test cases as far apart as possible
- Not yet applied to structurally complex input
space - Data perturbation testing (Offutt, 2001)
- Test XML message for web services
- As a application specific technique and
applicable to XML files - Metamorphic testing (Chen, Tse, et al. 2003)
- As a test oracle automation technique and focus
on the metamorphic relations rather than to
generate test cases - Could be integrated with data mutation method
37Future Work
- More case studies with potential applications
- Security control software Role-Base Access
Control - Input Role model, User assignments
- ltRoles, Resources, Permissions Role?Resources,
- Constraints ? Roles X Resources X Permissionsgt
- User assignments Users ? P(Roles)
- Virus detection
- Input files infected by virus
- Virus are programs in assembly/binary code format
- One virus may have many variants obtained by
equivalent transformation of the code. - Spreadsheet processing software and spreadsheets
applications - Input spreadsheets ltdata cells, program cellsgt
38Perspectives and Future Work
- Integration of data mutation testing, metamorphic
testing and algebraic testing methods
Let
be the program under test
Data mutation testing generates test cases using
a set of data mutation operators
Metamorphic testing used a set of metamorphic
relations to check output correctness
We can use ?i to define metamorphic relations as
follows
39Example
- Consider the Triangle Classification program P
- The following is a metamorphic relation
- P(t) equilateral ? P(IPV(t)) isosceles
- For each of the data mutation operators f WXY,
WXZ, WYZ, RPL, or RPR, the following is a
metamorphic relation - P(f(t))P(t)
We observed in case study that data mutation
operators are very helpful to find metamorphic
relations.
40Integration with Algebraic Testing
- In algebraic software testing, axioms are written
in the form of - T1T1 T2T2 TnTn gt TT,
- Where Ti, Ti are terms constructed from
variables and function/procedure/methods of the
program under test. - The integration of data mutation testing,
metamorphic testing and algebraic testing by
developing - A black box software testing specification
language - An automated tool to check metamorphic relations
- Using observation context to check if a relation
is true - To allow user defined data mutation operators to
be invoked - To allow metamorphic relations to be specified
41Screen Snapshot of Algebraic Testing Tool CASCAT
42References
- Lijun Shan and Hong Zhu, Generating Structurally
Complex Test Cases by Data Mutation A Case Study
of Testing an Automated Modelling Tool, Special
Issue on Automation of Software Test, the
Computer Journal, (In press). - Shan, L. and Zhu, H., Testing Software Modelling
Tools Using Data Mutation, Proc. of AST06, ACM
Press, 2006, pp43-49. - Zhu, H. and Shan, L., Caste-Centric Modelling of
Multi-Agent Systems The CAMLE Modelling Language
and Automated Tools, in Beydeda, S. and Gruhn, V.
(eds) Model-driven Software Development, Research
and Practice in Software Engineering, Vol. II,
Springer, 2005, pp57-89. - Liang Kong, Hong Zhu and Bin Zhou, Automated
Testing EJB Components Based on Algebraic
Specifications, Proc. of TEST07, IEEE CS Press,
2007.