Title: Survey on Automatic Test Data Generation
1Survey on Automatic Test Data Generation
2002. 4. 16 Han, Seung-hee
2Contents
- Introduction
- Automation of testing
- Problems of test data generation
- Automatic test data generation techniques
- Conclusion
- Future challenges
3Introduction
- Software testing accounts for 50 of the total
cost of software development - This cost could be reduced if the process of
testing is automated - Testing techniques
- Functional testing
- Structural testing
4Testing Process
Specification or Software
Test Case Design
Selection Criteria
Test Cases
Software Under Testing
Test Oracles
Test Verdicts
5Automation of Testing
test data generation tasks
mechanical tasks running monitoring for
coverage analysis
administrative tasks recording test outcomes
test report generation
6Definition of Test Data Generation
- Process of identifying program input which
satisfy the selected testing criterion - Given a program P, a path u, generate input x ?S
so that x traverses u. - P S ? R
- S the set of all possible inputs
- the set of all vectors x (d1,d2,,dn)
such that di ? Dxi - Dxi the domain of input variable xi
- R the set of all possible outputs
- 1) Find the path predicate for the path
- 2) Solve the path predicate in terms of input
variables
7Process of Test Data Generation
Select a path for testing criterion
1
x y2
If (xgt10)
2
T
F
3
x x10
Extract constraints (a path condition) of the
path
4
y y1
If (xy lt110)
5
T
F
Solve the constraints ? test data
printf(OK)
6
x? y?
7
8Complexity of Test Data Generation
- The problem of determining whether a solution
exists to a system of inequalities is undecidable - Path feasibility problem is undecidable
- Most work has been performed on toy programs,
i.e. small and less complex program
9Problems of Test Data Generation
- Arrays, Pointers
- Ambiguity
- Complex-heap
- Objects
- Dynamically allocated
- Inheritance, Polymorphism
- Loops
- Not having a constant number of iterations
input(i,j) aj 2 ai 0 aj aj1
If (aj3)
if i j, aj 1 if i ? j, aj 3
x 20 y 50 c x
if x ? y, c 20 if x y, c 50
10Problems of Test Data Generation (cont.)
- Modules
- Source code is not accessible
- Infeasible paths
- Solving an arbitrary system of equations is
undecidable - Oracle
- The only way of achieving an oracle is to supply
extra information - requirement/design spec, assertion
11Architecture of Test Data Generator
Program Analyzer
Control Flowgraph
Control Flowgraph Data Dependence Graph
Path Selector
Test Paths
Path Info concerning infeasible paths
Test Data Generator
Test Data
12Automatic Test Data Generation Techniques
Structural testing (code-based)
Functional testing (spec-based)
13Random Test Data Generation
test data generator
program execution
01110011
a stream of bits
- It used as a benchmark, since it is considered
to be of the lowest acceptance rate
14Path-Oriented Test Data Generation
s
int triType(int a, int b, int c) 1 int type
PLAIN 1 if (a gt b) 2 swap(a,b) 3
if (a gt c) 4 swap(a,c) 5 if (b gt c) 6
swap(b,c) 7 if (ab) 8 if (bc)
9 type EQUILATERAL . else 10
type ISOSCELES . 11 else if (bc)
12 type ISOSCELES 13 return type
a gt b
1
Path Condition P (a gt b) ? (a ? c ) ? (b gt c)
? (a b) ? (b ? c)
2
a ? b
data dependency
a gt c
3
4
a ? c
5
b gt c
b ? c
6
Valid Path Condition P (a gt b) ? (b ? c ) ? (a
gt c) ? (b c) ? (c ? a)
7
a b
a ? b
solve using ? CLP, IRM, MILP
8
b c
11
b ? c
b c
9
10
12
b ? c
Test Data (a, b, c) (5, 4, 4)
13
e
15Goal-Oriented Test Data Generation
int triType(int a, int b, int c) 1 int type
PLAIN 1 if (a gt b) 2 swap(a,b) 3
if (a gt c) 4 swap(a,c) 5 if (b gt c) 6
swap(b,c) 7 if (ab) 8 if (bc)
9 type EQUILATERAL . else 10
type ISOSCELES . 11 else if (bc)
12 type ISOSCELES 13 return type
a gt b
Unspecific path lt3,10,13gt lt3gt and lt10,13gt
1
2
a ? b
a gt c
3
4
4
a ? c
Set of paths lt3,5,7,8,10,13gt lt3,4,5,7,8,10,13gt lt3,
5,6,7,8,10,13gt lt3,4,5,6,7,8,10,13gt
5
5
b gt c
b ? c
6
6
7
7
a b
Choose one path
a ? b
8
8
b c
11
b ? c
b c
9
10
12
b ? c
Test Data
13
e
16Goal-Oriented Test Data Generation
- Find-any-path concept
- Related work (Bogdan Korel)
- Chaining approach (IEEE TSE, 1995)
- Data dependence analysis
- Assertion-oriented approach (IEEE TSE, 1996)
- Assertions are inserted
- Oracle is given in the code
- Tool
- TESTGEN system
17Comparison of Approaches
- Random approach
- Low probability in finding semantically small
faults - Path-oriented approach
- Better prediction of coverage
- Harder to find test data
- Often selection of infeasible paths
- Goal-oriented approach
- Hard to predict the coverage
- More flexible to find test data
- Reduces the selection relatively infeasible paths
void foo(int a , int b) if (a b) then
write(1) else write(2)
18Static Test Data Generation
- Symbolic execution is representative
- Proposed by James C. King in 1976
- Executing a program symbolically means that
instead of using actual values variable
substitution is used - Related work
- Test-Case Generation with IOGen (Timothy E.
Lindquist, IEEE Soft., 1988) - Automatic Unit Test Data Generation using
Mixed-Integer Linear Programming and Execution
Trees (S. Lapierre, ICSM, 1999) - ATGen Automatic Test Data Generation using
Constraint Logic Programming and Symbolic
Execution (Christophe Meudec, STVR, 2001)
19Merits Demerits of Static Approach
- Merits
- No consistency checks of branch predicates since
all can be solved at once - Demerits
- Difficulty with dynamic data structures, arrays,
procedures, and loop conditions - Overheads of repeated algebraic manipulation and
simplification of variable and path expressions - Not applicable to real-time software systems
20Dynamic Test Data Generation
- Execution-based approach
- Automated software test data generation,
(Bogdan Korel, IEEE TSE, 1990) - Function minimization search algorithm
- Dynamic data flow analysis
21Related Work of Dynamic Approach
- ADTESTA Test Data Generation Suite for Ada
Software Systems (Matthew J. Gallagher, IEEE
TSE, 1997) - Program instrumentation
- Numerical optimization problem
- Generating Software Test Data Generation by
Evolution (Christoph Michael Gary McGraw,
(1997), IEEE TSE 2001) - Automated Software Test Data Generation for
Complex Program(Christoph Michael Gary McGraw,
IEEE ASEC, 1998) - Genetic algorithm
- GADGET System
22Merits Demerits of Dynamic Approach
- Merits
- Dynamic data structure (pointer, array) can be
handled - Function call can be handled
- Demerits
- Expensive require many iterations before a
suitable input is found - Monitoring by instrumentation
- Actual execution suffers from the speed of
execution of program - Inefficient in handling infeasible paths
23Hybrid of Two Forms
- Automated Test Data Generation using an
Iterative Relaxation Method (Gupta et al., ACM
SIGSOFT, 1998) - Techniques
- Relaxation technique
- Predicate slice and input dependency set
- Solved problems
- General approach for nonlinear expressions
- Dynamic data structure
- Effective handling of infeasible paths
24Hybrid of Two Forms (cont.)
- The Dynamic Domain Reduction Procedure for Test
Data Generation, (A. Jefferson Offutt et al.,
SPE, 1999) - Techniques
- Constraint-based testing domain reduction
procedure - Symbolic evaluation
- dynamic test data generation approach
- Solved problems
- Array, loop
- Only applicable numerical software
- Unit level testing (not inter-procedural)
25Conclusion
- Most of research work has performed on toy
programs - Applied programs are very short, less complex,
and using a subset of language features - Test data generator handle only integer, Boolean,
real number, array, pointer, not general data
object - Some work supports inter-procedural level
testing, oracle, and effective detection of the
infeasible paths - No papers concerning objects
- Combining static and dynamic method is useful
- Static cost reduction
- Dynamic dynamic data structure
26Future Challenges
- Constraint-satisfaction techniques
- Pointers and shapes
- Object-oriented programs
- Path selection
- Oracle problem
27Symbolic Execution
input variable y ? Y
1
x y2
If (xgt10)
2
T
F
3
x x10
4
y y1
If (xy lt110)
5
T
F
printf(OK)
6
Y gt 5 ? Y gt 33 ? Y gt 33
7
28Dynamic Test Data Generation
input variable y
y ? 3
y ? 4
1
x y2
y ? 10
F(y) 10 x2(y) if x2(y) lt 10, 0
otherwise
If (xgt10)
F(y) 4
2
F(y) 2
T
F
3
x x10
F(y) 0
4
y y1
If (xy lt110)
5
T
F
printf(OK)
6
7
29Iterative Relaxation Method
initial input I0 (x0,y0)
1
read(x,y)
axbyc 0
predicate function F xy -10
If (xygt10)
2
ax0by0c r
T
F
x x10
3
linear function f(x,y) axbyc
a?x b?y -r ? ?x, ?y
y y1
4
5
a(x0?x) b(y0?y) c 0
Test Data (x0?x, y0?y)
30Dynamic Domain Reduction
Find test data ?
1. Assume initial input domain X lt -10 .. 10
gt Y lt -10 .. 10 gt Z lt -10 .. 10 gt
2
3
4
2. Split at 0 y lt -10 .. 0 gt z lt 1 .. 10 gt
3. Split at -5 x lt -5 .. 0 gt y lt -10 ..
-5 gt
4. Split at 2 x lt -5 .. 2 gt z lt 3 .. 10 gt
A test case X 0, y -10, z 8