Title: Test Case Filtering and Prioritization Based on Coverage of Combinations of Program Elements
1Test Case Filtering and Prioritization Based on
Coverage of Combinations of Program Elements
- Wes Masri and Marwa El-Ghali
- American Univ. of BeirutECE Department
- Beirut, Lebanon
- wm13_at_aub.edu.lb
2Test Case Filtering
- Test case filtering is concerned with selecting
from a test suite T a subset T that is capable
of revealing most of the defects revealed by T - Approach T to cover all elements covered by T
3Test Case Filtering What to Cover?
- Existing techniques cover singular program
elements of varying granularity - methods, statements, branches, def-use pairs,
slice pairs and information flow pairs - Previous studies have shown that increasing the
granularity leads to revealing more defects at
the expense of larger subsets
4Test Case Filtering
- This work explores covering suspicious
combinations of simple program elements - The number of possible combinations is
exponential w.r.t. the number of singular
elements ? use an approximation algorithm - We use a genetic algorithm
5Test Case Filtering Conjectures
- Combinations of program elements are more likely
to characterize complex failures - The percentage of failing tests is typically much
smaller than that of the passing tests - Each defect causes a small number of tests to
fail - Given groups of (structurally) similar tests,
smaller ones are more likely to be
failure-inducing than larger ones
6Test Case Filtering Steps
- Given a test suite T, generate execution profiles
of simple program elements (statements, branches,
and def-use pairs) - Choose a threshold Mfail for the maximum number
of tests that could fail due to a single defect - Use the genetic algorithm to generate C, a set
of combinations of simple program elements that
were covered by less than Mfail tests ?
suspicious combinations - Use a greedy algorithm to extract T, the
smallest subset of T that covers all the
combinations in C
7Genetic Algorithm
- A genetic algorithm solves a problem by
- Operating on an initial population of candidate
solutions or chromosomes - Evaluating their quality using a fitness function
- Uses transformation to create new generations
with improved quality - Ultimately evolving to a single solution
8Fitness Function
- We use the following equation
- fitness(combination) 1 - tests
- where tests is the percentage of test cases
that exercised the combination - The smaller the percentage the higher the
fitness - The aim is to end up with a manageable set of
combinations in which each combination occurred
in at most Mfail tests
9Initial Population Generation
- Generated from union of all execution profiles
- Size 50 in our implementation
- 0?0 always, 1?1 with small probability P
10Transformation Operator
- Combines two parent chromosomes to produce a
child - Passes down properties from each, favoring the
parent with the higher fitness. - Goal child to have a better fitness than its
parents - Replace the parent with the worse fitness with
the child
11Solution Set
- The obtained solution set contains all the
encountered combinations with high-enough fitness
values ? suspicious combinations
12Experimental Work
- Our subject programs included
- The JTidy HTML syntax checker and pretty printer
1000 tests 8 defects 47 failures - The NanoXML XML parser 140 tests 4 defects 20
failures
13Experimental Work
- We profiled the following program elements
- basic-blocks or statements (BB)
- basic-block edges or branches (BBE)
- def-use pairs (DUP)
- Next we applied the genetic algorithm to generate
the following - a pool of BBcomb
- a pool of BBEcomb
- a pool of DUPcomb
- a pool of ALLcomb (combinations of BBs, BBEs and
DUPs) - The values of Mfail we chose for JTidy, and
NanoXML were 100, and 20, respectively
14Profile Type Tests Selected Defects Revealed
BB 5.3 55.0
BBcomb 9.6 65.6
BBE 6.5 78.7
BBEcomb 10.2 87.5
DUP 11.7 81.2
DUPcomb 14.1 87.5
ALL 12.4 94.8
ALLcomb 14.1 100.0
SliceP 26.7 100.0
- JTidy results
- In the case of ALLcomb, 14.1 of the original
test suite was needed to exercise all of the
combinations exercised by the original test
suite, and these tests revealed all the defects
revealed by the original test suite - In previous work we showed that coverage of slice
pairs (SliceP) performed better than coverage of
BB, BBE and DUP this is why we are including the
results of SliceP here for comparison.
15- Above Figure compares the various techniques to
random sampling - All variations performed better than random
sampling - BBcomb revealed 10.6 more defects than BB but
selected 4.2 more tests - BBEcomb revealed 8.8 more defects than BBE but
selected 3.7 more tests - DUPcomb revealed 6.3 more defects than DUP but
selected 2.4 more tests - ALLcomb performed better than SliceP, since it
revealed all defects, as SliceP did, but selected
12.6 less tests
16Experimental Work
- Concerning BBcomb , BBEcomb , DUPcomb, the
additional cost due to the selection of more
tests might not be well justified, since the rate
of improvement is no better than it is for random
sampling - Concerning ALLcomb, not only did it perform
better than SliceP, but it is considerably less
costly - It took 90 seconds on average per test to
generate its profiles (i.e., BBs, BBEs and
DUPs), whereas it took 1200 seconds per test to
generate the SliceP profiles (1 day vs. 2 weeks)
17- NanoXML observations
- BB, BBE, DUP, and ALL did not perform any better
than random sampling, whereas BBcomb, BBEcomb,
DUPcomb, and ALLcomb performed noticeably better - BBcomb, BBEcomb, DUPcomb, and ALLcomb revealed
all the defects, but at relatively high cost,
since over 50 tests were needed to be executed - The cost of running the genetic algorithm and the
greedy selection algorithm has to be factored in
when comparing our techniques to others
18Test Case Prioritization
- Test case prioritization aims at scheduling the
tests in T so that the defects are revealed as
early as possible - Summary of our technique
- Prioritize combinations in terms of their
suspiciousness - Then assign the priority of a given combination
to the tests that cover it
19Test Case Prioritization Steps
- Identify combinations that were exercised by 1
test assign that test priority 1, and add it to
T - Identify combinations that were exercised by 2
tests assign those tests priority 2, and add
them to T - and so on until all tests are prioritized, or
Mfail is exceeded, or all combinations were
explored - Use the greedy algorithm to reduce T
- Any remaining tests that were not prioritized
will be scheduled to run randomly following the
prioritized tests
20Element
Element tests defects
BBcomb 6.75 56.25
BBEcomb 7.55 81.25
DUPcomb 12.6 87.5
ALLcomb 13.05 100.0
JTidy prioritization results when step 3
is satisfied, i.e., when all tests are
prioritized, or Mfail is exceeded, or all
combinations were explored Observation Using
BBcomb, BBEcomb, and DUPcomb not all defects were
revealed. Combinations of BB, BBE, and DUP
(ALLcomb) are needed to reveal all defects.
21Element
Element tests defects
BBcomb 50.2 100.0
BBEcomb 50.8 100.0
DUPcomb 52.8 100.0
ALLcomb 53.5 100.0
NanoXML prioritization results Observati
on All defects were revealed using BBcomb,
BBEcomb, DUPcomb , or ALLcomb, but at a high cost
of selected tests.
22Conclusion
- Our techniques performed better than similar
coverage-based techniques that consider program
elements of the same type and that do not take
into account their combinations - Will conduct a more thorough empirical study
- Will use APFD (Average Percentage of Faults
Detected) approach to evaluate prioritization