Title: Software Testing Seminar
1Software Testing Seminar
- Mooly Sagiv
- http//www.math.tau.ac.il/sagiv/courses/testing.h
tml - Tel Aviv University
- 640-6706
- Sunday 16-18
- Monday 10-12 Schrieber 317
2Bibliography
- Michael Young, University of Oregon UW/MSR 1999
- Glenford Myers The Art of Software Testing 1978!
3Outline
- Testing Questions
- Goals of Testing
- The Psychology of Testing
- Program Inspections and Reviews
- Test Case Design
- Achieving Reliability
- Other Techniques
- A Success Story
4Standard Testing Questions
- Did this test execution succeed or fail?
- Oracles
- How shall we select test cases?
- Selection generation
- How do we know when weve tested enough?
- Adequacy
- What do we know when were done?
- Assessment
5Possible Goals of Testing
- Find faults
- Glenford Myers, The Art of Software Testing
- Provide confidence
- of reliability
- of (probable) correctness
- of detection (therefore absence) of particular
faults
6Testing Theory (such as it is)
- Plenty of negative results
- Nothing guarantees correctness
- Statistical confidence is prohibitively expensive
- Being systematic may not improve fault detection
- as compared to simple random testing
- So what did you expect, decision procedures for
undecidable problems?
7What Information Can We Exploit?
- Specifications (formal or informal)
- in Oracles
- for Selection, Generation, Adequacy
- Designs
-
- Code
- for Selection, Generation, Adequacy
- Usage (historical or models)
- Organization experience
8The Psychology of Testing
- Testing is the process of demonstrating that an
errors are not present - Testing is the process of establishing
confidence that the program does what it intends
to do - Testing is the process of executing programs
with the intent of finding errors - Successful (positive) test exposes an error
9Black Box Testing
- View the program as a black box
- Exhaustive testing is infeasible even for tiny
programs - Can never guarantee correctness
- Fundamental question is economics
- Maximize investment
- Example Partition test
10White Box Testing
- Investigate the internal structure of the program
- Exhaustive paths testing is infeasible
- Does not even guarantee correctness
- Specification is needed
- Missing paths
- Data dependent paths
- Becomes an economical question
if (a-b lt epsilon)
11Testing Principles
- Test case must include the definition of expected
results - A program should not test her/his own code
- A programming organization should not test its
own programs - Thoroughly inspect the results of each test
- Test cases must be also written for invalid
inputs - Check that programs do not do unexpected things
- Test cases should not be thrown a way
- Do not plan testing assuming that there are no
errors - The probably of error in a piece of code is
proportional to the errors found so far in this
part of the code
12Program Inspections and Reviews
- Conducted in groups/sessions
- A well established process
- Modern programming languages eliminate many
programming errors - Type errors
- Memory violations
- Uninitialized variables
- But many errors are not currently found
- Division by zero
- Overflow
- Wrong precedence of logical operators
13Test Case Design
- What subset of inputs has the highest probability
of detecting the most errors? - Combine black and white box testing
14Higher Order Testing
- Show that the program does not so what the
end-user expects - Different testing levels
- Unit test (procedure boundaries)
- Module test
- Function test (discrepancies external spec.)
- System test (discrepancies with original
objectives) - Facility test
- Volume test
- Usability test
- Security test
- Performance test
- ...
15Black box
White box
- Coverage/Adequacy
- Data flow analysis
- Cleanness
- Correctness
- Mutation
- Slicing
- Equivalence partitioning
- Boundary-value analysis
- Cause-effect graphing
- Error guessing
- State space exploration
16Partition Testing (Equivalence Partitioning)
- Basic idea Divide program input space into
(quasi-) equivalence classes - Underlying idea of specification-based,
structural, and fault-based testing
17Specification-Based Partition Testing
- Divide the program input space according to
identifiable cases in the specification - May emphasize boundary cases
- May include combinations of features or values
- If all combinations are considered, the space is
usually too large - Systematically cover the categories
- May be driven by scripting tools or input
generators - Example Category-Partition testing Ostrand
18Adequate testing
- Ideally adequate testing ensures some property
(proof by cases) - Origins in Goodenough Gerhart, Weyuker and
Ostrand - In reality as impractical as other program
proofs - Practical adequacy criteria are really
inadequacy criteria - If no case from class XX has been chosen, surely
more testing is needed ...
19Structural Coverage Testing
- (In)adequacy criteria
- If significant parts of program structure are not
tested, testing is surely inadequate - Control flow coverage criteria
- Statement (node, basic block) coverage
- Branch (edge) and condition coverage
- Data flow (syntactic dependency) coverage
- Various control-flow criteria
- Attempted compromise between the impossible and
the inadequate
20Basic structural criteria (ex.)
a
Edge ac is required by all-edges but not by
all-nodes coverage
b
Typical loop coverage criterion would require
zero iterations (cdf), one iteration (cdedf), and
multiple iterations (cdededed...df)
c
d
e
f
21Data flow coverage criteria (ex.)
x 7
Rationale An untested def-use association
could hide an erroneous computation
y x
2 reaching definitions (one is from self)
y y1
z xy
2 reaching definitions for x, and 2 reaching
definitions for y
22The Infeasibility Problem
- Syntactically indicated behaviors (paths, data
flows, etc.) are often impossible - Infeasible control flow, data flow, and data
states - Adequacy criteria are typically impossible to
satisfy - Unsatisfactory approaches
- Manual justification for omitting each impossible
test case (esp. for more demanding criteria) - Adequacy scores based on coverage
- example 95 statement coverage, 80 def-use
coverage
23Challenges in Structural Coverage
- Interprocedural and gross-level coverage
- e.g., interprocedural data flow, call-graph
coverage - Regression testing
- Late binding (OO programming languages)
- coverage of actual and apparent polymorphism
- Fundamental challenge Infeasible behaviors
- underlies problems in inter-procedural and
polymorphic coverage, as well as obstacles to
adoption of more sophisticated coverage criteria
and dependence analysis
24Structural Coverage in Practice
- Statement and sometimes edge or condition
coverage is used in practice - Simple lower bounds on adequate testing may even
be harmful if inappropriately used for test
selection - Additional control flow heuristics sometimes used
- Loops (never, once, many), combinations of
conditions
25Testing for Reliability
- Reliability is statistical, and requires a
statistically valid sampling scheme - Programs are complex human artifacts with few
useful statistical properties - In some cases the environment (usage) of the
program has useful statistical properties - Usage profiles can be obtained for relatively
stable, pre-existing systems (telephones), or
systems with thoroughly modeled environments
(avionics)
26Arbitrary ? Random
- A common error in naïve attempts to obtain
statistical confidence measures - Arbitrary distributions may be modeled by
adversary functions, not by uniform distributions - Example
- If failures were distributed randomly through the
execution space of a database program, it would
fail at a uniform rate over time. - In reality, it may never fail until a critical
table overflows, and then always fail thereafter.
27Certifying Ultra-High reliability
- Problem How can I show that system X has an
expected failure rate of 10-9/hour? - example probability that software will ever
bring down an Airbus A320 - Butler Finelli estimate
- for 10-9 per 10 hour mission
- requires 1010 hours testing with 1 computer
- or 106 hours (114 years) testing with 10,000
computers - ACM Sigsoft 91, Conf. on SW for Critical Systems
28Glimmers of Hope for Measuring High Reliability
- Random distribution of faults or failures would
enable statistical reasoning and classic
redundancy techniques - A whole more reliable than its parts
- Randomization approaches
- Blum Self-checking programs
- Lipton Redundant computations
- Podgurski Kolmogorov complexity
- Grail or illusion?
- Difficult to generalize beyond simple functions
29Process-Based Reliability Testing
- Rather than relying only on properties of the
program, we may use historical characteristics of
the development process - Reliability growth models (Musa, Littlewood, et
al) project reliability based on experience with
the current system and previous similar systems
30Fault-based testing
- Given a fault model
- hypothesized set of deviations from correct
program - typically, simple syntactic mutations relies on
coupling of simple faults with complex faults - Coverage criterion Test set should be adequate
to reveal (all, or x) faults generated by the
model - similar to hardware test coverage
31Fault Models
- Fault models are key to semiconductor testing
- Test vectors graded by coverage of accepted model
of faults (e.g., stuck-at faults) - What are fault models for software?
- What would a fault model look like?
- How general would it be?
- Across application domains?
- Across organizations?
- Across time?
- Defect tracking is a start
32The Budget Coverage Criterion
- A common answer to when is testing done
- When the money is used up
- When the deadline is reached
- This is sometimes a rational approach!
- Implication 1 Test selection is more important
than stopping criteria per se. - Implication 2 Practical comparison of approaches
must consider the cost of test case selection
33Test Selection Standard Advice
- Specification coverage is good for selection as
well as adequacy - applicable to informal as well as formal specs
- Fault-based tests
- usually ad hoc, sometimes from check-lists
- Program coverage last
- to suggest uncovered cases, not just to achieve a
coverage criterion
34The Importance of Oracles
- Much testing research has concentrated on
adequacy, and ignored oracles - Much testing practice has relied on the eyeball
oracle - Expensive, especially for regression testing
- makes large numbers of tests infeasible
- Not dependable
- Automated oracles are essential to cost-effective
testing
35Sources of Oracles
- Specifications
- sufficiently formal (e.g., Z spec)
- but possibly incomplete (e.g., assertions in
Anna, ADL, APP, Nana) - Design models
- treated as specifications, as in protocol
conformance testing - Prior runs (capture/replay)
- especially important for regression testing and
GUIs hard problem is parameterization
36What can be automated?
- Oracles
- assertions replay from some specifications
- Selection (Generation)
- scripting specification-driven replay
variations - selective regression test
- Coverage
- statement, branch, dependence
- Management
37Design for Test 3 Principles
Adapted from circuit and chip design
- Observability
- Providing the right interfaces to observe the
behavior of an individual unit or subsystem - Controllability
- Providing interfaces to force behaviors of
interest - Partitioning
- Separating control and observation of one
component from details of others
38Problems Opportunities
- Compositionality
- for components for regression
- Specifications
- low entry barrier, incremental payoff
- Synergy with Analysis
- conformance test w/ verified models
- backstop for unsafe assumptions
- (your idea here)
39A recent success story
- The Prefix program analysis tool
- Analyzes C/C sources
- Scans for cleanness bugs, e.g., dereferences to
NULL pointers - Symbolically executes the program on some paths
- May miss some errors and generate false alarms
- Tried on Windows 2000
- Located 65,000 potential bugs28,000 out of which
are real bugs