Software Testing Seminar - PowerPoint PPT Presentation

About This Presentation
Title:

Software Testing Seminar

Description:

Mooly Sagiv http://www.math.tau.ac.il/~sagiv/courses/testing.html Tel Aviv University 640-6706 Sunday 16-18 Monday 10-12 Schrieber 317 Bibliography Michael Young ... – PowerPoint PPT presentation

Number of Views:2164
Avg rating:3.0/5.0
Slides: 39
Provided by: Michal100
Category:

less

Transcript and Presenter's Notes

Title: Software Testing Seminar


1
Software Testing Seminar
  • Mooly Sagiv
  • http//www.math.tau.ac.il/sagiv/courses/testing.h
    tml
  • Tel Aviv University
  • 640-6706
  • Sunday 16-18
  • Monday 10-12 Schrieber 317

2
Bibliography
  • Michael Young, University of Oregon UW/MSR 1999
  • Glenford Myers The Art of Software Testing 1978!

3
Outline
  • Testing Questions
  • Goals of Testing
  • The Psychology of Testing
  • Program Inspections and Reviews
  • Test Case Design
  • Achieving Reliability
  • Other Techniques
  • A Success Story

4
Standard Testing Questions
  • Did this test execution succeed or fail?
  • Oracles
  • How shall we select test cases?
  • Selection generation
  • How do we know when weve tested enough?
  • Adequacy
  • What do we know when were done?
  • Assessment

5
Possible Goals of Testing
  • Find faults
  • Glenford Myers, The Art of Software Testing
  • Provide confidence
  • of reliability
  • of (probable) correctness
  • of detection (therefore absence) of particular
    faults

6
Testing Theory (such as it is)
  • Plenty of negative results
  • Nothing guarantees correctness
  • Statistical confidence is prohibitively expensive
  • Being systematic may not improve fault detection
  • as compared to simple random testing
  • So what did you expect, decision procedures for
    undecidable problems?

7
What Information Can We Exploit?
  • Specifications (formal or informal)
  • in Oracles
  • for Selection, Generation, Adequacy
  • Designs
  • Code
  • for Selection, Generation, Adequacy
  • Usage (historical or models)
  • Organization experience

8
The Psychology of Testing
  • Testing is the process of demonstrating that an
    errors are not present
  • Testing is the process of establishing
    confidence that the program does what it intends
    to do
  • Testing is the process of executing programs
    with the intent of finding errors
  • Successful (positive) test exposes an error

9
Black Box Testing
  • View the program as a black box
  • Exhaustive testing is infeasible even for tiny
    programs
  • Can never guarantee correctness
  • Fundamental question is economics
  • Maximize investment
  • Example Partition test

10
White Box Testing
  • Investigate the internal structure of the program
  • Exhaustive paths testing is infeasible
  • Does not even guarantee correctness
  • Specification is needed
  • Missing paths
  • Data dependent paths
  • Becomes an economical question

if (a-b lt epsilon)
11
Testing Principles
  • Test case must include the definition of expected
    results
  • A program should not test her/his own code
  • A programming organization should not test its
    own programs
  • Thoroughly inspect the results of each test
  • Test cases must be also written for invalid
    inputs
  • Check that programs do not do unexpected things
  • Test cases should not be thrown a way
  • Do not plan testing assuming that there are no
    errors
  • The probably of error in a piece of code is
    proportional to the errors found so far in this
    part of the code

12
Program Inspections and Reviews
  • Conducted in groups/sessions
  • A well established process
  • Modern programming languages eliminate many
    programming errors
  • Type errors
  • Memory violations
  • Uninitialized variables
  • But many errors are not currently found
  • Division by zero
  • Overflow
  • Wrong precedence of logical operators

13
Test Case Design
  • What subset of inputs has the highest probability
    of detecting the most errors?
  • Combine black and white box testing

14
Higher Order Testing
  • Show that the program does not so what the
    end-user expects
  • Different testing levels
  • Unit test (procedure boundaries)
  • Module test
  • Function test (discrepancies external spec.)
  • System test (discrepancies with original
    objectives)
  • Facility test
  • Volume test
  • Usability test
  • Security test
  • Performance test
  • ...

15
Black box
White box
  • Coverage/Adequacy
  • Data flow analysis
  • Cleanness
  • Correctness
  • Mutation
  • Slicing
  • Equivalence partitioning
  • Boundary-value analysis
  • Cause-effect graphing
  • Error guessing
  • State space exploration

16
Partition Testing (Equivalence Partitioning)
  • Basic idea Divide program input space into
    (quasi-) equivalence classes
  • Underlying idea of specification-based,
    structural, and fault-based testing

17
Specification-Based Partition Testing
  • Divide the program input space according to
    identifiable cases in the specification
  • May emphasize boundary cases
  • May include combinations of features or values
  • If all combinations are considered, the space is
    usually too large
  • Systematically cover the categories
  • May be driven by scripting tools or input
    generators
  • Example Category-Partition testing Ostrand

18
Adequate testing
  • Ideally adequate testing ensures some property
    (proof by cases)
  • Origins in Goodenough Gerhart, Weyuker and
    Ostrand
  • In reality as impractical as other program
    proofs
  • Practical adequacy criteria are really
    inadequacy criteria
  • If no case from class XX has been chosen, surely
    more testing is needed ...

19
Structural Coverage Testing
  • (In)adequacy criteria
  • If significant parts of program structure are not
    tested, testing is surely inadequate
  • Control flow coverage criteria
  • Statement (node, basic block) coverage
  • Branch (edge) and condition coverage
  • Data flow (syntactic dependency) coverage
  • Various control-flow criteria
  • Attempted compromise between the impossible and
    the inadequate

20
Basic structural criteria (ex.)
a
Edge ac is required by all-edges but not by
all-nodes coverage
b
Typical loop coverage criterion would require
zero iterations (cdf), one iteration (cdedf), and
multiple iterations (cdededed...df)
c
d
e
f
21
Data flow coverage criteria (ex.)
x 7
Rationale An untested def-use association
could hide an erroneous computation
y x
2 reaching definitions (one is from self)
y y1
z xy
2 reaching definitions for x, and 2 reaching
definitions for y
22
The Infeasibility Problem
  • Syntactically indicated behaviors (paths, data
    flows, etc.) are often impossible
  • Infeasible control flow, data flow, and data
    states
  • Adequacy criteria are typically impossible to
    satisfy
  • Unsatisfactory approaches
  • Manual justification for omitting each impossible
    test case (esp. for more demanding criteria)
  • Adequacy scores based on coverage
  • example 95 statement coverage, 80 def-use
    coverage

23
Challenges in Structural Coverage
  • Interprocedural and gross-level coverage
  • e.g., interprocedural data flow, call-graph
    coverage
  • Regression testing
  • Late binding (OO programming languages)
  • coverage of actual and apparent polymorphism
  • Fundamental challenge Infeasible behaviors
  • underlies problems in inter-procedural and
    polymorphic coverage, as well as obstacles to
    adoption of more sophisticated coverage criteria
    and dependence analysis

24
Structural Coverage in Practice
  • Statement and sometimes edge or condition
    coverage is used in practice
  • Simple lower bounds on adequate testing may even
    be harmful if inappropriately used for test
    selection
  • Additional control flow heuristics sometimes used
  • Loops (never, once, many), combinations of
    conditions

25
Testing for Reliability
  • Reliability is statistical, and requires a
    statistically valid sampling scheme
  • Programs are complex human artifacts with few
    useful statistical properties
  • In some cases the environment (usage) of the
    program has useful statistical properties
  • Usage profiles can be obtained for relatively
    stable, pre-existing systems (telephones), or
    systems with thoroughly modeled environments
    (avionics)

26
Arbitrary ? Random
  • A common error in naïve attempts to obtain
    statistical confidence measures
  • Arbitrary distributions may be modeled by
    adversary functions, not by uniform distributions
  • Example
  • If failures were distributed randomly through the
    execution space of a database program, it would
    fail at a uniform rate over time.
  • In reality, it may never fail until a critical
    table overflows, and then always fail thereafter.

27
Certifying Ultra-High reliability
  • Problem How can I show that system X has an
    expected failure rate of 10-9/hour?
  • example probability that software will ever
    bring down an Airbus A320
  • Butler Finelli estimate
  • for 10-9 per 10 hour mission
  • requires 1010 hours testing with 1 computer
  • or 106 hours (114 years) testing with 10,000
    computers
  • ACM Sigsoft 91, Conf. on SW for Critical Systems

28
Glimmers of Hope for Measuring High Reliability
  • Random distribution of faults or failures would
    enable statistical reasoning and classic
    redundancy techniques
  • A whole more reliable than its parts
  • Randomization approaches
  • Blum Self-checking programs
  • Lipton Redundant computations
  • Podgurski Kolmogorov complexity
  • Grail or illusion?
  • Difficult to generalize beyond simple functions

29
Process-Based Reliability Testing
  • Rather than relying only on properties of the
    program, we may use historical characteristics of
    the development process
  • Reliability growth models (Musa, Littlewood, et
    al) project reliability based on experience with
    the current system and previous similar systems

30
Fault-based testing
  • Given a fault model
  • hypothesized set of deviations from correct
    program
  • typically, simple syntactic mutations relies on
    coupling of simple faults with complex faults
  • Coverage criterion Test set should be adequate
    to reveal (all, or x) faults generated by the
    model
  • similar to hardware test coverage

31
Fault Models
  • Fault models are key to semiconductor testing
  • Test vectors graded by coverage of accepted model
    of faults (e.g., stuck-at faults)
  • What are fault models for software?
  • What would a fault model look like?
  • How general would it be?
  • Across application domains?
  • Across organizations?
  • Across time?
  • Defect tracking is a start

32
The Budget Coverage Criterion
  • A common answer to when is testing done
  • When the money is used up
  • When the deadline is reached
  • This is sometimes a rational approach!
  • Implication 1 Test selection is more important
    than stopping criteria per se.
  • Implication 2 Practical comparison of approaches
    must consider the cost of test case selection

33
Test Selection Standard Advice
  • Specification coverage is good for selection as
    well as adequacy
  • applicable to informal as well as formal specs
  • Fault-based tests
  • usually ad hoc, sometimes from check-lists
  • Program coverage last
  • to suggest uncovered cases, not just to achieve a
    coverage criterion

34
The Importance of Oracles
  • Much testing research has concentrated on
    adequacy, and ignored oracles
  • Much testing practice has relied on the eyeball
    oracle
  • Expensive, especially for regression testing
  • makes large numbers of tests infeasible
  • Not dependable
  • Automated oracles are essential to cost-effective
    testing

35
Sources of Oracles
  • Specifications
  • sufficiently formal (e.g., Z spec)
  • but possibly incomplete (e.g., assertions in
    Anna, ADL, APP, Nana)
  • Design models
  • treated as specifications, as in protocol
    conformance testing
  • Prior runs (capture/replay)
  • especially important for regression testing and
    GUIs hard problem is parameterization

36
What can be automated?
  • Oracles
  • assertions replay from some specifications
  • Selection (Generation)
  • scripting specification-driven replay
    variations
  • selective regression test
  • Coverage
  • statement, branch, dependence
  • Management

37
Design for Test 3 Principles
Adapted from circuit and chip design
  • Observability
  • Providing the right interfaces to observe the
    behavior of an individual unit or subsystem
  • Controllability
  • Providing interfaces to force behaviors of
    interest
  • Partitioning
  • Separating control and observation of one
    component from details of others

38
Problems Opportunities
  • Compositionality
  • for components for regression
  • Specifications
  • low entry barrier, incremental payoff
  • Synergy with Analysis
  • conformance test w/ verified models
  • backstop for unsafe assumptions
  • (your idea here)

39
A recent success story
  • The Prefix program analysis tool
  • Analyzes C/C sources
  • Scans for cleanness bugs, e.g., dereferences to
    NULL pointers
  • Symbolically executes the program on some paths
  • May miss some errors and generate false alarms
  • Tried on Windows 2000
  • Located 65,000 potential bugs28,000 out of which
    are real bugs
Write a Comment
User Comments (0)
About PowerShow.com