Title: Software Testing
1Software Testing
There are only 2 hard problems in Computer
Science. Naming things, cache invalidation and
off-by-one errors. Phil Haack
Program testing can be used to show the presence
of bugs, but never to show their
absence! Edsger Dijkstra
2Outline
- Foundations Motivations Terminology
- Principles and Concepts
- Levels of Testing
- Test Process
- Techniques
- Measures
- Deciding when to stop
3Defects are Bad
- At a minimum defects in software annoy users.
- Glitchy software reflects poorly on the company
issuing the software. - If defects arent controlled during a software
project, they increase the cost and duration of
the project. - For safety critical systems the consequences can
be even more severe.
4Spectacular Failures
Ariane 5, 1996Rocket Cargo 500M
Therac-25
Patriot Missile, 1991 Failed to destroy an Iraqi
Scud missile which hit a barracks.
Software defects between 1985 and 1987 lead to 6
accidents. Three patients died as a direct
consequence.
5Controlling defects in software
- There are two ways of dealing with the potential
for defects in software - The most obvious is work to identify and remove
defects that make it into the software. - Another approach that often goes unnoticed is
stop making errors in the first place. In other
words, take action to prevent defects from ever
being injected in the first place. This second
approach is called Defect Prevention. - Testing is one method of uncovering defects in
software. (Inspection is another.) - Testing might not be the most efficient method of
uncovering defects but for many companies it is
their primary means of ensuring quality.
6What is testing?
- Testing is the dynamic execution of the software
for the purpose of uncovering defects. - Testing is one technique for improving product
quality. Dont confuse testing with other,
distinct techniques for improving product
quality - Inspections and reviews (sometimes called static
testing) - Debugging
- Defect prevention
- Quality assurance
- Quality control
7Testing and its relationshipto other related
activities
8Benefits of Testing
- Testing improves product quality (at least when
the defects that are revealed are fixed), - The rate and number of defects found during
testing gives an indication of overall product
quality. A high rate of defect detection suggests
that product quality is low. Finding few errors
after rigorous testing, increases confidence in
overall product quality. Such information can be
used to decide the release date. - Defect data from testing may suggest
opportunities for process improvement preventing
certain type of defects from being introduced
into future systems.
9Errors, Faults and Failures! Oh my!
- Error or Mistake human action or inaction that
produces an incorrect result - Fault or Defect the manifestation of an error
in code or documentation - Failure an incorrect result.
10Software Bugs
1947 log book entry for the Harvard Mark II
11Verification and Validation
- Verification and validation are two complementary
testing objectives. - Verification Comparing program outcomes against
a specification. Are we building the product
right? - Validation Comparing program outcomes against
user expectations. Are we building the right
product? - Verification and validation is accomplished using
both dynamic testing and static evaluation (peer
review) techniques.
12- What are the 3 benefits of testing? In other
words, why test? - Can you have an fault without a failure? Can you
have a fault without an error?
13Principles of Testing
- Program testing can be used to show the presence
of bugs, but never to show their absence!
Edsger Dijkstra He is speaking of course about
non-trivial programs - Mindset is important. The goal of testing is to
demonstrate that the system doesnt work
correctly not that the software meets its
specification. You are trying to break it. If you
approach testing with the attitude of trying to
show that the software works correctly, you might
unconsciously avoid difficult tests that threaten
your assumption. - Should programmers test their own code?
14OrganizationWho should do the testing?
- Developers shouldnt system test their own code.
- There is no problem with developers unit testing
their own codethey are probably the most
qualified to do sobut experience shows
programmers are too close to their code in order
to do a good job at system testing their own
code. - Independent testers are more effective.
- Levels of independence independent testers on a
team independent of the team independent of the
company.
15The cost of finding and fixing a defect increases
with the length of time the defect remains in the
product
16Cost to correct late-stage defects
- For large projects, a requirements or design
error is often 100 times more expensive to find
and fix after the software is released than
during the phase the error was injected.
17Correspondence between Development and different
opportunities for Verification and Validation
18Two dimensions to testing
19Levels of testing
- Unit testing individual cohesive units
(modules). Usually white-box testing done by the
programmer. - Integration verifying the interaction between
software components. Integration testing is done
on a regular basis during development (possibly
once a day/week/month depending on the
circumstances of the project). Architecture and
design defects typically show up during
integration. - System testing the behavior of the system as a
whole. Testing against the requirements (system
objectives and expected behavior). Also a good
environment for testing non-functional software
requirements such as usability, security,
performance, etc. - Acceptance used to determine if the system
meets its acceptance criteria and is ready for
release.
20Other types of testing
- Regression testing
- Alpha and Beta testing limited release of a
product to a few select customers for evaluation
before the general release. The primary purpose
of a beta test isnt to find defects, but rather,
assess how well the software works in the
real-world under a variety of conditions that are
hard to simulate in the lab. Customers
impressions are starting to be formed during beta
testing so the product should have release-like
quality. - Stress testing, load testing etc.
- Smoke test a very brief test to determine
whether or not there are obvious problems that
would make more extensive testing futile.
21Regression Testing
- Imagine adding a 24-inch lift kit and monster
truck tires to your sensible sedan
- After making the changes you would of course test
the new and modified components, but is that all
that should be tested? Not by a mile!
22Regression Testing Cont
- When making changes to a complex system there is
no reliable way of predicting which components
might be affected. Therefore, it is imperative
that at least a subset of tests be ran on all
components. - In this analogy, that means testing the heater,
air conditioner, radio, cup holders,
speedometerhum, thats interesting, there seems
to be a problem with the speedometer. It
significantly understates the speed of the car. - On closer inspection you discover the speedometer
has a dependency on wheel size. The
implementation for the speedometer makes an
assumption about the wheel size and how far the
car will move for each rotation of the tires.
Larger wheels mean the car is going a greater
distance for each revolution. - Who would have predicted that? Good thing we
performed regression testing.
23Regression Testing Cont
- Making sure new code doesnt break old code.
- Regression testing is selective retesting. You
want to ensure that changes or enhancements dont
impair existing functionality. - During regression testing you rerun a subset of
all test cases on old code to make sure new code
hasnt caused old code to regress or stop working
properly. - Its no uncommon for a change in one area of code
to cause a problem in another area. Designs based
on loose coupling can mitigate this tendency but
regression testing is still needed in order to
increase the assurance there were no unintended
consequences of a program change.
24Testing Objectives
- Conformance testing (aka correctness or
functional testing) does the observed behavior
of the software conform to its specification
(SRS)? - Non-functional requirements testing have
non-functional requirements such as usability,
performance and reliability been met? - Regression testing does an addition or change
break existing functionality? - Stress testing how well does the software hold
up under heavy load and extreme circumstances? - Installation testing can the system be
installed and configured with reasonable effort? - Alpha/Beta testing how well does the software
work under the myriad of real-world conditions? - Acceptance testing how well does the software
work in the users environment?
25Integration Strategies
- What doesnt work?
- All-at-once or Big Bang waiting until all of
the components are ready before attempting to
build the system for the first time. Not
recommended. - What does work?
- Top-Down high-level components are integrated
and tested before low level components are
complete. Example high-level components
life-cycle methods of component framework, screen
flow of web application. - Bottom-Up low-level components are integrated
and tested before top-level components. Example
low-level components abstract interface onto
database, component to display animated image.
OR
26Advantages of Incremental/ Continuous Integration
- Easier to find problems. If there is a problem
during integration testing it is most likely
related to the last component integratedknowing
this usually reduces the amount of code that has
to be examined in order to find the source of the
problem. - Testing can begin sooner. Big bang testing
postpones testing until the whole system is
ready.
27Top-Down Integration
- Stubs and mock objects are substituted for as yet
unavailable lower-level components. - Stubs A stub is a unit of code that simulates
the activity of a missing component. A stub has
the same interface as the low-level component it
emulates, but is missing some or all of its full
implementation. Subs return minimal values to
allow the functioning of top-level components. - Mock Objects mock objects are stubs that
simulate the behavior of real objects. The term
mock object typically implies a bit more
functionality than a stub. A stub may return
pre-arranged responses. A mock object has more
intelligence. It might simulate the behavior of
the real object or make assertions of its own.
28Bottom-Up Integration
- Scaffolding code or drivers are used in place of
high-level code. - One advantage of bottom-up integration is that it
can begin before the system architecture is in
place. - One disadvantage of bottom-up integration is it
postpones testing of system architecture. This is
risky because architecture is a critical aspect
of a software system that needs to be verified
early.
29Continuous Integration
- Top-down and bottom-up is how you are going to
integrate. - Continuous integration is when or how often you
are going to integrate. - Continuous integration frequent integration
where frequent daily, maybe hourly, but not
longer than weekly. - You cant find integration problems early unless
you integrate frequently.
30Test Process
- Test planning
- Test case generation
- Test environment preparation
- Execution
- Test results evaluation
- Problem reporting
- Defect tracking
31Testing artifacts/products
- Test plan who is doing what when.
- Test case specification specification of actual
test cases including preconditions, inputs and
expected results. - Test procedure specification how to run test
cases. - Test log results of testing
- Test incident report record and track errors.
32Test Plan
- A document describing the scope, approach,
resources, and schedule of intended test
activities. It identifies test items, the
features to be tested, the testing tasks, who
will do each task, and any risks requiring
contingency planning. IEEE std
33Test Case
- A test case consists of a set of input values,
execution preconditions, expected results and
execution post-conditions, developed to cover
certain test condition
34Oracle
- When you run a test there has to be some way of
determining if the test failed. - For every test there needs to be an oracle that
compares expected output to actual output in
order to determine if the test failed. - For tests that are executed manually, the tester
is the oracle. For automated unit tests, actual
and expected results are compared with code.
35Test Procedure
- Detailed instructions for the setup, execution,
and evaluation of results for a given test case.
36Incident Reporting
- What you track depends on what you need to
understand, control and estimate. - Example incident report
37Testing Strategies
- Two very broad testing strategies are
- White-Box (Transparent) Test cases are derived
from knowledge of the design and/or
implementation. - Black-Box (Opaque) Test cases are derived from
external software specifications.
38Testing Strategies
39Black-Box Techniques
- Equivalence Partitioning Tests are divided into
groups according to the criteria that two test
cases are in the same group if both test cases
are likely to find the same error. Classes can be
formed based on inputs or outputs. - Boundary value analysis create test cases with
values that are on the edge of equivalence
partitions
40Equivalence Partitioning
- What test cases would you use to test the
following routine? - // This routine returns true if score is gt
- // 50 of possiblePoints, else it returns
false. - // This routine throws an exception if either
- // input is negative or score is gt
possiblePoints. - boolean isPassing(int score, int possiblePoints)
- ID Input Expected Result
- -- ------- ---------------
- 1 -1,-2 Exception
- 2 50,100 true
- . . .
41Equivalence Classes
- Score/Possible Pts gt 50
- Score/Possible Pts lt 50
- Score gt Possible Pts
- Score lt 0
- Possible Pts lt 0
42Test Cases
Test Case Test Case Data Expected Outcome Classes Covered
1 5,10 True 1
2 30,30 True 1
3 19,40 False 2
4 -1,10 Exception 4
- Write test cases covering all valid equivalence
classes. Cover as many valid equivalence classes
as you can with each test case. (Note, there are
no overlapping equivalence classes in this
example.) - Write one and only one test case for each invalid
equivalence class. When testing a value from an
equivalence class that is expected to return an
invalid result all other values should be valid.
You want to isolate tests of invalid equivalence
classes.
43Boundary Value Analysis
- Rather than select any element within an
equivalence class, select values at the edge of
the equivalence class. - For example, given the class 1 lt input lt 12
you would select values -1,1,12,13.
44Experience-Based Techniques
- Error guessing testers anticipate defects
based on experience
45Testing Effectiveness Metrics
- Defect density
- Defect removal effectiveness (efficiency)
- Code coverage
46Defect Density
- Software engineers often need to quantify how
buggy a piece of software is. Defect counts alone
are not very meaningful though. - Is 12 defects a lot to have in a program? Depends
on the size of the product (as measured by
features or LOC). - 12 defects in a 200 line program 60
defects/KLOC ? low quality. - 12 defects in a 20,000 line program is .6
defects/KLOC ? high quality. - Defect counts are more interesting (meaningful)
when tracked relative to the size of the software.
47Defect Density Cont
- Defect density is an important measure of
software quality. - Defect density total known defects / size.
- Defect density is often measured in defects/KLOC.
(KLOC thousand lines of code) - Dividing by size normalizes the measure which
allows comparison between modules of different
size. - Size is typically measured in LOC or FPs.
- Measurement is over a particular time period
(e.g. from system test through one year after
release) - Might calculate defect density after inspections
to decide which modules should be rewritten or
give more focused testing. - Be sure to define LOC. Also, consider weighting
defects. A severe defect is worse than a trivial
on.) - Gives wrong incentive.
48Defect Density Cont
- Defect density measures can be used to track
product quality across multiple releases.
49Defect removal effectiveness
- DRE tells you what percentage of defects that are
present are being found (at a certain point in
time). - Example when you started system test there were
40 errors to be found. You found 30 of them. The
defect removal effectiveness of system test is
30/40 or 75. - The trick of course is calculating the latent
number of errors at any one point in the
development process. - Solution to calculate latent number of errors at
time x, wait a certain period after time x to
learn just how many errors were present at time x.
50Example Calculation of Defect Removal
Effectiveness
51Levels of White-Box Code Coverage
- Another important testing metric is code
coverage. How thoroughly have paths through the
code been tested. - Options are
- Statement coverage
- Decision coverage (aka branch coverage)
- Condition coverage
- Path coverage
52Statement Coverage
- Each line of code is executed.
- if (a)
- stmt1
- if (b)
- stmt2
- atbt gives statement coverage
- atbf doesnt give statement coverage
53Decision Coverage
- Decision coverage is also known as branch
coverage - The boolean condition at every branch point (if,
while, etc) has been evaluated to both T and F. - if (a and b)
- stmt1
- if (c)
- stmt2
- atbtct and afb?cf gives decision
coverage
54Does statement coverage guarantee decision
coverage?
- if (a)
- stmt1
- If no, give an example of input that gives
statement coverage but not decision coverage.
55Condition Coverage
- Each boolean sub-expression at a branch point has
been evaluated to true and false. - if (a and b)
- stmt1
- at,bt and afbf gives condition coverage
56Condition Coverage
- Does condition coverage guarantee decision
coverage? - if (a and b)
- stmt1
- If no, give example input that gives condition
coverage but not decision coverage.
57Path Coverage
- In order to achieve path coverage you need a set
of test cases that executes every possible route
through a unit of code. - Path coverage is impractical for all but the most
trivial units of code.
58Path Coverage
- How many paths are there in the following unit of
code? - if (a)
- stmt1
- if (b)
- stmt2
- if (c)
- stmt3
59Path Coverage
- What inputs (test cases) are needed to achieve
path coverage on the following code fragment? - procedure AddTwoNumbers()
- top print Enter two numbers
- read a
- read b
- print ab
- if (a ! -1) goto top
60Deciding when to stop testing
- When the marginal cost finding another defect
exceeds the expected loss from that defect. - Both factors (cost of finding another defect and
expected loss from that defect) can only be
estimated. - Stopping criteria is something that should be
determined at the start of a project. Why?
61Peer Reviews
- Inspection
- Walkthrough
- Pair Programming
- Code Review
- Technical review vs. management review
62Old Example
- Use equivalence partitioning to define test cases
for the following function - // This function takes integer values for day,
- // month and year and returns the day of the
- // week in string format. The function returns
- // an empty string when given invalid inputs
values. - // Year must be gt 1752.
- // Example DayOfWeek(12,31,2009) ? Tuesday
- // Example DayOfWeek(13,13,2009) ?
- String DayOfWeek(int month, int day, int year)
63Equivalence Classes
- Month lt 1 (invalid)
- Month gt 12 (invalid)
- Year gt 1752 (valid)
- Year lt 1753 (invalid)
- Month 1 0 gt Day lt 32 (valid)
- Month 1 Day gt 32 (invalid)
- Month 4 0 gt Day lt 31 (valid)
- Month 4 Day gt 31 (invalid)
- Etc
64Test Cases
Test Case Test Case Data Expected Outcome Classes Covered
1 1,1,2010 Friday 3,5
2 0,1,1999 1
3 45,1,1999 3
4 4,1,1752 8
- Write test cases covering all valid equivalence
classes. Cover as many valid equivalence classes
as you can with each test case. - Write one and only one test case for each invalid
equivalence class. When testing a value from an
equivalence class that is expected to return an
invalid result all other values should be valid.
You want to isolate tests of invalid equivalence
classes.