Skoll: A System for Distributed Continuous Quality Assurance - PowerPoint PPT Presentation

1 / 42

About This Presentation

Title:

Skoll: A System for Distributed Continuous Quality Assurance

Description:

... to clients, (2) building each subcfg in a VM & caching the VMs to enable reuse ... each able to cache up to 8 VMs exhaustive testing takes up to 355 hours ... – PowerPoint PPT presentation

Number of Views:18

Avg rating:3.0/5.0

Slides: 43

Provided by: csU2

Learn more at: http://www.cs.umd.edu

Category:

more less

Transcript and Presenter's Notes

Title: Skoll: A System for Distributed Continuous Quality Assurance

1
Skoll A System for Distributed Continuous
Quality Assurance

Atif Memon Adam Porter
University of Maryland
atif,aporter_at_cs.umd.edu

2
Quality Assurance for Large-Scale Systems

Modern systems increasingly complex
Run on numerous platform, compiler library
combinations
Have 10s, 100s, even 1000s of configuration
options
Are evolved incrementally by geographically-distri
buted teams
Run atop of other frequently changing systems
Have multi-faceted quality objectives
How do you QA systems like this?

3
Distributed Continuous Quality Assurance

QA processes conducted around-the-world,
around-the-clock on powerful, virtual computing
grids
Grids can by made up of end-user machines,
project-wide resources or dedicated computing
clusters
General Approach
Divide QA processes into numerous tasks
Intelligently distribute tasks to clients who
then execute them
Merge and analyze incremental results to
efficiently complete desired QA process
Expected benefits
Massive parallelization allows more, better
faster QA
Improved access to resources/environs. not
readily found in-house
Carefully coordinated QA efforts enables more
sophisticated analyses

4
Collaborators
Doug Schmidt Andy Gohkale
Alex Orso
Myra Cohen
Murali Haran, Alan Karr, Mike Last, Ashish
Sanil
Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at
IBM TJ Watson) Il-Chul Yoon
5
Skoll DCQA Infrastructure Approach
Clients
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
6
Skoll DCQA Infrastructure Approach
Clients
1. Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
7
Skoll DCQA Infrastructure Approach
Clients
2. Reduce Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
8
Skoll DCQA Infrastructure Approach
Clients
3. Distribution
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
9
Skoll DCQA Infrastructure Approach
Clients
4. Feedback
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
10
Skoll DCQA Infrastructure Approach
Clients
5. Steering
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
11
The ACETAOCIAO (ATC) System

ATC characteristics
2M line open-source CORBA implementation
maintained by 40, geographically-distributed
developers
20,000 users worldwide
Product Line Architecture with 500 configuration
options
runs on dozens of OS and compiler combinations
Continuously evolving 200 CVS commits per week
Quality concerns include correctness, QoS,
footprint, compilation time more

12
Define QA Space
Options Type Settings
Operating System compile-time Linux, Windows XP, .
TAO_HAS_MINIMUM_CORBA compile-time True, False
ORBCollocation runtime global, per-orb, no
ORBConnectionPurgingStrategy runtime lru, lfu, fifo, null
ACE_version component version v5.4.3, v5.4.4,
TAO_version component version v1.4.3, v1.4.4,
run(ORT/run_test.pl) test case True, False
Constraints Constraints Constraints
(TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA)
run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA)
13
Nearest Neighbor Search
14
Nearest Neighbor Search
15
Nearest Neighbor Search
16
Nearest Neighbor Search
17
Fault Characterization

We used machine learning techniques
(classification trees) to model option setting
patterns that predict test failures

18
Applications Feasibility Studies

Compatibility testing of component-based systems
Configuration-level fault characterization
Test case generation input space exploration

19
Compatibility Testing of Comp-Based Systems

Goal
Given a component-based system, identify
components their specific versions that fail to
build
Solution Approach
Sample the configuration space, efficiently test
this sample identify subspaces in which
compilation installation fails
Initial focus on building installing
components. Later work will add functional and
performance testing

See I. Yoon, A. Sussman, A. Memon A. Porter,
Direct-Dependency-based Software Compatibility
Testing. International Conference on Automated
Software Engineering, Nov. 2007 (to appear).
20
The InterComm (IC) Framework

Middleware for coupling large scientific
simulations
Built from up to 14 other components (e.g., PVM,
MPI, GCC, OS)
Each comp can have several actively maintained
versions
There are complex constraints between components,
e.g.,
Requires GCC version 2.96 or later
When configured with multiple GNU compilers, all
must have the same version number
When configured with multiple comps that use MPI,
all must use the same implementation version
http//www.cs.umd.edu/projects/hpsl/chaos/Research
Areas/ic
Developers need help to
Identify working/broken configurations
Broaden working set (to increase potential user
base)
Rationally manage support activities

21
Annotated Comp. Dependency Graph

ACDG (CDG, Ann)
CDG DAG capturing inter-comp deps
Ann comp. versions constraints
Constraints for each cfg, e.g.,
ver (gf) x ? ver (gcr) x
ver (gf) 4.1.1 ? ver (gmp) 4.0
Can generate cfgs from ACDG
3552 total cfgs. Takes up to 10,700 CPU hrs to
build all

22
Improving Test Execution

Cfgs often share common build subsequences. This
build effort should be reusable across cfgs
Combine all cfgs into a data structure called a
prefix tree
Execute implied test plan across grid by (1)
assigning subpaths to clients, (2) building each
subcfg in a VM caching the VMs to enable reuse
Example with 8 machines each able to cache up
to 8 VMs exhaustive testing takes up to 355 hours

23
Direct-Dependency (DD) Coverage

Hypothesis A comps build process is most likely
to be affected by the comps on which it directly
depends
A directly depends on B iff there is a path (in
CDG) from A to B containing no comp nodes
Sampling approach
Identify all DDs between every pair of components
Identify all valid instantiations of these DDs
(ver. combs that violate no constraints)
Select a (small) set of cfgs that cover all valid
instantiations of the DDs

24
Executing the DD Coverage Test Suite

DD test suite much smaller than exhaustive
211 cfgs with 649 comps vs 3552 cfgs with 9919
comps
For IC, no loss of test eff. (same build failures
exposed)
Speedups achieved using 8 machines w/ 8 VM cache
Actual case 2.54 (18 vs 43 hrs)
Best case 14.69 (52 vs 355 hrs)

25
Summary

Infrastructure in place working
Complete client/server implementation using
VMware
Simulator for large scale tests on limited
resources
Initial results promising, but lots of work
remains
Ongoing activities
Alternative algorithms test execution policies
More theoretical study of sampling test exec
approaches
Apply to more software systems

26
Configuration-Level Fault Characterization

Goal
Help developers localize configuration-related
faults
Current Solution Approach
Use covering arrays to sample the cfg space to
test for subspaces in which (1) compilation fails
or (2) reg. tests fail
Build models that characterize the configuration
options and specific settings that define the
failing subspace

See C. Yilmaz, M. Cohen, A. Porter, Covering
Arrays for Efficient Fault Characterization in
Complex Configuration Spaces, ISSTA04, TSE v32
(1)
27
Covering Arrays

Compute test schedule from t-way covering arrays
a set of configurations in which all ordered
t-tuples of option settings appear at least once
2-way covering array example

Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations
C1 C2 C3 C4 C5 C6 C7 C8 C9
O1 0 0 0 1 1 1 2 2 2
O2 0 1 2 0 1 2 0 1 2
O3 0 1 2 1 2 0 2 0 1
28
Limitations

Must choose the strength covering array before
computing it
No way to know, a priori, what the right value is
Our experience suggests failures patterns can
change over time
Choose too high
Run more tests than necessary
Testing might not finish before next release
Non-uniform sample negatively affects
classification performance
Choose too low
Non-uniform sample negatively affects
classification techniques
Must repeat process at higher strength

29
Incremental Covering Arrays

Start with traditional covering array(s) of low
strength (usually 2)
Execute test schedule classify observed
failures
If resources allow or classification performance
requires
Increment strength
Build new covering array using previously run
array(s) as seeds

See S. Fouche, M. Cohen and A. Porter. Towards
Incremental Adaptive Covering Arrays, ESEC/FSE
2007, (to appear)
30
Incremental Covering Arrays (cont.)

Multiple CAs at each level of t
Use t1 as a seed for the first t1-way array
(t11)
To create the ith t-way array (ti), create a seed
of size t-11 using non-seeded cfgs from t-1i
If seed lt t-11 complete seed with cfgs from
t11

31
MySQL Case Study

Project background
Widely-used, 2M line open-source database
project
Cont. evolving maint. by geographically-distribu
ted developers
Dozens of cfg opts runs on dozens of
OS/compiler combos
Case study using release 5.0.24
Used 13 cfg opts with 2-12 settings each (gt 110k
unique cfgs)
460 tests/per config across a grid of 50 machines
Executed 50M tests total using 25 CPU years

32
Results

Built 3 trad and incr covering arrays for 2 ? t ?
4
Traditional sizes 108, 324, 870
Incremental sizes 113, 336 (223), 932 (596)
Incr appr exposed classified the same failures
as trad appr
Costs depend on t failures patterns
Failures at level t Inc gt Trad (4-9)
Failures at level lt t Inc lt Trad (65-87)
Failures at level gt t Inc lt Trad (28-38)

33
Summary

New application driving infrastructure
improvements
Initial results encouraging
Applied process to a configuration space with
over 110K cfgs
Found many test failures corresponding to real
bugs
Incremental approach more flexible than
traditional approach. Appears to offer
substantial savings in best case, while incurring
minimal cost in worst case
Ongoing extensions
MySQL continuous build process
Community involvement starting
Want to volunteer? Go to http//www.cs.umd.edu

34
GUI Test Case Executable By A Robot

JFCUnit
Other interactions
Exponential with length
Capture/replay
Tedious
Test common sequences
Bad Idea
Model-based techniques
GUITAR
guitar.cs.umd.edu

35
Modeling The Event-Interaction Space
Sampling

Event flow graph (EFG)
Nodes all GUI events
Starting events
Edges Follows
relationship
Reverse Engineering
Obtained Automatically

Test case generation
Cover all edges

See Atif M. Memon and Qing Xie, Studying the
Fault-Detection Effectiveness of GUI Test Cases
for Rapidly Evolving Software. IEEE Transactions
on Software Engineering, vol. 31, no. 10, 2005,
pp. 884-896.
36
Lets See How It Works!

Point to the CVS head
Push the button
Read error report
What happens
Gets code from CVS head
Builds
Reverse engineers the event-flow graph
Generates test cases to cover all the edges
2-way covering
Runs them
SourceForge.net
Four applications

37
Digging Deeper!

Intuition
Non-interacting events (e.g., Save, Find)
Interacting events (e.g., Copy, Paste)
Key Idea
Identify interacting events
Mark the
EFG edges
(Annotated graph)
Generate
3-way, 4-way, covering
test cases for interacting
events only

EFG
38
Identifying Interacting Events

High-level overview of approach
Observe how events execute on the GUI
Events interact if they influence one anothers
execution
Execute event e2 execute event sequence lte1, e2gt
Did e1 influence e2s execution?
If YES, then they must be tested further
annotate the lte1, e2gt edge in graph
Use feedback
Generate seed suite
2-way covering test cases
Run test cases
Need to obtain sets of GUI states
Collect GUI run-time states as feedback
Analyze feedback and obtain interacting event
sets
Generate new test cases
3-way, 4-way, covering test cases

39
Did We Do Better?

Compare feedback-based approach to 2-way coverage

40
Summary

Manually developed test cases
JFCUnit, Capture/replay
Can be deployed and executed by a robot
Too many interactions to test
Exponential
The GUITAR Approach
Develop a model of all possible interactions
Use abstraction techniques to sample the model
Develop adequacy criteria
Generate an initial test suite Develop an
Execute tests collect feedback annotate
model generate tests cycle
Feasibility study results

41
Future Work