Title: Skoll: A System for Distributed Continuous Quality Assurance
1Skoll A System for Distributed Continuous
Quality Assurance
- Atif Memon Adam Porter
- University of Maryland
- atif,aporter_at_cs.umd.edu
2Quality Assurance for Large-Scale Systems
- Modern systems increasingly complex
- Run on numerous platform, compiler library
combinations - Have 10s, 100s, even 1000s of configuration
options - Are evolved incrementally by geographically-distri
buted teams - Run atop of other frequently changing systems
- Have multi-faceted quality objectives
- How do you QA systems like this?
3Distributed Continuous Quality Assurance
- QA processes conducted around-the-world,
around-the-clock on powerful, virtual computing
grids - Grids can by made up of end-user machines,
project-wide resources or dedicated computing
clusters - General Approach
- Divide QA processes into numerous tasks
- Intelligently distribute tasks to clients who
then execute them - Merge and analyze incremental results to
efficiently complete desired QA process - Expected benefits
- Massive parallelization allows more, better
faster QA - Improved access to resources/environs. not
readily found in-house - Carefully coordinated QA efforts enables more
sophisticated analyses
4Collaborators
Doug Schmidt Andy Gohkale
Alex Orso
Myra Cohen
Murali Haran, Alan Karr, Mike Last, Ashish
Sanil
Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at
IBM TJ Watson) Il-Chul Yoon
5Skoll DCQA Infrastructure Approach
Clients
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
6Skoll DCQA Infrastructure Approach
Clients
1. Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
7Skoll DCQA Infrastructure Approach
Clients
2. Reduce Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
8Skoll DCQA Infrastructure Approach
Clients
3. Distribution
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
9Skoll DCQA Infrastructure Approach
Clients
4. Feedback
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
10Skoll DCQA Infrastructure Approach
Clients
5. Steering
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
11The ACETAOCIAO (ATC) System
- ATC characteristics
- 2M line open-source CORBA implementation
- maintained by 40, geographically-distributed
developers - 20,000 users worldwide
- Product Line Architecture with 500 configuration
options - runs on dozens of OS and compiler combinations
- Continuously evolving 200 CVS commits per week
- Quality concerns include correctness, QoS,
footprint, compilation time more
12Define QA Space
Options Type Settings
Operating System compile-time Linux, Windows XP, .
TAO_HAS_MINIMUM_CORBA compile-time True, False
ORBCollocation runtime global, per-orb, no
ORBConnectionPurgingStrategy runtime lru, lfu, fifo, null
ACE_version component version v5.4.3, v5.4.4,
TAO_version component version v1.4.3, v1.4.4,
run(ORT/run_test.pl) test case True, False
Constraints Constraints Constraints
(TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA)
run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA)
13Nearest Neighbor Search
14Nearest Neighbor Search
15Nearest Neighbor Search
16Nearest Neighbor Search
17Fault Characterization
- We used machine learning techniques
(classification trees) to model option setting
patterns that predict test failures
18Applications Feasibility Studies
- Compatibility testing of component-based systems
- Configuration-level fault characterization
- Test case generation input space exploration
19Compatibility Testing of Comp-Based Systems
- Goal
- Given a component-based system, identify
components their specific versions that fail to
build - Solution Approach
- Sample the configuration space, efficiently test
this sample identify subspaces in which
compilation installation fails - Initial focus on building installing
components. Later work will add functional and
performance testing
See I. Yoon, A. Sussman, A. Memon A. Porter,
Direct-Dependency-based Software Compatibility
Testing. International Conference on Automated
Software Engineering, Nov. 2007 (to appear).
20The InterComm (IC) Framework
- Middleware for coupling large scientific
simulations - Built from up to 14 other components (e.g., PVM,
MPI, GCC, OS) - Each comp can have several actively maintained
versions - There are complex constraints between components,
e.g., - Requires GCC version 2.96 or later
- When configured with multiple GNU compilers, all
must have the same version number - When configured with multiple comps that use MPI,
all must use the same implementation version - http//www.cs.umd.edu/projects/hpsl/chaos/Research
Areas/ic - Developers need help to
- Identify working/broken configurations
- Broaden working set (to increase potential user
base) - Rationally manage support activities
21Annotated Comp. Dependency Graph
- ACDG (CDG, Ann)
- CDG DAG capturing inter-comp deps
- Ann comp. versions constraints
- Constraints for each cfg, e.g.,
- ver (gf) x ? ver (gcr) x
- ver (gf) 4.1.1 ? ver (gmp) 4.0
- Can generate cfgs from ACDG
- 3552 total cfgs. Takes up to 10,700 CPU hrs to
build all
22Improving Test Execution
- Cfgs often share common build subsequences. This
build effort should be reusable across cfgs - Combine all cfgs into a data structure called a
prefix tree - Execute implied test plan across grid by (1)
assigning subpaths to clients, (2) building each
subcfg in a VM caching the VMs to enable reuse - Example with 8 machines each able to cache up
to 8 VMs exhaustive testing takes up to 355 hours
23Direct-Dependency (DD) Coverage
- Hypothesis A comps build process is most likely
to be affected by the comps on which it directly
depends - A directly depends on B iff there is a path (in
CDG) from A to B containing no comp nodes - Sampling approach
- Identify all DDs between every pair of components
- Identify all valid instantiations of these DDs
(ver. combs that violate no constraints) - Select a (small) set of cfgs that cover all valid
instantiations of the DDs
24Executing the DD Coverage Test Suite
- DD test suite much smaller than exhaustive
- 211 cfgs with 649 comps vs 3552 cfgs with 9919
comps - For IC, no loss of test eff. (same build failures
exposed) - Speedups achieved using 8 machines w/ 8 VM cache
- Actual case 2.54 (18 vs 43 hrs)
- Best case 14.69 (52 vs 355 hrs)
25Summary
- Infrastructure in place working
- Complete client/server implementation using
VMware - Simulator for large scale tests on limited
resources - Initial results promising, but lots of work
remains - Ongoing activities
- Alternative algorithms test execution policies
- More theoretical study of sampling test exec
approaches - Apply to more software systems
26Configuration-Level Fault Characterization
- Goal
- Help developers localize configuration-related
faults - Current Solution Approach
- Use covering arrays to sample the cfg space to
test for subspaces in which (1) compilation fails
or (2) reg. tests fail - Build models that characterize the configuration
options and specific settings that define the
failing subspace
See C. Yilmaz, M. Cohen, A. Porter, Covering
Arrays for Efficient Fault Characterization in
Complex Configuration Spaces, ISSTA04, TSE v32
(1)
27Covering Arrays
- Compute test schedule from t-way covering arrays
- a set of configurations in which all ordered
t-tuples of option settings appear at least once - 2-way covering array example
Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations
C1 C2 C3 C4 C5 C6 C7 C8 C9
O1 0 0 0 1 1 1 2 2 2
O2 0 1 2 0 1 2 0 1 2
O3 0 1 2 1 2 0 2 0 1
28Limitations
- Must choose the strength covering array before
computing it - No way to know, a priori, what the right value is
- Our experience suggests failures patterns can
change over time - Choose too high
- Run more tests than necessary
- Testing might not finish before next release
- Non-uniform sample negatively affects
classification performance - Choose too low
- Non-uniform sample negatively affects
classification techniques - Must repeat process at higher strength
29Incremental Covering Arrays
- Start with traditional covering array(s) of low
strength (usually 2) - Execute test schedule classify observed
failures - If resources allow or classification performance
requires - Increment strength
- Build new covering array using previously run
array(s) as seeds
See S. Fouche, M. Cohen and A. Porter. Towards
Incremental Adaptive Covering Arrays, ESEC/FSE
2007, (to appear)
30Incremental Covering Arrays (cont.)
- Multiple CAs at each level of t
- Use t1 as a seed for the first t1-way array
(t11) - To create the ith t-way array (ti), create a seed
of size t-11 using non-seeded cfgs from t-1i - If seed lt t-11 complete seed with cfgs from
t11
31MySQL Case Study
- Project background
- Widely-used, 2M line open-source database
project - Cont. evolving maint. by geographically-distribu
ted developers - Dozens of cfg opts runs on dozens of
OS/compiler combos - Case study using release 5.0.24
- Used 13 cfg opts with 2-12 settings each (gt 110k
unique cfgs) - 460 tests/per config across a grid of 50 machines
- Executed 50M tests total using 25 CPU years
32Results
- Built 3 trad and incr covering arrays for 2 ? t ?
4 - Traditional sizes 108, 324, 870
- Incremental sizes 113, 336 (223), 932 (596)
- Incr appr exposed classified the same failures
as trad appr - Costs depend on t failures patterns
- Failures at level t Inc gt Trad (4-9)
- Failures at level lt t Inc lt Trad (65-87)
- Failures at level gt t Inc lt Trad (28-38)
33Summary
- New application driving infrastructure
improvements - Initial results encouraging
- Applied process to a configuration space with
over 110K cfgs - Found many test failures corresponding to real
bugs - Incremental approach more flexible than
traditional approach. Appears to offer
substantial savings in best case, while incurring
minimal cost in worst case - Ongoing extensions
- MySQL continuous build process
- Community involvement starting
- Want to volunteer? Go to http//www.cs.umd.edu
34GUI Test Case Executable By A Robot
- JFCUnit
- Other interactions
- Exponential with length
- Capture/replay
- Tedious
- Test common sequences
- Bad Idea
- Model-based techniques
- GUITAR
- guitar.cs.umd.edu
35Modeling The Event-Interaction Space
Sampling
- Event flow graph (EFG)
- Nodes all GUI events
- Starting events
- Edges Follows
- relationship
- Reverse Engineering
- Obtained Automatically
- Test case generation
- Cover all edges
See Atif M. Memon and Qing Xie, Studying the
Fault-Detection Effectiveness of GUI Test Cases
for Rapidly Evolving Software. IEEE Transactions
on Software Engineering, vol. 31, no. 10, 2005,
pp. 884-896.
36Lets See How It Works!
- Point to the CVS head
- Push the button
- Read error report
- What happens
- Gets code from CVS head
- Builds
- Reverse engineers the event-flow graph
- Generates test cases to cover all the edges
- 2-way covering
- Runs them
- SourceForge.net
- Four applications
37Digging Deeper!
- Intuition
- Non-interacting events (e.g., Save, Find)
- Interacting events (e.g., Copy, Paste)
- Key Idea
- Identify interacting events
- Mark the
- EFG edges
- (Annotated graph)
- Generate
- 3-way, 4-way, covering
- test cases for interacting
- events only
EFG
38Identifying Interacting Events
- High-level overview of approach
- Observe how events execute on the GUI
- Events interact if they influence one anothers
execution - Execute event e2 execute event sequence lte1, e2gt
- Did e1 influence e2s execution?
- If YES, then they must be tested further
annotate the lte1, e2gt edge in graph - Use feedback
- Generate seed suite
- 2-way covering test cases
- Run test cases
- Need to obtain sets of GUI states
- Collect GUI run-time states as feedback
- Analyze feedback and obtain interacting event
sets - Generate new test cases
- 3-way, 4-way, covering test cases
39Did We Do Better?
- Compare feedback-based approach to 2-way coverage
40Summary
- Manually developed test cases
- JFCUnit, Capture/replay
- Can be deployed and executed by a robot
- Too many interactions to test
- Exponential
- The GUITAR Approach
- Develop a model of all possible interactions
- Use abstraction techniques to sample the model
- Develop adequacy criteria
- Generate an initial test suite Develop an
Execute tests collect feedback annotate
model generate tests cycle - Feasibility study results
41Future Work
- Need volunteers for MySQL Build Farm Project
- http//skoll.cs.umd.edu
- Looking for more example systems (help!)
- Continue improving Skoll system
- New problem classes
- Performance and robustness optimization
- Improved use of test data
- Test case ROI analysis
- Configuration advice
- Cost-aware testing (e.g., minimize power,
network, disk) - Use source code analysis to further reduce state
spaces - Extend test generation technology outside GUI
applications - QA for distributed systems
42The End