Skoll: A System for Distributed Continuous Quality Assurance - PowerPoint PPT Presentation

1 / 42
About This Presentation
Title:

Skoll: A System for Distributed Continuous Quality Assurance

Description:

... to clients, (2) building each subcfg in a VM & caching the VMs to enable reuse ... each able to cache up to 8 VMs exhaustive testing takes up to 355 hours ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 43
Provided by: csU2
Learn more at: http://www.cs.umd.edu
Category:

less

Transcript and Presenter's Notes

Title: Skoll: A System for Distributed Continuous Quality Assurance


1
Skoll A System for Distributed Continuous
Quality Assurance
  • Atif Memon Adam Porter
  • University of Maryland
  • atif,aporter_at_cs.umd.edu

2
Quality Assurance for Large-Scale Systems
  • Modern systems increasingly complex
  • Run on numerous platform, compiler library
    combinations
  • Have 10s, 100s, even 1000s of configuration
    options
  • Are evolved incrementally by geographically-distri
    buted teams
  • Run atop of other frequently changing systems
  • Have multi-faceted quality objectives
  • How do you QA systems like this?

3
Distributed Continuous Quality Assurance
  • QA processes conducted around-the-world,
    around-the-clock on powerful, virtual computing
    grids
  • Grids can by made up of end-user machines,
    project-wide resources or dedicated computing
    clusters
  • General Approach
  • Divide QA processes into numerous tasks
  • Intelligently distribute tasks to clients who
    then execute them
  • Merge and analyze incremental results to
    efficiently complete desired QA process
  • Expected benefits
  • Massive parallelization allows more, better
    faster QA
  • Improved access to resources/environs. not
    readily found in-house
  • Carefully coordinated QA efforts enables more
    sophisticated analyses

4
Collaborators
Doug Schmidt Andy Gohkale
Alex Orso
Myra Cohen
Murali Haran, Alan Karr, Mike Last, Ashish
Sanil
Sandro Fouché, Alan Sussman, Cemal Yilmaz (now at
IBM TJ Watson) Il-Chul Yoon
5
Skoll DCQA Infrastructure Approach
Clients
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
6
Skoll DCQA Infrastructure Approach
Clients
1. Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
7
Skoll DCQA Infrastructure Approach
Clients
2. Reduce Model
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
8
Skoll DCQA Infrastructure Approach
Clients
3. Distribution
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
9
Skoll DCQA Infrastructure Approach
Clients
4. Feedback
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
10
Skoll DCQA Infrastructure Approach
Clients
5. Steering
See A. Porter, C. Yilmaz, A. Memon, A.
Nagarajan, D. C. Schmidt, and B. Natarajan,
Skoll A Process and Infrastructure for
Distributed Continuous Quality Assurance. IEEE
Transactions on Software Engineering. August
2007, 33(8), pp. 510-525.
11
The ACETAOCIAO (ATC) System
  • ATC characteristics
  • 2M line open-source CORBA implementation
  • maintained by 40, geographically-distributed
    developers
  • 20,000 users worldwide
  • Product Line Architecture with 500 configuration
    options
  • runs on dozens of OS and compiler combinations
  • Continuously evolving 200 CVS commits per week
  • Quality concerns include correctness, QoS,
    footprint, compilation time more

12
Define QA Space
Options Type Settings
Operating System compile-time Linux, Windows XP, .
TAO_HAS_MINIMUM_CORBA compile-time True, False
ORBCollocation runtime global, per-orb, no
ORBConnectionPurgingStrategy runtime lru, lfu, fifo, null
ACE_version component version v5.4.3, v5.4.4,
TAO_version component version v1.4.3, v1.4.4,
run(ORT/run_test.pl) test case True, False
Constraints Constraints Constraints
(TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA) (TAO_HAS_AMI) ? (TAO_HAS_MINIMUM_CORBA)
run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA) run(ORT/run_test.pl) ?(TAO_HAS_MINIMUM_CORBA)
13
Nearest Neighbor Search
14
Nearest Neighbor Search
15
Nearest Neighbor Search
16
Nearest Neighbor Search
17
Fault Characterization
  • We used machine learning techniques
    (classification trees) to model option setting
    patterns that predict test failures

18
Applications Feasibility Studies
  • Compatibility testing of component-based systems
  • Configuration-level fault characterization
  • Test case generation input space exploration

19
Compatibility Testing of Comp-Based Systems
  • Goal
  • Given a component-based system, identify
    components their specific versions that fail to
    build
  • Solution Approach
  • Sample the configuration space, efficiently test
    this sample identify subspaces in which
    compilation installation fails
  • Initial focus on building installing
    components. Later work will add functional and
    performance testing

See I. Yoon, A. Sussman, A. Memon A. Porter,
Direct-Dependency-based Software Compatibility
Testing. International Conference on Automated
Software Engineering, Nov. 2007 (to appear).
20
The InterComm (IC) Framework
  • Middleware for coupling large scientific
    simulations
  • Built from up to 14 other components (e.g., PVM,
    MPI, GCC, OS)
  • Each comp can have several actively maintained
    versions
  • There are complex constraints between components,
    e.g.,
  • Requires GCC version 2.96 or later
  • When configured with multiple GNU compilers, all
    must have the same version number
  • When configured with multiple comps that use MPI,
    all must use the same implementation version
  • http//www.cs.umd.edu/projects/hpsl/chaos/Research
    Areas/ic
  • Developers need help to
  • Identify working/broken configurations
  • Broaden working set (to increase potential user
    base)
  • Rationally manage support activities

21
Annotated Comp. Dependency Graph
  • ACDG (CDG, Ann)
  • CDG DAG capturing inter-comp deps
  • Ann comp. versions constraints
  • Constraints for each cfg, e.g.,
  • ver (gf) x ? ver (gcr) x
  • ver (gf) 4.1.1 ? ver (gmp) 4.0
  • Can generate cfgs from ACDG
  • 3552 total cfgs. Takes up to 10,700 CPU hrs to
    build all

22
Improving Test Execution
  • Cfgs often share common build subsequences. This
    build effort should be reusable across cfgs
  • Combine all cfgs into a data structure called a
    prefix tree
  • Execute implied test plan across grid by (1)
    assigning subpaths to clients, (2) building each
    subcfg in a VM caching the VMs to enable reuse
  • Example with 8 machines each able to cache up
    to 8 VMs exhaustive testing takes up to 355 hours

23
Direct-Dependency (DD) Coverage
  • Hypothesis A comps build process is most likely
    to be affected by the comps on which it directly
    depends
  • A directly depends on B iff there is a path (in
    CDG) from A to B containing no comp nodes
  • Sampling approach
  • Identify all DDs between every pair of components
  • Identify all valid instantiations of these DDs
    (ver. combs that violate no constraints)
  • Select a (small) set of cfgs that cover all valid
    instantiations of the DDs

24
Executing the DD Coverage Test Suite
  • DD test suite much smaller than exhaustive
  • 211 cfgs with 649 comps vs 3552 cfgs with 9919
    comps
  • For IC, no loss of test eff. (same build failures
    exposed)
  • Speedups achieved using 8 machines w/ 8 VM cache
  • Actual case 2.54 (18 vs 43 hrs)
  • Best case 14.69 (52 vs 355 hrs)

25
Summary
  • Infrastructure in place working
  • Complete client/server implementation using
    VMware
  • Simulator for large scale tests on limited
    resources
  • Initial results promising, but lots of work
    remains
  • Ongoing activities
  • Alternative algorithms test execution policies
  • More theoretical study of sampling test exec
    approaches
  • Apply to more software systems

26
Configuration-Level Fault Characterization
  • Goal
  • Help developers localize configuration-related
    faults
  • Current Solution Approach
  • Use covering arrays to sample the cfg space to
    test for subspaces in which (1) compilation fails
    or (2) reg. tests fail
  • Build models that characterize the configuration
    options and specific settings that define the
    failing subspace

See C. Yilmaz, M. Cohen, A. Porter, Covering
Arrays for Efficient Fault Characterization in
Complex Configuration Spaces, ISSTA04, TSE v32
(1)
27
Covering Arrays
  • Compute test schedule from t-way covering arrays
  • a set of configurations in which all ordered
    t-tuples of option settings appear at least once
  • 2-way covering array example

Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations Configurations
C1 C2 C3 C4 C5 C6 C7 C8 C9
O1 0 0 0 1 1 1 2 2 2
O2 0 1 2 0 1 2 0 1 2
O3 0 1 2 1 2 0 2 0 1
28
Limitations
  • Must choose the strength covering array before
    computing it
  • No way to know, a priori, what the right value is
  • Our experience suggests failures patterns can
    change over time
  • Choose too high
  • Run more tests than necessary
  • Testing might not finish before next release
  • Non-uniform sample negatively affects
    classification performance
  • Choose too low
  • Non-uniform sample negatively affects
    classification techniques
  • Must repeat process at higher strength

29
Incremental Covering Arrays
  • Start with traditional covering array(s) of low
    strength (usually 2)
  • Execute test schedule classify observed
    failures
  • If resources allow or classification performance
    requires
  • Increment strength
  • Build new covering array using previously run
    array(s) as seeds

See S. Fouche, M. Cohen and A. Porter. Towards
Incremental Adaptive Covering Arrays, ESEC/FSE
2007, (to appear)
30
Incremental Covering Arrays (cont.)
  • Multiple CAs at each level of t
  • Use t1 as a seed for the first t1-way array
    (t11)
  • To create the ith t-way array (ti), create a seed
    of size t-11 using non-seeded cfgs from t-1i
  • If seed lt t-11 complete seed with cfgs from
    t11

31
MySQL Case Study
  • Project background
  • Widely-used, 2M line open-source database
    project
  • Cont. evolving maint. by geographically-distribu
    ted developers
  • Dozens of cfg opts runs on dozens of
    OS/compiler combos
  • Case study using release 5.0.24
  • Used 13 cfg opts with 2-12 settings each (gt 110k
    unique cfgs)
  • 460 tests/per config across a grid of 50 machines
  • Executed 50M tests total using 25 CPU years

32
Results
  • Built 3 trad and incr covering arrays for 2 ? t ?
    4
  • Traditional sizes 108, 324, 870
  • Incremental sizes 113, 336 (223), 932 (596)
  • Incr appr exposed classified the same failures
    as trad appr
  • Costs depend on t failures patterns
  • Failures at level t Inc gt Trad (4-9)
  • Failures at level lt t Inc lt Trad (65-87)
  • Failures at level gt t Inc lt Trad (28-38)

33
Summary
  • New application driving infrastructure
    improvements
  • Initial results encouraging
  • Applied process to a configuration space with
    over 110K cfgs
  • Found many test failures corresponding to real
    bugs
  • Incremental approach more flexible than
    traditional approach. Appears to offer
    substantial savings in best case, while incurring
    minimal cost in worst case
  • Ongoing extensions
  • MySQL continuous build process
  • Community involvement starting
  • Want to volunteer? Go to http//www.cs.umd.edu

34
GUI Test Case Executable By A Robot
  • JFCUnit
  • Other interactions
  • Exponential with length
  • Capture/replay
  • Tedious
  • Test common sequences
  • Bad Idea
  • Model-based techniques
  • GUITAR
  • guitar.cs.umd.edu

35
Modeling The Event-Interaction Space
Sampling
  • Event flow graph (EFG)
  • Nodes all GUI events
  • Starting events
  • Edges Follows
  • relationship
  • Reverse Engineering
  • Obtained Automatically
  • Test case generation
  • Cover all edges

See Atif M. Memon and Qing Xie, Studying the
Fault-Detection Effectiveness of GUI Test Cases
for Rapidly Evolving Software. IEEE Transactions
on Software Engineering, vol. 31, no. 10, 2005,
pp. 884-896.
36
Lets See How It Works!
  • Point to the CVS head
  • Push the button
  • Read error report
  • What happens
  • Gets code from CVS head
  • Builds
  • Reverse engineers the event-flow graph
  • Generates test cases to cover all the edges
  • 2-way covering
  • Runs them
  • SourceForge.net
  • Four applications

37
Digging Deeper!
  • Intuition
  • Non-interacting events (e.g., Save, Find)
  • Interacting events (e.g., Copy, Paste)
  • Key Idea
  • Identify interacting events
  • Mark the
  • EFG edges
  • (Annotated graph)
  • Generate
  • 3-way, 4-way, covering
  • test cases for interacting
  • events only

EFG
38
Identifying Interacting Events
  • High-level overview of approach
  • Observe how events execute on the GUI
  • Events interact if they influence one anothers
    execution
  • Execute event e2 execute event sequence lte1, e2gt
  • Did e1 influence e2s execution?
  • If YES, then they must be tested further
    annotate the lte1, e2gt edge in graph
  • Use feedback
  • Generate seed suite
  • 2-way covering test cases
  • Run test cases
  • Need to obtain sets of GUI states
  • Collect GUI run-time states as feedback
  • Analyze feedback and obtain interacting event
    sets
  • Generate new test cases
  • 3-way, 4-way, covering test cases

39
Did We Do Better?
  • Compare feedback-based approach to 2-way coverage

40
Summary
  • Manually developed test cases
  • JFCUnit, Capture/replay
  • Can be deployed and executed by a robot
  • Too many interactions to test
  • Exponential
  • The GUITAR Approach
  • Develop a model of all possible interactions
  • Use abstraction techniques to sample the model
  • Develop adequacy criteria
  • Generate an initial test suite Develop an
    Execute tests collect feedback annotate
    model generate tests cycle
  • Feasibility study results

41
Future Work
  • Need volunteers for MySQL Build Farm Project
  • http//skoll.cs.umd.edu
  • Looking for more example systems (help!)
  • Continue improving Skoll system
  • New problem classes
  • Performance and robustness optimization
  • Improved use of test data
  • Test case ROI analysis
  • Configuration advice
  • Cost-aware testing (e.g., minimize power,
    network, disk)
  • Use source code analysis to further reduce state
    spaces
  • Extend test generation technology outside GUI
    applications
  • QA for distributed systems

42
The End
Write a Comment
User Comments (0)
About PowerShow.com