Title: A Modelbased Distributed Continuous QA Process to Enhance the QoS of Evolving Performanceintensive S
1A Model-based Distributed Continuous QA Process
to Enhance the QoS of Evolving
Performance-intensive Software Systems
Cemal Yilmaz
2Motivating Example ACETAO
- ACETAO characteristics
- QoS-enabled middleware
- Large user community 20,000 users worldwide
- Large code base 2M lines of C code
- Geographically-distributed developers
- Continuous evolution 200 CVS commits per week
- Highly configurable program family
- Over 500 configuration options
- Dozens of OS, compiler and hardware platform
combinations - Current QoS assurance process
- Assess performance on very few configurations and
extrapolate to entire configuration space - Allow performance bottlenecks and sources of QoS
degradation to escape detection until system is
fielded
3Persistent Challenges and Emerging Opportunities
- Configuration space explosion
- Context One-size-fits-all solutions often have
unacceptable QoS - Problem Many potential system configurations to
test - Solution approach Skoll DCQA environment
- Evaluating QoS
- Context Configuration space explosion
- Problem Handcrafted QA tasks tedious and
error-prone - Solution approach Benchmark Generation Modeling
Language (BGML) - Assessing QoS across large configuration spaces
- Context Identifying the effects of changes on
QoS - Problem Brute force processes are still
infeasible - Solution approach Main effects screening
4Skoll Distributed Continuous QA
- Goal Leverage remote computing resources and
network ubiquity for distributed, continuous QA - Vision QA processes conducted around-the-world,
around-the-clock on powerful, virtual computing
grid provided by thousands of user machines
during off-peak hours - Generic Skoll DCQA Process
- Distributed
- Opportunistic
- Adaptive
- We are currently building an infrastructure,
tools and algorithms for developing and executing
thorough, transparent, managed, adaptive DCQA
processes
5Persistent Challenges and Emerging Opportunities
- Configuration space explosion
- Context One-size-fits-all solutions often have
unacceptable QoS - Problem Many potential system configurations to
test - Solution approach Skoll DCQA environment
- Evaluating QoS
- Context Configuration space explosion
- Problem Handcrafted QA tasks tedious and
error-prone - Solution approach Benchmark Generation Modeling
Language (BGML) - Assessing QoS across large configuration spaces
- Context Identifying the effects of changes on
QoS - Problem Brute force processes are still
infeasible - Solution approach Main effects screening
6BGML
- Model-driven benchmarking tool
- Visually model interaction scenarios between
configuration options and system components - Automate benchmarking code generation and reuse
of QA task code across configurations - Generate scripts to distribute and execute the
experiments to monitor QoS behavior - Enable evaluation of multiple performance metrics
(e.g., throughput, latency, and
jitter)
7Persistent Challenges and Emerging Opportunities
- Configuration space explosion
- Context One-size-fits-all solutions often have
unacceptable QoS - Problem Many potential system configurations to
test - Solution approach Skoll DCQA environment
- Evaluating QoS
- Context Configuration space explosion
- Problem Handcrafted QA tasks tedious and
error-prone - Solution approach Benchmark Generation Modeling
Language (BGML) - Assessing QoS across large configuration spaces
- Context Identifying the effects of changes on
QoS - Problem Brute force processes are still
infeasible - Solution approach Main effects screening
8Main Effects Screening
- Goal Efficiently improve the visibility of
developers into the QoS across large
configuration spaces - Phase 1
- Run benchmarks on set of configurations which are
selected using a class of experimental design
called screening designs - Reveal options that significantly affect
performance - Highly economical
- Phase 2
- Focus only on significant options
- Configuration space reduction
- Exhaustively benchmark all combinations of
significant options each time system changes - Idea Focusing only on first-order options can
greatly reduce the configuration space while at
the same time capture a much more complete
picture of the systems QoS
9Example Screening Experiment
- Goal Efficiently identify factors which have
significant effects - Assume
- 5 binary configuration options A, B, C, D, and E
(2532) - Effort only 8 configurations (238)
- We start with 23 full-factorial design
Design Generators D AB
E BC
Analysis Methods Informal ME(A)
z(A-) z(A) Formal Wilcox analysis
10Feasibility Study
- Subject applications
- ACE v5.4 TAO v1.4 CIAO v0.4
- Application scenario
- Due to recent changes to message queuing
strategy, developers are concerned with measuring
two performance criteria between the ACETAOCIAO
client and server - latency for each request
- total message throughput (events/sec.)
- 14 potentially effective binary options
(21416,384 configs.) - Example options
11Putting It All Together
12Application of the Main Effects Screening
- Given 14 potentially effective binary
configuration options - Goal Find statistically significant options
- Result A screening experiment which examines 14
options in only 25 32 runs - Started with a 25 full-factorial design for 5
options - Used 9 design generators for the rest of the
options - Observed
- Latency for each request (msec)
- Predictability of latency
- Total message throughput (events/sec)
- Also obtained performance variation for the
entire configuration space (214 16,384
configurations) - Wilcox analysis to identify statistically
significant options
13Results
- Phase 1
- For each performance metric, both exhaustive
testing and main effects screening revealed the
same two statistically significant options - Screening design gave us the same information at
a fraction of the cost (32 vs. 16,382
configurations) - Phase 2
- Exhaustively tested all possible combinations of
these two options (22 4 configurations)
14Results (cont)
Predictability of Latency Distribution
Latency Distribution
0 1 2 3 4 5 6 7
80 100 120 140 160
ln(s2)
Latency (msec)
all options
screening
random
all options
screening
random
Throughput Distribution
Range of Performance Metrics Covered
15Discussion
- Quickly generated benchmarking experiment
- Automatically identified 2 statistically
significant options out of 14 by examining only
32 configurations out of 16K configurations - Examining only 4 configurations exposed about 75
of the entire range of the systems performance
across all 16K valid configurations - Given the small number of important options,
developers can incorporate the benchmark
execution on the 4 configurations whenever they
changed the code rapid feedback on the effects
of their changes - Help tracking the performance distribution over
time - Periodic recalibration
- Defect detection aid Unexpected changes in the
screened options