Title: Software Analysis via Data Analysis
1Software Analysis via Data Analysis
Matthias F. Stallmann, 2006/04/04, DIMACS EAAISC
planning meeting
- Xiao Yu Li
- Seattle, WA, USA
Franc Brglez Raleigh, NC, USA
Based on joint work with
2Software versus Algorithms
- Problems are NP-hard require heuristics or
(worst case) exponential algorithms. - Simple algorithms must be compared with
- cplex and other ILP solvers
- metaheuristics (SA, GA, particle swarm,
ants, etc.) with lots of adjustable parameters - Want black box comparison with (in many cases)
no prior understanding of (some of) the algorithms
3Data Presentationsbased on CPLEX runs for
(permutations of) a single instance
- Descriptive Statistics
- mean / median / stdev
- 600.4 / 25.3 / 1767.3
- Histogram
- Percent solved
4Stretching the Truthor making it clearer?
- more information here, but we need to look
carefully
5A more normal distributionCPLEX with different
settings under the same conditions
6uf250..87 QT2/QT1 vs UW2/UW1 (1)
What about random 3-SAT instances?
exp. d. 16.7/17.2
solvability
runtime (seconds)
7uf250..87 QT2/QT1 vs UW2/UW1 (2)
UW1 performs the same as QT1 (t-test t 1.88 gt
1.97)
exp. d. 16.7/17.2
exp. d. 12.3/12.5
solvability
runtime (seconds)
8uf250..87 QT2/QT1 vs UW2/UW1 (3)
UW1 performs the same as QT1 (t-test t 1.88 gt
1.97)
UW2 outperforms UW1 by a factor of 31 ...
exp. d. 0.39/0.29
9uf250..87 QT2/QT1 vs UW2/UW1 (4)
UW1 performs the same as QT1 (t-test t 1.88 gt
1.97)
UW2 outperforms UW1 by a factor of 31 ...
QT2 outperforms UW2 slightly (t-test t 2.24 gt
1.97)
exp. d.
exp. d.
0.31/0.28
exp. d. 0.39/0.29
exp. d. 0.21/0.19
10Sources of Data Distributionsfirst, 100 random
instances (SAT)
median 4.8
mean 21.2
stdev 42.9
Heavy tail
11Not all instances are equalheres an easy one
(128 permutations original)
median 4.4
mean 6.7
stdev 6.5
Exponential
12and heres a harder one
median 84.8
mean 126.7
stdev 117.6
Exponential
13Another wrinkle stochastic searchfirst, same
seed and 32 permuted instances original
number of flips
median 42484
mean 62457
stdev 86551
slightly worse than Exponential
14versus 33 different seeds same distribution?
number of flips
median 36637
mean 50185
stdev 83226
slightly worse than Exponential
15Things get strange when the solver is not
completely stochastic (e.g. BB with stochastic
search)
Bi-modal Stochastic search either finds optimum
at root, or at first branch. Lower bound finds
optimum at root. No randomness in LB method.
16Lower bound method is extremely sensitive to
input ordering
Heavy tail (and one instance times out) Lower
bound method finds optimum at root or in an early
branch only if input order is friendly.
17Another Data Analysis Application Instance
Profiling
- sao2.b is very easy to solve (variables easy to
distinguish) - e64.b is very difficult (lots of variables occur
equally often)