More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R - PowerPoint PPT Presentation

1 / 66

About This Presentation

Title:

More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R

Description:

The Third Study (3) ... Other fourth study design choices were identical to their first and third study counterparts. ... Fifth Study Results (1) ... – PowerPoint PPT presentation

Number of Views:91

Avg rating:3.0/5.0

Slides: 67

Provided by: mitchro

Category:

more less

Transcript and Presenter's Notes

Title: More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R

1
More on Correlation Accuracyin Crystal Ball?
Simulations(or What Weve Now Learned about
Spearmans Rin Cost Risk Analyses)

Mitch Robinson Wayne Salls
Wyle Laboratories
June 15-18, 2004 Manhattan Beach, CA

2
Suspicion of Rank Correlation
Experts have questioned Monte Carlo tools that
simulate correlated variates using rank
correlation methods
Crystal Ball and _at_Risk use rank correlation
methods. Rank correlation is easier to simulate
than Pearson correlation however, as we've seen,
rank correlation is not appropriate for cost risk
analyses. Sources 67th Military Operations
Research Symposium, 1999 32nd Annual DoD Cost
Analysis Symposium, 1999 ISPA/SCEA Joint
Meeting, 2001
Why do they so doubt rank correlation?
3
Rank Correlation
Rank correlation measures how consistently one
variable changes with a second 1 if one
variable strictly increases in the other -1 if
one variable strictly decreases in the other ?
(-1,1) if one variable is constant in the second
or variably increases and decreases in it. The
Spearman r is one rank correlation measure.
4
Spearman Rank Correlation
Y strictly decreases in X Spearman r -1.00.
5
Pearson Correlation

However, our uncertainty tools typically require
linear association measureshow consistently do
two variables covary in a linear sense?
1 if two variables covary on a
positively-sloped line
-1 if two variables covary on a
negatively-sloped line
? (-1,1) if the two variables dont covary on a
line.
The Pearson raddresses this linear covariation.

6
Pearson Correlation (2)
Same numbers. Pearson r -0.18.
The regression line modeling linearity is about
Y 142 19 ? X.
7
Spearman vs. Pearson
Monotonicity does not imply linearity!
Spearman r -1.00 Pearson r -0.18
8
Whats Up with Crystal Ball?(1)
Crystal Ball? implements Iman and Conovers
(1982) algorithm for inducing a specified rank
correlation between two sets of numbers.
9
Whats Up with Crystal Ball?(2)

However, we act as though Crystal Ball? uses the
Iman-Conover algorithm to simulate Pearson rs.
This practice does not follow from the
Iman-Conover logic and is thus sensibly suspect.
Weve seen the potential for bad disconnects
between Spearman rs and Pearson rs for the same
sets of numbers.
We should thus want to know, How well do Crystal
Ball correlations match our intended
Pearson-sense correlations?

10
February-June 2002

Can Crystal Ball? accurately simulate Pearson
correlations?
Are there conditions or practices that contribute
to better or worse accuracy performance?

11
The General Approach (1)

Define 33 variates and their probability
distributions in an ExcelÒ spreadsheet using
Crystal Ball?assumption cells.
Link 33 other cells one-to-one to the assumption
cells. Make them Crystal Ball forecast cells.
Crystal Ball will record the varying assumption
cell values and collect statistics on them via
the equated forecast cells.

12
The General Approach (2)

Define a target correlation matrix.

13
The General Approach (3)

Configure the simulation using the Crystal Ball?
Run Preferences menu.

14
The General Approach (4)

Run 10,000 simulation trials.

15
The General Approach (5)
Extract the forecast cell outputs
16
The General Approach (6)
Examine the simulated correlations using the
correlation tool under the Tools-Data Analysis
(add-in) menu or
17
The General Approach (7)
the MS ExcelÒ correl(), correlation function.
18
The General Approach (8)

Compare the simulated Pearson correlations with
their respective target correlations.

19
The First Study (1)

Thirty-three variates allow 528 pairwise
correlations for accuracy tests.
Identical target correlations among the
variables.
Identical triangular (0,0.25, 1.0) probability
distributions slightly right skewed with mode
0.25, mean 1.25/3 ? 0.42.

20
The Tr (0, 0.25, 1.0) Distribution
21
The First Study (2)

Correlation sample 10,000 that is, apply the
correlation algorithm to the entire set of
numbers.
If correlation sample 1000 Crystal Ball
applies the algorithm 10 times, to batches
comprising 1000 trials per variable and 33
variables.

22
The First Study (3)

3 x 4 study design run the 10,000 simulation
trials under 12 separate conditions
target correlation 0.25, 0.50, or 0.75.
starting seed 1 2 1,048,576 or 2,097,152
the four number streams are nonoverlapping over
their first 2 million members.

23
First Study Results (1)

More than 98 of the 6336 simulated correlations
were within 0.03 of the target all but 5 were
within 0.05 of the target all were within 0.06
of the target.
Nearly 75 of the simulated correlations were
less than their target this varied only
negligibly over the three targets.

24
The Second Study (1)
The first study related every variable to every
other variable. Did this highly connected
correlation network 528 nonzero correlations
interconnecting all 33 variable pairs drive the
correlation accuracy results?
25
The Second Study (2)

Assigned nonzero correlations only to (x1, x2),
(x3, x4), (x31, x32), reducing the correlation
yield from 528 to 16 in each replication.
Other second study design choices are identical
to their first study counterparts.

26
Second Study Results (1)

All of the 192 simulated correlations were within
0.02 of their target.
All of the simulated correlations were less than
the target.

27
The Third Study (1)

Are there conditions or practices that worsen
accuracy performance?
Simulating correlations requires a large sample
of random values generated ahead of time. The
values in the samples are rearranged to create
the desired correlations. If the correlation
sample size is smaller than the total number of
trials a next group of samples is generated and
correlated.
Crystal Ball 2000 Users Manual. pp. 246-7.

28
The Third Study (2)

Are there conditions or practices that worsen
accuracy performance?
The sample size is initially set to 500. While
any sample size greater than 100 should produce
sufficiently acceptable results, you can set this
number higher to maximize accuracy. The increased
accuracy resulting from the use of larger
samples, however, requires additional memory and
reduces overall system responsiveness. If either
of these become an issue, reduce the sample size.
Crystal Ball 2000 Users Manual. pp. 246-7.

29
The Third Study (3)

Correlation sample size 100 configure
Crystal Ball to apply the correlation algorithm
100 times, to batches comprising 100 trials per
variable and 33 variables.
Other third study design choices were identical
to their first study counterparts.

30
Third Study Results (1) (First Study Results)

About 31 (98) of the 6336 simulated
correlations were within 0.03 of the target 2817
(5) were outside 0.05 of the targetall were
within 0.29 (0.06) of the target.
About 92 (74) of the simulated correlations
were less than the target.
Correlation accuracy worsened with target
sizei.e., 0.25 accuracy lt 0.50 accuracy lt 0.75
accuracy see the next slides.

31
Third Study Results (2)
32
Third Study Results (3) (First Study Results)
33
The Fourth Study (1)

Does increasing the correlation sample size from
100 to 500 improve accuracy?
500 is Crystal Balls installation default for
the correlation sample The sample size is
initially set to 500. Source Crystal Ball
2000Users Manual. pp. 246-7 also see the
Trials tab in the Crystal Ball? Run
Preferences menu.

34
The Fourth Study (2)

Correlation sample size of 500 i.e.,
configure Crystal Ball to apply the correlation
algorithm 20 times, to successive batches
comprising 500 trials per variable and 33
variables.
Examine only target correlation 0.75, for which
we catastrophically lost accuracy in the third
study.
Other fourth study design choices were identical
to their first and third study counterparts.

35
Fourth Study Results (1) (First/Third Study
Results for Target 0.75)

About 79 (99/2) of the 2112 simulated
correlations were within 0.03 of the target
100 were within 0.09 (0.05/0.29) of the target.
About 93 (74/92) of the simulated correlations
were less than the target.

36
The Extended Fourth Study

Side-by-side examination of correlation sample
sizes 100, 500, 1000, and 10,000 for target
correlation 0.75 and accuracy bands 0.02,
0.04, and 0.06.
Other fourth study design choices were identical
to their first and third study counterparts.

37
Extended Fourth Study Results
38
FAST FORWARD TO THE PRESENT
39
The Fifth Study

Lognormal distributions with ? 1 and ? 0.50
or 1.00.
Other study design choices similar to the earlier
studies.
33 variates ? 528 correlations
10,000 trials
Four seeds 1, 2 1,048,576 2,097,152
Three correlation targets 0.25, 0.50, 0.75
Two correlation sample sizes 100 10,000.

40
Tails of Two Lognormals
? 1 ? 0.5
? 1 ? 1
41
Fifth Study Results (1)
Correlation error is worse when ? 1 than when ?
0.50 with some divergence over the three
correlation targets.
42
Fifth Study Results (2)
Average correlation error is worse when ? 1
than when ? 0.5 with some divergence over the
three correlation targets. Correlation sample N
10,000 e.g.., The average error in
simulating a 0.25 correlation was 0.062 for ? 1.
43
Fifth Study Results (3)
Average correlation error is worse when ? 1
than when ? 0.5 with some divergence over the
three correlation targets. Correlation sample N
100.
44
Fifth Study Results (4)

We observed the correlation sample-size effect
Correlation sample size has a negligible effect
on average correlation error for 0.25 targets.
Average correlation error increases for 0.50
targets and more yet for 0.75 targets as we
switch from the 10,000-trial correlation sample
size to 100.

45
Fifth Study Results (5)
Correlation sample size has a negligible effect
on average correlation error for 0.25 targets but
does affect it for 0.50 targets more yet for
0.75 targets. ? 0.5
correl. sample N 100
correl. sample N 10,000
46
Fifth Study Results (6)
Correlation sample size has a negligible effect
on average correlation error for 0.25 targets but
does affect it for 0.50 targets more yet for
0.75 targets. ? 1
correl. sample N 100
correl. sample N 10,000
47
Fifth Study Results (7)
? 1 N100
? 0.5 N100
? 1 N10K
? 0.5 N10K
48
Fifth Study Results (8)

We observed related accuracy loss results for
other performance measures as well
Percentage of simulated correlations within X of
their target.
Average absolute correlation error.
Percentage of simulated correlations below the
target.
Maximum correlation error.
Minimum correlation error.
Simulated correlation sample variance.

49
Fifth Study Results (9)
50
Fifth Study Results (10)
51
Fifth Study Results (11)
52
Fifth Study Results (12)
53
The Sixth Study

Does increasing the number of trials improve
accuracy?
Are the accuracy differences between 100-trial
correlation samples and 10,000-trial correlation
samples due to batch size or batch percentage ?
e.g., 10,000 vs. 100 or 100 vs. 1?
10,000 trials with 100 vs. 10,000-trial batches
vs.
30, 000 trails with 300 vs. 30,000-trial
batches
One correlation target 0.75
33 variates ? 528 correlations
Four seeds 1, 2 1,048,576 2,097,152.

54
Sixth Study Results (1)
55
Sixth Study Results (2)
Correlation sample
N 100 out of 10K trials
N 300 out of 30K trials
N 30K out of 30K trials N 10K out of 10K
trials
56
Sixth Study Results (3)
57
Sixth Study Results (4)
58
Sixth Study Results (5)
59
Sixth Study Results (6)
60
Sixth Study Results (7)
61
Lesson Learned (1)

Dont fear rank order correlation as a general
principle Crystal Ball? simulated reasonably
accurate Pearson correlations in the first,
second, and parts of the third, fourth, fifth,
and sixth studies.
This was surprising given the theoryonly
showing that we really dont understand the
theory.
Accuracy error increased noticeably after
reducing the correlation sample size from 100 of
trials to lesser percentages.
Accuracy losses occurred asymmetrically in the
larger correlations, hardly appearing in the 0.25
correlation results. Interactions are always
interesting.

62
Lesson Learned (2)

There was a clear tendency to undershoot target
correlations.
This may have been predictable. However, we
still dont understand the underlying theory well
enough to confidently predict this result.
Accuracy losses are subject to some analyst
control via optional levels for the correlation
sample size.
The purpose of this option was to reduce
simulation demands on desktop computers running
late 1990s technology. We may no longer need this
option to obtain satisfactory run times. Set the
correlation sample size as large as is
practicable.

63
Lesson Learned (3)

Increasing the number of trials improved accuracy
in the less-than-100 correlation sample size
conditions but not in the 100 conditions.
Interactions are always interesting.
More skewed probability distributions were more
inaccuracy prone than less skewed probability
distributions.
This may have been predictable. However, we
still dont understand the underlying theory well
enough to confidently predict this result.

64
Acknowledgements
To Ed Miller for initially encouraging these
studies. To Eric Wainwright and Decisioneering,
Inc. for supplying us a Crystal Ball?.
65
References
Decisioneering, Inc. Crystal Ball 2000? User
Manual. 1998-2000. Iman, R.L. and W.J. Conover. A
distribution-free approach to inducing rank
correlation among input variables. Communications
in Statistics, B11 (3), pp. 311-334,
1982. Robinson, M. and S. Cole. Rank Correlation
inCrystal Ball? Simulations (or How We Overcame
Our Fear of Spearmans R in Cost Risk Analyses).
Presented at the 76th Space Systems Cost Analysis
Group meeting, San Piedro, CA, February 2002 and
at the 3rd Joint Annual ISPA-SCEA International
Conference and Educational Workshop , Scottsdale,
AZ, June 2002.
66
The End

Write a Comment

User Comments (0)