More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R - PowerPoint PPT Presentation

1 / 66
About This Presentation
Title:

More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R

Description:

The Third Study (3) ... Other fourth study design choices were identical to their first and third study counterparts. ... Fifth Study Results (1) ... – PowerPoint PPT presentation

Number of Views:91
Avg rating:3.0/5.0
Slides: 67
Provided by: mitchro
Category:

less

Transcript and Presenter's Notes

Title: More on Correlation Accuracy in Crystal Ball Simulations or What Weve Now Learned about Spearmans R


1
More on Correlation Accuracyin Crystal Ball?
Simulations(or What Weve Now Learned about
Spearmans Rin Cost Risk Analyses)
  • Mitch Robinson Wayne Salls
  • Wyle Laboratories
  • June 15-18, 2004 Manhattan Beach, CA

2
Suspicion of Rank Correlation
Experts have questioned Monte Carlo tools that
simulate correlated variates using rank
correlation methods
Crystal Ball and _at_Risk use rank correlation
methods. Rank correlation is easier to simulate
than Pearson correlation however, as we've seen,
rank correlation is not appropriate for cost risk
analyses. Sources 67th Military Operations
Research Symposium, 1999 32nd Annual DoD Cost
Analysis Symposium, 1999 ISPA/SCEA Joint
Meeting, 2001
Why do they so doubt rank correlation?
3
Rank Correlation
Rank correlation measures how consistently one
variable changes with a second 1 if one
variable strictly increases in the other -1 if
one variable strictly decreases in the other ?
(-1,1) if one variable is constant in the second
or variably increases and decreases in it. The
Spearman r is one rank correlation measure.
4
Spearman Rank Correlation
Y strictly decreases in X Spearman r -1.00.
5
Pearson Correlation
  • However, our uncertainty tools typically require
    linear association measureshow consistently do
    two variables covary in a linear sense?
  • 1 if two variables covary on a
    positively-sloped line
  • -1 if two variables covary on a
    negatively-sloped line
  • ? (-1,1) if the two variables dont covary on a
    line.
  • The Pearson raddresses this linear covariation.

6
Pearson Correlation (2)
Same numbers. Pearson r -0.18.
The regression line modeling linearity is about
Y 142 19 ? X.
7
Spearman vs. Pearson
Monotonicity does not imply linearity!
Spearman r -1.00 Pearson r -0.18
8
Whats Up with Crystal Ball?(1)
Crystal Ball? implements Iman and Conovers
(1982) algorithm for inducing a specified rank
correlation between two sets of numbers.
9
Whats Up with Crystal Ball?(2)
  • However, we act as though Crystal Ball? uses the
    Iman-Conover algorithm to simulate Pearson rs.
  • This practice does not follow from the
    Iman-Conover logic and is thus sensibly suspect.
  • Weve seen the potential for bad disconnects
    between Spearman rs and Pearson rs for the same
    sets of numbers.
  • We should thus want to know, How well do Crystal
    Ball correlations match our intended
    Pearson-sense correlations?

10
February-June 2002
  • Can Crystal Ball? accurately simulate Pearson
    correlations?
  • Are there conditions or practices that contribute
    to better or worse accuracy performance?

11
The General Approach (1)
  • Define 33 variates and their probability
    distributions in an ExcelÒ spreadsheet using
    Crystal Ball?assumption cells.
  • Link 33 other cells one-to-one to the assumption
    cells. Make them Crystal Ball forecast cells.
    Crystal Ball will record the varying assumption
    cell values and collect statistics on them via
    the equated forecast cells.

12
The General Approach (2)
  • Define a target correlation matrix.

13
The General Approach (3)
  • Configure the simulation using the Crystal Ball?
    Run Preferences menu.

14
The General Approach (4)
  • Run 10,000 simulation trials.

15
The General Approach (5)
Extract the forecast cell outputs
16
The General Approach (6)
Examine the simulated correlations using the
correlation tool under the Tools-Data Analysis
(add-in) menu or
17
The General Approach (7)
the MS ExcelÒ correl(), correlation function.
18
The General Approach (8)
  • Compare the simulated Pearson correlations with
    their respective target correlations.

19
The First Study (1)
  • Thirty-three variates allow 528 pairwise
    correlations for accuracy tests.
  • Identical target correlations among the
    variables.
  • Identical triangular (0,0.25, 1.0) probability
    distributions slightly right skewed with mode
    0.25, mean 1.25/3 ? 0.42.

20
The Tr (0, 0.25, 1.0) Distribution
21
The First Study (2)
  • Correlation sample 10,000 that is, apply the
    correlation algorithm to the entire set of
    numbers.
  • If correlation sample 1000 Crystal Ball
    applies the algorithm 10 times, to batches
    comprising 1000 trials per variable and 33
    variables.

22
The First Study (3)
  • 3 x 4 study design run the 10,000 simulation
    trials under 12 separate conditions
  • target correlation 0.25, 0.50, or 0.75.
  • starting seed 1 2 1,048,576 or 2,097,152
    the four number streams are nonoverlapping over
    their first 2 million members.

23
First Study Results (1)
  • More than 98 of the 6336 simulated correlations
    were within 0.03 of the target all but 5 were
    within 0.05 of the target all were within 0.06
    of the target.
  • Nearly 75 of the simulated correlations were
    less than their target this varied only
    negligibly over the three targets.

24
The Second Study (1)
The first study related every variable to every
other variable. Did this highly connected
correlation network 528 nonzero correlations
interconnecting all 33 variable pairs drive the
correlation accuracy results?
25
The Second Study (2)
  • Assigned nonzero correlations only to (x1, x2),
    (x3, x4), (x31, x32), reducing the correlation
    yield from 528 to 16 in each replication.
  • Other second study design choices are identical
    to their first study counterparts.

26
Second Study Results (1)
  • All of the 192 simulated correlations were within
    0.02 of their target.
  • All of the simulated correlations were less than
    the target.

27
The Third Study (1)
  • Are there conditions or practices that worsen
    accuracy performance?
  • Simulating correlations requires a large sample
    of random values generated ahead of time. The
    values in the samples are rearranged to create
    the desired correlations. If the correlation
    sample size is smaller than the total number of
    trials a next group of samples is generated and
    correlated.
  • Crystal Ball 2000 Users Manual. pp. 246-7.

28
The Third Study (2)
  • Are there conditions or practices that worsen
    accuracy performance?
  • The sample size is initially set to 500. While
    any sample size greater than 100 should produce
    sufficiently acceptable results, you can set this
    number higher to maximize accuracy. The increased
    accuracy resulting from the use of larger
    samples, however, requires additional memory and
    reduces overall system responsiveness. If either
    of these become an issue, reduce the sample size.
  • Crystal Ball 2000 Users Manual. pp. 246-7.

29
The Third Study (3)
  • Correlation sample size 100 configure
    Crystal Ball to apply the correlation algorithm
    100 times, to batches comprising 100 trials per
    variable and 33 variables.
  • Other third study design choices were identical
    to their first study counterparts.

30
Third Study Results (1) (First Study Results)
  • About 31 (98) of the 6336 simulated
    correlations were within 0.03 of the target 2817
    (5) were outside 0.05 of the targetall were
    within 0.29 (0.06) of the target.
  • About 92 (74) of the simulated correlations
    were less than the target.
  • Correlation accuracy worsened with target
    sizei.e., 0.25 accuracy lt 0.50 accuracy lt 0.75
    accuracy see the next slides.

31
Third Study Results (2)
32
Third Study Results (3) (First Study Results)
33
The Fourth Study (1)
  • Does increasing the correlation sample size from
    100 to 500 improve accuracy?
  • 500 is Crystal Balls installation default for
    the correlation sample The sample size is
    initially set to 500. Source Crystal Ball
    2000Users Manual. pp. 246-7 also see the
    Trials tab in the Crystal Ball? Run
    Preferences menu.

34
The Fourth Study (2)
  • Correlation sample size of 500 i.e.,
    configure Crystal Ball to apply the correlation
    algorithm 20 times, to successive batches
    comprising 500 trials per variable and 33
    variables.
  • Examine only target correlation 0.75, for which
    we catastrophically lost accuracy in the third
    study.
  • Other fourth study design choices were identical
    to their first and third study counterparts.

35
Fourth Study Results (1) (First/Third Study
Results for Target 0.75)
  • About 79 (99/2) of the 2112 simulated
    correlations were within 0.03 of the target
  • 100 were within 0.09 (0.05/0.29) of the target.
  • About 93 (74/92) of the simulated correlations
    were less than the target.

36
The Extended Fourth Study
  • Side-by-side examination of correlation sample
    sizes 100, 500, 1000, and 10,000 for target
    correlation 0.75 and accuracy bands 0.02,
    0.04, and 0.06.
  • Other fourth study design choices were identical
    to their first and third study counterparts.

37
Extended Fourth Study Results
38
FAST FORWARD TO THE PRESENT
39
The Fifth Study
  • Lognormal distributions with ? 1 and ? 0.50
    or 1.00.
  • Other study design choices similar to the earlier
    studies.
  • 33 variates ? 528 correlations
  • 10,000 trials
  • Four seeds 1, 2 1,048,576 2,097,152
  • Three correlation targets 0.25, 0.50, 0.75
  • Two correlation sample sizes 100 10,000.

40
Tails of Two Lognormals
? 1 ? 0.5
? 1 ? 1
41
Fifth Study Results (1)
Correlation error is worse when ? 1 than when ?
0.50 with some divergence over the three
correlation targets.
42
Fifth Study Results (2)
Average correlation error is worse when ? 1
than when ? 0.5 with some divergence over the
three correlation targets. Correlation sample N
10,000 e.g.., The average error in
simulating a 0.25 correlation was 0.062 for ? 1.
43
Fifth Study Results (3)
Average correlation error is worse when ? 1
than when ? 0.5 with some divergence over the
three correlation targets. Correlation sample N
100.
44
Fifth Study Results (4)
  • We observed the correlation sample-size effect
  • Correlation sample size has a negligible effect
    on average correlation error for 0.25 targets.
  • Average correlation error increases for 0.50
    targets and more yet for 0.75 targets as we
    switch from the 10,000-trial correlation sample
    size to 100.

45
Fifth Study Results (5)
Correlation sample size has a negligible effect
on average correlation error for 0.25 targets but
does affect it for 0.50 targets more yet for
0.75 targets. ? 0.5
correl. sample N 100
correl. sample N 10,000
46
Fifth Study Results (6)
Correlation sample size has a negligible effect
on average correlation error for 0.25 targets but
does affect it for 0.50 targets more yet for
0.75 targets. ? 1
correl. sample N 100
correl. sample N 10,000
47
Fifth Study Results (7)
? 1 N100
? 0.5 N100
? 1 N10K
? 0.5 N10K
48
Fifth Study Results (8)
  • We observed related accuracy loss results for
    other performance measures as well
  • Percentage of simulated correlations within X of
    their target.
  • Average absolute correlation error.
  • Percentage of simulated correlations below the
    target.
  • Maximum correlation error.
  • Minimum correlation error.
  • Simulated correlation sample variance.

49
Fifth Study Results (9)
50
Fifth Study Results (10)
51
Fifth Study Results (11)
52
Fifth Study Results (12)
53
The Sixth Study
  • Does increasing the number of trials improve
    accuracy?
  • Are the accuracy differences between 100-trial
    correlation samples and 10,000-trial correlation
    samples due to batch size or batch percentage ?
    e.g., 10,000 vs. 100 or 100 vs. 1?
  • 10,000 trials with 100 vs. 10,000-trial batches
    vs.
  • 30, 000 trails with 300 vs. 30,000-trial
    batches
  • One correlation target 0.75
  • 33 variates ? 528 correlations
  • Four seeds 1, 2 1,048,576 2,097,152.

54
Sixth Study Results (1)
55
Sixth Study Results (2)
Correlation sample
N 100 out of 10K trials
N 300 out of 30K trials
N 30K out of 30K trials N 10K out of 10K
trials
56
Sixth Study Results (3)
57
Sixth Study Results (4)
58
Sixth Study Results (5)
59
Sixth Study Results (6)
60
Sixth Study Results (7)
61
Lesson Learned (1)
  • Dont fear rank order correlation as a general
    principle Crystal Ball? simulated reasonably
    accurate Pearson correlations in the first,
    second, and parts of the third, fourth, fifth,
    and sixth studies.
  • This was surprising given the theoryonly
    showing that we really dont understand the
    theory.
  • Accuracy error increased noticeably after
    reducing the correlation sample size from 100 of
    trials to lesser percentages.
  • Accuracy losses occurred asymmetrically in the
    larger correlations, hardly appearing in the 0.25
    correlation results. Interactions are always
    interesting.

62
Lesson Learned (2)
  • There was a clear tendency to undershoot target
    correlations.
  • This may have been predictable. However, we
    still dont understand the underlying theory well
    enough to confidently predict this result.
  • Accuracy losses are subject to some analyst
    control via optional levels for the correlation
    sample size.
  • The purpose of this option was to reduce
    simulation demands on desktop computers running
    late 1990s technology. We may no longer need this
    option to obtain satisfactory run times. Set the
    correlation sample size as large as is
    practicable.

63
Lesson Learned (3)
  • Increasing the number of trials improved accuracy
    in the less-than-100 correlation sample size
    conditions but not in the 100 conditions.
  • Interactions are always interesting.
  • More skewed probability distributions were more
    inaccuracy prone than less skewed probability
    distributions.
  • This may have been predictable. However, we
    still dont understand the underlying theory well
    enough to confidently predict this result.

64
Acknowledgements
To Ed Miller for initially encouraging these
studies. To Eric Wainwright and Decisioneering,
Inc. for supplying us a Crystal Ball?.
65
References
Decisioneering, Inc. Crystal Ball 2000? User
Manual. 1998-2000. Iman, R.L. and W.J. Conover. A
distribution-free approach to inducing rank
correlation among input variables. Communications
in Statistics, B11 (3), pp. 311-334,
1982. Robinson, M. and S. Cole. Rank Correlation
inCrystal Ball? Simulations (or How We Overcame
Our Fear of Spearmans R in Cost Risk Analyses).
Presented at the 76th Space Systems Cost Analysis
Group meeting, San Piedro, CA, February 2002 and
at the 3rd Joint Annual ISPA-SCEA International
Conference and Educational Workshop , Scottsdale,
AZ, June 2002.
66
The End
Write a Comment
User Comments (0)
About PowerShow.com