Presented to the SCEA 2002 National Conference - PowerPoint PPT Presentation

1 / 60

About This Presentation

Title:

Presented to the SCEA 2002 National Conference

Description:

Presented to the SCEA 2002 National Conference. Scottsdale, Arizona, 14 June 2002 ... take samples until a clear picture of the underlying distribution emerges ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 61

Provided by: jimp169

Category:

more less

Transcript and Presenter's Notes

Title: Presented to the SCEA 2002 National Conference

1
An Implementation of the Lurie-Goldberg Algorithm
in Schedule Risk Analysis

Presented to the SCEA 2002 National Conference
Scottsdale, Arizona, 14 June 2002
Jim Price
C/S Solutions, Inc.

2
Introduction

Risk 2.0 is a schedule risk analysis add-in to
Microsoft Project
The product implements the Lurie-Goldberg1
algorithms for generating correlated variables
and for adjusting inconsistent correlation
matrices
Details follow

1An Approximate Method for Sampling Correlated
Random Variables from Partially-Specified
Distributions, Management Science, Vol 44, No. 2,
February 1998, pp 203-218.
3
Overview

How correlation enters a Monte-Carlo simulation
Lurie-Goldberg adjustment procedure for
inconsistent correlation matrices
Non-iterated and iterated (Lurie-Goldberg)
methods for generating correlated random
variables
Implementation and performance factors
Why correlation matters in schedule risk analysis
Addendum What about rank correlation?

4
Schedule Risk Analysis

Instead of a single-point estimate for duration,
the user supplies
A three-point estimate
minimum, most likely, and maximum
A distribution curve
uniform, triangular, normal, beta, other
Instead of a single completion date, output is a
range of completion dates, with associated
probabilities of occurrence

5
A Schedule Without Risk
Input
Output
6
A Schedule With Risk
Input
Output
7
Consider the Possibilities

In the presence of risk, each of Design-Code-Test
can take on 7200 possible values (in minutes)
Leads to 3.7 x 1011 possible outcomes
Not all are distinct, nor equally likely
We would like to know how they are distributed
Too many to enumerate, so the distribution of
finish dates is determined by sampling
Literally, take samples until a clear picture of
the underlying distribution emerges

8
Monte Carlo Simulation

Select and apply a random (remaining) duration
from the specified distribution of each
lowest-level task
Recalculate the network
Accumulate summary-level statistics
Repeat until the underlying distribution is
revealed

9
Random Number Generation

Sampling requires that we generate random numbers
from specified distributions
General technique is known as the inversion
method
Requires only the cumulative distribution
function (CDF) of the desired distribution

10
The Inversion Method

Pick a random number between 0 and 1
Feed that value to inverse CDF to get random
sample from specified distribution
Events bunch where the slope of the CDF is
large

11
Demo

In which the inversion method is shown
Excel (SCEA1.xls)
Risk (CORR0.mpp)

12
Are We Done?

Yes, if individual tasks are independent
We know how to generate independent random
durations for lowest-level tasks
No, if individual tasks are correlated
Durations at lowest level can no longer be
generated in isolation

13
The Problem In Principle

Instead of N single-variable probability
distributions, we are modeling a single
N-variable probability distribution
Our three-point estimates and distribution curves
define the marginal distributions of the
underlying multivariate distribution
At least, we hope they do
To capture the interrelationships, we also need
to specify the correlations among tasks

14
The Problem In Practice

Both plots below show Code vs. Test durations,
for the distributions specified
Simple application of inversion method produces
the plot on the left values are uncorrelated
We want the plot on the right values are
correlated

15
What Is Correlation?

Literally co-relation
Two or more variables share a relationship
Mathematically measures dependence
P(A, B) P(A) P(B) something
If independent, something is zero the joint
distribution is just the product of the marginal
distributions

16
How Is Correlation Measured?

Let me count the ways
Product-moment (Pearson)
Rank (Spearman, Kendall, )
Others
Product-moment correlation is the appropriate
measure for cost-schedule risk analysis
Durations and costs are interval, not ordinal
measures
It arises naturally in the consideration of the
variance of sums of random variables, e.g.
summary task durations and costs
Whether product-moment correlation can be well
modeled by other measures is a matter of debate

17
Product-Moment Correlation

Covariance of X and Y
Cov(X, Y) E(X - ?x) (Y- ?y)
EXY - ?x ?y Cov(Y, X)
Intimately related to Variance
Var(X) E(X - ?x)2 Cov(X, X)
Scale by standard deviation ? to give the
correlation coefficient, ?
?xy Cov(X, Y) / ?x?y

18
Correlation Matrices

Suppose M is a k x m vector whose columns Mi are
random variables
If we form Vi (Mi - ?i) / ?i, it follows from
the definitions above that
C (1/k) VTV
is a matrix of correlation coefficients
If Mi are uncorrelated, then C I, the identity
matrix

19
The Cholesky Decomposition

For any correlation matrix C, a lower triangular
matrix L can be found such that
C LLT
L is called the Cholesky decomposition of the
matrix C

20
Inconsistent Correlation Matrices

Correlations cannot be specified arbitrarily
If two tasks are each correlated with a third,
they are necessarily correlated with each other
Mathematically, a consistent correlation matrix
is positive semi-definite
Eigenvalues are non-negative
Determinants of principal minors are non-negative
Inconsistent matrices are easy to produce
Lurie and Goldberg describe an adjustment
procedure for inconsistent matrices

21
Matrix Adjustment Procedure

Generate a correlation matrix R LLT for an
initial lower triangular matrix L
Calculate the weighted difference between the
terms of R and the original matrix C
D ½ ?? wij(cij rij)2
Minimize D by iterating over the terms in L,
constrain to ensure unit diagonals in R
The weights W are supplied by the user
Less certain terms should be much smaller than
the default value of 1.0

22
Demo

In which the Lurie-Goldberg matrix adjustment
procedure is shown
Excel (SCEA6.xls)
Risk (SCEA.mpp)

23
Fun With Numbers

If N is any k x m matrix with standardized,
uncorrelated columns, then V NLT is also in
standard form, and NTN kI
If C is a correlation matrix and C LLT, it
follows that
VTV (NLT)TNLT LNTNLT LkILT kLLT kC
Therefore, the columns of V NLT are correlated
according to the matrix C

24
Is V the Object of Our Desire?

V NLT has the desired correlations, by
construction
V is in standard form, so mean and standard
deviation are easily adjusted
But, unless the columns of N are standard
normals, the distributions Vi will not match Ni
Vi li1N1 li2N2 liiNi
A linear combination of normals is itself normal,
but this is not generally true

25
Demo

In which the desired correlations, but not
necessarily the desired marginal distributions,
are obtained
Excel (SCEA2.xls)

26
More Fun with Numbers

Well, we can generate correlated normals
Recall inversion method we might be able to do
something with correlated uniforms
Easy to generate just apply the standard normal
CDF to our correlated standard normals,
U ?(NLT)
Then apply desired inverse CDF to U
V F-1(U) F-1(?(NLT))

27
The Version-Inversion Method

Correlated normals on x-axis become correlated
uniforms on the y-axis, then correlated whatever
back on the x-axis
At least one of these transformations is
non-linear

28
Is V the Object of Our Desire?

V F-1(?(NLT)) has the desired distributions, by
construction
Any uniform distribution will work as well as any
other as input to the inversion method
But, the transformations are non-linear, so
correlations are not necessarily preserved
Once again, the method is exact only for normal
distributions

29
Demo

In which marginal distributions are preserved,
but correlations are not
Excel (SCEA3.xls)
Risk (CORR1.mpp)

30
Sampling Variability

In any Monte Carlo simulation, variations are
expected from run to run
A different subset of the underlying distribution
is sampled each time
Sampling variability can be greatly reduced by
starting with a clean set of normals
Method due to R.C.H. Cheng2 produces completely
uncorrelated normals with exactly zero mean and
unit variance

2 Generation of Multivariate Normal Samples with
Given Sample Mean and Covariance Matrix, J.
Statistical Computer Simulation, 1985, Vol 21, pp
39-49.
31
Sample Results Un-iterated Method with Full
Sampling Variability (0)
32
Sample Results Un-iterated Method with Reduced
Sampling Variability (3)
33
The Lurie-Goldberg Algorithm

Developed by Dr. Phillip Lurie and Dr. Matt
Goldberg
Iteratively modifies the Cholesky decomposition
matrix to achieve the best possible agreement
with target correlations
Basic and Modified implementations

34
Steps of the Basic Method

Begin as before (version-inversion)
V F-1(?(NLT))
Calculate the realized correlations R among the
columns of V
Calculate a scalar distance D between the
realized and target correlations C
D ½ ?? (cij rij)2
(sum taken over super-diagonal terms only)
Minimize D by iterating over the terms in L,
constrain to ensure unit diagonals in R

35
Demo

In which the Lurie-Goldberg Basic algorithm is
shown
Excel (SCEA4.xls)
Risk (CORR1.mpp)

36
Sample Results Iterated Method without Sampling
Variability (1)
37
Modified Sampling Algorithm

As noted, a Monte Carlo simulation will
experience run-to-run variations
The basic Lurie-Goldberg algorithm iterates out
this natural variability
Variation appears in solution time, not result
To re-introduce sampling variability, LG suggest
a modified algorithm
For schedule risk analysis, primary advantage is
probably greater speed, not greater variability

38
Steps of the Modified Method

Run the basic algorithm with a limited number of
samples (e.g. 100)
N is constructed to have exactly zero mean and
unit variance (again, Chengs method)
Use modified decomposition matrix L to generate
the full set of samples
Once again, V F-1(?(NLT)), but with the
iterated matrix for L and a full set (e.g. 1000)
of uncorrelated normals for N

39
Sample Results Iterated Method with Full
Sampling Variability (2)
40
Comparison of Results by Method (Costs)
2500 Samples

Sample problem from presentation by Book and
Young to 24th DODCAS, Sept 1990
4 skewed triangular distributions
Apparently arbitrary correlations, 0.9-0.5
Cross check by analytic fit to beta distribution

41
Comparison of Results by Method (Completion Dates)
2500 Samples

Design-Code-Test sample
3 skewed triangular distributions
All correlations 0.8

42
Implementation Details

Custom non-linear constrained optimization
routine
Conjugate-gradient method with quadratic penalty
terms, some domain-specific enhancements
Similar to Gauss-Newton method analyzed by Lurie
and Goldberg
Both are successive line-minimization methods
with quadratic convergence
Conjugate-gradient method requires less storage

43
Performance Factors

Number of correlations
Minimization routine searches O(m2) dimensions
Number of samples
Distance function requires O(nm2) multiplications
and O(nm) marginal distribution calculations per
evaluation
Unrealizable problem specification
Results in extended fruitless searching
More likely to be a problem when marginal
distributions are far from normal

44
Solution Time is a Random Variable
Sampling variability is not eliminated it
manifests as variation in solution time
45
Why All the Fuss?

In schedule analysis, the durations of summary
tasks are sums of random variables
Variance of sums of random variables is affected
by correlation
Var(XY) Var(X) Var(Y) 2 Cov(X, Y)
If Cov(X, Y) gt 0, neglecting correlation leads to
an underestimate of the standard deviation of
summary tasks

46
Demo

In which the effect of correlation on
summary-level standard deviation is shown
Risk (CORRDEMO.mpp)

47
Summary

How correlation enters a Monte-Carlo simulation
Lurie-Goldberg adjustment procedure for
inconsistent correlation matrices
Non-iterated and iterated (Lurie-Goldberg)
methods for generating correlated random
variables
Implementation and performance factors
Why correlation matters in schedule risk analysis

48
Addendum -- What about Rank Correlation?

Commercial software with rank-correlation
capability is often used to model product-moment
correlation
Point Rank correlation is not product-moment
correlation, and using it can lead to misleading
(if not meaningless) results.
Counter-point So you say, but it seems to work
pretty well. Go figure.

49
How Are These Rank Correlated Numbers Generated?

Almost certainly, by an algorithm due to R.L.
Iman and W.J. Conover3
But ask your vendor, to be sure
Observation The Iman-Conover algorithm uses
product-moment correlation to approximate rank
correlation
Yes, really

3A Distribution-Free Approach to Inducing Rank
Correlation Among Input Variables, Communications
in Statistics - Simulation and Computation, Vol
11, 1982, pp 311-334.
50
The Iman-Conover Algorithm

Generate a set of uncorrelated normal samples
Multiply by the Cholesky decomposition of the
target correlation matrix to produce correlated
normal samples
Determine the rank order of the correlated normal
samples
Generate uncorrelated samples from any desired
set of target distributions
Rearrange the target samples so they have the
same rank-ordering as the correlated normal
samples

51
Iman-Conover -- Commentary

The observed rank correlations will always be
approximate
Rank correlation is not product-moment
correlation
If the target distributions are normal, the
product-moment correlations will be exact (within
sampling error)
If the target distributions are non-normal, the
resulting product-moment correlations will be
inexact, in the now-familiar way
Reordering the target distributions is a
non-linear transformation comparable to
version-inversion

52
Sample Results Rank Correlations from
Iman-Conover
53
Sample Results Product-Moment Correlations from
Iman-Conover
54
The Lurie-Goldberg-Iman-Conover Method (LGIC)

Iman-Conover is ultimately based on the Cholesky
decomposition
Consequently, the Lurie-Goldberg algorithm is
easily extended to incorporate Iman-Conover
Simply substitute IC rank ordering technique for
the version-inversion method
Calculate observed product-moment correlations
and iterate as before
Of course, we could iterate to improve rank
correlation just as easily, if desired

55
Sample Results Product-Moment Correlations from
LGIC
56
Sample Results Rank Correlations from LGIC
57
Demo

In which the Lurie-Goldberg-Iman-Conover
algorithm is exercised
Risk (CORR1.mpp)

58
Comparison LG vs LGIC

For equal number of samples, Lurie-Goldberg (LG)
is significantly faster than Lurie-Goldberg-Iman-C
onover (LGIC)
For common distributions, it is faster to
generate samples than to sort them
But, LGIC supports Latin Hypercube sampling,
which can reduce the number of samples required
Not to mention producing prettier distributions
LGIC has an inherent limit in resolution,
especially for very small sample sizes
Small refinements to Cholesky decomposition may
not result in a new ordering
Effect is unnoticeable unless target correlations
are already at the edge of feasibility

59
Summary (of Addendum)

The Iman-Conover algorithm for generating
rank-correlated samples uses product-moment
correlation under the hood
Hence its rather unexpected success in modeling
product-moment correlations
As with all (Cholesky-based) algorithms, it will
be particularly good for normal and near-normal
distributions
The Lurie-Goldberg algorithm is easily extended
to incorporate the Iman-Conover algorithm
Combined approach is slower than the original for
common distributions, but capable of comparable
accuracy
Combined approach supports Latin Hypercube
sampling