Title: Presented to the SCEA 2002 National Conference
1An Implementation of the Lurie-Goldberg Algorithm
in Schedule Risk Analysis
- Presented to the SCEA 2002 National Conference
- Scottsdale, Arizona, 14 June 2002
- Jim Price
- C/S Solutions, Inc.
2Introduction
- Risk 2.0 is a schedule risk analysis add-in to
Microsoft Project - The product implements the Lurie-Goldberg1
algorithms for generating correlated variables
and for adjusting inconsistent correlation
matrices - Details follow
1An Approximate Method for Sampling Correlated
Random Variables from Partially-Specified
Distributions, Management Science, Vol 44, No. 2,
February 1998, pp 203-218.
3Overview
- How correlation enters a Monte-Carlo simulation
- Lurie-Goldberg adjustment procedure for
inconsistent correlation matrices - Non-iterated and iterated (Lurie-Goldberg)
methods for generating correlated random
variables - Implementation and performance factors
- Why correlation matters in schedule risk analysis
- Addendum What about rank correlation?
4Schedule Risk Analysis
- Instead of a single-point estimate for duration,
the user supplies - A three-point estimate
- minimum, most likely, and maximum
- A distribution curve
- uniform, triangular, normal, beta, other
- Instead of a single completion date, output is a
range of completion dates, with associated
probabilities of occurrence
5A Schedule Without Risk
Input
Output
6A Schedule With Risk
Input
Output
7Consider the Possibilities
- In the presence of risk, each of Design-Code-Test
can take on 7200 possible values (in minutes) - Leads to 3.7 x 1011 possible outcomes
- Not all are distinct, nor equally likely
- We would like to know how they are distributed
- Too many to enumerate, so the distribution of
finish dates is determined by sampling - Literally, take samples until a clear picture of
the underlying distribution emerges
8Monte Carlo Simulation
- Select and apply a random (remaining) duration
from the specified distribution of each
lowest-level task - Recalculate the network
- Accumulate summary-level statistics
- Repeat until the underlying distribution is
revealed
9Random Number Generation
- Sampling requires that we generate random numbers
from specified distributions - General technique is known as the inversion
method - Requires only the cumulative distribution
function (CDF) of the desired distribution
10The Inversion Method
- Pick a random number between 0 and 1
- Feed that value to inverse CDF to get random
sample from specified distribution - Events bunch where the slope of the CDF is
large
11Demo
- In which the inversion method is shown
- Excel (SCEA1.xls)
- Risk (CORR0.mpp)
12Are We Done?
- Yes, if individual tasks are independent
- We know how to generate independent random
durations for lowest-level tasks - No, if individual tasks are correlated
- Durations at lowest level can no longer be
generated in isolation
13The Problem In Principle
- Instead of N single-variable probability
distributions, we are modeling a single
N-variable probability distribution - Our three-point estimates and distribution curves
define the marginal distributions of the
underlying multivariate distribution - At least, we hope they do
- To capture the interrelationships, we also need
to specify the correlations among tasks
14The Problem In Practice
- Both plots below show Code vs. Test durations,
for the distributions specified - Simple application of inversion method produces
the plot on the left values are uncorrelated - We want the plot on the right values are
correlated
15What Is Correlation?
- Literally co-relation
- Two or more variables share a relationship
- Mathematically measures dependence
- P(A, B) P(A) P(B) something
- If independent, something is zero the joint
distribution is just the product of the marginal
distributions
16How Is Correlation Measured?
- Let me count the ways
- Product-moment (Pearson)
- Rank (Spearman, Kendall, )
- Others
- Product-moment correlation is the appropriate
measure for cost-schedule risk analysis - Durations and costs are interval, not ordinal
measures - It arises naturally in the consideration of the
variance of sums of random variables, e.g.
summary task durations and costs - Whether product-moment correlation can be well
modeled by other measures is a matter of debate
17Product-Moment Correlation
- Covariance of X and Y
- Cov(X, Y) E(X - ?x) (Y- ?y)
- EXY - ?x ?y Cov(Y, X)
- Intimately related to Variance
- Var(X) E(X - ?x)2 Cov(X, X)
- Scale by standard deviation ? to give the
correlation coefficient, ? - ?xy Cov(X, Y) / ?x?y
18Correlation Matrices
- Suppose M is a k x m vector whose columns Mi are
random variables - If we form Vi (Mi - ?i) / ?i, it follows from
the definitions above that - C (1/k) VTV
- is a matrix of correlation coefficients
- If Mi are uncorrelated, then C I, the identity
matrix
19The Cholesky Decomposition
- For any correlation matrix C, a lower triangular
matrix L can be found such that - C LLT
- L is called the Cholesky decomposition of the
matrix C
20Inconsistent Correlation Matrices
- Correlations cannot be specified arbitrarily
- If two tasks are each correlated with a third,
they are necessarily correlated with each other - Mathematically, a consistent correlation matrix
is positive semi-definite - Eigenvalues are non-negative
- Determinants of principal minors are non-negative
- Inconsistent matrices are easy to produce
- Lurie and Goldberg describe an adjustment
procedure for inconsistent matrices
21Matrix Adjustment Procedure
- Generate a correlation matrix R LLT for an
initial lower triangular matrix L - Calculate the weighted difference between the
terms of R and the original matrix C - D ½ ?? wij(cij rij)2
- Minimize D by iterating over the terms in L,
constrain to ensure unit diagonals in R - The weights W are supplied by the user
- Less certain terms should be much smaller than
the default value of 1.0
22Demo
- In which the Lurie-Goldberg matrix adjustment
procedure is shown - Excel (SCEA6.xls)
- Risk (SCEA.mpp)
23Fun With Numbers
- If N is any k x m matrix with standardized,
uncorrelated columns, then V NLT is also in
standard form, and NTN kI - If C is a correlation matrix and C LLT, it
follows that - VTV (NLT)TNLT LNTNLT LkILT kLLT kC
- Therefore, the columns of V NLT are correlated
according to the matrix C
24Is V the Object of Our Desire?
- V NLT has the desired correlations, by
construction - V is in standard form, so mean and standard
deviation are easily adjusted - But, unless the columns of N are standard
normals, the distributions Vi will not match Ni - Vi li1N1 li2N2 liiNi
- A linear combination of normals is itself normal,
but this is not generally true
25Demo
- In which the desired correlations, but not
necessarily the desired marginal distributions,
are obtained - Excel (SCEA2.xls)
26More Fun with Numbers
- Well, we can generate correlated normals
- Recall inversion method we might be able to do
something with correlated uniforms - Easy to generate just apply the standard normal
CDF to our correlated standard normals, - U ?(NLT)
- Then apply desired inverse CDF to U
- V F-1(U) F-1(?(NLT))
27The Version-Inversion Method
- Correlated normals on x-axis become correlated
uniforms on the y-axis, then correlated whatever
back on the x-axis - At least one of these transformations is
non-linear
28Is V the Object of Our Desire?
- V F-1(?(NLT)) has the desired distributions, by
construction - Any uniform distribution will work as well as any
other as input to the inversion method - But, the transformations are non-linear, so
correlations are not necessarily preserved - Once again, the method is exact only for normal
distributions
29Demo
- In which marginal distributions are preserved,
but correlations are not - Excel (SCEA3.xls)
- Risk (CORR1.mpp)
30Sampling Variability
- In any Monte Carlo simulation, variations are
expected from run to run - A different subset of the underlying distribution
is sampled each time - Sampling variability can be greatly reduced by
starting with a clean set of normals - Method due to R.C.H. Cheng2 produces completely
uncorrelated normals with exactly zero mean and
unit variance
2 Generation of Multivariate Normal Samples with
Given Sample Mean and Covariance Matrix, J.
Statistical Computer Simulation, 1985, Vol 21, pp
39-49.
31Sample Results Un-iterated Method with Full
Sampling Variability (0)
32Sample Results Un-iterated Method with Reduced
Sampling Variability (3)
33The Lurie-Goldberg Algorithm
- Developed by Dr. Phillip Lurie and Dr. Matt
Goldberg - Iteratively modifies the Cholesky decomposition
matrix to achieve the best possible agreement
with target correlations - Basic and Modified implementations
34Steps of the Basic Method
- Begin as before (version-inversion)
- V F-1(?(NLT))
- Calculate the realized correlations R among the
columns of V - Calculate a scalar distance D between the
realized and target correlations C - D ½ ?? (cij rij)2
- (sum taken over super-diagonal terms only)
- Minimize D by iterating over the terms in L,
constrain to ensure unit diagonals in R
35Demo
- In which the Lurie-Goldberg Basic algorithm is
shown - Excel (SCEA4.xls)
- Risk (CORR1.mpp)
36Sample Results Iterated Method without Sampling
Variability (1)
37Modified Sampling Algorithm
- As noted, a Monte Carlo simulation will
experience run-to-run variations - The basic Lurie-Goldberg algorithm iterates out
this natural variability - Variation appears in solution time, not result
- To re-introduce sampling variability, LG suggest
a modified algorithm - For schedule risk analysis, primary advantage is
probably greater speed, not greater variability
38Steps of the Modified Method
- Run the basic algorithm with a limited number of
samples (e.g. 100) - N is constructed to have exactly zero mean and
unit variance (again, Chengs method) - Use modified decomposition matrix L to generate
the full set of samples - Once again, V F-1(?(NLT)), but with the
iterated matrix for L and a full set (e.g. 1000)
of uncorrelated normals for N
39Sample Results Iterated Method with Full
Sampling Variability (2)
40Comparison of Results by Method (Costs)
2500 Samples
- Sample problem from presentation by Book and
Young to 24th DODCAS, Sept 1990 - 4 skewed triangular distributions
- Apparently arbitrary correlations, 0.9-0.5
- Cross check by analytic fit to beta distribution
41Comparison of Results by Method (Completion Dates)
2500 Samples
- Design-Code-Test sample
- 3 skewed triangular distributions
- All correlations 0.8
42Implementation Details
- Custom non-linear constrained optimization
routine - Conjugate-gradient method with quadratic penalty
terms, some domain-specific enhancements - Similar to Gauss-Newton method analyzed by Lurie
and Goldberg - Both are successive line-minimization methods
with quadratic convergence - Conjugate-gradient method requires less storage
43Performance Factors
- Number of correlations
- Minimization routine searches O(m2) dimensions
- Number of samples
- Distance function requires O(nm2) multiplications
and O(nm) marginal distribution calculations per
evaluation - Unrealizable problem specification
- Results in extended fruitless searching
- More likely to be a problem when marginal
distributions are far from normal
44Solution Time is a Random Variable
Sampling variability is not eliminated it
manifests as variation in solution time
45Why All the Fuss?
- In schedule analysis, the durations of summary
tasks are sums of random variables - Variance of sums of random variables is affected
by correlation - Var(XY) Var(X) Var(Y) 2 Cov(X, Y)
- If Cov(X, Y) gt 0, neglecting correlation leads to
an underestimate of the standard deviation of
summary tasks
46Demo
- In which the effect of correlation on
summary-level standard deviation is shown - Risk (CORRDEMO.mpp)
47Summary
- How correlation enters a Monte-Carlo simulation
- Lurie-Goldberg adjustment procedure for
inconsistent correlation matrices - Non-iterated and iterated (Lurie-Goldberg)
methods for generating correlated random
variables - Implementation and performance factors
- Why correlation matters in schedule risk analysis
48Addendum -- What about Rank Correlation?
- Commercial software with rank-correlation
capability is often used to model product-moment
correlation - Point Rank correlation is not product-moment
correlation, and using it can lead to misleading
(if not meaningless) results. - Counter-point So you say, but it seems to work
pretty well. Go figure.
49How Are These Rank Correlated Numbers Generated?
- Almost certainly, by an algorithm due to R.L.
Iman and W.J. Conover3 - But ask your vendor, to be sure
- Observation The Iman-Conover algorithm uses
product-moment correlation to approximate rank
correlation - Yes, really
3A Distribution-Free Approach to Inducing Rank
Correlation Among Input Variables, Communications
in Statistics - Simulation and Computation, Vol
11, 1982, pp 311-334.
50The Iman-Conover Algorithm
- Generate a set of uncorrelated normal samples
- Multiply by the Cholesky decomposition of the
target correlation matrix to produce correlated
normal samples - Determine the rank order of the correlated normal
samples - Generate uncorrelated samples from any desired
set of target distributions - Rearrange the target samples so they have the
same rank-ordering as the correlated normal
samples
51Iman-Conover -- Commentary
- The observed rank correlations will always be
approximate - Rank correlation is not product-moment
correlation - If the target distributions are normal, the
product-moment correlations will be exact (within
sampling error) - If the target distributions are non-normal, the
resulting product-moment correlations will be
inexact, in the now-familiar way - Reordering the target distributions is a
non-linear transformation comparable to
version-inversion
52Sample Results Rank Correlations from
Iman-Conover
53Sample Results Product-Moment Correlations from
Iman-Conover
54The Lurie-Goldberg-Iman-Conover Method (LGIC)
- Iman-Conover is ultimately based on the Cholesky
decomposition - Consequently, the Lurie-Goldberg algorithm is
easily extended to incorporate Iman-Conover - Simply substitute IC rank ordering technique for
the version-inversion method - Calculate observed product-moment correlations
and iterate as before - Of course, we could iterate to improve rank
correlation just as easily, if desired
55Sample Results Product-Moment Correlations from
LGIC
56Sample Results Rank Correlations from LGIC
57Demo
- In which the Lurie-Goldberg-Iman-Conover
algorithm is exercised - Risk (CORR1.mpp)
58Comparison LG vs LGIC
- For equal number of samples, Lurie-Goldberg (LG)
is significantly faster than Lurie-Goldberg-Iman-C
onover (LGIC) - For common distributions, it is faster to
generate samples than to sort them - But, LGIC supports Latin Hypercube sampling,
which can reduce the number of samples required - Not to mention producing prettier distributions
- LGIC has an inherent limit in resolution,
especially for very small sample sizes - Small refinements to Cholesky decomposition may
not result in a new ordering - Effect is unnoticeable unless target correlations
are already at the edge of feasibility
59Summary (of Addendum)
- The Iman-Conover algorithm for generating
rank-correlated samples uses product-moment
correlation under the hood - Hence its rather unexpected success in modeling
product-moment correlations - As with all (Cholesky-based) algorithms, it will
be particularly good for normal and near-normal
distributions - The Lurie-Goldberg algorithm is easily extended
to incorporate the Iman-Conover algorithm - Combined approach is slower than the original for
common distributions, but capable of comparable
accuracy - Combined approach supports Latin Hypercube
sampling
60Questions?
- The Excel files used in this demonstration are
available on request. Please email
jprice_at_cs-solutions.com