Presented to the SCEA 2002 National Conference - PowerPoint PPT Presentation

1 / 60
About This Presentation
Title:

Presented to the SCEA 2002 National Conference

Description:

Presented to the SCEA 2002 National Conference. Scottsdale, Arizona, 14 June 2002 ... take samples until a clear picture of the underlying distribution emerges ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 61
Provided by: jimp169
Category:

less

Transcript and Presenter's Notes

Title: Presented to the SCEA 2002 National Conference


1
An Implementation of the Lurie-Goldberg Algorithm
in Schedule Risk Analysis
  • Presented to the SCEA 2002 National Conference
  • Scottsdale, Arizona, 14 June 2002
  • Jim Price
  • C/S Solutions, Inc.

2
Introduction
  • Risk 2.0 is a schedule risk analysis add-in to
    Microsoft Project
  • The product implements the Lurie-Goldberg1
    algorithms for generating correlated variables
    and for adjusting inconsistent correlation
    matrices
  • Details follow

1An Approximate Method for Sampling Correlated
Random Variables from Partially-Specified
Distributions, Management Science, Vol 44, No. 2,
February 1998, pp 203-218.
3
Overview
  • How correlation enters a Monte-Carlo simulation
  • Lurie-Goldberg adjustment procedure for
    inconsistent correlation matrices
  • Non-iterated and iterated (Lurie-Goldberg)
    methods for generating correlated random
    variables
  • Implementation and performance factors
  • Why correlation matters in schedule risk analysis
  • Addendum What about rank correlation?

4
Schedule Risk Analysis
  • Instead of a single-point estimate for duration,
    the user supplies
  • A three-point estimate
  • minimum, most likely, and maximum
  • A distribution curve
  • uniform, triangular, normal, beta, other
  • Instead of a single completion date, output is a
    range of completion dates, with associated
    probabilities of occurrence

5
A Schedule Without Risk
Input
Output
6
A Schedule With Risk
Input
Output
7
Consider the Possibilities
  • In the presence of risk, each of Design-Code-Test
    can take on 7200 possible values (in minutes)
  • Leads to 3.7 x 1011 possible outcomes
  • Not all are distinct, nor equally likely
  • We would like to know how they are distributed
  • Too many to enumerate, so the distribution of
    finish dates is determined by sampling
  • Literally, take samples until a clear picture of
    the underlying distribution emerges

8
Monte Carlo Simulation
  • Select and apply a random (remaining) duration
    from the specified distribution of each
    lowest-level task
  • Recalculate the network
  • Accumulate summary-level statistics
  • Repeat until the underlying distribution is
    revealed

9
Random Number Generation
  • Sampling requires that we generate random numbers
    from specified distributions
  • General technique is known as the inversion
    method
  • Requires only the cumulative distribution
    function (CDF) of the desired distribution

10
The Inversion Method
  • Pick a random number between 0 and 1
  • Feed that value to inverse CDF to get random
    sample from specified distribution
  • Events bunch where the slope of the CDF is
    large

11
Demo
  • In which the inversion method is shown
  • Excel (SCEA1.xls)
  • Risk (CORR0.mpp)

12
Are We Done?
  • Yes, if individual tasks are independent
  • We know how to generate independent random
    durations for lowest-level tasks
  • No, if individual tasks are correlated
  • Durations at lowest level can no longer be
    generated in isolation

13
The Problem In Principle
  • Instead of N single-variable probability
    distributions, we are modeling a single
    N-variable probability distribution
  • Our three-point estimates and distribution curves
    define the marginal distributions of the
    underlying multivariate distribution
  • At least, we hope they do
  • To capture the interrelationships, we also need
    to specify the correlations among tasks

14
The Problem In Practice
  • Both plots below show Code vs. Test durations,
    for the distributions specified
  • Simple application of inversion method produces
    the plot on the left values are uncorrelated
  • We want the plot on the right values are
    correlated

15
What Is Correlation?
  • Literally co-relation
  • Two or more variables share a relationship
  • Mathematically measures dependence
  • P(A, B) P(A) P(B) something
  • If independent, something is zero the joint
    distribution is just the product of the marginal
    distributions

16
How Is Correlation Measured?
  • Let me count the ways
  • Product-moment (Pearson)
  • Rank (Spearman, Kendall, )
  • Others
  • Product-moment correlation is the appropriate
    measure for cost-schedule risk analysis
  • Durations and costs are interval, not ordinal
    measures
  • It arises naturally in the consideration of the
    variance of sums of random variables, e.g.
    summary task durations and costs
  • Whether product-moment correlation can be well
    modeled by other measures is a matter of debate

17
Product-Moment Correlation
  • Covariance of X and Y
  • Cov(X, Y) E(X - ?x) (Y- ?y)
  • EXY - ?x ?y Cov(Y, X)
  • Intimately related to Variance
  • Var(X) E(X - ?x)2 Cov(X, X)
  • Scale by standard deviation ? to give the
    correlation coefficient, ?
  • ?xy Cov(X, Y) / ?x?y

18
Correlation Matrices
  • Suppose M is a k x m vector whose columns Mi are
    random variables
  • If we form Vi (Mi - ?i) / ?i, it follows from
    the definitions above that
  • C (1/k) VTV
  • is a matrix of correlation coefficients
  • If Mi are uncorrelated, then C I, the identity
    matrix

19
The Cholesky Decomposition
  • For any correlation matrix C, a lower triangular
    matrix L can be found such that
  • C LLT
  • L is called the Cholesky decomposition of the
    matrix C

20
Inconsistent Correlation Matrices
  • Correlations cannot be specified arbitrarily
  • If two tasks are each correlated with a third,
    they are necessarily correlated with each other
  • Mathematically, a consistent correlation matrix
    is positive semi-definite
  • Eigenvalues are non-negative
  • Determinants of principal minors are non-negative
  • Inconsistent matrices are easy to produce
  • Lurie and Goldberg describe an adjustment
    procedure for inconsistent matrices

21
Matrix Adjustment Procedure
  • Generate a correlation matrix R LLT for an
    initial lower triangular matrix L
  • Calculate the weighted difference between the
    terms of R and the original matrix C
  • D ½ ?? wij(cij rij)2
  • Minimize D by iterating over the terms in L,
    constrain to ensure unit diagonals in R
  • The weights W are supplied by the user
  • Less certain terms should be much smaller than
    the default value of 1.0

22
Demo
  • In which the Lurie-Goldberg matrix adjustment
    procedure is shown
  • Excel (SCEA6.xls)
  • Risk (SCEA.mpp)

23
Fun With Numbers
  • If N is any k x m matrix with standardized,
    uncorrelated columns, then V NLT is also in
    standard form, and NTN kI
  • If C is a correlation matrix and C LLT, it
    follows that
  • VTV (NLT)TNLT LNTNLT LkILT kLLT kC
  • Therefore, the columns of V NLT are correlated
    according to the matrix C

24
Is V the Object of Our Desire?
  • V NLT has the desired correlations, by
    construction
  • V is in standard form, so mean and standard
    deviation are easily adjusted
  • But, unless the columns of N are standard
    normals, the distributions Vi will not match Ni
  • Vi li1N1 li2N2 liiNi
  • A linear combination of normals is itself normal,
    but this is not generally true

25
Demo
  • In which the desired correlations, but not
    necessarily the desired marginal distributions,
    are obtained
  • Excel (SCEA2.xls)

26
More Fun with Numbers
  • Well, we can generate correlated normals
  • Recall inversion method we might be able to do
    something with correlated uniforms
  • Easy to generate just apply the standard normal
    CDF to our correlated standard normals,
  • U ?(NLT)
  • Then apply desired inverse CDF to U
  • V F-1(U) F-1(?(NLT))

27
The Version-Inversion Method
  • Correlated normals on x-axis become correlated
    uniforms on the y-axis, then correlated whatever
    back on the x-axis
  • At least one of these transformations is
    non-linear

28
Is V the Object of Our Desire?
  • V F-1(?(NLT)) has the desired distributions, by
    construction
  • Any uniform distribution will work as well as any
    other as input to the inversion method
  • But, the transformations are non-linear, so
    correlations are not necessarily preserved
  • Once again, the method is exact only for normal
    distributions

29
Demo
  • In which marginal distributions are preserved,
    but correlations are not
  • Excel (SCEA3.xls)
  • Risk (CORR1.mpp)

30
Sampling Variability
  • In any Monte Carlo simulation, variations are
    expected from run to run
  • A different subset of the underlying distribution
    is sampled each time
  • Sampling variability can be greatly reduced by
    starting with a clean set of normals
  • Method due to R.C.H. Cheng2 produces completely
    uncorrelated normals with exactly zero mean and
    unit variance

2 Generation of Multivariate Normal Samples with
Given Sample Mean and Covariance Matrix, J.
Statistical Computer Simulation, 1985, Vol 21, pp
39-49.
31
Sample Results Un-iterated Method with Full
Sampling Variability (0)
32
Sample Results Un-iterated Method with Reduced
Sampling Variability (3)
33
The Lurie-Goldberg Algorithm
  • Developed by Dr. Phillip Lurie and Dr. Matt
    Goldberg
  • Iteratively modifies the Cholesky decomposition
    matrix to achieve the best possible agreement
    with target correlations
  • Basic and Modified implementations

34
Steps of the Basic Method
  • Begin as before (version-inversion)
  • V F-1(?(NLT))
  • Calculate the realized correlations R among the
    columns of V
  • Calculate a scalar distance D between the
    realized and target correlations C
  • D ½ ?? (cij rij)2
  • (sum taken over super-diagonal terms only)
  • Minimize D by iterating over the terms in L,
    constrain to ensure unit diagonals in R

35
Demo
  • In which the Lurie-Goldberg Basic algorithm is
    shown
  • Excel (SCEA4.xls)
  • Risk (CORR1.mpp)

36
Sample Results Iterated Method without Sampling
Variability (1)
37
Modified Sampling Algorithm
  • As noted, a Monte Carlo simulation will
    experience run-to-run variations
  • The basic Lurie-Goldberg algorithm iterates out
    this natural variability
  • Variation appears in solution time, not result
  • To re-introduce sampling variability, LG suggest
    a modified algorithm
  • For schedule risk analysis, primary advantage is
    probably greater speed, not greater variability

38
Steps of the Modified Method
  • Run the basic algorithm with a limited number of
    samples (e.g. 100)
  • N is constructed to have exactly zero mean and
    unit variance (again, Chengs method)
  • Use modified decomposition matrix L to generate
    the full set of samples
  • Once again, V F-1(?(NLT)), but with the
    iterated matrix for L and a full set (e.g. 1000)
    of uncorrelated normals for N

39
Sample Results Iterated Method with Full
Sampling Variability (2)
40
Comparison of Results by Method (Costs)
2500 Samples
  • Sample problem from presentation by Book and
    Young to 24th DODCAS, Sept 1990
  • 4 skewed triangular distributions
  • Apparently arbitrary correlations, 0.9-0.5
  • Cross check by analytic fit to beta distribution

41
Comparison of Results by Method (Completion Dates)
2500 Samples
  • Design-Code-Test sample
  • 3 skewed triangular distributions
  • All correlations 0.8

42
Implementation Details
  • Custom non-linear constrained optimization
    routine
  • Conjugate-gradient method with quadratic penalty
    terms, some domain-specific enhancements
  • Similar to Gauss-Newton method analyzed by Lurie
    and Goldberg
  • Both are successive line-minimization methods
    with quadratic convergence
  • Conjugate-gradient method requires less storage

43
Performance Factors
  • Number of correlations
  • Minimization routine searches O(m2) dimensions
  • Number of samples
  • Distance function requires O(nm2) multiplications
    and O(nm) marginal distribution calculations per
    evaluation
  • Unrealizable problem specification
  • Results in extended fruitless searching
  • More likely to be a problem when marginal
    distributions are far from normal

44
Solution Time is a Random Variable
Sampling variability is not eliminated it
manifests as variation in solution time
45
Why All the Fuss?
  • In schedule analysis, the durations of summary
    tasks are sums of random variables
  • Variance of sums of random variables is affected
    by correlation
  • Var(XY) Var(X) Var(Y) 2 Cov(X, Y)
  • If Cov(X, Y) gt 0, neglecting correlation leads to
    an underestimate of the standard deviation of
    summary tasks

46
Demo
  • In which the effect of correlation on
    summary-level standard deviation is shown
  • Risk (CORRDEMO.mpp)

47
Summary
  • How correlation enters a Monte-Carlo simulation
  • Lurie-Goldberg adjustment procedure for
    inconsistent correlation matrices
  • Non-iterated and iterated (Lurie-Goldberg)
    methods for generating correlated random
    variables
  • Implementation and performance factors
  • Why correlation matters in schedule risk analysis

48
Addendum -- What about Rank Correlation?
  • Commercial software with rank-correlation
    capability is often used to model product-moment
    correlation
  • Point Rank correlation is not product-moment
    correlation, and using it can lead to misleading
    (if not meaningless) results.
  • Counter-point So you say, but it seems to work
    pretty well. Go figure.

49
How Are These Rank Correlated Numbers Generated?
  • Almost certainly, by an algorithm due to R.L.
    Iman and W.J. Conover3
  • But ask your vendor, to be sure
  • Observation The Iman-Conover algorithm uses
    product-moment correlation to approximate rank
    correlation
  • Yes, really

3A Distribution-Free Approach to Inducing Rank
Correlation Among Input Variables, Communications
in Statistics - Simulation and Computation, Vol
11, 1982, pp 311-334.
50
The Iman-Conover Algorithm
  • Generate a set of uncorrelated normal samples
  • Multiply by the Cholesky decomposition of the
    target correlation matrix to produce correlated
    normal samples
  • Determine the rank order of the correlated normal
    samples
  • Generate uncorrelated samples from any desired
    set of target distributions
  • Rearrange the target samples so they have the
    same rank-ordering as the correlated normal
    samples

51
Iman-Conover -- Commentary
  • The observed rank correlations will always be
    approximate
  • Rank correlation is not product-moment
    correlation
  • If the target distributions are normal, the
    product-moment correlations will be exact (within
    sampling error)
  • If the target distributions are non-normal, the
    resulting product-moment correlations will be
    inexact, in the now-familiar way
  • Reordering the target distributions is a
    non-linear transformation comparable to
    version-inversion

52
Sample Results Rank Correlations from
Iman-Conover
53
Sample Results Product-Moment Correlations from
Iman-Conover
54
The Lurie-Goldberg-Iman-Conover Method (LGIC)
  • Iman-Conover is ultimately based on the Cholesky
    decomposition
  • Consequently, the Lurie-Goldberg algorithm is
    easily extended to incorporate Iman-Conover
  • Simply substitute IC rank ordering technique for
    the version-inversion method
  • Calculate observed product-moment correlations
    and iterate as before
  • Of course, we could iterate to improve rank
    correlation just as easily, if desired

55
Sample Results Product-Moment Correlations from
LGIC
56
Sample Results Rank Correlations from LGIC
57
Demo
  • In which the Lurie-Goldberg-Iman-Conover
    algorithm is exercised
  • Risk (CORR1.mpp)

58
Comparison LG vs LGIC
  • For equal number of samples, Lurie-Goldberg (LG)
    is significantly faster than Lurie-Goldberg-Iman-C
    onover (LGIC)
  • For common distributions, it is faster to
    generate samples than to sort them
  • But, LGIC supports Latin Hypercube sampling,
    which can reduce the number of samples required
  • Not to mention producing prettier distributions
  • LGIC has an inherent limit in resolution,
    especially for very small sample sizes
  • Small refinements to Cholesky decomposition may
    not result in a new ordering
  • Effect is unnoticeable unless target correlations
    are already at the edge of feasibility

59
Summary (of Addendum)
  • The Iman-Conover algorithm for generating
    rank-correlated samples uses product-moment
    correlation under the hood
  • Hence its rather unexpected success in modeling
    product-moment correlations
  • As with all (Cholesky-based) algorithms, it will
    be particularly good for normal and near-normal
    distributions
  • The Lurie-Goldberg algorithm is easily extended
    to incorporate the Iman-Conover algorithm
  • Combined approach is slower than the original for
    common distributions, but capable of comparable
    accuracy
  • Combined approach supports Latin Hypercube
    sampling

60
Questions?
  • The Excel files used in this demonstration are
    available on request. Please email
    jprice_at_cs-solutions.com
Write a Comment
User Comments (0)
About PowerShow.com