Multivariate Statistics - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Multivariate Statistics

Description:

Multivariate Statistics Confirmatory Factor Analysis I W. M. van der Veld University of Amsterdam Overview Digression: The expectation Formal specification Exercise 2 ... – PowerPoint PPT presentation

Number of Views:144
Avg rating:3.0/5.0
Slides: 23
Provided by: W150
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Statistics


1
Multivariate Statistics
  • Confirmatory Factor Analysis I
  • W. M. van der Veld
  • University of Amsterdam

2
Overview
  • Digression The expectation
  • Formal specification
  • Exercise 2
  • Estimation
  • ULS
  • WLS
  • ML
  • The c2-test
  • General confirmatory factor analysis approach

3
Digression the expectation
  • If the variables are expressed in deviations from
    their mean E(x)E(y)0, then
  • If the variables are expressed as standard
    scoresE(x2)E(y2)1, then

4
Formal specification
  • The full model in matrix notation x ?? d
  • The variables are expressed in deviations from
    their means, so E(x)E(?)E(d)0.
  • The latent ?-variables are uncorrelated with the
    unique components (d), so E(?d)E(d?)0
  • On the left side we need the covariance (or
    correlation) matrix. Hence
  • E(xx) S E(?? d)(?? d)
  • S is the covariance matrix of the x variables.
  • S E(?? d)(?? d)
  • S E(???? d?? ??d dd)

5
Formal specification
  • The factor equation x ?? d
  • E(x)E(?)E(d)0 and E(?d)E(d?)0
  • S E(???? d?? ??d dd)
  • S E(????) E(d??) E(??d) E(dd)
  • S ?E(??)? ?E(d?) ?E(?d) E(dd)
  • S ?E(??)? ?0 ?0 E(dd)
  • S ?F? Td
  • E(??) F the variance-covariance matrix of the
    factors
  • E(dd) Td the variance-covariance matrix of the
    factors
  • This is the covariance equation S ?F? Td
  • Now relax, and see the powerful possibilities of
    this equation.

6
Exercise 2
  • Formulate expressions for the variances of and
    the correlations between the x variables in terms
    of the parameters of the model
  • Now via the formal way.
  • It is assumed that E(xi)E(?i)0, and
    E(didj)E(dx)E(d?)0

7
Exercise 2
  • The factor equation is x ?? d
  • The covariance equation then is S ?F? Td
  • This provides the required expression.

8
Exercise 2
9
Exercise 2
10
Exercise 2
  • Because both matrices are symmetric, we skip the
    upper diagonal.

11
Exercise 2
  • Lets list the variances and covariances.

12
Exercise 2
  • The variances of the x variables
  • The covariances between the x variables
  • We already assumed thatE(xi)E(?i)E(di)0,
    andE(didj)E(dx)E(d?)0
  • If we standardize the variables x and ? so
    thatvar(xi)var(?i)1,
  • Then we can write

13
Exercise 2
Becomes
  • Results Exercise 1
  • ?12 ?11?21 ?13 ?11f21?32?14 ?11f21?42?23
    ?21f21?32?24 ?21f21?42?34 ?32?42

Which is the same result as in the intuitive
approach, but using a different notation
fiivar(?ii) and fijcov(?ij) or when
standardized cor(?ij)
14
Estimation
  • The model parameters can normally be estimated if
    the model is identified.
  • Lets assume for the sake of simplicity that our
    variables are standardized, except for the unique
    components.
  • The decomposition rules only hold for the
    population correlations and not for the sample
    correlations.
  • Normally, we know only the sample correlations.
  • It is easily shown that the solution is different
    for different models.
  • So an efficient estimation procedure is needed.

15
Estimation
  • There are several general principles.
  • We will discuss
  • - the Unweighted Least Squares (ULS) procedure
  • - the Weighted Least Squares (WLS) procedure.
  • Both procedures are based on the residuals
    between the sample correlations (S) and the
    expected values of the correlations.
  • Thus estimation means minimizing the difference
    between
  • The expected values of the correlations are a
    function of the model parameters, which we found
    earlier

16
ULS Estimation
  • The ULS procedure suggests to look for the
    parameter values that minimize the unweighted sum
    of squared residuals
  • Where i is the total number of unique elements of
    the correlations matrix.
  • Lets see what this does for the example used
    earlier with the four indicators.

x1 x2 x3 x4
x1 1.0
x2 .42 1.0
x3 .56 .48 1.0
x4 .35 .30 .50 1.0
17
ULS Estimation
  • FULS
  • (.42 - ?11?21)2 (.56 - ?11?31)2 (.35 -
    ?11?41)2
  • (.48 - ?21?31)2 (.30 - ?21?41)2
  • (.40 - ?31?41)2
  • (1 - (?112 var(d11)))2 (1 - (?212
    var(d22)))2
  • (1 - (?312 var(d33)))2 (1 - (?412
    var(d44)))2
  • The estimation procedure looks (iteratively) for
    the values of all the parameters that minimize
    the function Fuls.
  • Advantages
  • Consistent estimates without distributional
    assumptions on xs.
  • So for large samples ULS is approximately
    unbiased.
  • Disadvantages
  • There is no statistical test associated with this
    procedure (RMR).
  • The estimators are scale dependent.

18
WLS Estimation
  • The WLS procedure suggests to look for the
    parameter values that minimize the weighted sum
    of squared residuals
  • Where i is the total number of unique elements of
    the correlations matrix.
  • These weights can be chosen in different ways.

19
Maximum Likelihood Estimation
  • The most commonly used procedure, the Maximum
    Likelihood (ML) estimator, can be specified as a
    special case of the WLS estimator.
  • The ML estimator provides standard errors for the
    parameters and a test statistic for the fit of
    the model for much smaller samples.
  • But this estimator is developed under the
    assumption that the observed variables have a
    multivariate normal distribution.

20
The ?2-test
  • Without a statistical test we dont know whether
    our theory holds.
  • The test statistic t used is the value of the
    fitting function (FML) at its minimum.
  • If the model is correct, t is c2 (df) distributed
  • Normally the model is rejected if t gt Ca
  • where Ca is the value of the c2 for which
    pr(c2df gt Ca) a
  • See the appendices in many statistics books.
  • But, the c2 should not always be trusted, as any
    other similar test-statistic.
  • A robust test is to look at
  • The residuals, and
  • The expected parameter change (EPC).

21
General CF approach
  • A model is specified with observed and latent
    variables.
  • Correlations (covariances) between the observed
    variables can be expressed in the parameters of
    the model (decomposition rules).
  • If the model is identified the parameters can be
    estimated.
  • A test of the model can be performed if df gt 0.
  • Eventual misspecifications (unacceptable c2) can
    be detected.
  • Corrections in the models can be introduced
    adjusting the theory.

22
Theory
Model
Reality
Data collection process
Model modification
Data
Write a Comment
User Comments (0)
About PowerShow.com