Common Factors versus Components: Principals and Principles, Errors and Misconceptions Keith F' Wida

About This Presentation

Title:

Common Factors versus Components: Principals and Principles, Errors and Misconceptions Keith F' Wida

Description:

Published in Psychometrika ... Recent paper in Psychometrika (Ogasawara, 2003) Based work on oblique factors & components with: ... – PowerPoint PPT presentation

Number of Views:156

Avg rating:3.0/5.0

Slides: 92

Provided by: keithfw

Category:

more less

Transcript and Presenter's Notes

Title: Common Factors versus Components: Principals and Principles, Errors and Misconceptions Keith F' Wida

1
Common Factors versus ComponentsPrincipals and
Principles,Errors and MisconceptionsKeith F.
Widaman University of California at Davis

Presented at conference Factor Analysis at 100
L. L. Thurstone Psychometric Lab, University of
North Carolina at Chapel Hill, May 2004

2
Goal of the Talk

Flip rendition
(With apologies to Will) I come not to praise
principal components, but to bury them
Thus, we might inter the procedure beside its
creator
More serious
To outline several key assumptions, usually
implicit, of the simpler principal components
approach
Compare and contrast common factor analysis and
principal component analysis

3
Organization of the Talk

Principals
Major figures/events
Important dimensions factors/components
Principles
To organize our thinking
Lead to methods to evaluate procedures
Errors
Structures of residuals
Unclear presentations
Misconceptions

4
Principal Individuals Contributions

Spearman (1904)
First conceptualization of the nature of a common
factor the element in common to two or more
indicators (preferably three or more)
Stressed presence of two classes of factors
general (with one member) and
specific (with a potentially infinite number)
Key Based evaluation of empirical evidence on
the tetrad difference criterion (i.e., on
patterns in correlations among manifest
variables) with no consideration of diagonal

5
Principal Individuals Contributions

Thomson (1916)
Early recognition of elusiveness of theory data
connection
Single common factor implies hierarchical pattern
of correlations, but so does an opposite
conceptualization
Key for this talk Focus was still on the
patterns displayed by off-diagonal correlation.
Diagonal elements were of no interest or
importance

6
Principal Individuals Contributions

Thurstone (1931)
First foray into factor analysis
Devised a center of gravity method for
estimation of loadings
Led to centroid method
Key Again, diagonal values explicitly
disregarded

7
Principal Individuals Contributions

Hotelling (1933)
Proposed method of principal components
Method of estimation
Least squares
Decomposition of all of the variance of the
manifest variables into dimensions that are
(a) orthogonal
(b) conditionally variance maximized
Key 1 Left unities on diagonal
Key 2 Interpreted unrotated solution

8
Principal Individuals Contributions

Thurstone (1935) The Vectors of Mind
It is a fundamental criterion for a valid method
of isolating primary abilities that the weights
of the primary abilities for a test must remain
invariant when it is moved from one test battery
to another test battery.
If this criterion is not fulfilled, the
psychological description of a test will
evidently be as variable as the arbitrarily
chosen batteries into which the test may be
placed. Under such conditions no stable
identification of primary mental abilities can be
expected.

9
Principal Individuals Contributions

Thurstone (1935)
This implies invariant factorial description of a
test (a) across batteries and (b) across
populations
Again, diagonal values explicitly disregarded
Developed rationale for necessity for rotation
Contra Hotelling
Unities on diagonal imply manifest variables
are perfectly reliably
Need for dimensions manifest variables
No rotation! This appears, to me, to be the most
important criticism of Hotelling by Thurstone.

10
Principal Individuals Contributions

McCloy, Metheny, Knott (1938)
Published in Psychometrika
Sought to compare Common FA (Thurstones method)
vs. Principal Components Analysis (Hotellings
method)
Perhaps the first comparison of the two methods

11
Principal Individuals Contributions

Thomson (1939)
Clear statement of the differing aims of
Common factor analysis to explain the
off-diagonal correlations among manifest
variables
Principal component analysis to re-represent
the manifest variables in a mathematically
efficient manner

12
Principal Individuals Contributions

Guttman (1955, 1958)
Developed lower bounds for the number of factors
Weakest lower bound was number of factors with
eigenvalues greater than or equal to unity
With unities on diagonal
With population data
Other bounds used other diagonal elements (e.g.,
strongest lower bound used SMCs), but these did
not work as well

13
Principal Individuals Contributions

Kaiser (1960, 1971)
Described the origin of the Little Jiffy
Principal components
Retain components with eigenvalues gt 1.0
Rotate using varimax
Later modifications Little Jiffy Mark IV
offered important improvements, but were not
followed

14
Principles Mislaid or Forgotten

Principle 1 Common factor analysis and principal
component analysis have different goals à la
Thomson (1939)
Common factor analysis to explain the
off-diagonal correlations among manifest
variables
Principal component analysis to re-represent
the original variables in a mathematically
efficient manner
(a) in reduced dimensionality, or
(b) using orthogonal, conditionally variance
maximized way

15
Principles Mislaid or Forgotten

Principle 2 Common factor analysis was as much a
theory of manifest variables as a theory of
latent variables
Spearman doctrine of the indifference of the
indicator, so any manifest variable was a
more-or-less good indicator of g
Thurstone test ones theory by developing new
variables as differing mixtures of factors and
then attempt to verify presumptions
Today, focus seems largely on the latent
variables
Forgetting about manifest variables can be
problematic

16
Principles Mislaid or Forgotten

Principle 3 Invariance of the psychological/
mathematical description of manifest variables is
a fundamental issue
It is a fundamental criterion for a valid method
of isolating primary abilities that the weights
of the primary abilities for a test must remain
invariant when it is moved from one test battery
to another test battery
Much work on measurement factorial invariance
But, only similarities between common factors and
principal components are stressed differences
are not emphasized

17
Principles Mislaid or Forgotten

Principle 4 Know data and model
Should know relation between data and model
Should know all assumptions (even implicit) of
model
Frequently told
information in correlation matrix is difficult to
discern
so, dont look at data
run it through FA or PCA
interpret the results
This is not justifiable!

18
Common FA Principal CA Models

Common Factor Analysis
R FF U2 PFP U2
where
R is (p x p) correlation matrix among manifest
vars
F is a (p x k) unrotated factor matrix, with
loadings of p manifest variables on k factors
U2 is a (p x p) matrix (diagonal) of unique
factor variances
P is a (p x k) rotated factor matrix, with
loadings of p manifest variables on k rotated
factors
F is a (k x k) matrix of covariances among
factors (may be I, usually diag I)

19
Common FA Principal CA Models

Principal Component Analysis
R FcFc PcFcPc
R FcFc GG PcFcPc GG
R FcFc ? PcFcPc ?
where
Fc, Pc, Fc have same order as like-named
matrices for CFA, but with c subscript to denote
PCA
G is a (p x p-k) matrix of loadings of p
manifest variables on the (p-k) discarded
components
? ( GG) is a (p x p) matrix of covariances
among residuals

20
Present Day Advice to Practicing Scientist

Velicer Jackson (1990) CFA vs. PCA
Four practical issues
Similarity between solutions
Issues related to of dimensions to retain
Improper solutions in CFA
Differences in computational efficiency
Three theoretical issues
Factorial indeterminacy in CFA, not PCA
CFA can be used in exploratory and confirmatory
modes, PCA only exploratory
CFA is latent procedure, PCA is manifest

21
Present Day Advice to Practicing Scientist

Goldberg Digman (1994) and Goldberg Velicer
(in press) CFA vs. PCA
Results from CFA and PCA are so similar that
differences are unimportant
If differences are large, data are not
well-structured enough for either type of
analysis
Use factor to refer to factors and components
Aim is to explain correlations among manifest vars

22
Present Day Quantitative Approaches

Recent paper in Psychometrika (Ogasawara, 2003)
Based work on oblique factors components with
Equal number of indicators per dimension
Independent cluster solution
Sphericity (equal error variances), hence equal
factor loadings
Derived expression for SEs (standard errors) for
factor and component loadings and
intercorrelations
SEs for PCA estimates were smaller than those for
CFA estimates, implying greater stability of
(i.e., lower variability around) population
estimates

23
An Apocryphal Example

Researcher wanted to develop a new inventory to
assess three cognitive traits
Knew to collect data in at least two initial,
derivation samples
Use exploratory procedures to verify initial, a
priori hypotheses
Then, move on to confirmatory techniques
So, Sample 1, N 1600, and 8 manifest variables
3 Components explain 51 of total variance

24
Oblique Components, Sample 1

Variable Fac 1 Fac 2 Fac 3 . h2 .
V1 .704 .002 .005 .496
V2 .704 .002 .005 .496
V3 .704 .002 .005 .496
N1 .105 .715 .065 .575
N2 .002 .725 .014 .538
N3 .116 .670 .089 .417
S1 .005 .002 .735 .540
S2 .005 .002 .735 .540
Fac 1 1.0
Fac 2 .256 1.0
Fac 3 .147 .147 1.0

25
Orthogonal Components, Sample 1

Variable Fac 1 Fac 2 Fac 3 . h2 .
V1 .698 .079 .044 .496
V2 .698 .079 .044 .496
V3 .698 .079 .044 .496
N1 .211 .717 .127 .575
N2 .104 .716 .070 .538
N3 .025 .643 .045 .417
S1 .050 .046 .732 .540
S2 .050 .046 .732 .540
Fac 1 1.0
Fac 2 .000 1.0
Fac 3 .000 .000 1.0

26
An Apocryphal Example

After confirming a priori hypotheses in Sample 1,
the researcher collected data from Sample 2
Same manifest variables
Sampled from the same general population
Same mathematical approach principal components
followed by oblique and orthogonal rotation
Got same results!
Decided to test the theory in Sample 3 using
replicate and extend approach
Major change Switch to Confirmatory Factor
Analysis

27
Confirmatory Factor Analysis, Sample 3

Variable Fac 1 Fac 2 Fac 3 . ?2 .
V1 2.50 (.18) .0 .0 18.75
V2 3.00 (.21) .0 .0 27.00
V3 3.50 (.25) .0 .0 36.75
N1 .0 2.10 (.13) .0 4.59
N2 .0 2.00 (.14) .0 12.00
N3 .0 1.50 (.16) .0 22.75
S1 .0 .0 2.40 (.44) 58.24
S2 .0 .0 2.70 (.50) 73.71
Fac 1 1.0
Fac 2 .50 (.04) 1.0
Fac 3 .50 (.10) .50 (.10) 1.0

28
Fully Standardized Solution, Sample 3

Variable Fac 1 Fac 2 Fac 3 . h2 .
V1 .50 .0 .0 .25
V2 .50 .0 .0 .25
V3 .50 .0 .0 .25
N1 .0 .70 .0 .49
N2 .0 .50 .0 .25
N3 .0 .30 .0 .09
S1 .0 .0 .30 .09
S2 .0 .0 .30 .09
Fac 1 1.0
Fac 2 .50 1.0
Fac 3 .50 .50 1.0

29
Oblique Component Solution, Sample 3

Variable Fac 1 Fac 2 Fac 3 . h2 .
V1 .704 .002 .005 .496
V2 .704 .002 .005 .496
V3 .704 .002 .005 .496
N1 .105 .715 .065 .575
N2 .002 .725 .014 .538
N3 .116 .670 .089 .417
S1 .005 .002 .735 .540
S2 .005 .002 .735 .540
Fac 1 1.0
Fac 2 .256 1.0
Fac 3 .147 .147 1.0

30
An Early Comparison

McCloy, Metheny, Knott (1938)
Published in Psychometrika
Sought to compare Common FA (Thurstones method)
vs. Principal Components Analysis (Hotellings)
Stated that Principal Components can be rotated
So, both techniques are different means to same
end
Principal difference
Thurstone inserts largest correlation in row in
the diagonal of each residual matrix
Hotelling begins with unities and stays with
residual values in each residual matrix

31
Hypothetical Factor Matrix (McCloy et al.)

Variable Fac 1 Fac 2 Fac 3 . h2 .
1 .900 .0 .0 .810
2 .800 .0 .0 .640
3 .0 .700 .0 .490
4 .0 .800 .0 .640
5 .0 .0 .900 .810
6 .0 .0 .600 .360
7 .0 .424 .424 .360
8 .566 .566 .0 .640
9 .495 .0 .495 .490
10 .520 .520 .520 .810

32
Rotated Factor Matrix (McCloy et al.)

Variable Fac 1 Fac 2 Fac 3 . h2 .
1 .860 .033 .035 .742
2 .819 .025 .023 .672
3 .014 .726 .000 .527
4 .023 .766 .004 .587
5 .010 .004 .808 .653
6 .008 .029 .645 .417
7 .011 .434 .466 .406
8 .587 .548 .014 .645
9 .516 .038 .512 .530
10 .489 .471 .537 .749

33
Rotated Component Matrix (McCloy et al.)

Variable Fac 1 Fac 2 Fac 3 . h2 .
1 .906 .055 .063 .828
2 .874 .053 .046 .769
3 .034 .824 .021 .681
4 .050 .859 .006 .740
5 .060 .000 .885 .787
6 .094 .035 .773 .608
7 .054 .519 .525 .548
8 .653 .558 .029 .739
9 .527 .085 .605 .651
10 .527 .477 .552 .810

34
An Early Comparison

McCloy, Metheny, Knott (1938)
Argued that
both CFA and PCA were means to same end
both led to similar pattern of loadings, but
Thurstones method was more accurate (?h2 .056)
than Hotellings (?h2 .125) but these were
average absolute differences
I averaged signed differences, and Thurstones
method was much accurate (?h2 -.013) than
Hotellings (?h2 .120)

35
An Early Comparison

McCloy, Metheny, Knott (1938)
Found similar pattern of high and low loadings
from PCA and CFA
But, they found (but did not stress) that PCA led
to decidedly higher loadings
Tukey (1969)
Amount, as well as direction, is vital
For any science to advance, we must pay attention
to quantitative variation, not just qualitative

36
Regularity Conditions or Phenomena

Relations between population values of P and R
Features of eigenvalues
Covariances among residuals
Need a theory of errors
Recount my first exposure
Should have to acknowledge (predict? live with?)
the patterns in residuals

37
Practicing Scientists vs. Statisticians

Interesting dimension along which researchers
fall
Practicing Statisticians
scientists (Dark side)
use CFA prefer PCA
use regression warn of probs
analysis errors in vars

38
Practicing Scientists vs. Statisticians

At first seems odd
Practicing scientist prefers
CFA (which partials out errors of measurement and
specific variance)
Regression analysis despite the implicit
assumption of perfect measurement
Statistician prefers
To warn of ill-effects of errors in variables on
results of regression analysis
PCA (despite lack of attention to measurement
error), perhaps due to elegant, reduced rank
representation

39
Practicing Scientists vs. Statisticians

On second thought, is rational
Practicing scientist prefers
Assumptions that residuals (in CFA or regression
analysis) are independent, uncorrelated, normally
distributed
Statistician prefers
To try to circumvent (or solve) problem of errors
in variables in regression
To relegate errors in variables problems in PCA
to that part of solution (GG) that is orthogonal
to the retained part, thereby circumventing (or
solving) this problem

40
Regularity Conditions or Phenomena

In Common Factor Analysis,
Char. of correlations ? Char. of variables 11
Char. of correlations ? Char. of variables 11
In Principal Component Analysis,
Char. of correlations ? Char. of variables 11
(??)
Char. of correlations ? Char. of
variables many1

41
Manifest Correlations

Var V1 V2 V3 V4 V5 V6
V1 1.00
V2 .64 1.00
V3 .64 .64 1.00
V4 .64 .64 .64 1.00
V5 .64 .64 .64 .64 1.00
V6 .64 .64 .64 .64 .64 1.00

42
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.92 .80 .64 2.28 .87 .76
V2 .0 .80 .64 .36 .87 .76
V3 .0 .80 .64 .36 .87 .76
V4
V5
V6
P1 1.0 1.0
P2

43
Residual Covariances CFA

Var V1 V2 V3 V4 V5 V6
V1 .36 .00 .00
V2 .00 .36 .00
V3 .00 .00 .36
V4
V5
V6
Covs below diag., corrs above diag.

44
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .24 -.50 -.50
V2 -.12 .24 -.50
V3 -.12 -.12 .24
V4
V5
V6
Covs below diag., corrs above diag.

45
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 3.84 .80 .64 4.20 .84 .70
V2 .0 .80 .64 .36 .84 .70
V3 .0 .80 .64 .36 .84 .70
V4 .0 .80 .64 .36 .84 .70
V5 .0 .80 .64 .36 .84 .70
V6 .0 .80 .64 .36 .84 .70
P1 1.0 1.0
P2

46
Residual Covariances CFA

Var V1 V2 V3 V4 V5 V6
V1 .36 .00 .00 .00 .00 .00
V2 .00 .36 .00 .00 .00 .00
V3 .00 .00 .36 .00 .00 .00
V4 .00 .00 .00 .36 .00 .00
V5 .00 .00 .00 .00 .36 .00
V6 .00 .00 .00 .00 .00 .36
Covs below diag., corrs above diag.

47
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .30 -.20 -.20 -.20 -.20 -.20
V2 -.06 .30 -.20 -.20 -.20 -.20
V3 -.06 -.06 .30 -.20 -.20 -.20
V4 -.06 -.06 -.06 .30 -.20 -.20
V5 -.06 -.06 -.06 -.06 .30 -.20
V6 -.06 -.06 -.06 -.06 -.06 .30
Covs below diag., corrs above diag.

48
Regularity Conditions or Phenomena

In Common Factor Analysis,
If (a) the model fits in the population, (b)
there is one factor, and (c) communalities are
estimated optimally,
Single non-zero eigenvalue
Factor loadings and residual variances for first
three variables are unaffected by addition of 3
identical variables
Residuals specific error variance
Residual matrix is diagonal

49
Regularity Conditions or Phenomena

In Principal Component Analysis,
If (a) the common factor model fits in the
population, (b) there is one factor, and (c)
unities are retained on the main diagonal,
Single large eigenvalue, plus (p 1) identical,
smaller eigenvalues
Residual component matrix G is independent of the
space defined by Fc
But, residual covariance matrix is clearly
non-diagonal
And, (a) population component loadings and (b)
residual variances and covariances vary as a
function of number of manifest variables!

50
Manifest Correlations

Var V1 V2 V3 V4 V5 V6
V1 1.00
V2 .36 1.00
V3 .36 .36 1.00
V4 .36 .36 .36 1.00
V5 .36 .36 .36 .36 1.00
V6 .36 .36 .36 .36 .36 1.00

51
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.08 .60 .36 1.72 .76 .57
V2 .0 .60 .36 .64 .76 .57
V3 .0 .60 .36 .64 .76 .57
V4
V5
V6
P1 1.0 1.0
P2

52
Residual Covariances CFA

Var V1 V2 V3 V4 V5 V6
V1 .64 .00 .00
V2 .00 .64 .00
V3 .00 .00 .64
V4
V5
V6
Covs below diag., corrs above diag.

53
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .43 -.50 -.50
V2 -.21 .43 -.50
V3 -.21 -.21 .43
V4
V5
V6
Covs below diag., corrs above diag.

54
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 2.16 .60 .36 4.80 .68 .47
V2 .0 .60 .36 .64 .68 .47
V3 .0 .60 .36 .64 .68 .47
V4 .0 .60 .36 .64 .68 .47
V5 .0 .60 .36 .64 .68 .47
V6 .0 .60 .36 .64 .68 .47
P1 1.0 1.0
P2

55
Residual Covariances CFA

Var V1 V2 V3 V4 V5 V6
V1 .64 .00 .00 .00 .00 .00
V2 .00 .64 .00 .00 .00 .00
V3 .00 .00 .64 .00 .00 .00
V4 .00 .00 .00 .64 .00 .00
V5 .00 .00 .00 .00 .64 .00
V6 .00 .00 .00 .00 .00 .64
Covs below diag., corrs above diag.

56
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .53 -.20 -.20 -.20 -.20 -.20
V2 -.11 .53 -.20 -.20 -.20 -.20
V3 -.11 -.11 .53 -.20 -.20 -.20
V4 -.11 -.11 -.11 .53 -.20 -.20
V5 -.11 -.11 -.11 -.11 .53 -.20
V6 -.11 -.11 -.11 -.11 -.11 .53
Covs below diag., corrs above diag.

57
Regularity Conditions or Phenomena

So, the difference between population
parameters from CFA and PCA diverge more
(a) the fewer the number of indicators per
dimension, and
(b) the lower the true communality
But, some regularities still seem to hold
(although these vary with the number of
indicators)
regular estimates of loadings
regular magnitude of residual covariance
regular magnitude of residual covariance
regular form of eigenvalue structure

58
Regularity Conditions or Phenomena

But, what if we have variation in loadings?

59
Manifest Correlations

Var V1 V2 V3 V4 V5 V6
V1 1.00
V2 .64 1.00
V3 .64 .64 1.00
V4 .48 .48 .48 1.00
V5 .48 .48 .48 .36 1.00
V6 .48 .48 .48 .36 .36 1.00

60
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 3.00 .80 .64 3.47 .83 .69
V2 .0 .80 .64 .64 .83 .69
V3 .0 .80 .64 .64 .83 .69
V4 .0 .60 .36 .53 .68 .47
V5 .0 .60 .36 .36 .68 .47
V6 .0 .60 .36 .36 .68 .47
P1 1.0 1.0
P2

61
Residual Covariances CFA

Var V1 V2 V3 V4 V5 V6
V1 .36 .00 .00 .00 .00 .00
V2 .00 .36 .00 .00 .00 .00
V3 .00 .00 .36 .00 .00 .00
V4 .00 .00 .00 .64 .00 .00
V5 .00 .00 .00 .00 .64 .00
V6 .00 .00 .00 .00 .00 .64
Covs below diag., corrs above diag.

62
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .31 -.15 -.15 -.21 -.21 -.21
V2 -.05 .31 -.15 -.21 -.21 -.21
V3 -.05 -.05 .31 -.21 -.21 -.21
V4 -.09 -.09 -.09 .53 -.20 -.20
V5 -.09 -.09 -.09 -.11 .53 -.20
V6 -.09 -.09 -.09 -.11 -.11 .53
Covs below diag., corrs above diag.

63
Regularity Conditions or Phenomena

So, with variation in loadings
One piece of approximate stability
regular estimates of loadings
But, sacrifice
regular magnitude of residual covariance
regular magnitude of residual covariance
regular form of eigenvalue structure

64
Regularity Conditions or Phenomena

But, what if we have multiple factors?
Lets start with
(a) equal loadings
(b) orthogonal factors

65
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.08 .60 .0 .64 1.72 .76 .0 .57
V2 1.08 .60 .0 .64 1.72 .76 .0 .57
V3 .0 .60 .0 .64 .64 .76 .0 .57
V4 .0 .0 .60 .64 .64 .0 .76 .57
V5 .0 .0 .60 .64 .64 .0 .76 .57
V6 .0 .0 .60 .64 .64 .0 .76 .57
P1 1.0 1.0
P2 .0 1.0 .0 1.0

66
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .43 -.50 -.50 .00 .00 .00
V2 -.21 .43 -.50 .00 .00 .00
V3 -.21 -.21 .43 .00 .00 .00
V4 .00 .00 .00 .43 -.50 -.50
V5 .00 .00 .00 -.21 .43 -.50
V6 .00 .00 .00 -.21 -.21 .43
Covs below diag., corrs above diag.

67
Regularity Conditions or Phenomena

So, strange result
Same factor inflation as with 1-factor, 3
indicators
Same within-factor residual covariances as for
1-factor, 3 indicators
But, between-factor residual covariances 0!
Lets go to
(a) equal loadings, but
(b) oblique factors

68
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.62 .60 .0 .64 2.26 .76 .0 .57
V2 .54 .60 .0 .64 1.18 .76 .0 .57
V3 .0 .60 .0 .64 .64 .76 .0 .57
V4 .0 .0 .60 .64 .64 .0 .76 .57
V5 .0 .0 .60 .64 .64 .0 .76 .57
V6 .0 .0 .60 .64 .64 .0 .76 .57
P1 1.0 1.0
P2 .5 1.0 .31 1.0

69
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .43 -.50 -.50 .00 .00 .00
V2 -.21 .43 -.50 .00 .00 .00
V3 -.21 -.21 .43 .00 .00 .00
V4 .00 .00 .00 .43 -.50 -.50
V5 .00 .00 .00 -.21 .43 -.50
V6 .00 .00 .00 -.21 -.21 .43
Covs below diag., corrs above diag.

70
Regularity Conditions or Phenomena

So, strange result
Same factor inflation as with 1-factor, 3
indicators
Reduced correlation between factors
But, residual covariances matrix is identical!
Lets go to
(a) unequal loadings, and
(b) orthogonal factors

71
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.16 .80 .0 .36 1.70 .83 .0 .68
V2 1.16 .60 .0 .64 1.70 .78 .0 .61
V3 .0 .40 .0 .84 .79 .64 .0 .41
V4 .0 .0 .80 .36 .79 .0 .83 .68
V5 .0 .0 .60 .64 .51 .0 .78 .61
V6 .0 .0 .40 .84 .51 .0 .64 .41
P1 1.0 1.0
P2 .0 1.0 .0 1.0

72
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .32 -.47 -.48 .00 .00 .00
V2 -.16 .39 -.55 .00 .00 .00
V3 -.21 -.26 .59 .00 .00 .00
V4 .00 .00 .00 .32 -.47 -.48
V5 .00 .00 .00 -.16 .39 -.55
V6 .00 .00 .00 -.21 -.26 .59
Covs below diag., corrs above diag.

73
Regularity Conditions or Phenomena

So, strange result
Different factor inflation than with 1-factor, 3
indicators
Reduced correlation between factors
But, residual covariances matrix has unequal
covariances and correlations among residuals, but
between-factor covariances 0!
Lets go to
(a) unequal loadings, and
(b) oblique factors

74
Eigenvalues, Loadings, and Explained Variance

Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
V1 1.74 .80 .0 .36 2.27 .77 .11 .66
V2 .58 .60 .0 .64 1.16 .77 .00 .59
V3 .0 .40 .0 .84 .79 .71 -.12 .46
V4 .0 .0 .80 .36 .77 .11 .77 .66
V5 .0 .0 .60 .64 .52 .00 .77 .59
V6 .0 .0 .40 .84 .49 -.12 .71 .46
P1 1.0 1.0
P2 .5 1.0 .32 1.0

75
Residual Covariances PCA

Var V1 V2 V3 V4 V5 V6
V1 .34 -.38 -.49 -.13 -.11 .01
V2 -.14 .41 -.59 -.11 -.04 .07
V3 -.21 -.28 .54 .01 .07 .16
V4 -.04 -.04 .00 .34 -.38 -.49
V5 -.04 -.02 .03 -.14 .41 -.59
V6 .00 .03 .08 -.21 -.28 .54
Covs below diag., corrs above diag.

76
Regularity Conditions or Phenomena

So, strange result
Extremely different factor inflation than with
1-factor, 3 indicators
Largest loading is now UNderrepresented
Very different population factor loadings (.8,
.6, .4) have very similar component loadings
Now, between-factor covariances are not zero, and
some are positive!

77
R from Component Parameters

All the preceding from a CFA view
Develop parameters from a CF model
Analyze using CFA and PCA
CFA procedures recover parameters
PCA procedures exhibit failings or anomalies
So What? What else could you expect?
Challenge (to me)
Generate data from a PC model
Analyze using CFA and PCA
PCA should recover parameters, CFA should exhibit
problems and/or anomalies

78
R from Component Parameters

Difficult to do
Leads to
Impractical, unacceptable outcomes, from the
point of view of the practicing scientist
Crucial indeterminacies with the PCA model

79
R from Component Parameters

Impractical, unacceptable outcomes, from the
point of view of the practicing scientist

80
Manifest Correlations

Var V1 V2 V3 V4 V5 V6
V1 1.00
V2 .46 1.00
V3 .46 .46 1.00
V4
V5
V6
First principal component has 3 loadings of .8
First principal factor has 3 loadings of
(.46)1/2, or about .67

81
Manifest Correlations

Var V1 V2 V3 V4 V5 V6
V1 1.00
V2 .568 1.00
V3 .568 .568 1.00
V4 .568 .568 .568 1.00
V5 .568 .568 .568 .568 1.00
V6 .568 .568 .568 .568 .568 1.00
First principal component has 6 loadings of .8
First principal factor has 6 loadings of
(.568)1/2, or about .75
But, one would have to alter the first 3 tests,
as their population correlations are altered

82
R from Component Parameters

Crucial indeterminacies with the PCA model
Consider case of well-identified CFA model 6
manifest variables loading on a single factor
One could easily construct the population matrix
as FF uniquenesses to ensure diag(R) I
With 6 manifest variables, 6(7)/2 21 unique
elements of covariance matrix
12 parameter estimates
therefore 9 df

83
R from Component Parameters

Crucial indeterminacies with the PCA model
Consider now 6 manifest variables with defined
loadings on first PC
To estimate the correlation matrix, must come up
with the remaining 5 PCs
A start Fc G Fc G diag, so
orthogonality constraint yields 6(5)/2 15
equations
Sum of squares across rows 1, so 6 more
equations
In short, 15 equations, but 30 unknowns (loadings
of 6 variables on the 5 components in G)
Therefore, an infinite of R matrices will lead
to the stated first PC

84
R from Component Parameters

Crucial indeterminacies with the PCA model
Related to the Ledermann number, but in reverse
For example, with 10 manifest variables, one can
minimally overdetermine no more than 6 factors
(so use 6 or fewer factors)
But, here, one must specify at least 6 components
(to ensure more equations than unknowns) to
ensure a unique R
If fewer than 6 components are specified, an
infinite number of solutions for R can be found

85
Conclusions CFA

CFA factor models may not hold in the population
But, if they do (in a theoretical population)
The notion of a population factor loading is
realistic
The population factor loading is unaffected by
presence of other variables, as long as the
battery contains the same factors
In one-factor case, loadings can vary from 0 to 1
(provided reflection of variables is possible)
This generalizes to the case of multiple factors

86
Conclusions CFA

CFA factor models may not hold in the population
But, if they do
Residual (i.e., unique) variances are
uncorrelated
Magnitude of unique variance for a given variable
is unaffected by other variables in the analysis

87
Conclusions PCA

PCA factor models cannot hold in the population
(because all variables have measurement error)
Moreover
The notion of the population component loading
for a particular manifest variable is meaningless
The population component loading is affected
strongly by presence of other variables
SEs for component loadings have no interpretation
In the one-component case, component loadings can
only vary from (1/m)1/2 to 1, where m is the
number of indicators for the dimension
Generalizes to multiple component case

88
Conclusions PCA

PCA factor models cannot hold in the population
(because all variables have measurement error)
Moreover
Residual variables are correlated, often in
unpredictable and seemingly haphazard fashion
Magnitude of unique variance and covariances for
a given manifest variable are affected by other
variables in the analysis

89
Conclusions PCA

PCA factor models cannot hold in the population
(because all variables have measurement error)
Moreover
Finally, generating data from a PC model leads
either to
Impractical, unacceptable outcomes
Indeterminacies in the parameter R relations

90
(No Transcript)
91
(No Transcript)

Write a Comment

User Comments (0)