Title: Common Factors versus Components: Principals and Principles, Errors and Misconceptions Keith F' Wida
1Common Factors versus ComponentsPrincipals and
Principles,Errors and MisconceptionsKeith F.
Widaman University of California at Davis
- Presented at conference Factor Analysis at 100
- L. L. Thurstone Psychometric Lab, University of
North Carolina at Chapel Hill, May 2004
2Goal of the Talk
- Flip rendition
- (With apologies to Will) I come not to praise
principal components, but to bury them - Thus, we might inter the procedure beside its
creator - More serious
- To outline several key assumptions, usually
implicit, of the simpler principal components
approach - Compare and contrast common factor analysis and
principal component analysis
3Organization of the Talk
- Principals
- Major figures/events
- Important dimensions factors/components
- Principles
- To organize our thinking
- Lead to methods to evaluate procedures
- Errors
- Structures of residuals
- Unclear presentations
- Misconceptions
4Principal Individuals Contributions
- Spearman (1904)
- First conceptualization of the nature of a common
factor the element in common to two or more
indicators (preferably three or more) - Stressed presence of two classes of factors
- general (with one member) and
- specific (with a potentially infinite number)
- Key Based evaluation of empirical evidence on
the tetrad difference criterion (i.e., on
patterns in correlations among manifest
variables) with no consideration of diagonal
5Principal Individuals Contributions
- Thomson (1916)
- Early recognition of elusiveness of theory data
connection - Single common factor implies hierarchical pattern
of correlations, but so does an opposite
conceptualization - Key for this talk Focus was still on the
patterns displayed by off-diagonal correlation.
Diagonal elements were of no interest or
importance
6Principal Individuals Contributions
- Thurstone (1931)
- First foray into factor analysis
- Devised a center of gravity method for
estimation of loadings - Led to centroid method
- Key Again, diagonal values explicitly
disregarded
7Principal Individuals Contributions
- Hotelling (1933)
- Proposed method of principal components
- Method of estimation
- Least squares
- Decomposition of all of the variance of the
manifest variables into dimensions that are - (a) orthogonal
- (b) conditionally variance maximized
- Key 1 Left unities on diagonal
- Key 2 Interpreted unrotated solution
8Principal Individuals Contributions
- Thurstone (1935) The Vectors of Mind
- It is a fundamental criterion for a valid method
of isolating primary abilities that the weights
of the primary abilities for a test must remain
invariant when it is moved from one test battery
to another test battery. - If this criterion is not fulfilled, the
psychological description of a test will
evidently be as variable as the arbitrarily
chosen batteries into which the test may be
placed. Under such conditions no stable
identification of primary mental abilities can be
expected.
9Principal Individuals Contributions
- Thurstone (1935)
- This implies invariant factorial description of a
test (a) across batteries and (b) across
populations - Again, diagonal values explicitly disregarded
- Developed rationale for necessity for rotation
- Contra Hotelling
- Unities on diagonal imply manifest variables
are perfectly reliably - Need for dimensions manifest variables
- No rotation! This appears, to me, to be the most
important criticism of Hotelling by Thurstone.
10Principal Individuals Contributions
- McCloy, Metheny, Knott (1938)
- Published in Psychometrika
- Sought to compare Common FA (Thurstones method)
vs. Principal Components Analysis (Hotellings
method) - Perhaps the first comparison of the two methods
11Principal Individuals Contributions
- Thomson (1939)
- Clear statement of the differing aims of
- Common factor analysis to explain the
off-diagonal correlations among manifest
variables - Principal component analysis to re-represent
the manifest variables in a mathematically
efficient manner
12Principal Individuals Contributions
- Guttman (1955, 1958)
- Developed lower bounds for the number of factors
- Weakest lower bound was number of factors with
eigenvalues greater than or equal to unity - With unities on diagonal
- With population data
- Other bounds used other diagonal elements (e.g.,
strongest lower bound used SMCs), but these did
not work as well
13Principal Individuals Contributions
- Kaiser (1960, 1971)
- Described the origin of the Little Jiffy
- Principal components
- Retain components with eigenvalues gt 1.0
- Rotate using varimax
- Later modifications Little Jiffy Mark IV
offered important improvements, but were not
followed
14Principles Mislaid or Forgotten
- Principle 1 Common factor analysis and principal
component analysis have different goals à la
Thomson (1939) - Common factor analysis to explain the
off-diagonal correlations among manifest
variables - Principal component analysis to re-represent
the original variables in a mathematically
efficient manner - (a) in reduced dimensionality, or
- (b) using orthogonal, conditionally variance
maximized way
15Principles Mislaid or Forgotten
- Principle 2 Common factor analysis was as much a
theory of manifest variables as a theory of
latent variables - Spearman doctrine of the indifference of the
indicator, so any manifest variable was a
more-or-less good indicator of g - Thurstone test ones theory by developing new
variables as differing mixtures of factors and
then attempt to verify presumptions - Today, focus seems largely on the latent
variables - Forgetting about manifest variables can be
problematic
16Principles Mislaid or Forgotten
- Principle 3 Invariance of the psychological/
mathematical description of manifest variables is
a fundamental issue - It is a fundamental criterion for a valid method
of isolating primary abilities that the weights
of the primary abilities for a test must remain
invariant when it is moved from one test battery
to another test battery - Much work on measurement factorial invariance
- But, only similarities between common factors and
principal components are stressed differences
are not emphasized
17Principles Mislaid or Forgotten
- Principle 4 Know data and model
- Should know relation between data and model
- Should know all assumptions (even implicit) of
model - Frequently told
- information in correlation matrix is difficult to
discern - so, dont look at data
- run it through FA or PCA
- interpret the results
- This is not justifiable!
18Common FA Principal CA Models
- Common Factor Analysis
- R FF U2 PFP U2
- where
- R is (p x p) correlation matrix among manifest
vars - F is a (p x k) unrotated factor matrix, with
loadings of p manifest variables on k factors - U2 is a (p x p) matrix (diagonal) of unique
factor variances - P is a (p x k) rotated factor matrix, with
loadings of p manifest variables on k rotated
factors - F is a (k x k) matrix of covariances among
factors (may be I, usually diag I)
19Common FA Principal CA Models
- Principal Component Analysis
- R FcFc PcFcPc
- R FcFc GG PcFcPc GG
- R FcFc ? PcFcPc ?
- where
- Fc, Pc, Fc have same order as like-named
matrices for CFA, but with c subscript to denote
PCA - G is a (p x p-k) matrix of loadings of p
manifest variables on the (p-k) discarded
components - ? ( GG) is a (p x p) matrix of covariances
among residuals
20Present Day Advice to Practicing Scientist
- Velicer Jackson (1990) CFA vs. PCA
- Four practical issues
- Similarity between solutions
- Issues related to of dimensions to retain
- Improper solutions in CFA
- Differences in computational efficiency
- Three theoretical issues
- Factorial indeterminacy in CFA, not PCA
- CFA can be used in exploratory and confirmatory
modes, PCA only exploratory - CFA is latent procedure, PCA is manifest
21Present Day Advice to Practicing Scientist
- Goldberg Digman (1994) and Goldberg Velicer
(in press) CFA vs. PCA - Results from CFA and PCA are so similar that
differences are unimportant - If differences are large, data are not
well-structured enough for either type of
analysis - Use factor to refer to factors and components
- Aim is to explain correlations among manifest vars
22Present Day Quantitative Approaches
- Recent paper in Psychometrika (Ogasawara, 2003)
- Based work on oblique factors components with
- Equal number of indicators per dimension
- Independent cluster solution
- Sphericity (equal error variances), hence equal
factor loadings - Derived expression for SEs (standard errors) for
factor and component loadings and
intercorrelations - SEs for PCA estimates were smaller than those for
CFA estimates, implying greater stability of
(i.e., lower variability around) population
estimates
23An Apocryphal Example
- Researcher wanted to develop a new inventory to
assess three cognitive traits - Knew to collect data in at least two initial,
derivation samples - Use exploratory procedures to verify initial, a
priori hypotheses - Then, move on to confirmatory techniques
- So, Sample 1, N 1600, and 8 manifest variables
- 3 Components explain 51 of total variance
24Oblique Components, Sample 1
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- V1 .704 .002 .005 .496
- V2 .704 .002 .005 .496
- V3 .704 .002 .005 .496
- N1 .105 .715 .065 .575
- N2 .002 .725 .014 .538
- N3 .116 .670 .089 .417
- S1 .005 .002 .735 .540
- S2 .005 .002 .735 .540
-
- Fac 1 1.0
- Fac 2 .256 1.0
- Fac 3 .147 .147 1.0
25Orthogonal Components, Sample 1
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- V1 .698 .079 .044 .496
- V2 .698 .079 .044 .496
- V3 .698 .079 .044 .496
- N1 .211 .717 .127 .575
- N2 .104 .716 .070 .538
- N3 .025 .643 .045 .417
- S1 .050 .046 .732 .540
- S2 .050 .046 .732 .540
-
- Fac 1 1.0
- Fac 2 .000 1.0
- Fac 3 .000 .000 1.0
26An Apocryphal Example
- After confirming a priori hypotheses in Sample 1,
the researcher collected data from Sample 2 - Same manifest variables
- Sampled from the same general population
- Same mathematical approach principal components
followed by oblique and orthogonal rotation - Got same results!
- Decided to test the theory in Sample 3 using
replicate and extend approach - Major change Switch to Confirmatory Factor
Analysis
27Confirmatory Factor Analysis, Sample 3
- Variable Fac 1 Fac 2 Fac 3 . ?2 .
- V1 2.50 (.18) .0 .0 18.75
- V2 3.00 (.21) .0 .0 27.00
- V3 3.50 (.25) .0 .0 36.75
- N1 .0 2.10 (.13) .0 4.59
- N2 .0 2.00 (.14) .0 12.00
- N3 .0 1.50 (.16) .0 22.75
- S1 .0 .0 2.40 (.44) 58.24
- S2 .0 .0 2.70 (.50) 73.71
-
- Fac 1 1.0
- Fac 2 .50 (.04) 1.0
- Fac 3 .50 (.10) .50 (.10) 1.0
28Fully Standardized Solution, Sample 3
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- V1 .50 .0 .0 .25
- V2 .50 .0 .0 .25
- V3 .50 .0 .0 .25
- N1 .0 .70 .0 .49
- N2 .0 .50 .0 .25
- N3 .0 .30 .0 .09
- S1 .0 .0 .30 .09
- S2 .0 .0 .30 .09
-
- Fac 1 1.0
- Fac 2 .50 1.0
- Fac 3 .50 .50 1.0
29Oblique Component Solution, Sample 3
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- V1 .704 .002 .005 .496
- V2 .704 .002 .005 .496
- V3 .704 .002 .005 .496
- N1 .105 .715 .065 .575
- N2 .002 .725 .014 .538
- N3 .116 .670 .089 .417
- S1 .005 .002 .735 .540
- S2 .005 .002 .735 .540
-
- Fac 1 1.0
- Fac 2 .256 1.0
- Fac 3 .147 .147 1.0
30An Early Comparison
- McCloy, Metheny, Knott (1938)
- Published in Psychometrika
- Sought to compare Common FA (Thurstones method)
vs. Principal Components Analysis (Hotellings) - Stated that Principal Components can be rotated
- So, both techniques are different means to same
end - Principal difference
- Thurstone inserts largest correlation in row in
the diagonal of each residual matrix - Hotelling begins with unities and stays with
residual values in each residual matrix
31Hypothetical Factor Matrix (McCloy et al.)
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- 1 .900 .0 .0 .810
- 2 .800 .0 .0 .640
- 3 .0 .700 .0 .490
- 4 .0 .800 .0 .640
- 5 .0 .0 .900 .810
- 6 .0 .0 .600 .360
- 7 .0 .424 .424 .360
- 8 .566 .566 .0 .640
- 9 .495 .0 .495 .490
- 10 .520 .520 .520 .810
32Rotated Factor Matrix (McCloy et al.)
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- 1 .860 .033 .035 .742
- 2 .819 .025 .023 .672
- 3 .014 .726 .000 .527
- 4 .023 .766 .004 .587
- 5 .010 .004 .808 .653
- 6 .008 .029 .645 .417
- 7 .011 .434 .466 .406
- 8 .587 .548 .014 .645
- 9 .516 .038 .512 .530
- 10 .489 .471 .537 .749
33Rotated Component Matrix (McCloy et al.)
- Variable Fac 1 Fac 2 Fac 3 . h2 .
- 1 .906 .055 .063 .828
- 2 .874 .053 .046 .769
- 3 .034 .824 .021 .681
- 4 .050 .859 .006 .740
- 5 .060 .000 .885 .787
- 6 .094 .035 .773 .608
- 7 .054 .519 .525 .548
- 8 .653 .558 .029 .739
- 9 .527 .085 .605 .651
- 10 .527 .477 .552 .810
34An Early Comparison
- McCloy, Metheny, Knott (1938)
- Argued that
- both CFA and PCA were means to same end
- both led to similar pattern of loadings, but
- Thurstones method was more accurate (?h2 .056)
than Hotellings (?h2 .125) but these were
average absolute differences - I averaged signed differences, and Thurstones
method was much accurate (?h2 -.013) than
Hotellings (?h2 .120)
35An Early Comparison
- McCloy, Metheny, Knott (1938)
- Found similar pattern of high and low loadings
from PCA and CFA - But, they found (but did not stress) that PCA led
to decidedly higher loadings - Tukey (1969)
- Amount, as well as direction, is vital
- For any science to advance, we must pay attention
to quantitative variation, not just qualitative
36Regularity Conditions or Phenomena
- Relations between population values of P and R
- Features of eigenvalues
- Covariances among residuals
- Need a theory of errors
- Recount my first exposure
- Should have to acknowledge (predict? live with?)
the patterns in residuals
37Practicing Scientists vs. Statisticians
- Interesting dimension along which researchers
fall - Practicing Statisticians
- scientists (Dark side)
- use CFA prefer PCA
- use regression warn of probs
- analysis errors in vars
38Practicing Scientists vs. Statisticians
- At first seems odd
- Practicing scientist prefers
- CFA (which partials out errors of measurement and
specific variance) - Regression analysis despite the implicit
assumption of perfect measurement - Statistician prefers
- To warn of ill-effects of errors in variables on
results of regression analysis - PCA (despite lack of attention to measurement
error), perhaps due to elegant, reduced rank
representation
39Practicing Scientists vs. Statisticians
- On second thought, is rational
- Practicing scientist prefers
- Assumptions that residuals (in CFA or regression
analysis) are independent, uncorrelated, normally
distributed - Statistician prefers
- To try to circumvent (or solve) problem of errors
in variables in regression - To relegate errors in variables problems in PCA
to that part of solution (GG) that is orthogonal
to the retained part, thereby circumventing (or
solving) this problem
40Regularity Conditions or Phenomena
- In Common Factor Analysis,
- Char. of correlations ? Char. of variables 11
- Char. of correlations ? Char. of variables 11
- In Principal Component Analysis,
- Char. of correlations ? Char. of variables 11
(??) - Char. of correlations ? Char. of
variables many1
41Manifest Correlations
- Var V1 V2 V3 V4 V5 V6
- V1 1.00
- V2 .64 1.00
- V3 .64 .64 1.00
- V4 .64 .64 .64 1.00
- V5 .64 .64 .64 .64 1.00
- V6 .64 .64 .64 .64 .64 1.00
-
42Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.92 .80 .64 2.28 .87 .76
- V2 .0 .80 .64 .36 .87 .76
- V3 .0 .80 .64 .36 .87 .76
- V4
- V5
- V6
- P1 1.0 1.0
- P2
-
43Residual Covariances CFA
- Var V1 V2 V3 V4 V5 V6
- V1 .36 .00 .00
- V2 .00 .36 .00
- V3 .00 .00 .36
- V4
- V5
- V6
- Covs below diag., corrs above diag.
44Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .24 -.50 -.50
- V2 -.12 .24 -.50
- V3 -.12 -.12 .24
- V4
- V5
- V6
- Covs below diag., corrs above diag.
45Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 3.84 .80 .64 4.20 .84 .70
- V2 .0 .80 .64 .36 .84 .70
- V3 .0 .80 .64 .36 .84 .70
- V4 .0 .80 .64 .36 .84 .70
- V5 .0 .80 .64 .36 .84 .70
- V6 .0 .80 .64 .36 .84 .70
- P1 1.0 1.0
- P2
-
46Residual Covariances CFA
- Var V1 V2 V3 V4 V5 V6
- V1 .36 .00 .00 .00 .00 .00
- V2 .00 .36 .00 .00 .00 .00
- V3 .00 .00 .36 .00 .00 .00
- V4 .00 .00 .00 .36 .00 .00
- V5 .00 .00 .00 .00 .36 .00
- V6 .00 .00 .00 .00 .00 .36
- Covs below diag., corrs above diag.
47Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .30 -.20 -.20 -.20 -.20 -.20
- V2 -.06 .30 -.20 -.20 -.20 -.20
- V3 -.06 -.06 .30 -.20 -.20 -.20
- V4 -.06 -.06 -.06 .30 -.20 -.20
- V5 -.06 -.06 -.06 -.06 .30 -.20
- V6 -.06 -.06 -.06 -.06 -.06 .30
- Covs below diag., corrs above diag.
48Regularity Conditions or Phenomena
- In Common Factor Analysis,
- If (a) the model fits in the population, (b)
there is one factor, and (c) communalities are
estimated optimally, - Single non-zero eigenvalue
- Factor loadings and residual variances for first
three variables are unaffected by addition of 3
identical variables - Residuals specific error variance
- Residual matrix is diagonal
49Regularity Conditions or Phenomena
- In Principal Component Analysis,
- If (a) the common factor model fits in the
population, (b) there is one factor, and (c)
unities are retained on the main diagonal, - Single large eigenvalue, plus (p 1) identical,
smaller eigenvalues - Residual component matrix G is independent of the
space defined by Fc - But, residual covariance matrix is clearly
non-diagonal - And, (a) population component loadings and (b)
residual variances and covariances vary as a
function of number of manifest variables!
50Manifest Correlations
- Var V1 V2 V3 V4 V5 V6
- V1 1.00
- V2 .36 1.00
- V3 .36 .36 1.00
- V4 .36 .36 .36 1.00
- V5 .36 .36 .36 .36 1.00
- V6 .36 .36 .36 .36 .36 1.00
-
51Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.08 .60 .36 1.72 .76 .57
- V2 .0 .60 .36 .64 .76 .57
- V3 .0 .60 .36 .64 .76 .57
- V4
- V5
- V6
- P1 1.0 1.0
- P2
-
52Residual Covariances CFA
- Var V1 V2 V3 V4 V5 V6
- V1 .64 .00 .00
- V2 .00 .64 .00
- V3 .00 .00 .64
- V4
- V5
- V6
- Covs below diag., corrs above diag.
53Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .43 -.50 -.50
- V2 -.21 .43 -.50
- V3 -.21 -.21 .43
- V4
- V5
- V6
- Covs below diag., corrs above diag.
54Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 2.16 .60 .36 4.80 .68 .47
- V2 .0 .60 .36 .64 .68 .47
- V3 .0 .60 .36 .64 .68 .47
- V4 .0 .60 .36 .64 .68 .47
- V5 .0 .60 .36 .64 .68 .47
- V6 .0 .60 .36 .64 .68 .47
- P1 1.0 1.0
- P2
-
55Residual Covariances CFA
- Var V1 V2 V3 V4 V5 V6
- V1 .64 .00 .00 .00 .00 .00
- V2 .00 .64 .00 .00 .00 .00
- V3 .00 .00 .64 .00 .00 .00
- V4 .00 .00 .00 .64 .00 .00
- V5 .00 .00 .00 .00 .64 .00
- V6 .00 .00 .00 .00 .00 .64
- Covs below diag., corrs above diag.
56Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .53 -.20 -.20 -.20 -.20 -.20
- V2 -.11 .53 -.20 -.20 -.20 -.20
- V3 -.11 -.11 .53 -.20 -.20 -.20
- V4 -.11 -.11 -.11 .53 -.20 -.20
- V5 -.11 -.11 -.11 -.11 .53 -.20
- V6 -.11 -.11 -.11 -.11 -.11 .53
- Covs below diag., corrs above diag.
57Regularity Conditions or Phenomena
- So, the difference between population
parameters from CFA and PCA diverge more - (a) the fewer the number of indicators per
dimension, and - (b) the lower the true communality
- But, some regularities still seem to hold
(although these vary with the number of
indicators) - regular estimates of loadings
- regular magnitude of residual covariance
- regular magnitude of residual covariance
- regular form of eigenvalue structure
58Regularity Conditions or Phenomena
- But, what if we have variation in loadings?
59Manifest Correlations
- Var V1 V2 V3 V4 V5 V6
- V1 1.00
- V2 .64 1.00
- V3 .64 .64 1.00
- V4 .48 .48 .48 1.00
- V5 .48 .48 .48 .36 1.00
- V6 .48 .48 .48 .36 .36 1.00
-
60Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 3.00 .80 .64 3.47 .83 .69
- V2 .0 .80 .64 .64 .83 .69
- V3 .0 .80 .64 .64 .83 .69
- V4 .0 .60 .36 .53 .68 .47
- V5 .0 .60 .36 .36 .68 .47
- V6 .0 .60 .36 .36 .68 .47
- P1 1.0 1.0
- P2
-
61Residual Covariances CFA
- Var V1 V2 V3 V4 V5 V6
- V1 .36 .00 .00 .00 .00 .00
- V2 .00 .36 .00 .00 .00 .00
- V3 .00 .00 .36 .00 .00 .00
- V4 .00 .00 .00 .64 .00 .00
- V5 .00 .00 .00 .00 .64 .00
- V6 .00 .00 .00 .00 .00 .64
- Covs below diag., corrs above diag.
62Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .31 -.15 -.15 -.21 -.21 -.21
- V2 -.05 .31 -.15 -.21 -.21 -.21
- V3 -.05 -.05 .31 -.21 -.21 -.21
- V4 -.09 -.09 -.09 .53 -.20 -.20
- V5 -.09 -.09 -.09 -.11 .53 -.20
- V6 -.09 -.09 -.09 -.11 -.11 .53
- Covs below diag., corrs above diag.
63Regularity Conditions or Phenomena
- So, with variation in loadings
- One piece of approximate stability
- regular estimates of loadings
- But, sacrifice
- regular magnitude of residual covariance
- regular magnitude of residual covariance
- regular form of eigenvalue structure
64Regularity Conditions or Phenomena
- But, what if we have multiple factors?
- Lets start with
- (a) equal loadings
- (b) orthogonal factors
65Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.08 .60 .0 .64 1.72 .76 .0 .57
- V2 1.08 .60 .0 .64 1.72 .76 .0 .57
- V3 .0 .60 .0 .64 .64 .76 .0 .57
- V4 .0 .0 .60 .64 .64 .0 .76 .57
- V5 .0 .0 .60 .64 .64 .0 .76 .57
- V6 .0 .0 .60 .64 .64 .0 .76 .57
- P1 1.0 1.0
- P2 .0 1.0 .0 1.0
-
66Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .43 -.50 -.50 .00 .00 .00
- V2 -.21 .43 -.50 .00 .00 .00
- V3 -.21 -.21 .43 .00 .00 .00
- V4 .00 .00 .00 .43 -.50 -.50
- V5 .00 .00 .00 -.21 .43 -.50
- V6 .00 .00 .00 -.21 -.21 .43
- Covs below diag., corrs above diag.
67Regularity Conditions or Phenomena
- So, strange result
- Same factor inflation as with 1-factor, 3
indicators - Same within-factor residual covariances as for
1-factor, 3 indicators - But, between-factor residual covariances 0!
- Lets go to
- (a) equal loadings, but
- (b) oblique factors
68Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.62 .60 .0 .64 2.26 .76 .0 .57
- V2 .54 .60 .0 .64 1.18 .76 .0 .57
- V3 .0 .60 .0 .64 .64 .76 .0 .57
- V4 .0 .0 .60 .64 .64 .0 .76 .57
- V5 .0 .0 .60 .64 .64 .0 .76 .57
- V6 .0 .0 .60 .64 .64 .0 .76 .57
- P1 1.0 1.0
- P2 .5 1.0 .31 1.0
-
69Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .43 -.50 -.50 .00 .00 .00
- V2 -.21 .43 -.50 .00 .00 .00
- V3 -.21 -.21 .43 .00 .00 .00
- V4 .00 .00 .00 .43 -.50 -.50
- V5 .00 .00 .00 -.21 .43 -.50
- V6 .00 .00 .00 -.21 -.21 .43
- Covs below diag., corrs above diag.
70Regularity Conditions or Phenomena
- So, strange result
- Same factor inflation as with 1-factor, 3
indicators - Reduced correlation between factors
- But, residual covariances matrix is identical!
- Lets go to
- (a) unequal loadings, and
- (b) orthogonal factors
71Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.16 .80 .0 .36 1.70 .83 .0 .68
- V2 1.16 .60 .0 .64 1.70 .78 .0 .61
- V3 .0 .40 .0 .84 .79 .64 .0 .41
- V4 .0 .0 .80 .36 .79 .0 .83 .68
- V5 .0 .0 .60 .64 .51 .0 .78 .61
- V6 .0 .0 .40 .84 .51 .0 .64 .41
- P1 1.0 1.0
- P2 .0 1.0 .0 1.0
-
72Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .32 -.47 -.48 .00 .00 .00
- V2 -.16 .39 -.55 .00 .00 .00
- V3 -.21 -.26 .59 .00 .00 .00
- V4 .00 .00 .00 .32 -.47 -.48
- V5 .00 .00 .00 -.16 .39 -.55
- V6 .00 .00 .00 -.21 -.26 .59
- Covs below diag., corrs above diag.
73Regularity Conditions or Phenomena
- So, strange result
- Different factor inflation than with 1-factor, 3
indicators - Reduced correlation between factors
- But, residual covariances matrix has unequal
covariances and correlations among residuals, but
between-factor covariances 0! - Lets go to
- (a) unequal loadings, and
- (b) oblique factors
74Eigenvalues, Loadings, and Explained Variance
- Var EV P1 P2 h2 EVc Pc1 Pc2 hc2
- V1 1.74 .80 .0 .36 2.27 .77 .11 .66
- V2 .58 .60 .0 .64 1.16 .77 .00 .59
- V3 .0 .40 .0 .84 .79 .71 -.12 .46
- V4 .0 .0 .80 .36 .77 .11 .77 .66
- V5 .0 .0 .60 .64 .52 .00 .77 .59
- V6 .0 .0 .40 .84 .49 -.12 .71 .46
- P1 1.0 1.0
- P2 .5 1.0 .32 1.0
-
75Residual Covariances PCA
- Var V1 V2 V3 V4 V5 V6
- V1 .34 -.38 -.49 -.13 -.11 .01
- V2 -.14 .41 -.59 -.11 -.04 .07
- V3 -.21 -.28 .54 .01 .07 .16
- V4 -.04 -.04 .00 .34 -.38 -.49
- V5 -.04 -.02 .03 -.14 .41 -.59
- V6 .00 .03 .08 -.21 -.28 .54
- Covs below diag., corrs above diag.
76Regularity Conditions or Phenomena
- So, strange result
- Extremely different factor inflation than with
1-factor, 3 indicators - Largest loading is now UNderrepresented
- Very different population factor loadings (.8,
.6, .4) have very similar component loadings - Now, between-factor covariances are not zero, and
some are positive!
77R from Component Parameters
- All the preceding from a CFA view
- Develop parameters from a CF model
- Analyze using CFA and PCA
- CFA procedures recover parameters
- PCA procedures exhibit failings or anomalies
- So What? What else could you expect?
- Challenge (to me)
- Generate data from a PC model
- Analyze using CFA and PCA
- PCA should recover parameters, CFA should exhibit
problems and/or anomalies
78R from Component Parameters
- Difficult to do
- Leads to
- Impractical, unacceptable outcomes, from the
point of view of the practicing scientist - Crucial indeterminacies with the PCA model
79R from Component Parameters
- Impractical, unacceptable outcomes, from the
point of view of the practicing scientist
80Manifest Correlations
- Var V1 V2 V3 V4 V5 V6
- V1 1.00
- V2 .46 1.00
- V3 .46 .46 1.00
- V4
- V5
- V6
- First principal component has 3 loadings of .8
- First principal factor has 3 loadings of
(.46)1/2, or about .67 -
81Manifest Correlations
- Var V1 V2 V3 V4 V5 V6
- V1 1.00
- V2 .568 1.00
- V3 .568 .568 1.00
- V4 .568 .568 .568 1.00
- V5 .568 .568 .568 .568 1.00
- V6 .568 .568 .568 .568 .568 1.00
- First principal component has 6 loadings of .8
- First principal factor has 6 loadings of
(.568)1/2, or about .75 - But, one would have to alter the first 3 tests,
as their population correlations are altered
82R from Component Parameters
- Crucial indeterminacies with the PCA model
- Consider case of well-identified CFA model 6
manifest variables loading on a single factor - One could easily construct the population matrix
as FF uniquenesses to ensure diag(R) I - With 6 manifest variables, 6(7)/2 21 unique
elements of covariance matrix - 12 parameter estimates
- therefore 9 df
83R from Component Parameters
- Crucial indeterminacies with the PCA model
- Consider now 6 manifest variables with defined
loadings on first PC - To estimate the correlation matrix, must come up
with the remaining 5 PCs - A start Fc G Fc G diag, so
orthogonality constraint yields 6(5)/2 15
equations - Sum of squares across rows 1, so 6 more
equations - In short, 15 equations, but 30 unknowns (loadings
of 6 variables on the 5 components in G) - Therefore, an infinite of R matrices will lead
to the stated first PC
84R from Component Parameters
- Crucial indeterminacies with the PCA model
- Related to the Ledermann number, but in reverse
- For example, with 10 manifest variables, one can
minimally overdetermine no more than 6 factors
(so use 6 or fewer factors) - But, here, one must specify at least 6 components
(to ensure more equations than unknowns) to
ensure a unique R - If fewer than 6 components are specified, an
infinite number of solutions for R can be found
85Conclusions CFA
- CFA factor models may not hold in the population
- But, if they do (in a theoretical population)
- The notion of a population factor loading is
realistic - The population factor loading is unaffected by
presence of other variables, as long as the
battery contains the same factors - In one-factor case, loadings can vary from 0 to 1
(provided reflection of variables is possible) - This generalizes to the case of multiple factors
86Conclusions CFA
- CFA factor models may not hold in the population
- But, if they do
- Residual (i.e., unique) variances are
uncorrelated - Magnitude of unique variance for a given variable
is unaffected by other variables in the analysis
87Conclusions PCA
- PCA factor models cannot hold in the population
(because all variables have measurement error) - Moreover
- The notion of the population component loading
for a particular manifest variable is meaningless - The population component loading is affected
strongly by presence of other variables - SEs for component loadings have no interpretation
- In the one-component case, component loadings can
only vary from (1/m)1/2 to 1, where m is the
number of indicators for the dimension - Generalizes to multiple component case
88Conclusions PCA
- PCA factor models cannot hold in the population
(because all variables have measurement error) - Moreover
- Residual variables are correlated, often in
unpredictable and seemingly haphazard fashion - Magnitude of unique variance and covariances for
a given manifest variable are affected by other
variables in the analysis
89Conclusions PCA
- PCA factor models cannot hold in the population
(because all variables have measurement error) - Moreover
- Finally, generating data from a PC model leads
either to - Impractical, unacceptable outcomes
- Indeterminacies in the parameter R relations
90(No Transcript)
91(No Transcript)