Title: Principal Components An Introduction
1Principal ComponentsAn Introduction
- exploratory factoring
- meaning application of principal components
- Basic steps in a PC analysis
- PC extraction process
- PCs determination
- Statistical approaches
- Mathematical approaches
- Nontrivial factors approaches
2Exploratory vs. Confirmatory Factoring
- Exploratory Factoring when we do not have RH
about . . . - the number of factors
- what variables load on which factors
- we will explore the factor structure of the
variables, consider multiple alternative
solutions, and arrive at a post hoc solution - Weak Confirmatory Factoring when we have RH
about the factors and factor memberships - we will test the proposed weak a priori factor
structure - Strong Confirmatory Factoring when we have RH
about relative strength of contribution to
factors by variables - we will test the proposed strong a priori
factor structure
3Meaning of Principal Components
- Component analyses are those that are based on
the full correlation matrix - 1.00s in the diagonal
- yep, theres other kinds, more later
- Principal analyses are those for which each
successive factor... - accounts for maximum available variance
- is orthogonal (uncorrelated, independent) with
all prior factors - full solution (as many factors as variables)
accounts for all the variance
4Applications of PC analysis
- Components analysis is a kind of data reduction
- start with an inter-related set of measured
variables - identify a smaller set of composite variables
that can be constructed from the measured
variables and that carry as much of their
information as possible - A Full components solution ...
- has as many PCs as variables
- accounts for 100 of the variables variance
- each variable has a final communality of 1.00
all of its variance is accounted for by the full
set of PCs - A Truncated components solution
- has fewer PCs than variables
- accounts for lt100 of the variables variance
- each variable has a communality lt 1.00 -- not all
of its variance is accounted for by the PCs
5The basic steps of a PC analysis
- Compute the correlation matrix
- Extract a full components solution
- Determine the number of components to keep
- total variance accounted for
- variable communalities
- Rotate the components and interpret (name)
them - Structure weights gt .3-.4 define which
variables load - Compute component scores
- Apply components solution
- theoretically -- understand meaning of the data
reduction - statistically -- use the component scores in
other analyses
- interpretability
- replicability
6PC Factor Extraction
- Extraction is the process of forming PCs as
linear combinations of the measured variables - PC1 b11X1 b21X2 bk1Xk
- PC2 b12X1 b22X2
bk2Xk - PCf b1fX1 b2fX2 bkfXk
- Heres the thing to remember
- We usually perform factor analyses to find out
how many groups of related variables there are
however - The mathematical goal of extraction is to
reproduce the variables variance, efficiently
7PC Factor Extraction, cont.
- Consider R on the right
- Obviously there are 2 kinds of information among
these 4 variables - X1 X2 X3 X4
- Looks like the PCs should be formed as,
X1 X2 X3 X4 X1 1.0 X2 .7
1.0 X3 .3 .3 1.0 X4 .3 .3
.5 1.0
- PC1 b11X1 b21X2 -- capturing the
information in X1 X2 - PC2 b32X3 b42X4 -- capturing the
information in X3 X4 - But remember, PC extraction isnt trying to
group variables it is trying to reproduce
variance - notice that there are cross correlations
between the groups of variables !!
8PC Factor Extraction, cont.
- So, because of the cross correlations, in order
to maximize the variance reproduced, PC1 will be
formed more like ... - PC1 .5X1 .5X2 .4X3 .4X4
- Notice that all the variables contribute to
defining PC1 - Notice the slightly higher loadings for X1 X2
- Because PC1 didnt focus on the X1 X2 variable
group or X3 X4 variable group, there
will still be variance to account for in both,
and PC2 will be formed, probably something like - PC2 .3X1 .3X2 - .4X3 - .4X4
- Notice that all the variables contribute to
defining PC2 - Notice the slightly higher loadings for X3 X4
9PC Factor Extraction, cont.
- While this set of PCs will account for lots of
the variables variance -- it doesnt provide a
very satisfactory interpretation - PC1 has all 4 variables loading on it
- PC2 has all 4 variables loading on it and 2 of
then have negative weights, even though all the
variables are positively correlated with each
other - The goal here was point out what extraction does
(maximize variance accounted for) and what it
doesnt do (find groups of variables)
10Determining the Number of PCs
- Determining the number of PCs is arguably the
most important decision in the analysis - rotation, interpretation and use of the PCs are
all influenced by the how may PCs are kept for
those processes - there are many different procedures available
none are guaranteed to work !! - probably the best approach to determining the
of PCS - remember that this is an exploratory factoring
-- that means you dont have decent RH about the
number of factors - So Explore
- consider different reasonable PCs and try
them out - rotate, interpret /or tryout resulting factor
scores from each and then decide
To get started well use the SPSS standard of
? gt 1.00
11Statistical Procedures
- PC analyses are extracted from a correlation
matrix - PCs should only be extracted if there is
systematic covariation in the correlation
matrix - This is know as the sphericity question
- Note the test asks if there the next PC should
be extracted - There are two different sphericity tests
- Whether there is any systematic covariation in
the original R - Whether there is any systematic covariation left
in the partial R, after a given number of factors
has been extracted - Both tests are called Bartletts Sphericity Test
12Statistical Procedures, cont.
- Applying Bartletts Sphericity Tests
- Retaining H0 means dont extract another
factor - Rejecting H0 means extract the next factor
- Significance tests provide a p-value, and so a
known probability that the next factor is 1 too
many (a type I error) - Like all significance tests, these are influenced
by N - larger N more power more likely to reject H0
more likely to keep the next factor ( make a
Type I error) - Quandary?!?
- Samples large enough to have a stable R are
likely to have excessive power and lead to
over factoring - Be sure to consider variance, replication
interpretability
13Mathematical Procedures
- The most commonly applied decision rule (and the
default in most stats packages -- chicken egg
?) is the ? gt 1.00 rule heres the logic - Part 1
- Imagine a spherical R (of k variables)
- each variable is independent and carries unique
information - so, each variable has 1/kth of the information in
R - For a normal R (of k variables)
- each variable, on average, has 1/kth of the
information in R
14Mathematical Procedure, cont.
- Part 2
- The trace of a matrix is the sum of its
diagonal - So, the trace of R (with 1s in the diag) k (
vars) - ? tells the amount of variance in R accounted for
by each extracted PC - for a full PC solution ? ? k (accounts for all
variance) - Part 3
- PC is about data reduction and parsimony
- trading fewer more-complex things (PCs - linear
combinations of variables) for fewer more-simple
things (original variables)
15Mathematical Procedure, cont.
- Putting it all together (hold on tight !)
- Any PC with ? gt 1.00 accounts for more variance
than the average variable in that R - That PC has parsimony -- the more complex
composite has more information than the average
variable - Any PC with ? lt 1.00 accounts for less variance
than the average variable in that R - That PC doesnt have parsimony -- the more
complex composite has more no information than
the average variable
16Mathematical Procedure, cont.
- There have been examinations the accuracy of this
criterion - The usual procedure is to generate a set of
variables from a known number of factors (vk
b1kPC1 bfkPCf, etc.) --- while varying N,
factors, PCs communalities - Then factor those variables and see if ? gt 1.00
leads to the correct number of factors - Results -- the rule works pretty well on the
average, which really means that it gets the
factors right some times, underestimates
sometimes and overestimates sometimes - No one has generated an accurate rule for
assessing when which of these occurs - But the rule is most accurate with k lt 40, f
between k/5 and k/3 and N gt 300
17Nontrivial Factors Procedures
- These common sense approaches became increasing
common as - the limitations of statistical and mathematical
procedures became better known - the distinction between exploratory and
confirmatory factoring developed and the crucial
role of successful exploring became better
known - These procedures are more like judgement calls
and require greater application of content
knowledge and persuasion, but are often the
basis of good factorings !!
18Nontrivial factors Procedures, cont.
- Scree -- the junk that piles up at the foot of
an glacier - a diminishing returns approach
- plot the ? for each factor and look for the
elbow - Old rule -- factors elbow (1966 3 below)
- New rule -- factors elbow - 1 (1967 2
below)
- Sometimes there isnt a clear elbow -- try
another rule - This approach seems to work best when combined
with attention to interpretability !!
? 4 2 0
PC 1 2 3 4 5 6
19An Example
A buddy in graduate school wanted to build a
measure of contemporary morality. He started
with the 10 Commandments and the 7 Deadly
Sins and created a 56-item scale with 8
subscales. His scree plot looked like How many
factors?
?
1? big elbow at 2, so 67 rule suggests a
single factor, which clearly accounts
for the biggest portion of variance 7? smaller
elbow at 8, so 67 rule suggests 7 8? smaller
elbow at 8, 66 rule gives the 8 he was looking
for also 8th has ? gt 1.0 and 9th had ? lt 1.0
0 1 10 20
1 8 20
40 56
- Remember that these are subscales of a central
construct, so.. - items will have substantial correlations both
within and between subscales - to maximize the variance accounted for, the
first factor is likely to pull in all these
inter-correlated variables, leading to a large ?
for the first (general) factor and much smaller
?s for subsequent factors - This is a common scree configuration when
factoring items from a multi-subscale scale!
20Rotation finding groups in the variables
- Factor Rotations
- changing the viewing angle or head tilt of
the factor space - makes the groupings visible in the graph apparent
in the structure matrix
Unrotated Structure PC1 PC2 V1
.7 .5 V2 .6 .6 V3 .6 -.5 V4
.7 -.6
PC1
Rotated Structure PC1 PC2 V1 .7
-.1 V2 .7 .1 V3 .1 .5 V4 .2
.6
PC2
V2
V1
PC1
V3
V4
PC2
21Interpretation Naming groups in the variables
- Usually interpret factors using the rotated
solutions using the rotated - Factors are named for the variables correlated
with them - Usual cutoffs are /- .3 - .4
- So a variable that shares at least 9-16 of
its variance with a factor is used to name that
factor - Variables may load on none, 1 or 2 factors
Rotated Structure PC1 PC2 V1 .7
-.1 V2 .7 .1 V3 .1 .5 V4 .2
.6
This rotated structure is easy PC1 is V1 V2
PC2 is V3 V4 It is seldom this easy !?!?!
22Kinds of Factors
- General Factor
- all or almost all variables load
- there is a dominant underlying theme among the
set of variables which can be represented with a
single composite variable - Group Factor
- some subset of the variables load
- there is an identifiable sub-theme in the
variables that must be represented with a
specific subset of the variables - smaller vs. larger group factors ( vars
variance) - Unique Factor
- single variable loads
23Kinds of Variables
- Univocal variable -- loads on a single factor
- Multivocal variable -- loads on 2 factors
- Nonvocal variable -- doesnt load on any factor
- You should notice a pattern here
- a higher cutoff (e.g., .40) tends to produce
- fewer variables loading on a given factor
- less likely to have a general factor
- fewer multivocal variables
- more nonvocal variables
- a lower cutoff (e.g., .30) tends to produce
- more variables loading on a given factror
- more likely to have a general factor
- more multivocal variables
- fewer nonvocal variables