Title: Determining the
1Determining the Of PCs
- Remembering the process
- Some cautionary comments
- Statistical approaches
- Mathematical approaches
- Nontrivial factors approaches
- Help thats coming later
2How the process really works
- Heres the series of steps we talked about
earlier. - factors decision
- Rotate the factors
- interpreting the factors
- factor scores
These steps arent made independently and done
in this order!
Considering the interpretations of the factors
can aid the factors decision!
Considering how the factor scores (representing
the factors) relate to each other and to
variables external to the factoring can aid both
the factors decision and interpretation.
3Some cautionary comments
- Remember that the factors decision ...
- is influenced by the particular variables in the
analysis - so, unless you are working with a closed set of
variables, there probably isnt a real
factors - the whole story includes how the factors
changes with what variable additions and
deletions - how do these change your interpretation of
factors and variables - isnt independent of interpretability
- a factor is only real if its meaningful
- be cautious of both making up and missing
meaning
4Some cautionary comments, cont.
- agreement across decision rules is helpful
- well talk about several decision rules, each of
which is flawed in known ways - replication is convincing
- split-half and hold-out sampling can help
- separate-sample replication is more convincing
- convergence research is more convincing
- not just replicating, but correctly anticipating
what will be the results of adding, deleting
variables across samplings - Remember this is Exploratory factoring
- explore consider alternative factor solutions
- Want to be really convincing? Use confirmatory
factoring!!
5Statistical Procedures
- PC analyses are extracted from a correlation
matrix - PCs should only be extracted if there is
systematic covariation in the correlation
matrix - This is know as the sphericity question
- Note the test asks if there the next PC should
be extracted - There are two different sphericity tests
- Whether there is any systematic covariation in
the original R - Whether there is any systematic covariation left
in the partial R, after a given number of factors
has been extracted - Both tests are called Bartletts Sphericity Test
6Statistical Procedures, cont.
- Applying Bartletts Sphericity Tests
- Retaining H0 means dont extract another
factor - Rejecting H0 means extract the next factor
- Significance tests provide a p-value, and so a
known probability that the next factor is 1 too
many (a type I error) - Like all significance tests, these are influenced
by N - larger N more power more likely to reject H0
more likely to keep the next factor ( make a
Type I error) - Quandary?!?
- Samples large enough to have a stable R are
likely to have excessive power and lead to
over factoring - Be sure to consider variance, replication
interpretability
7Mathematical Procedures
- The most commonly applied decision rule (and the
default in most stats packages -- chicken egg
?) is the ? gt 1.00 rule heres the logic - Part 1
- Imagine a spherical R (of k variables)
- each variable is independent and carries unique
information - so, each variable has 1/kth of the information in
R - For a normal R (of k variables)
- each variable, on average, has 1/kth of the
information in R
8Mathematical Procedure, cont.
- Part 2
- The trace of a matrix is the sum of its
diagonal - So, the trace of R (with 1s in the diag) k (
vars) - ? tells the amount of variance in R accounted for
by each extracted PC - for a full PC solution ? ? k (accounts for all
variance) - Part 3
- PC is about data reduction and parsimony
- trading fewer more-complex things (PCs - linear
combinations of variables) for fewer more-simple
things (original variables)
9Mathematical Procedure, cont.
- Putting it all together (hold on tight !)
- Any PC with ? gt 1.00 accounts for more variance
than the average variable in that R - That PC has parsimony -- the more complex
composite has more information than the average
variable - Any PC with ? lt 1.00 accounts for less variance
than the average variable in that R - That PC doesnt have parsimony -- the more
complex composite has more no information than
the average variable
10Mathematical Procedure, cont.
- There have been examinations the accuracy of this
criterion - The usual procedure is to generate a set of
variables from a known number of factors (vk
b1kPC1 bfkPCf, etc.) --- while varying N,
factors, PCs communalities - Then factor those variables and see if ? gt 1.00
leads to the correct number of factors - Results -- the rule works pretty well on the
average, which really means that it gets the
factors right some times, underestimates
sometimes and overestimates sometimes - No one has generated an accurate rule for
assessing when which of these occurs - But the rule is most accurate with k lt 40, f
between k/5 and k/3 and N gt 300
11Nontrivial Factors Procedures
- These common sense approaches became increasing
common as - the limitations of statistical and mathematical
procedures became better known - the distinction between exploratory and
confirmatory factoring developed and the crucial
role of successful exploring became better
known - These procedures are more like judgement calls
and require greater application of content
knowledge and persuasion, but are often the
basis of good factorings !!
12Nontrivial factors Procedures, cont.
- Scree -- the junk that piles up at the foot of
an glacier - a diminishing returns approach
- plot the ? for each factor and look for the
elbow - Old rule -- factors elbow (1966 3 below)
- New rule -- factors elbow - 1 (1967 2
below)
- Sometimes there isnt a clear elbow -- try
another rule - This approach seems to work best when combined
with attention to interpretability !!
? 4 2 0
PC 1 2 3 4 5 6
13An Example
A buddy in graduate school wanted to build a
measure of contemporary morality. He started
with the 10 Commandments and the 7 Deadly
Sins and created a 56-item scale with 8
subscales. His scree plot looked like How many
factors?
?
1? big elbow at 2, so 67 rule suggests a
single factor, which clearly accounts
for the biggest portion of variance 7? smaller
elbow at 8, so 67 rule suggests 7 8? smaller
elbow at 8, 66 rule gives the 8 he was looking
for also 8th has ? gt 1.0 and 9th had ? lt 1.0
0 1 10 20
1 8 20
40 56
- Remember that these are subscales of a central
construct, so.. - items will have substantial correlations both
within and between subscales - to maximize the variance accounted for, the
first factor is likely to pull in all these
inter-correlated variables, leading to a large ?
for the first (general) factor and much smaller
?s for subsequent factors - This is a common scree configuration when
factoring items from a multi-subscale scale!
14Nontrivial factors Procedures, cont.
- of variance accounted for
- keep the factors necessary to account for
enough variance -- 75 to 90 are common goals - Interpretability -- meaningfulness of resulting
PCs - Depends greatly upon content knowledge
- Beware factoring illusions
- Were good at finding patterns, even when
theyre not really there - Rotational Survival -- akin to meaningfulness
- Consider different factors with different types
of rotation -- see which factors keep showing
up - Replicability -- split, holdout, or independent
samples - What PCs appear consistently across factorings?
- Jack-knifing
- Re-sampling from a single dataset looking for
consistency of factors
15Help thats coming later
- If you have a reasonably clear factor structure
all the different ways of deciding the factors
are likely to give the same result (except maybe
statistical likely to over-factor with N) - Remember that what the factors are can be very
important in deciding how many factors there
are - Consider the different interpretations of the
factors from the different -of-factors solutions - we can also look at the correlations between the
factors to help with these decisions - Remember that what the factors do can be very
important in deciding how many factors there
are - you can look at how factors from the different
-of-factor solutions correlate with other
variables that are not in the factor analysis