Title: Principal Components
1Principal Components
Principal components is a method of dimension
reduction. Suppose that you have a dozen
variables that are correlated. You might use
principal components analysis to reduce your 12
measures to a few principal components. Unlike
factor analysis, principal components analysis is
not usually used to identify underlying latent
variables.
2Principal Components
Principal components is a technique that requires
a large sample size. Principal components is
based on the correlation matrix of the variables
involved, and correlations usually need a large
sample size before they stabilize.
3Principal Components
As a rule of thumb, a bare minimum of 10
observations per variable is necessary to avoid
computational difficulties.
Comrey Lee (1992) A First Course In Factor
Analysis
4Principal Components
In this example we have included many options,
while you may not wish to use all of these
options, we have included them here to aid in the
explanation of the analysis.
5Principal Components
In this example we examine students assessment of
academic courses. We restrict attention to 12
variables.
Scored on a five point Likert scale.
6Principal Components
In this example we examine students assessment of
academic courses. We restrict attention to 12
variables.
Scored on a five point Likert scale.
7Principal Components
Analyze gt Dimension Reduction gt Factor
8Principal Components
Select variables 13-24 that is instructor well
prepared to compared to other courses this
course was. By using the arrow button.
Use the buttons at the side of the screen to set
additional options.
9Principal Components
Use the buttons at the side of the screen to set
the Descriptives employ the Continue button to
return to the main Factor Analysis screen.
10Principal Components
Use the buttons at the side of the screen to set
the Extraction employ the Continue button to
return to the main Factor Analysis screen.
Select the appropriate method and the eigen value
criteria, set at 1. It is essential to obtain a
scree plot.
11Principal Components
Select the OK button to proceed with the analysis.
12Principal Components
The descriptive statistics table is output
because we used the univariate option.
13Principal Components
Mean - These are the means of the variables used
in the factor analysis. Are these appropriate
for a Likert scale?
14Principal Components
Std. Deviation - These are the standard
deviations of the variables used in the factor
analysis. Are these appropriate for a Likert
scale?
15Principal Components
Analysis N - This is the number of cases used in
the factor analysis.
16Principal Components
The correlation matrix table was included in the
output because we included the correlation
option. This table gives the correlations
between the original variables (which were
specified). Before conducting a principal
components analysis, you want to check the
correlations between the variables. If any of the
correlations are too high (say above 0.9), you
may need to remove one of the variables from the
analysis, as the two variables seem to be
measuring the same thing. Another alternative
would be to combine the variables in some way
(perhaps by taking the average).
17Principal Components
If the correlations are too low, say below 0.1,
then one or more of the variables might load only
onto one principal component (in other words,
make its own principal component). This is not
helpful, as the whole point of the analysis is to
reduce the number of items (variables).
18Principal Components
The correlation matrix is extremely large.
19Principal Components
The correlation matrix is extremely large.
20Principal Components
Kaiser-Meyer-Olkin Measure of Sampling Adequacy
This measure varies between 0 and 1, and values
closer to 1 are better. A value of 0.6 is a
suggested minimum.
21Principal Components
Bartlett's Test of Sphericity - This tests the
null hypothesis that the correlation matrix is an
identity matrix. An identity matrix is matrix in
which all of the diagonal elements are 1 and all
off diagonal elements are 0. You want to reject
this null hypothesis.
22Principal Components
Taken together, these tests provide a minimum
standard, which should be passed before a
principal components analysis (or a factor
analysis) should be conducted.
23Principal Components
Communalities - This is the proportion of each
variable's variance that can be explained by the
principal components (e.g. the underlying latent
continua).
24Principal Components
Initial - By definition, the initial value of the
communality in a principal components analysis is
1.
25Principal Components
Extraction - The values in this column indicate
the proportion of each variable's variance that
can be explained by the principal components.
Variables with high values are well represented
in the common factor space, while variables with
low values are not well represented. (In this
example, we don't have any particularly low
values.)
26Principal Components
Component - There are as many components
extracted during a principal components analysis,
as there are variables that are put into it. In
our example, we used 12 variables (item13 through
item24), so we have 12 components.
27Principal Components
Initial eigen values - eigen values are the
variances of the principal components. Because we
conducted our principal components analysis on
the correlation matrix, the variables are
standardized, which means that the each variable
has a variance of 1, and the total variance is
equal to the number of variables used in the
analysis, in this case, 12.
28Principal Components
Initial eigen values - Total - This column
contains the eigen values. The first component
will always account for the most variance (and
hence have the highest eigen value), and the next
component will account for as much of the left
over variance as it can, and so on. Hence, each
successive component will account for less and
less variance.
29Principal Components
Initial eigen values - of Variance - This
column contains the percent of variance accounted
for by each principal component (6.249/12 0.52).
30Principal Components
Initial eigen values - Cumulative - This column
contains the cumulative percentage of variance
accounted for by the current and all preceding
principal components. For example, the second row
shows a value of 62.322. This means that the
first two components together account for 62.322
of the total variance.
31Principal Components
Extraction Sums of Squared Loadings - The three
columns in this half of the table exactly
reproduce the values given on the same row on the
left side of the table. The number of rows
reproduced on the right side of the table is
determined by the number of principal components
whose eigen values are 1 or greater.
Totally agree
32Principal Components
The scree plot graphs the eigen value against the
component number.
33Principal Components
In general, we are interested in keeping only
those principal components whose eigen values are
greater than 1 (we set this value).
34Principal Components
Component Matrix - This table contains component
loadings, which are the correlations between the
variable and the component. Because these are
correlations, possible values range from -1 to
1. It is usual to not report any correlations
that are less than .3. As shown.
35Principal Components
Component - The columns under this heading are
the principal components that have been
extracted. As you can see by the footnote
provided by SPSS, two components were extracted
(the two components that had an eigen value
greater than 1).
36Principal Components
You usually do not try to interpret the
components in the way that you would factors that
have been extracted from a factor analysis.
Rather, most people are interested in the
component scores, which are used for dimension
reduction (as opposed to factor analysis where
you are looking for underlying latent continua).
37Principal Components
For a component plot employ the Rotation option
38Principal Components
Its always wise to plot your results. Note the
clusters.
39Principal Components
Summary Principal Components is used to help
understand the covariance structure in the
original variables and/or to create a smaller
number of variables using this structure. Factor
Analysis like principal components is used to
summarise the data covariance structure in a
smaller number of dimensions. The emphasis is the
identification of underlying factors that might
explain the dimensions associated with large data
variability.
40Similarities
Principal Components Analysis and Factor
Analysis have these assumptions in
common Measurement scale is interval or ratio
level. Random sample - at least 5 observations
per observed variable and at least 100
observations. Larger sample sizes recommended
for more stable estimates, 10-20 observations per
observed variable.
41Similarities
Principal Components Analysis and Factor
Analysis have these assumptions in common Over
sample to compensate for missing values Linear
relationship between observed variables Normal
distribution for each observed variable Each
pair of observed variables has a bivariate normal
distribution Are both variable reduction
techniques. If communalities are large, close to
1.00, results could be similar.
42Similarities
Principal Components Analysis assumes the
absence of outliers in the data. Factor
Analysis assumes a multivariate normal
distribution when using Maximum Likelihood
extraction method.
43Differences