Title: Factor Analysis and Principal Components
1Factor Analysis and Principal Components
- Factor analysis with principal components
presented as a subset of factor analysis
techniques, which it is subset.
2Principal Components (PC)
- Principal components is about explaining the
variance-covariance - structure, ?, of a set of variables through a
few linear combinations - of these variables.
- In general PC is used for either
- Data reduction
- or
- Interpretation
3So the original data of n measurements on p
variables can be reduced to a to a data set of
n measurements on k principal components. PC
tends to be a means to an end but not the end
itself. That is PC is often not the final
step. The PC may be used for multiple
regression, cluster analysis, etc.
4(No Transcript)
5Find the principal components and the proportion
of the total population of the total population
variance explained by each when the covariance
matrix is
To solve this you will have to go though your
notes, but you can do this even though didnt
give you the formula.
6(No Transcript)
7(No Transcript)
8(No Transcript)
9(No Transcript)
10(No Transcript)
11(No Transcript)
12(No Transcript)
13(No Transcript)
14(No Transcript)
15(No Transcript)
16(No Transcript)
17Factor Analysis The main purpose of factor
analysis is to try to describe the covariance
relationships among many variables in term of a
few underlying, but unobservable, random
quantities called factors. The Orthogonal
Factor Model The observable random vector x,
with p components, has mean ? and covariance
matrix, ?. The factor model proposes that x is
linearly dependent upon a few unobservable
random variables, F1, F2 ,.,Fm , called common
factors and p additional sources of variation
?1, ?2 ,.,?p called error, or specific
factors.
18(No Transcript)
19(No Transcript)
20(No Transcript)
21(No Transcript)
22(No Transcript)
23(No Transcript)
24(No Transcript)
25(No Transcript)
26 Maximum Likelihood Method for Factor
Analysis If the common factors, F, and ?
can be assumed to be normally distributed, then
ML estimates of the factor loadings and ?
variance may be obtained.
27(No Transcript)
28 Factor Rotation All factor loadings
obtained from the initial loadings by a
orthogonal transformation have the same ability
to reproduce the covariance (or correlation)
matrix.
29(No Transcript)
30A Reference
- The following 13 slides comes from
- Multivariate Data Analysis Using SPSS
- By John Zhang
- ARL, IUP
31Factor Analysis-1
- The main goal of factor analysis is data
reduction. A typical use of factor analysis is in
survey research, where a researcher wishes to
represent a number of questions with a smaller
number of factors - Two questions in factor analysis
- How many factors are there and what they
represent (interpretation) - Two technical aids
- Eigenvalues
- Percentage of variance accounted for
32Factor Analysis-2
- Two types of factor analysis
- Exploratory introduce here
- Confirmatory SPSS AMOS
- Theoretical basis
- Correlations among variables are explained by
underlying factors - An example of mathematical 1 factor model for two
variables - V1L1F1E1
- V2L2F1E2
33Factor Analysis-3
- Each variable is composed of a common factor (F1)
multiply by a loading coefficient (L1, L2 the
lambdas or factor loadings) plus a random
component - V1 and V2 correlate because the common factor and
should relate to the factor loadings, thus, the
factor loadings can be estimated by the
correlations - A set of correlations can derive different factor
loadings (i.e. the solutions are not unique) - One should pick the simplest solution
34Factor Analysis-4
That is the findings should not differ by
methodology of analysis nor by sample
- A factor solution needs to confirm
- By a different factor method
- By a different sample
- More on terminology
- Factor loading interpreted as the Pearson
correlation between the variable and the factor - Communality the proportion of variability for a
given variable that is explained by the factor - Extraction the process by which the factors are
determined from a large set of variables
35Factor Analysis-5 (Principle components)
- Principle component one of the extraction
methods - A principle component is a linear combination of
observed variables that is independent
(orthogonal) of other components - The first component accounts for the largest
amount of variance in the input data the second
component accounts for the largest amount or the
remaining variance - Components are orthogonal means they are
uncorrelated
36Factor Analysis-6 (Principle components)
- Possible application of principle components
- E.g. in a survey research, it is common to have
many questions to address one issue (e.g.
customer service). It is likely that these
questions are highly correlated. It is
problematic to use these variables in some
statistical procedures (e.g. regression). One can
use factor scores, computed from factor loadings
on each orthogonal component
37Factor Analysis-7 (Principle components)
- Principle component vs. other extract methods
- Principle component focus on accounting for the
maximum among of variance (the diagonal of a
correlation matrix) - Other extract methods (e.g. principle axis
factoring) focus more on accounting for the
correlations between variables (off diagonal
correlations) - Principle component can be defined as a unique
combination of variables but the other factor
methods can not - Principle component are use for data reduction
but more difficult to interpret
38Factor Analysis-8
- Number of factors
- Eigenvalues are often used to determine how many
factors to take - Take as many factors there are eigenvalues
greater than 1 - Eigenvalue represents the amount of standardized
variance in the variable accounted for by a
factor - The amount of standardized variance in a variable
is 1 - The sum of eigenvalues is the percentage of
variance accounted for
39Factor Analysis-9
- Rotation
- Objective to facilitate interpretation
- Orthogonal rotation done when data reduction is
the objective and factors need to be orthogonal - Varimax attempts to simplify interpretation by
maximize the variances of the variable loadings
on each factor - Quartimax simplify solution by finding a
rotation that produces high and low loadings
across factors for each variable - Oblique rotation use when there are reason to
allow factors to be correlated - Oblimin and Promax (promax runs fast)
40Factor Analysis-10
- Factor scores if you are satisfy with a factor
solution - You can request that a new set of variables be
created that represents the scores of each
observation on the factor (difficult of
interpret) - You can use the lambda coefficient to judge which
variables are highly related to the factor the
compute the sum of the mean of this variables for
further analysis (easy to interpret)
41Factor Analysis-11
- Sample size the sample size should be about 10
to 15 times the number of variables (as other
multivariate procedures) - Number of methods there are 8 factoring methods,
including principle component - Principle axis account for correlations between
the variables - Unweighted least-squares minimize the residual
between the observed and the reproduced
correlation matrix
42Factor Analysis-12
- Generalize least-squares similar to Unweighted
least-squares but give more weight to the
variables with stronger correlation - Maximum Likelihood generate the solution that is
the most likely to produce the correlation matrix - Alpha Factoring Consider variables as a sample
not using factor loadings - Image factoring decompose the variables into a
common part and a unique part, then work with the
common part
43Factor Analysis-13
- Recommendations
- Principle components and principle axis are the
most common used methods - When there are multicollinearity, use principle
components - Rotations are often done. Try to use Varimax
44Reference
- Factor Analysis from SPSS
- Much of the wording comes from the SPSS help and
tutorial.
45Factor Analysis
- Factor Analysis is primarily used for data
reduction or structure detection. - The purpose of data reduction is to remove
redundant (highly correlated) variables from the
data file, perhaps replacing the entire data file
with a smaller number of uncorrelated variables. - The purpose of structure detection is to examine
the underlying (or latent) relationships between
the variables.
46Factor Analysis
- The Factor Analysis procedure has several
extraction methods for constructing a solution. - For Data Reduction. The principal components
method of extraction begins by finding a linear
combination of variables (a component) that
accounts for as much variation in the original
variables as possible. It then finds another
component that accounts for as much of the
remaining variation as possible and is
uncorrelated with the previous component,
continuing in this way until there are as many
components as original variables. Usually, a few
components will account for most of the
variation, and these components can be used to
replace the original variables. This method is
most often used to reduce the number of variables
in the data file. - For Structure Detection. Other Factor Analysis
extraction methods go one step further by adding
the assumption that some of the variability in
the data cannot be explained by the components
(usually called factors in other extraction
methods). As a result, the total variance
explained by the solution is smaller however,
the addition of this structure to the factor
model makes these methods ideal for examining
relationships between the variables. - With any extraction method, the two questions
that a good solution should try to answer are
"How many components (factors) are needed to
represent the variables?" and "What do these
components represent?"
47Factor Analysis Data Reduction
- An industry analyst would like to predict
automobile sales from a set of predictors.
However, many of the predictors are correlated,
and the analyst fears that this might adversely
affect her results. - This information is contained in the file
car_sales.sav . Use Factor Analysis with
principal components extraction to focus the
analysis on a manageable subset of the
predictors.
48Factor Analysis Structure Detection
- A telecommunications provider wants to better
understand service usage patterns in its customer
database. If services can be clustered by usage,
the company can offer more attractive packages to
its customers. - A random sample from the customer database is
contained in telco.sav . Factor Analysis to
determine the underlying structure in service
usage. - Use Principal Axis Factoring
49Example of Factor Analysis Structure Detection
Telecommunications provider wants to better
understand service usage patterns in its customer
database. Selecting service offerings
50Example of Factor Analysis Descriptives
Click descriptives Recommend checking Initial
Solution (default) In addition, check
Anti-image and KMO and .
51Example of Factor Analysis Extraction
Click Extraction Select Method Principal axis
factoring. Recommend Keep defaults but also
check Scree plot.
52Example of Factor Analysis Rotation
Click Rotation Select Varimax and Loading
plot(s).
53Understanding the Output
The Kaiser-Meyer-Olkin Measure of Sampling
Adequacy is a statistic that indicates the
proportion of variance in your variables that
might be caused by underlying factors. Perhaps
cant use factor analys if lt0.5
Bartlett's test of sphericity tests the
hypothesis that your correlation matrix is an
identity matrix, which would indicate that your
variables are unrelated and therefore unsuitable
for structure detection. Sig. lt0.05 than factor
analysis may be helpful.
54Understanding the Output
Extraction communalities are estimates of the
variance in each variable accounted for by the
factors in the factor solution. Small values
indicate variables that do not fit well with the
factor solution, and should possibly be dropped
from the analysis. The lower values of Multiple
lines and Calling card show that they don't fit
as well as the others.
55Understanding the Output
Before rotation
Only three factors in the initial solution have
eigenvalues greater than 1. Together, they
account for almost 65 of the variability in the
original variables. This suggests that three
latent influences are associated with service
usage, but there remains room for a lot of
unexplained variation.
56Understanding the Output
After rotation
From rotation approximately now 56 of the
variation is explained about a 10 loss in
explanation of the variation.
57Understanding the Output
In general, there are a lot of services that have
correlations greater than 0.2 with multiple
factors, which muddies the picture. The rotated
factor matrix should clear this up.
Before rotation
The relationships in the unrotated factor matrix
are somewhat clear. The third factor is
associated with Long distance last month. The
second corresponds most strongly to Equipment
last month, Internet, and Electronic billing. The
first factor is associated with Toll free last
month, Wireless last month, Voice mail, Paging
service, Caller ID, Call waiting, Call
forwarding, and 3-way calling.
58Understanding the Output
After rotation
The first rotated factor is most highly
correlated with Toll free last month, Caller ID,
Call waiting, Call forwarding, and 3-way calling.
These variables are not particularly correlated
with the other two factors. The second factor is
most highly correlated with Equipment last month,
Internet, and Electronic billing. The third
factor is largely unaffected by the rotation.
59Understanding the Output
Thus, there are three major groupings of
services, as defined by the services that are
most highly correlated with the three factors.
Given these groupings, you can make the following
observations about the remaining services
Because of their moderately large correlations
with both the first and second factors, Wireless
last month, Voice mail, and Paging service bridge
the "Extras" and "Tech" groups. Calling card last
month is moderately correlated with the first and
third factors, thus it bridges the "Extras" and
"Long Distance" groups. Multiple lines is
moderately correlated with the second and third
factors, thus it bridges the "Tech" and "Long
Distance" groups. This suggests avenues for
cross-selling. For example, customers who
subscribe to extra services may be more
predisposed to accepting special offers on
wireless services than Internet services.
60Summary What Was Learned
- Using a principal axis factors extraction, you
have uncovered three latent factors that describe
relationships between your variables. These
factors suggest various patterns of service
usage, which you can use to more efficiently
increase cross-sales.
61Using Principal Components
- Principal Components can aid in clustering.
- What is principal components?
- Principal is a statistical technique that creates
new variables that are linear functions of the
old variables. The main goal of principal
components is to to reduce the number of
variables needed to analyze.
62Principal Components Analysis (PCA)
- What it is and when it should be used.
63Introduction to PCA
- What does principal components analysis do?
- Takes a set of correlated variables and creates a
smaller set of uncorrelated variables. - These newly created variables are called
principal components. - There are two main objectives for using PCA
- Reduce the dimensionality of the data.
- In simple English turn p variables into less
than p variables. - While reducing the number of variables we attempt
to keep as much information of the original
variables as possible. - Thus we try to reduce the number of variables
without loss of information. - Identify new meaningful underlying variables.
- This is often not possible.
- The principal components created are linear
combinations of the original variables and often
dont lend to any meaning beyond that. - There are several reasons why and situations
where PCA is useful.
64Introduction to PCA
- There are several reasons why PCA is useful.
- PCA is helpful in discovering if abnormalities
exist in a multivariate dataset. - Clustering (which will be covered later)
- PCA is helpful when it is desirable to classify
units into groups with similar attributes. - For example In marketing you may want to
classify your customers into groups (or clusters)
with similar attributes for marketing purposes. - It can also be helpful for verifying the clusters
created when clustering. - Discriminant analysis
- In some cases there may be more response
variables than independent variables. It is not
possible to use discriminant analysis in this
case. - Principal components can help reduce the number
of response variables to a number less than that
of the independent variables. - Regression
- It can help address the issue of multicolinearity
in the independent variables.
65Introduction to PCA
- Formation of principal components
- They are uncorrelated
- The 1st principal component accounts for as much
of the variability in the data as possible. - The 2nd principal component accounts for as much
of the remaining variability as possible. - The 3rd
- Etc.
66Principal Components and Least Squares
- Think of the Least Squares model
- Eigenvector ltmathematicsgt A vector which, when
acted on by a particular linear transformations,
produces a scalar multiple of theoriginal
vector. The scalar in question is called
theeigenvalue corresponding to this eigenvector.
- www.dictionary.com
67Calculation of the PCA
- There are two options
- Correlation matrix.
- Covariance matrix.
- Using the covariance matrix will cause variables
with large variances to be more strongly
associated with components with large eigenvalues
and the opposite is true of variables with small
variances. - For the above reason you should use the
correlation matrix unless the variables are
comparable or have been standardized.
68Limitations to Principal Components
- PCA converts a set of correlated variables into a
smaller set of uncorrelated variables. - If the variables are already uncorrelated than
PCA has nothing to add. - Often it is difficult to impossible to explain a
principal component. That is often principal
components do not lend themselves to any meaning.
69SAS Example of PCA
- We will analyze data on crime.
- CRIME RATES PER 100,000 POPULATION BY STATE.
- The variables are
- MURDER
- RAPE
- ROBBERY
- ASSAULT
- BURGLARY
- LARCENY
- AUTO
- SAS CODE
- PROC PRINCOMP DATACRIME OUTCRIMCOMP
- run
SAS command for PCA
The dataset is CRIME and results will be saved to
CRIMCOMP
70SAS Output Of Crime Example
71More SAS Output Of Crime Example
0.097983420.22203947 - 0.12045606
The first two principal components captures
76.48 of the variation.
If you include 6 of the 7 principal components
you capture 98.23 of the variability. The 7th
component only captures 1.77.
The proportion of variability explained by each
principal component individually. This value
equals the Eigenvalue/(sum of the Eigenvalues).
72More SAS Output Of Crime Example
Prin1 has all positive values. This variable can
be used as a proxy for overall crime rate.
Prin2 has positive and negative values. Murder,
Rape, and Assault are all negative (Violent
Crimes). Robbery, Burglary, Larceny, and Auto are
all positive (Property). This variable can be
used for an understanding of Property vs. Violent
crime.
73CRIME RATES PER 100,000 POPULATION BY
STATESTATES LISTED IN ORDER OF OVERALL CRIME
RATE AS DETERMINED BY THE FIRST PRINCIPAL
COMPONENTLowest 10 States and Then theTop 10
States
74CRIME RATES PER 100,000 POPULATION BY
STATE.STATES LISTED IN ORDER OF PROPERTY VS.
VIOLENT CRIME AS DETERMINED BY THE SECOND
PRINCIPAL COMPONENTLowest 10 States and Then
theTop 10 States
75Correlation From SAS First the Descriptive
Statistics (A part of the output from
Correlation)
76Correlation Matrix
77Correlation Matrix Just the Variables
Note that there is correlation among the crime
rates.
78Correlation Matrix Just the Principal Components
Note that there is no correlation among the
principal components.
79Correlation Matrix Just the Principal Components
Note the higher/very high correlations with the
1st few principal components and it decreases as
it goes closer to the last principal component.
80What If We Told SAS to Produce Only 2 Principal
Components?
The 2 principal components produced when it is
asked to produce only 2 principal components are
exactly the same for when it produced all.