Factor Analysis and Principal Components - PowerPoint PPT Presentation

1 / 80
About This Presentation
Title:

Factor Analysis and Principal Components

Description:

Principal components is about explaining the variance-covariance ... Recommend Keep defaults but also check 'Scree plot'. Example of Factor Analysis: Rotation ... – PowerPoint PPT presentation

Number of Views:576
Avg rating:3.0/5.0
Slides: 81
Provided by: asNi7
Category:

less

Transcript and Presenter's Notes

Title: Factor Analysis and Principal Components


1
Factor Analysis and Principal Components
  • Factor analysis with principal components
    presented as a subset of factor analysis
    techniques, which it is subset.

2
Principal Components (PC)
  • Principal components is about explaining the
    variance-covariance
  • structure, ?, of a set of variables through a
    few linear combinations
  • of these variables.
  • In general PC is used for either
  • Data reduction
  • or
  • Interpretation

3
So the original data of n measurements on p
variables can be reduced to a to a data set of
n measurements on k principal components. PC
tends to be a means to an end but not the end
itself. That is PC is often not the final
step. The PC may be used for multiple
regression, cluster analysis, etc.
4
(No Transcript)
5
Find the principal components and the proportion
of the total population of the total population
variance explained by each when the covariance
matrix is
To solve this you will have to go though your
notes, but you can do this even though didnt
give you the formula.
6
(No Transcript)
7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
(No Transcript)
15
(No Transcript)
16
(No Transcript)
17
Factor Analysis The main purpose of factor
analysis is to try to describe the covariance
relationships among many variables in term of a
few underlying, but unobservable, random
quantities called factors. The Orthogonal
Factor Model The observable random vector x,
with p components, has mean ? and covariance
matrix, ?. The factor model proposes that x is
linearly dependent upon a few unobservable
random variables, F1, F2 ,.,Fm , called common
factors and p additional sources of variation
?1, ?2 ,.,?p called error, or specific
factors.
18
(No Transcript)
19
(No Transcript)
20
(No Transcript)
21
(No Transcript)
22
(No Transcript)
23
(No Transcript)
24
(No Transcript)
25
(No Transcript)
26
Maximum Likelihood Method for Factor
Analysis If the common factors, F, and ?
can be assumed to be normally distributed, then
ML estimates of the factor loadings and ?
variance may be obtained.
27
(No Transcript)
28
Factor Rotation All factor loadings
obtained from the initial loadings by a
orthogonal transformation have the same ability
to reproduce the covariance (or correlation)
matrix.
29
(No Transcript)
30
A Reference
  • The following 13 slides comes from
  • Multivariate Data Analysis Using SPSS
  • By John Zhang
  • ARL, IUP

31
Factor Analysis-1
  • The main goal of factor analysis is data
    reduction. A typical use of factor analysis is in
    survey research, where a researcher wishes to
    represent a number of questions with a smaller
    number of factors
  • Two questions in factor analysis
  • How many factors are there and what they
    represent (interpretation)
  • Two technical aids
  • Eigenvalues
  • Percentage of variance accounted for

32
Factor Analysis-2
  • Two types of factor analysis
  • Exploratory introduce here
  • Confirmatory SPSS AMOS
  • Theoretical basis
  • Correlations among variables are explained by
    underlying factors
  • An example of mathematical 1 factor model for two
    variables
  • V1L1F1E1
  • V2L2F1E2

33
Factor Analysis-3
  • Each variable is composed of a common factor (F1)
    multiply by a loading coefficient (L1, L2 the
    lambdas or factor loadings) plus a random
    component
  • V1 and V2 correlate because the common factor and
    should relate to the factor loadings, thus, the
    factor loadings can be estimated by the
    correlations
  • A set of correlations can derive different factor
    loadings (i.e. the solutions are not unique)
  • One should pick the simplest solution

34
Factor Analysis-4
That is the findings should not differ by
methodology of analysis nor by sample
  • A factor solution needs to confirm
  • By a different factor method
  • By a different sample
  • More on terminology
  • Factor loading interpreted as the Pearson
    correlation between the variable and the factor
  • Communality the proportion of variability for a
    given variable that is explained by the factor
  • Extraction the process by which the factors are
    determined from a large set of variables

35
Factor Analysis-5 (Principle components)
  • Principle component one of the extraction
    methods
  • A principle component is a linear combination of
    observed variables that is independent
    (orthogonal) of other components
  • The first component accounts for the largest
    amount of variance in the input data the second
    component accounts for the largest amount or the
    remaining variance
  • Components are orthogonal means they are
    uncorrelated

36
Factor Analysis-6 (Principle components)
  • Possible application of principle components
  • E.g. in a survey research, it is common to have
    many questions to address one issue (e.g.
    customer service). It is likely that these
    questions are highly correlated. It is
    problematic to use these variables in some
    statistical procedures (e.g. regression). One can
    use factor scores, computed from factor loadings
    on each orthogonal component

37
Factor Analysis-7 (Principle components)
  • Principle component vs. other extract methods
  • Principle component focus on accounting for the
    maximum among of variance (the diagonal of a
    correlation matrix)
  • Other extract methods (e.g. principle axis
    factoring) focus more on accounting for the
    correlations between variables (off diagonal
    correlations)
  • Principle component can be defined as a unique
    combination of variables but the other factor
    methods can not
  • Principle component are use for data reduction
    but more difficult to interpret

38
Factor Analysis-8
  • Number of factors
  • Eigenvalues are often used to determine how many
    factors to take
  • Take as many factors there are eigenvalues
    greater than 1
  • Eigenvalue represents the amount of standardized
    variance in the variable accounted for by a
    factor
  • The amount of standardized variance in a variable
    is 1
  • The sum of eigenvalues is the percentage of
    variance accounted for

39
Factor Analysis-9
  • Rotation
  • Objective to facilitate interpretation
  • Orthogonal rotation done when data reduction is
    the objective and factors need to be orthogonal
  • Varimax attempts to simplify interpretation by
    maximize the variances of the variable loadings
    on each factor
  • Quartimax simplify solution by finding a
    rotation that produces high and low loadings
    across factors for each variable
  • Oblique rotation use when there are reason to
    allow factors to be correlated
  • Oblimin and Promax (promax runs fast)

40
Factor Analysis-10
  • Factor scores if you are satisfy with a factor
    solution
  • You can request that a new set of variables be
    created that represents the scores of each
    observation on the factor (difficult of
    interpret)
  • You can use the lambda coefficient to judge which
    variables are highly related to the factor the
    compute the sum of the mean of this variables for
    further analysis (easy to interpret)

41
Factor Analysis-11
  • Sample size the sample size should be about 10
    to 15 times the number of variables (as other
    multivariate procedures)
  • Number of methods there are 8 factoring methods,
    including principle component
  • Principle axis account for correlations between
    the variables
  • Unweighted least-squares minimize the residual
    between the observed and the reproduced
    correlation matrix

42
Factor Analysis-12
  • Generalize least-squares similar to Unweighted
    least-squares but give more weight to the
    variables with stronger correlation
  • Maximum Likelihood generate the solution that is
    the most likely to produce the correlation matrix
  • Alpha Factoring Consider variables as a sample
    not using factor loadings
  • Image factoring decompose the variables into a
    common part and a unique part, then work with the
    common part

43
Factor Analysis-13
  • Recommendations
  • Principle components and principle axis are the
    most common used methods
  • When there are multicollinearity, use principle
    components
  • Rotations are often done. Try to use Varimax

44
Reference
  • Factor Analysis from SPSS
  • Much of the wording comes from the SPSS help and
    tutorial.

45
Factor Analysis
  • Factor Analysis is primarily used for data
    reduction or structure detection.
  • The purpose of data reduction is to remove
    redundant (highly correlated) variables from the
    data file, perhaps replacing the entire data file
    with a smaller number of uncorrelated variables.
  • The purpose of structure detection is to examine
    the underlying (or latent) relationships between
    the variables.

46
Factor Analysis
  • The Factor Analysis procedure has several
    extraction methods for constructing a solution.
  • For Data Reduction. The principal components
    method of extraction begins by finding a linear
    combination of variables (a component) that
    accounts for as much variation in the original
    variables as possible. It then finds another
    component that accounts for as much of the
    remaining variation as possible and is
    uncorrelated with the previous component,
    continuing in this way until there are as many
    components as original variables. Usually, a few
    components will account for most of the
    variation, and these components can be used to
    replace the original variables. This method is
    most often used to reduce the number of variables
    in the data file.
  • For Structure Detection. Other Factor Analysis
    extraction methods go one step further by adding
    the assumption that some of the variability in
    the data cannot be explained by the components
    (usually called factors in other extraction
    methods). As a result, the total variance
    explained by the solution is smaller however,
    the addition of this structure to the factor
    model makes these methods ideal for examining
    relationships between the variables.
  • With any extraction method, the two questions
    that a good solution should try to answer are
    "How many components (factors) are needed to
    represent the variables?" and "What do these
    components represent?"

47
Factor Analysis Data Reduction
  • An industry analyst would like to predict
    automobile sales from a set of predictors.
    However, many of the predictors are correlated,
    and the analyst fears that this might adversely
    affect her results.
  • This information is contained in the file
    car_sales.sav . Use Factor Analysis with
    principal components extraction to focus the
    analysis on a manageable subset of the
    predictors.

48
Factor Analysis Structure Detection
  • A telecommunications provider wants to better
    understand service usage patterns in its customer
    database. If services can be clustered by usage,
    the company can offer more attractive packages to
    its customers.
  • A random sample from the customer database is
    contained in telco.sav . Factor Analysis to
    determine the underlying structure in service
    usage.
  • Use Principal Axis Factoring

49
Example of Factor Analysis Structure Detection
Telecommunications provider wants to better
understand service usage patterns in its customer
database. Selecting service offerings
50
Example of Factor Analysis Descriptives
Click descriptives Recommend checking Initial
Solution (default) In addition, check
Anti-image and KMO and .
51
Example of Factor Analysis Extraction
Click Extraction Select Method Principal axis
factoring. Recommend Keep defaults but also
check Scree plot.
52
Example of Factor Analysis Rotation
Click Rotation Select Varimax and Loading
plot(s).
53
Understanding the Output
The Kaiser-Meyer-Olkin Measure of Sampling
Adequacy is a statistic that indicates the
proportion of variance in your variables that
might be caused by underlying factors. Perhaps
cant use factor analys if lt0.5
Bartlett's test of sphericity tests the
hypothesis that your correlation matrix is an
identity matrix, which would indicate that your
variables are unrelated and therefore unsuitable
for structure detection. Sig. lt0.05 than factor
analysis may be helpful.
54
Understanding the Output
Extraction communalities are estimates of the
variance in each variable accounted for by the
factors in the factor solution. Small values
indicate variables that do not fit well with the
factor solution, and should possibly be dropped
from the analysis. The lower values of Multiple
lines and Calling card show that they don't fit
as well as the others.
55
Understanding the Output
Before rotation
Only three factors in the initial solution have
eigenvalues greater than 1. Together, they
account for almost 65 of the variability in the
original variables. This suggests that three
latent influences are associated with service
usage, but there remains room for a lot of
unexplained variation.
56
Understanding the Output
After rotation
From rotation approximately now 56 of the
variation is explained about a 10 loss in
explanation of the variation.
57
Understanding the Output
In general, there are a lot of services that have
correlations greater than 0.2 with multiple
factors, which muddies the picture. The rotated
factor matrix should clear this up.
Before rotation
The relationships in the unrotated factor matrix
are somewhat clear. The third factor is
associated with Long distance last month. The
second corresponds most strongly to Equipment
last month, Internet, and Electronic billing. The
first factor is associated with Toll free last
month, Wireless last month, Voice mail, Paging
service, Caller ID, Call waiting, Call
forwarding, and 3-way calling.
58
Understanding the Output
After rotation
The first rotated factor is most highly
correlated with Toll free last month, Caller ID,
Call waiting, Call forwarding, and 3-way calling.
These variables are not particularly correlated
with the other two factors. The second factor is
most highly correlated with Equipment last month,
Internet, and Electronic billing. The third
factor is largely unaffected by the rotation.
59
Understanding the Output
Thus, there are three major groupings of
services, as defined by the services that are
most highly correlated with the three factors.
Given these groupings, you can make the following
observations about the remaining services
Because of their moderately large correlations
with both the first and second factors, Wireless
last month, Voice mail, and Paging service bridge
the "Extras" and "Tech" groups. Calling card last
month is moderately correlated with the first and
third factors, thus it bridges the "Extras" and
"Long Distance" groups. Multiple lines is
moderately correlated with the second and third
factors, thus it bridges the "Tech" and "Long
Distance" groups. This suggests avenues for
cross-selling. For example, customers who
subscribe to extra services may be more
predisposed to accepting special offers on
wireless services than Internet services.
60
Summary What Was Learned
  • Using a principal axis factors extraction, you
    have uncovered three latent factors that describe
    relationships between your variables. These
    factors suggest various patterns of service
    usage, which you can use to more efficiently
    increase cross-sales.

61
Using Principal Components
  • Principal Components can aid in clustering.
  • What is principal components?
  • Principal is a statistical technique that creates
    new variables that are linear functions of the
    old variables. The main goal of principal
    components is to to reduce the number of
    variables needed to analyze.

62
Principal Components Analysis (PCA)
  • What it is and when it should be used.

63
Introduction to PCA
  • What does principal components analysis do?
  • Takes a set of correlated variables and creates a
    smaller set of uncorrelated variables.
  • These newly created variables are called
    principal components.
  • There are two main objectives for using PCA
  • Reduce the dimensionality of the data.
  • In simple English turn p variables into less
    than p variables.
  • While reducing the number of variables we attempt
    to keep as much information of the original
    variables as possible.
  • Thus we try to reduce the number of variables
    without loss of information.
  • Identify new meaningful underlying variables.
  • This is often not possible.
  • The principal components created are linear
    combinations of the original variables and often
    dont lend to any meaning beyond that.
  • There are several reasons why and situations
    where PCA is useful.

64
Introduction to PCA
  • There are several reasons why PCA is useful.
  • PCA is helpful in discovering if abnormalities
    exist in a multivariate dataset.
  • Clustering (which will be covered later)
  • PCA is helpful when it is desirable to classify
    units into groups with similar attributes.
  • For example In marketing you may want to
    classify your customers into groups (or clusters)
    with similar attributes for marketing purposes.
  • It can also be helpful for verifying the clusters
    created when clustering.
  • Discriminant analysis
  • In some cases there may be more response
    variables than independent variables. It is not
    possible to use discriminant analysis in this
    case.
  • Principal components can help reduce the number
    of response variables to a number less than that
    of the independent variables.
  • Regression
  • It can help address the issue of multicolinearity
    in the independent variables.

65
Introduction to PCA
  • Formation of principal components
  • They are uncorrelated
  • The 1st principal component accounts for as much
    of the variability in the data as possible.
  • The 2nd principal component accounts for as much
    of the remaining variability as possible.
  • The 3rd
  • Etc.

66
Principal Components and Least Squares
  • Think of the Least Squares model
  • Eigenvector ltmathematicsgt A vector which, when
    acted on by a particular linear transformations,
    produces a scalar multiple of theoriginal
    vector. The scalar in question is called
    theeigenvalue corresponding to this eigenvector.
  • www.dictionary.com

67
Calculation of the PCA
  • There are two options
  • Correlation matrix.
  • Covariance matrix.
  • Using the covariance matrix will cause variables
    with large variances to be more strongly
    associated with components with large eigenvalues
    and the opposite is true of variables with small
    variances.
  • For the above reason you should use the
    correlation matrix unless the variables are
    comparable or have been standardized.

68
Limitations to Principal Components
  • PCA converts a set of correlated variables into a
    smaller set of uncorrelated variables.
  • If the variables are already uncorrelated than
    PCA has nothing to add.
  • Often it is difficult to impossible to explain a
    principal component. That is often principal
    components do not lend themselves to any meaning.

69
SAS Example of PCA
  • We will analyze data on crime.
  • CRIME RATES PER 100,000 POPULATION BY STATE.
  • The variables are
  • MURDER
  • RAPE
  • ROBBERY
  • ASSAULT
  • BURGLARY
  • LARCENY
  • AUTO
  • SAS CODE
  • PROC PRINCOMP DATACRIME OUTCRIMCOMP
  • run

SAS command for PCA
The dataset is CRIME and results will be saved to
CRIMCOMP
70
SAS Output Of Crime Example
71
More SAS Output Of Crime Example
0.097983420.22203947 - 0.12045606
The first two principal components captures
76.48 of the variation.
If you include 6 of the 7 principal components
you capture 98.23 of the variability. The 7th
component only captures 1.77.
The proportion of variability explained by each
principal component individually. This value
equals the Eigenvalue/(sum of the Eigenvalues).
72
More SAS Output Of Crime Example
Prin1 has all positive values. This variable can
be used as a proxy for overall crime rate.
Prin2 has positive and negative values. Murder,
Rape, and Assault are all negative (Violent
Crimes). Robbery, Burglary, Larceny, and Auto are
all positive (Property). This variable can be
used for an understanding of Property vs. Violent
crime.
73
CRIME RATES PER 100,000 POPULATION BY
STATESTATES LISTED IN ORDER OF OVERALL CRIME
RATE AS DETERMINED BY THE FIRST PRINCIPAL
COMPONENTLowest 10 States and Then theTop 10
States
74
CRIME RATES PER 100,000 POPULATION BY
STATE.STATES LISTED IN ORDER OF PROPERTY VS.
VIOLENT CRIME AS DETERMINED BY THE SECOND
PRINCIPAL COMPONENTLowest 10 States and Then
theTop 10 States
75
Correlation From SAS First the Descriptive
Statistics (A part of the output from
Correlation)
76
Correlation Matrix
77
Correlation Matrix Just the Variables
Note that there is correlation among the crime
rates.
78
Correlation Matrix Just the Principal Components
Note that there is no correlation among the
principal components.
79
Correlation Matrix Just the Principal Components
Note the higher/very high correlations with the
1st few principal components and it decreases as
it goes closer to the last principal component.
80
What If We Told SAS to Produce Only 2 Principal
Components?
The 2 principal components produced when it is
asked to produce only 2 principal components are
exactly the same for when it produced all.
Write a Comment
User Comments (0)
About PowerShow.com