Principal Components Analysis - PowerPoint PPT Presentation

1 / 68
About This Presentation
Title:

Principal Components Analysis

Description:

A. The Basic Principle We wish to explain/summarize the underlying variance-covariance structure of a large set of variables through a few linear combinations of ... – PowerPoint PPT presentation

Number of Views:136
Avg rating:3.0/5.0
Slides: 69
Provided by: Preferred99
Category:

less

Transcript and Presenter's Notes

Title: Principal Components Analysis


1
VI. Principal Components Analysis
A. The Basic Principle We wish to
explain/summarize the underlying
variance-covariance structure of a large set of
variables through a few linear combinations of
these variables. The objectives of principal
components analysis are - data reduction
- interpretation The results of principal
components analysis are often used as inputs to
- regression analysis - cluster analysis
2
B. Population Principal Components Suppose we
have a population measured on p random variables
X1,,Xp. Note that these random variables
represent the p-axes of the Cartesian coordinate
system in which the population resides. Our goal
is to develop a new set of p axes (linear
combinations of the original p axes) in the
directions of greatest variability
X2
X1
This is accomplished by rotating the axes.
3
Consider our random vector
with covariance matrix S and eigenvalues l1 ? l2
? ? ? lp. We can construct p linear combinations

4
It is easy to show that
The principal components are those uncorrelated
linear combinations Y1,,Yp whose variances are
as large as possible. Thus the first principal
component is the linear combination of maximum
variance, i.e., we wish to solve the nonlinear
optimization problem
source of nonlinearity
restrict to coefficient vectors of unit length
5
The second principal component is the linear
combination of maximum variance that is
uncorrelated with the first principal component,
i.e., we wish to solve the nonlinear optimization
problem
restricts covariance to zero
The third principal component is the solution to
the nonlinear optimization problem
restricts covariances to zero
6
Generally, the ith principal component is the
linear combination of maximum variance that is
uncorrelated with all previous principal
components, i.e., we wish to solve the nonlinear
optimization problem
We can show that, for random vector X with
covariance matrix S and eigenvalues l1 ? l2 ? ? ?
lp ? 0, the ith principal component is given by


Note that the principal components are not
unique if some eigenvalues are equal.
7
We can also show for random vector X with
covariance matrix S and eigenvalue-eigenvector
pairs (l1 , e1), , (lp , ep) where l1 ? l2 ? ? ?
lp,




so we can assess how well a subset of the
principal components Yi summarizes the original
random variables Xi one common method of doing
so is
proportion of total population variance due to
the kth principal component
If a large proportion of the total population
variance can be attributed to relatively few
principal components, we can replace the original
p variables with these principal components
without loss of much information!
8
We can also easily find the correlations between
the original random variables Xk and the
principal components Yi
These values are often used in interpreting the
principal components Yi.
9
Example Suppose we have the following
population of four observations made on three
random variables X1, X2, and X3
Find the three population principal components
Y1, Y2, and Y3
10
First we need the covariance matrix S

and the corresponding eigenvalue-eigenvector
pairs
11
so the principal components are
Note that
12
and the proportion of total population variance
due to the each principal component is
Note that the third principal component is
relatively irrelevant!
13
Next we obtain the correlations between the
original random variables Xi and the principal
components Yi
14
(No Transcript)
15
We can display these results in a correlation
matrix
Here we can easily see that - the first
principal component (Y1) is a mixture of all
three random variables (X1, X2, and X3) - the
second principal component (Y2) is a trade-off
between X1 and X3 - the third principal
component (Y3) is a residual of X1
16
When the principal components are derived from
an X Np(m,S) distributed population, the
density of X is constant on the m-centered
ellipsoids




which have axes
where (li,ei) are the eigenvalue-eigenvector
pairs of S.


17
We can set m 0 w.l.g. we can then write

where the are the principal components of
x. Setting and substituting
into the previous expression yields

which defines an ellipsoid (note that li gt 0 ?
i) in a coordinate system with axes y1,,yp lying
in the directions of e1,,ep, respectively.




18
The major axis lies in the direction determined
by the eigenvector ei associated with the largest
eigenvalue li - the remaining minor axes lie in
the directions determined by the other
eigenvectors.

19
Example For the principal components derived
from the following population of four
observations made on three random variables X1,
X2, and X3
plot the major and minor axes.
20
We will need the centroid m
The direction of the major axis is given by
while the directions of the two minor axis are
given by
21
We first graph the centroid
X2
X1
3.0,10.0,15.0
X3
22
then use the first eigenvector to find a second
point on the first principal axis
X2
Y1
X1
X3
The line connecting these two points is the Y1
axis.
23
then do the same thing with the second
eigenvector
Y2
X2
Y1
X1
X3
The line connecting these two points is the Y2
axis.
24
and do the same thing with the third
eigenvector
Y2
X2
Y1
X1
Y3
X3
The line connecting these two points is the Y3
axis.
25
What we have done is a rotation
X2
X1
X3
26
and a translation in p 3 dimensions.
Y2
X2
Note that the rotated axes remain orthogonal!
X1
X3
27
Note that we can also construct principal
components for the standardized variables Zi
which in matrix notation is
where V1/2 is the diagonal standard deviation
matrix. Obviously

28
This suggests that the principal components for
the standardized variables Zi may be obtained
from the eigenvectors of the correlation matrix
r! The operations are analogous to those used in
conjunction with the covariance matrix.

We can show that, for random vector Z of
standardized variables with covariance matrix r
and eigenvalues l1 ? l2 ? ? ? lp ? 0, the ith
principal component is given by


Note again that the principal components are not
unique if some eigenvalues are equal.
29
We can also show for random vector Z with
covariance matrix r and eigenvalue-eigenvector
pairs (l1 , e1), , (lp , ep) where l1 ? l2 ? ? ?
lp,




and we can again assess how well a subset of the
principal components Yi summarizes the original
random variables Xi by using
proportion of total population variance due to
the kth principal component
If a large proportion of the total population
variance can be attributed to relatively few
principal components, we can replace the original
p variables with these principal components
without loss of much information!
30
Example Suppose we have the following
population of four observations made on three
random variables X1, X2, and X3
Find the three population principal components
variables Y1, Y2, and Y3 for the standardized
random variables Z1, Z2, and Z3
31
We could standardize the variables X1, X2, and
X3, then work with the resulting covariance
matrix S, but it is much easier to proceed
directly with correlation matrix r


and the corresponding eigenvalue-eigenvector
pairs
These results differ from the covariance- based
principal components!
32
so the principal components are
Note that
33
and the proportion of total population variance
due to the each principal component is
Note that the third principal component is again
relatively irrelevant!
34
Next we obtain the correlations between the
original random variables Xi and the principal
components Yi
35
(No Transcript)
36
We can display these results in a correlation
matrix
Here we can easily see that - the first
principal component (Y1) is a mixture of all
three random variables (X1, X2, and X3) - the
second principal component (Y2) is a trade-off
between X1 and X3 - the third principal
component (Y3) is a trade-off between X1 and X2
37
SAS code for Principal Components Analysis
OPTIONS LINESIZE72 NODATE PAGENO1 DATA
stuff INPUT x1 x2 x3 LABEL x1'Random Variable
1' x2'Random Variable 2' x3'Random
Variable 3' CARDS 1.0 6.0 9.0 4.0 12.0
10.0 3.0 12.0 15.0 4.0 10.0 12.0 PROC PRINCOMP
DATAstuff OUTpcstuff N3 VAR x1 x2
x3 RUN PROC CORR DATApcstuff VAR x1 x2
x3 WITH prin1 prin2 prin3 RUN PROC FACTOR
DATAstuff SCREE VAR x1 x2 x3 RUN
Note that the SAS default is to use the
correlation matrix to perform this analysis!
38
SAS output for Principal Components Analysis
The PRINCOMP Procedure
Observations 4
Variables 3
Simple Statistics
x1 x2
x3 Mean 3.000000000
10.00000000 11.50000000 StD
1.414213562 2.82842712 2.64575131
Correlation Matrix
x1 x2
x3 x1 Random Variable 1
1.0000 0.8333 0.3563 x2
Random Variable 2 0.8333 1.0000
0.6236 x3 Random Variable 3 0.3563
0.6236 1.0000
Eigenvalues of the Correlation Matrix
Eigenvalue Difference Proportion
Cumulative 1 2.22945702 1.56733894
0.7432 0.7432 2
0.66211808 0.55369318 0.2207
0.9639 3 0.10842490
0.0361 1.0000
Eigenvectors
Prin1 Prin2 Prin3 x1
Random Variable 1 0.581128 -0.562643
0.587982 x2 Random Variable 2
0.645363 -0.121542 -0.754145 x3
Random Variable 3 0.495779 0.817717
0.292477
39
SAS output for Correlation Matrix Original
Random Variables vs. Principal Components
The CORR Procedure
3 With Variables Prin1 Prin2
Prin3 3 Variables x1
x2 x3 Simple
Statistics Variable N Mean
Std Dev Sum Minimum Maximum
Prin1 4 0 1.49314
0 -2.20299 1.11219 Prin2 4
0 0.81371 0
-0.94739 0.99579 Prin3 4
0 0.32928 0 -0.28331
0.47104 x1 4 3.00000
1.41421 12.00000 1.00000 4.00000 x2
4 10.00000 2.82843
40.00000 6.00000 12.00000 x3
4 11.50000 2.64575 46.00000
9.00000 15.00000 Pearson
Correlation Coefficients, N 4
Prob gt r under H0 Rho0
x1 x2 x3
Prin1 0.86770 0.96362
0.74027 0.1323
0.0364 0.2597 Prin2
-0.45783 -0.09890 0.66538
0.5422 0.9011 0.3346
Prin3 0.19361 -0.24832
0.09631 0.8064
0.7517 0.9037
40
SAS output for Factor Analysis
PRINCIPAL COMPONENTS
ANALYSIS FOR QA
610 SPRING QUARTER
2001 Using PROC FACTOR to obtain a Scree Plot for
Principal Components Analysis
The FACTOR Procedure Initial
Factor Method Principal Components
Prior Communality Estimates ONE
Eigenvalues of the Correlation Matrix Total 3
Average 1 Eigenvalue
Difference Proportion Cumulative
1 2.22945702 1.56733894 0.7432
0.7432 2 0.66211808 0.55369318
0.2207 0.9639 3 0.10842490
0.0361 1.0000
1 factor will be retained by the MINEIGEN
criterion.
Note that this is consistent with the results
from PCA
41
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method Principal
Components Scree Plot of Eigenvalues
2.5 ˆ
1
2.0 ˆ E i
g e 1.5 ˆ n v a l
u e 1.0 ˆ s
2
0.5 ˆ

3 0.0 ˆ
Šƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆ
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒ 0
1 2 3
Number
42
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method Principal
Components Factor
Pattern
Factor1 x1 Random
Variable 1 0.86770 x2
Random Variable 2 0.96362
x3 Random Variable 3 0.74027
Variance Explained by Each Factor
Factor1
2.2294570
Final Communality Estimates Total 2.229457
x1 x2
x3 0.75291032 0.92855392
0.54799278
Pearson Correlation Coefficients for the first
principal component with the three original
variables X1, X2, and X3
First eigenvalue l1
43
SAS code for Principal Components Analysis
OPTIONS LINESIZE72 NODATE PAGENO1 DATA
stuff INPUT x1 x2 x3 LABEL x1'Random Variable
1' x2'Random Variable 2' x3'Random
Variable 3' CARDS 1.0 6.0 9.0 4.0 12.0
10.0 3.0 12.0 15.0 4.0 10.0 12.0 PROC PRINCOMP
DATAstuff OUTpcstuff N3 COV VAR x1 x2
x3 RUN PROC CORR DATApcstuff VAR x1 x2
x3 WITH prin1 prin2 prin3 RUN PROC FACTOR
DATAstuff SCREE COV VAR x1 x2 x3 RUN
Note that here we use SAS to derive the
covariance matrix based principal components!
44
SAS output for Principal Components Analysis
The PRINCOMP Procedure
Observations 4
Variables 3
Simple Statistics
x1 x2
x3 Mean 3.000000000
10.00000000 11.50000000 StD
1.414213562 2.82842712 2.64575131
Covariance Matrix
x1 x2
x3 x1 Random Variable 1
2.000000000 3.333333333 1.333333333 x2
Random Variable 2 3.333333333 8.000000000
4.666666667 x3 Random Variable 3
1.333333333 4.666666667 7.000000000
Total Variance 17
Eigenvalues of the Covariance Matrix
Eigenvalue Difference
Proportion Cumulative 1 13.2193960
9.8400643 0.7776 0.7776
2 3.3793317 2.9780594 0.1988
0.9764 3 0.4012723
0.0236 1.0000
Eigenvectors
Prin1 Prin2 Prin3 x1
Random Variable 1 0.291038 0.415039
0.861998 x2 Random Variable 2
0.734249 0.480716 -.479364 x3
Random Variable 3 0.613331 -.772434
0.164835
45
SAS output for Correlation Matrix Original
Random Variables vs. Principal Components
The CORR Procedure
3 With Variables Prin1 Prin2
Prin3 3 Variables x1
x2 x3 Simple
Statistics Variable N Mean
Std Dev Sum Minimum Maximum
Prin1 4 0 3.63585
0 -5.05240 3.61516 Prin2 4
0 1.83830 0
-1.74209 2.53512 Prin3 4
0 0.63346 0 -0.38181
0.94442 x1 4 3.00000
1.41421 12.00000 1.00000 4.00000 x2
4 10.00000 2.82843
40.00000 6.00000 12.00000 x3
4 11.50000 2.64575 46.00000
9.00000 15.00000 Pearson
Correlation Coefficients, N 4
Prob gt r under H0 Rho0
x1 x2 x3
Prin1 0.74824 0.94385
0.84285 0.2518
0.0561 0.1571 Prin2
0.53950 0.31243 -0.53670
0.4605 0.6876 0.4633
Prin3 0.38611 -0.10736
0.03947 0.6139
0.8926 0.9605
46
SAS output for Factor Analysis
PRINCIPAL COMPONENTS
ANALYSIS FOR QA
610 SPRING QUARTER
2001 Using PROC FACTOR to obtain a Scree Plot for
Principal Components Analysis
The FACTOR Procedure Initial
Factor Method Principal Components
Prior Communality Estimates ONE
Eigenvalues of the Covariance Matrix Total 17
Average 5.66666667 Eigenvalue
Difference Proportion Cumulative
1 13.2193960 9.8400643 0.7776
0.7776 2 3.3793317 2.9780594
0.1988 0.9764 3 0.4012723
0.0236 1.0000
1 factor will be retained by the MINEIGEN
criterion.
Note that this is consistent with the results
from PCA
47
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method Principal
Components Scree Plot of Eigenvalues
14 ˆ
1 12 ˆ
10 ˆ E i g e 8 ˆ n
v a l u 6 ˆ e s
4 ˆ
2 2 ˆ

3 0 ˆ
Šƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒ
ƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒ 0
1 2
3 Number
48
SAS output for Factor Analysis
The FACTOR Procedure
Initial Factor Method Principal
Components Factor
Pattern
Factor1 x1 Random
Variable 1 0.74824 x2
Random Variable 2 0.94385
x3 Random Variable 3 0.84285
Variance Explained by Each Factor
Factor Weighted
Unweighted Factor1
13.2193960 2.16112149 Final
Communality Estimates and Variable Weights
Total Communality Weighted 13.219396
Unweighted 2.161121 Variable
Communality Weight x1
0.55986257 2.00000000
x2 0.89085847 8.00000000
x3 0.71040045 7.00000000
Pearson Correlation Coefficients for the first
principal component with the three original
variables X1, X2, and X3
First eigenvalue l1
49
Covariance matrices with special structures
yield particularly interesting principal
components - Diagonal covariance matrices
suppose S is the diagonal matrix

since the eigenvector ei has a value of 1 in the
ith position and 0 in all other positions, we have

so (sii,ei) is the ith eigenvalue-eigenvecotr pair
50
so the linear combination
demonstrates that the set of principal
components and the original set of (uncorrelated)
random variables are the same! Note that this
result is also true if we work with the
correlation matrix.
51
- constant variances and covariance matrices
suppose S is the patterned matrix

Here the resulting correlation matrix
is also the covariance matrix of the
standardized variables Z Here the resulting
correlation matrix

52
C. Using Principal Components to Summarize Sample
Variation Suppose the data x1,,xn represent n
independent observations from a p-dimensional
population with some mean vector m and covariance
matrix S these data yield a sample mean vector
x, sample covariance matrix S, and sample
correlation matrix R. As in the population
case, our goal is to develop a new set of p axes
(linear combinations of the original p axes) in
the directions of greatest variability



_



53
Again it is easy to show that the linear
combinations
have sample means and
The principal components are those uncorrelated
linear combinations y1,,yp whose variances are
as large as possible. Thus the first principal
component is the linear combination of maximum
sample variance, i.e., we wish to solve the
nonlinear optimization problem


source of nonlinearity
restrict to coefficient vectors of unit length
54
The second principal component is the linear
combination of maximum sample variance that is
uncorrelated with the first principal component,
i.e., we wish to solve the nonlinear optimization
problem
restricts covariance to zero
The third principal component is the solution to
the nonlinear optimization problem
restricts covariances to zero
55
Generally, the ith principal component is the
linear combination of maximum sample variance
that is uncorrelated with all previous principal
components, i.e., we wish to solve the nonlinear
optimization problem
We can show that, for random sample X with
sample covariance matrix S and eigenvalues l1 ?
l2 ? ? ? lp ? 0, the ith sample principal
component is given by



Note that the principal components are not
unique if some eigenvalues are equal.
56
We can also show for random sample X with sample
covariance matrix S and eigenvalue-eigenvector
pairs (l1 , e1), , (lp , ep) where l1 ? l2 ? ? ?
lp,









so we can assess how well a subset of the
principal components yi summarizes the original
random sample X one common method of doing so is

proportion of total sample variance due to the
kth principal component
If a large proportion of the total sample
variance can be attributed to relatively few
principal components, we can replace the original
p variables with these principal components
without loss of much information!
57
We can also easily find the correlations between
the original random variables xk and the
principal components yi
These values are often used in interpreting the
principal components yi.
58
Note that - the approach for standardized
data (i.e., principal components derived from the
sample correlation matrix R) is analogous to the
population approach - when principal
components are derived from sample data, the
sample data are frequently centered,

which has no effect on the sample covariance
matrix S and yields the derived principal
components

Under these circumstances, the mean value of the
ith principal component associated with all n
observations in the data set is
59
Example Suppose we have the following sample of
four observations made on three random variables
X1, X2, and X3
Find the three sample principal components y1,
y2, and y3 based on the sample covariance matrix
S

60
First we need the sample covariance matrix S

and the corresponding eigenvalue-eigenvector
pairs
61
so the principal components are
Note that
62
and the proportion of total population variance
due to the each principal component is
Note that the third principal component is
relatively irrelevant!
63
Next we obtain the correlations between the
observed values xi of the original random
variables and the sample principal components yik
64
(No Transcript)
65
We can display these results in a correlation
matrix
How would we interpret these results? Note
that results based on the sample correlation
matrix R will not differ from results based on
the population correlation matrix r (why?).


66
SAS code for Principal Components Analysis
OPTIONS LINESIZE72 NODATE PAGENO1 DATA
stuff INPUT x1 x2 x3 LABEL x1'Random Variable
1' x2'Random Variable 2' x3'Random
Variable 3' CARDS 1.0 6.0 9.0 4.0 12.0
10.0 3.0 12.0 15.0 4.0 10.0 12.0 PROC PRINCOMP
DATAstuff COV OUTpcstuff VAR x1 x2 x3 TITLE4
'Using PROC PRINCOMP for Principal Components
Analysis' RUN PROC CORR DATApcstuff VAR x1 x2
x3 WITH prin1 prin2 prin3 run
used to instruct SAS to perform the principal
components analysis on the sample covariance
rather than the default (correlation matrix)!
67
SAS output for Principal Components Analysis
The PRINCOMP Procedure
Observations 4
Variables 3
Simple Statistics
x1 x2
x3 Mean 3.000000000
10.00000000 11.50000000 StD
1.414213562 2.82842712 2.64575131
Covariance Matrix
x1 x2
x3 x1 Random Variable 1
2.000000000 3.333333333 1.333333333 x2
Random Variable 2 3.333333333 8.000000000
4.666666667 x3 Random Variable 3
1.333333333 4.666666667 7.000000000
Total Variance 17
Eigenvalues of the Covariance Matrix
Eigenvalue Difference
Proportion Cumulative 1 13.2193960
9.8400643 0.7776 0.7776
2 3.3793317 2.9780594 0.1988
0.9764 3 0.4012723
0.0236 1.0000
Eigenvectors
Prin1 Prin2 Prin3 x1
Random Variable 1 0.291038 0.415039
0.861998 x2 Random Variable 2
0.734249 0.480716 -0.479364 x3
Random Variable 3 0.613331 -0.772434
0.164835
68
SAS output for Correlation Matrix Original
Random Variables vs. Principal Components
The CORR Procedure
3 With Variables Prin1 Prin2
Prin3 3 Variables x1
x2 x3 Simple
Statistics Variable N Mean
Std Dev Sum Minimum Maximum
Prin1 4 0 1.49314
0 -2.20299 1.11219 Prin2 4
0 0.81371 0
-0.94739 0.99579 Prin3 4
0 0.32928 0 -0.28331
0.47104 x1 4 3.00000
1.41421 12.00000 1.00000 4.00000 x2
4 10.00000 2.82843
40.00000 6.00000 12.00000 x3
4 11.50000 2.64575 46.00000
9.00000 15.00000 Pearson
Correlation Coefficients, N 4
Prob gt r under H0 Rho0
x1 x2 x3
Prin1 0.86770 0.96362
0.74027 0.1323
0.0364 0.2597 Prin2
-0.45783 -0.09890 0.66538
0.5422 0.9011 0.3346
Prin3 0.19361 -0.24832
0.09631 0.8064
0.7517 0.9037
Write a Comment
User Comments (0)
About PowerShow.com