Title: Principal Components Analysis
1Principal Components Analysis
The Basic Principle PCA transforms a set of
correlated variables into a smaller set of
uncorrelated variables called principal
components. Uses - Data screening and
reduction - Discriminant analysis
- Regression analysis - Cluster analysis
2Objectives of PCA
- Discovering the true dimension of the data.
- It may be that p dimensional data can be
represented in q lt p dimensions without losing
much information. - Interpreting the principal components (new
variables).
3Population Principal Components
For a population measured on p random variables
X1,,Xp. our goal is to develop a new set of p
axes (linear combinations of the original p axes)
in the directions of greatest variability
X2
X1
This is accomplished by rotating the axes.
4 Consider our random vector
with covariance matrix S and eigenvalues l1 ? l2
? ? ? lp. We can construct p linear combinations
5 It is easy to show that
The principal components are those uncorrelated
linear combinations Y1,,Yp whose variances are
as large as constraints allow. Thus the first
principal component is the linear combination of
maximum variance
source of nonlinearity
restrict to coefficient vectors of unit length
6 The second principal component is the linear
combination of maximum variance that is
uncorrelated with the first principal component.
restricts covariance to zero
The third principal component is the solution to
the nonlinear optimization problem
restricts covariances to zero
7 The ith principal component is the linear
combination of maximum variance that is
uncorrelated with all previous principal
components
We can show that, for random vector X with
covariance matrix S and eigenvalues l1 ? l2 ? ? ?
lp ? 0, the ith principal component is given by
Note that the principal components are not
unique if some eigenvalues are equal.
8From previous work
A measure of the importance of the kth principal
component is
proportion of total population variance due to
the kth principal component
Goal To have a large proportion of the total
population variance attributed to relatively few
principal components.
9Component Loading Factors
- Elements of aj a1j, a2j,, apj are often
compared to try to interpret the principal
components. - Example, suppose a1j is large positive and a2j
large negative while others near 0. - Can be thought of as measuring the difference
between the 1st and 2nd original variable.
10Component Loading Factors
- To make comparisons, the principal components can
be scaled by their corresponding eigenvector - The cj are called component loading vectors with
length.
11 We can also easily find the correlations between
the original random variables Xk and the
principal components Yi
These values are often used in interpreting the
principal components Yi.
12 Example Suppose we have the following
population of four observations made on three
random variables X1, X2, and X3
Find the three population principal components
Y1, Y2, and Y3
13 First we need the covariance matrix S
and the corresponding eigenvalue-eigenvector
pairs
14 so the principal components are
Note that
15 and the proportion of total population variance
due to the each principal component is
Note that the third principal component is
relatively irrelevant!
16 Next we obtain the correlations between the
original random variables Xi and the principal
components Yi
17(No Transcript)
18 We can display these results in a correlation
matrix
Here we can easily see that - the first
principal component (Y1) is a mixture of all
three random variables (X1, X2, and X3) - the
second principal component (Y2) is a trade-off
between X1 and X3 - the third principal
component (Y3) is largely X1
19 When m 0 (normal case), recall
where the are the principal components of
x. Setting and substituting
into the previous expression yields
which defines an ellipsoid in a coordinate
system with axes y1,,yp lying in the directions
of e1,,ep, respectively.
20 Example For the principal components derived
from the following population of four
observations made on three random variables X1,
X2, and X3
plot the major and minor axes.
21 We will need the centroid m
The direction of the major axis is given by
while the directions of the two minor axis are
given by
22 Note that we can also construct principal
components for the standardized variables Zi
which in matrix notation is
where V1/2 is the diagonal standard deviation
matrix. Obviously
23 The principal components for the standardized
variables Zi may be obtained from the
eigenvectors of the correlation matrix r.
We can show that, for random vector Z of
standardized variables with covariance matrix r
and eigenvalues l1 ? l2 ? ? ? lp ? 0, the ith
principal component is given by
Note again that the principal components are not
unique if some eigenvalues are equal.
24For random vector Z,
which allows us to use
proportion of total population variance due to
the kth principal component
25 Example Suppose we have the following
population of four observations made on three
random variables X1, X2, and X3
Find the three population principal components
variables Y1, Y2, and Y3 for the standardized
random variables Z1, Z2, and Z3
26 and the corresponding eigenvalue-eigenvector
pairs
These results differ from the covariance- based
principal components!
27 so the principal components are
Note that
28 and the proportion of total population variance
due to the each principal component is
Note that the third principal component is again
relatively irrelevant!
29 Next we obtain the correlations between the
original random variables Xi and the principal
components Yi
30(No Transcript)
31 We can display these results in a correlation
matrix
Here we can easily see that - the first
principal component (Y1) is a mixture of all
three random variables (X1, X2, and X3) - the
second principal component (Y2) is a trade-off
between X1 and X3 - the third principal
component (Y3) is a trade-off between X1 and X2
32 Special structures yield particularly
interesting principal components - Diagonal
covariance matrices suppose S is the diagonal
matrix
we have
so (sii,ei) is the ith eigenvalue-eigenvecotr pair
33 - The set of principal components and the original
set of (uncorrelated) random variables are the
same! - Note that this result is also true if we work
with the correlation matrix.
34 - constant variances and covariance matrices
suppose S is the patterned matrix
Here the resulting correlation matrix
l1 1 (p-1)r for r gt 0. e1 (r-1/2,
r-1/2, ... , r-1/2). li 1 -r
for i ? 0.
35Using Principal Components to Summarize Sample
Variation
Suppose the data x1,,xn represent n independent
observations from a p-dimensional population with
some mean vector m and covariance matrix S
these data yield a sample mean vector x, sample
covariance matrix S, and sample correlation
matrix R. As in the population case, our goal
is to develop a new set of p axes (linear
combinations of the original p axes) in the
directions of greatest variability
_
36 Again it is easy to show that the linear
combinations
have sample means and
Thus the first principal component is the linear
combination of maximum sample variance
source of nonlinearity
restrict to coefficient vectors of unit length
37 The second principal component
restricts covariance to zero
The third principal component
restricts covariances to zero
38 Generally, the ith principal component
We can show that, for random sample X with
sample covariance matrix S and eigenvalues l1 ?
l2 ? ? ? lp ? 0, the ith sample principal
component is given by
39 which yields
proportion of total sample variance due to the
kth principal component
40 We can also easily find the correlations between
the original random variables xk and the
principal components yi
41 Note that - the approach for standardized
data (i.e., principal components derived from the
sample correlation matrix R) is analogous to the
population approach - when principal
components are derived from sample data, the
sample data are frequently centered,
which has no effect on the sample covariance
matrix S and yields the derived principal
components
Under these circumstances, the mean value of the
ith principal component associated with all n
observations in the data set is
42 Example Suppose we have the following sample of
four observations made on three random variables
X1, X2, and X3
Find the three sample principal components y1,
y2, and y3 based on the sample covariance matrix
S
43 First we need the sample covariance matrix S
and the corresponding eigenvalue-eigenvector
pairs
44 so the principal components are
Note that
45 and the proportion of total population variance
due to the each principal component is
Note that the third principal component is
relatively irrelevant!
46 Next we obtain the correlations between the
observed values xi of the original random
variables and the sample principal components yik
47(No Transcript)
48 We can display these results in a correlation
matrix
How would we interpret these results?