Principal Component Analysis PCA

About This Presentation

Title:

Principal Component Analysis PCA

Description:

Scree Test: Plotting the eigenvalues against the corresponding. PC produces a scree plot that illustrates the rate of change in the ... – PowerPoint PPT presentation

Number of Views:553

Avg rating:3.0/5.0

Slides: 32

Provided by: comp88

Category:

more less

Transcript and Presenter's Notes

Title: Principal Component Analysis PCA

1
Principal Component Analysis(PCA)
Presented by Aycan YALÇIN
2003700369
2
Outline of the Presentation

Introduction
Objectives of PCA
Terminology
Algorithm
Applications
Conclusion

Introduction

4
Introduction

Problem
Analysis of multivariate data plays a key role in
data analysis
Multidimensional hyperspace is often difficult to
visualize

Represent data in a manner that facilitates the
analysis
5
Introduction (contd)

Objectives of unsupervised learning methods
Reduce dimensionality
Score all observations
Cluster similar observations together
Well-known linear transformation methods
PCA, Factor Analysis, Projection Pursuit,etc.

6
Introduction (contd)

Benefits of dimensionality reduction
The computational overhead of the subsequent
processing stages is reduced
Noise may be reduced
A projection into a subspace of a very low
dimension is useful for visualizing the data

7
Objectives of PCA
8
Objectives of PCA

Principal Component Analysis is a technique used
to
Reduce the dimensionality of the data set
Identify new meaningful underlying variables
Loose minimum information
by finding the directions in which a cloud of
data
points is stretched most.

9
Objectives of PCA (contd)

PCA or Karhunen- Loeve transform summarizes the
variation in a (possibly) correlated
multi-attribute to a set of (a smaller number
of) uncorrelated components (principal
components).
These uncorrelated variables are linear
combinations of original variables.

the objective of PCA is to reduce the
dimensionality by extracting the smallest number
components that account
for most of the variation in the original
multivariate data and to summarize the data with
little loss of information.

10
Terminology
11
Terminology

Variance
Covariance
Eigenvectors Eigenvalues
Principal Components

12
Terminology (Variance)

Standard deviation
Average distance from mean to a point
Variance
Standard deviation squared
One-dimensional measure

13
Terminology (Covariance)

How two dimensions vary from the mean with
respect to each other

cov(X,Y) gt 0 Dimensions increase together
cov(X,Y) lt 0 One increases, one decreases
cov(X,Y) 0 Dimensions are independent

14
Terminology (Covariance Matrix)

Contains covariance values between all possible
dimensions

Example for three dimensions (x,y,z) (Always
symetric)

cov(x,x) ? variance of component x
15
Terminology (Eigenvalues Eigenvectors)

Eigenvalues measure the amount of the variation
explained by each PC (largest for the first PC
and smaller for the subsequent PCs)
gt 1 indicates that PCs account for more variance
than accounted by one of the original variables
in standardized data
This is commonly used as a cutoff point for
which PCs are retained.
Eigenvectors provides the weights to compute the
uncorrelated PC. These vectors give the
directions in which the data cloud is stretched
most

16
Terminology (Eigenvalues Eigenvectors)

Vectors x having same direction as Ax are called
eigenvectors of A (A is an n by n matrix).
In the equation Ax?x, ? is called an eigenvalue
of A.
Ax?x ? (A-?I)x0
How to calculate x and ?
Calculate det(A-?I), yields a polynomial (degree
n)
Determine roots to det(A-?I)0, roots are
eigenvalues ?
Solve (A- ?I) x0 for each ? to obtain
eigenvectors x

17
Terminology (Principal Component)

The extracted uncorrelated components are called
principal components(PC)
Estimated from the eigenvectors of the covariance
or correlation matrix of the original variables.
The projections of the data on the eigenvectors
Extracted by linear transformations of the
original variables so that the first few PCs
contain most of the variations in the original
dataset.

18
Algorithm
19
Algorithm
We look for axes which minimise projection errors
and maximise the variance after projection
Ex
transform from 2 to 1 dimension
20
Algorithm (contd)

Preserve as much of the variance as possible

21
Algorithm (contd)

Data is a matrix such as
Rows ? Observations(values)
Columns ? Attributes (dimensions)
First center data by subtracting the mean in each
dimension
i is observation, j is dimension and m is
total number of observation
Calculate covariance matrix for DataAdjust

22
Algorithm (contd)

Calculate eigenvalues ? and eigenvectors x for
covariance matrix
Eigenvalues ?j are used for calculation of of
total variance (Vj) for each component j

23
Algorithm (contd)

Choose components form feature vector
Eigenvalues ? and eigenvectors x are sorted in
descending order
Component with highest ? is principal component
Featurevector(x1, ... , xn) where xi is a column
oriented eigenvector. Contains chosen components.
Derive new dataset
Transpose Featurevector and DataAdjust
FinaldataRowFeatureVector x RowDataAdjust
Original data in terms of chosen components
Finaldata has eigenvectors as coordinate axes

24
Algorithm (contd)

Retrieving old data (e.g. in data compression)
RetrievedRowData
(RowFeatureVectorT x
FinalData)OriginalMean
Yields original data using the chosen components

25
Algorithm (contd)

Estimating the Number of PC
Scree Test Plotting the eigenvalues against the
corresponding
PC produces a scree plot that illustrates the
rate of change in the
magnitude of the eigenvalues for the PC. The
rate of decline
tends to be fast first then levels off. The
elbow, or the point at
which the curve bends, is considered to indicate
the maximum
number of PC to extract. One less PC than the
number at the
elbow might be appropriate if you are concerned
about
getting an overly defined solution.

26
Applications
27
Applications

Example applications
Computer Vision
Representation
Pattern Identification
Image compression
Face recognition
Gene expression analysis
Purpose Determine core set of conditions for
useful gene comparison
Handwritten character recognition
Data Compression, etc.

28
Conclusion
29
Conclusion

PCA can be useful when there is a severe
high-degree of correlation present in the
multi-attributes
When a data set consists of several clusters, the
principal axes found by PCA usually pick
projections with good separation. PCA provides
an effective basis for feature extraction in this
case.
For good data compression, PCA offers a useful
self-organized learning procedure

30
Conclusion (contd)

Shortcomings of PCA
PCA requires to diagonalise matrix C (dimensionn
x n). Heavy if n is large !
PCA only finds linear sub-spaces
It works best if the individual components are
Gaussian-distributed(e.g ICA does not rely on
such a distribution)
PCA does not say how many target dimensions to use

31
Questions?

Write a Comment

User Comments (0)

About PowerShow.com

Principal Component Analysis PCA - PowerPoint PPT Presentation

Principal Component Analysis PCA

Scree Test: Plotting the eigenvalues against the corresponding. PC produces a scree plot that illustrates the rate of change in the ... – PowerPoint PPT presentation