Title: Principal Component Analysis PCA
1Principal Component Analysis(PCA)
Presented by Aycan YALÇIN
2003700369
2 Outline of the Presentation
- Introduction
- Objectives of PCA
- Terminology
- Algorithm
- Applications
- Conclusion
3 4 Introduction
- Problem
- Analysis of multivariate data plays a key role in
data analysis - Multidimensional hyperspace is often difficult to
visualize
Represent data in a manner that facilitates the
analysis
5 Introduction (contd)
- Objectives of unsupervised learning methods
- Reduce dimensionality
- Score all observations
- Cluster similar observations together
- Well-known linear transformation methods
- PCA, Factor Analysis, Projection Pursuit,etc.
-
6 Introduction (contd)
- Benefits of dimensionality reduction
- The computational overhead of the subsequent
processing stages is reduced - Noise may be reduced
- A projection into a subspace of a very low
dimension is useful for visualizing the data
7Objectives of PCA
8 Objectives of PCA
- Principal Component Analysis is a technique used
to - Reduce the dimensionality of the data set
- Identify new meaningful underlying variables
- Loose minimum information
- by finding the directions in which a cloud of
data - points is stretched most.
9 Objectives of PCA (contd)
- PCA or Karhunen- Loeve transform summarizes the
- variation in a (possibly) correlated
multi-attribute to a set of (a smaller number
of) uncorrelated components (principal
components). - These uncorrelated variables are linear
combinations of original variables.
- the objective of PCA is to reduce the
dimensionality by extracting the smallest number
components that account - for most of the variation in the original
multivariate data and to summarize the data with
little loss of information.
10Terminology
11 Terminology
- Variance
- Covariance
- Eigenvectors Eigenvalues
- Principal Components
12 Terminology (Variance)
- Standard deviation
- Average distance from mean to a point
- Variance
- Standard deviation squared
- One-dimensional measure
13 Terminology (Covariance)
- How two dimensions vary from the mean with
respect to each other
- cov(X,Y) gt 0 Dimensions increase together
- cov(X,Y) lt 0 One increases, one decreases
- cov(X,Y) 0 Dimensions are independent
14 Terminology (Covariance Matrix)
- Contains covariance values between all possible
dimensions
- Example for three dimensions (x,y,z) (Always
symetric)
cov(x,x) ? variance of component x
15 Terminology (Eigenvalues Eigenvectors)
- Eigenvalues measure the amount of the variation
explained by each PC (largest for the first PC
and smaller for the subsequent PCs) - gt 1 indicates that PCs account for more variance
than accounted by one of the original variables
in standardized data - This is commonly used as a cutoff point for
which PCs are retained. - Eigenvectors provides the weights to compute the
uncorrelated PC. These vectors give the
directions in which the data cloud is stretched
most
16Terminology (Eigenvalues Eigenvectors)
- Vectors x having same direction as Ax are called
eigenvectors of A (A is an n by n matrix). - In the equation Ax?x, ? is called an eigenvalue
of A. - Ax?x ? (A-?I)x0
- How to calculate x and ?
- Calculate det(A-?I), yields a polynomial (degree
n) - Determine roots to det(A-?I)0, roots are
eigenvalues ? - Solve (A- ?I) x0 for each ? to obtain
eigenvectors x
17 Terminology (Principal Component)
- The extracted uncorrelated components are called
principal components(PC) - Estimated from the eigenvectors of the covariance
or correlation matrix of the original variables. - The projections of the data on the eigenvectors
- Extracted by linear transformations of the
original variables so that the first few PCs
contain most of the variations in the original
dataset.
18Algorithm
19 Algorithm
We look for axes which minimise projection errors
and maximise the variance after projection
Ex
transform from 2 to 1 dimension
20 Algorithm (contd)
- Preserve as much of the variance as possible
21 Algorithm (contd)
- Data is a matrix such as
- Rows ? Observations(values)
- Columns ? Attributes (dimensions)
- First center data by subtracting the mean in each
dimension -
- i is observation, j is dimension and m is
total number of observation - Calculate covariance matrix for DataAdjust
22 Algorithm (contd)
- Calculate eigenvalues ? and eigenvectors x for
covariance matrix - Eigenvalues ?j are used for calculation of of
total variance (Vj) for each component j
23 Algorithm (contd)
- Choose components form feature vector
- Eigenvalues ? and eigenvectors x are sorted in
descending order - Component with highest ? is principal component
- Featurevector(x1, ... , xn) where xi is a column
oriented eigenvector. Contains chosen components.
- Derive new dataset
- Transpose Featurevector and DataAdjust
- FinaldataRowFeatureVector x RowDataAdjust
- Original data in terms of chosen components
- Finaldata has eigenvectors as coordinate axes
24 Algorithm (contd)
- Retrieving old data (e.g. in data compression)
- RetrievedRowData
- (RowFeatureVectorT x
FinalData)OriginalMean - Yields original data using the chosen components
25 Algorithm (contd)
- Estimating the Number of PC
- Scree Test Plotting the eigenvalues against the
corresponding - PC produces a scree plot that illustrates the
rate of change in the - magnitude of the eigenvalues for the PC. The
rate of decline - tends to be fast first then levels off. The
elbow, or the point at - which the curve bends, is considered to indicate
the maximum - number of PC to extract. One less PC than the
number at the - elbow might be appropriate if you are concerned
about - getting an overly defined solution.
26Applications
27 Applications
- Example applications
- Computer Vision
- Representation
- Pattern Identification
- Image compression
- Face recognition
- Gene expression analysis
- Purpose Determine core set of conditions for
useful gene comparison - Handwritten character recognition
- Data Compression, etc.
28Conclusion
29 Conclusion
- PCA can be useful when there is a severe
high-degree of correlation present in the
multi-attributes - When a data set consists of several clusters, the
principal axes found by PCA usually pick
projections with good separation. PCA provides
an effective basis for feature extraction in this
case. - For good data compression, PCA offers a useful
self-organized learning procedure
30 Conclusion (contd)
- Shortcomings of PCA
- PCA requires to diagonalise matrix C (dimensionn
x n). Heavy if n is large ! - PCA only finds linear sub-spaces
- It works best if the individual components are
Gaussian-distributed(e.g ICA does not rely on
such a distribution) - PCA does not say how many target dimensions to use
31Questions?