Principal Component Analysis - PowerPoint PPT Presentation

1 / 11

About This Presentation

Title:

Principal Component Analysis

Description:

Method of PCA part 1. The Covariance Matrix ... PCA in Fluorescence Spectroscopy ... Principal Component Analysis will be able to remove these unwanted shapes from ... – PowerPoint PPT presentation

Number of Views:99

Avg rating:3.0/5.0

Slides: 12

Provided by: OpticsC

Category:

more less

Transcript and Presenter's Notes

Title: Principal Component Analysis

1
Principal Component Analysis

Adam Day

2
Introduction

Principal Component Analysis (PCA) is a
statistical technique, it can be used for many
things including data compression and shape
recognition.
A fluorescence spectrum has a shape which PCA can
recognise.

3
Method of PCA part 1.The Covariance Matrix

It is important to understand a set of data in
terms of its dimensions and how they vary
together.
For a set of 2-dimensional data it is important
to know if high values in one dimension cause
values in the other to be high, or the opposite
effect, where high values in one dimension cause
there to be low values in another dimension.

4
Method of PCA part 1.The Covariance Matrix

One way of telling if one dimension of data
affects another is to find the covariance between
the two dimensions.
Covariance can be derived from the formula for
standard deviation, where X and Y are data
dimensions

It should be noted here that the first step in
finding the covariance is to subtract the mean of
data in a dimension from each value in that
dimension. This data set, called row data
adjust, will be used again later.
5
Method of PCA part 1.The Covariance Matrix

When covariance between two dimensions is
positive the values in those two dimensions
increase together, when it is negative they
decrease together.
When a data set comprises many dimensions, it is
important to lay out the covariance values in a
matrix. That is, the covariance matrix.

6
Method of PCA part 1.The Covariance Matrix

From the definition of covariance, it should be
obvious that
A consequence of this is that (in the case of
3dimensional data) the covariance matrix is
always a symmetric square matrix with the
variances down the main diagonal.

7
Method of PCA part 2.Finding Principal Components

The principal components of the data set are,
simply, the eigenvectors of the covariance
matrix.
There are as many eigenvectors as there are
dimensions in the data set.
The eigenvector with the highest corresponding
eigenvalue contains the most information about
the original data set. This information will
correspond to large features in the shape of the
original data set.

First Principal Component
Original Data
8
Method of PCA part 2.Finding Principal Components

As the value of the corresponding eigenvalue
falls, so too does the size of the features
represented by the eigenvector. In this specimen
set of results, principal components 1-5 are
shown of a set of data comprising 4 peaks of
random height added to a set of 200 points of
random noise. Notice that the peak heights fall
as PC number rises, in fact all of the principal
components above PC4 contain only noise.

Original Data
PC1
PC2
PC3
PC4
PC5
9
Method of PCA part 3Data Reconstruction

If the matrix dot product of this principal
component and the row data adjust data set were
taken, then the output data set (called final
data) would have a similar shape to that of the
original data however it is likely that it
would be a poor representation of the original
data.

10
Method of PCA part 3Data Reconstruction

If the dot product of all the principal
components lined up in a matrix (called a row
feature vector) and the row data adjust data
set was taken it would be exactly the original
data set returned.
So, if we could choose which principal components
were put into the row feature vector, then it
would be possible to control the detail of the
data returned.
In the previous example, it was clear that there
was no significant data relating to the peaks in
the input graph in any principal component above
number 4.

11
PCA in Fluorescence Spectroscopy

In the study of drug fluorescence, it is
necessary to study spectra similar to the
patterns of data shown in previous examples.
These are created by shining a light from a
source onto a drug sample, a spectrum is then
reflected and received by a computer as data.
In these spectra there is likely to be erroneous
noise and large shapes caused by unwanted signals
such as reflection of light from a source, and
noise caused by a number of things such as
electrical interference and fluctuations in the
output of the source.
Principal Component Analysis will be able to
remove these unwanted shapes from spectra by
careful creation of the row feature vector
mentioned earlier.