Principal Component Analysis - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

Principal Component Analysis

Description:

Method of PCA part 1. The Covariance Matrix ... PCA in Fluorescence Spectroscopy ... Principal Component Analysis will be able to remove these unwanted shapes from ... – PowerPoint PPT presentation

Number of Views:99
Avg rating:3.0/5.0
Slides: 12
Provided by: OpticsC
Category:

less

Transcript and Presenter's Notes

Title: Principal Component Analysis


1
Principal Component Analysis
  • Adam Day

2
Introduction
  • Principal Component Analysis (PCA) is a
    statistical technique, it can be used for many
    things including data compression and shape
    recognition.
  • A fluorescence spectrum has a shape which PCA can
    recognise.

3
Method of PCA part 1.The Covariance Matrix
  • It is important to understand a set of data in
    terms of its dimensions and how they vary
    together.
  • For a set of 2-dimensional data it is important
    to know if high values in one dimension cause
    values in the other to be high, or the opposite
    effect, where high values in one dimension cause
    there to be low values in another dimension.

4
Method of PCA part 1.The Covariance Matrix
  • One way of telling if one dimension of data
    affects another is to find the covariance between
    the two dimensions.
  • Covariance can be derived from the formula for
    standard deviation, where X and Y are data
    dimensions

It should be noted here that the first step in
finding the covariance is to subtract the mean of
data in a dimension from each value in that
dimension. This data set, called row data
adjust, will be used again later.
5
Method of PCA part 1.The Covariance Matrix
  • When covariance between two dimensions is
    positive the values in those two dimensions
    increase together, when it is negative they
    decrease together.
  • When a data set comprises many dimensions, it is
    important to lay out the covariance values in a
    matrix. That is, the covariance matrix.

6
Method of PCA part 1.The Covariance Matrix
  • From the definition of covariance, it should be
    obvious that
  • A consequence of this is that (in the case of
    3dimensional data) the covariance matrix is
    always a symmetric square matrix with the
    variances down the main diagonal.

7
Method of PCA part 2.Finding Principal Components
  • The principal components of the data set are,
    simply, the eigenvectors of the covariance
    matrix.
  • There are as many eigenvectors as there are
    dimensions in the data set.
  • The eigenvector with the highest corresponding
    eigenvalue contains the most information about
    the original data set. This information will
    correspond to large features in the shape of the
    original data set.

First Principal Component
Original Data
8
Method of PCA part 2.Finding Principal Components
  • As the value of the corresponding eigenvalue
    falls, so too does the size of the features
    represented by the eigenvector. In this specimen
    set of results, principal components 1-5 are
    shown of a set of data comprising 4 peaks of
    random height added to a set of 200 points of
    random noise. Notice that the peak heights fall
    as PC number rises, in fact all of the principal
    components above PC4 contain only noise.

Original Data
PC1
PC2
PC3
PC4
PC5
9
Method of PCA part 3Data Reconstruction
  • If the matrix dot product of this principal
    component and the row data adjust data set were
    taken, then the output data set (called final
    data) would have a similar shape to that of the
    original data however it is likely that it
    would be a poor representation of the original
    data.

10
Method of PCA part 3Data Reconstruction
  • If the dot product of all the principal
    components lined up in a matrix (called a row
    feature vector) and the row data adjust data
    set was taken it would be exactly the original
    data set returned.
  • So, if we could choose which principal components
    were put into the row feature vector, then it
    would be possible to control the detail of the
    data returned.
  • In the previous example, it was clear that there
    was no significant data relating to the peaks in
    the input graph in any principal component above
    number 4.

11
PCA in Fluorescence Spectroscopy
  • In the study of drug fluorescence, it is
    necessary to study spectra similar to the
    patterns of data shown in previous examples.
  • These are created by shining a light from a
    source onto a drug sample, a spectrum is then
    reflected and received by a computer as data.
  • In these spectra there is likely to be erroneous
    noise and large shapes caused by unwanted signals
    such as reflection of light from a source, and
    noise caused by a number of things such as
    electrical interference and fluctuations in the
    output of the source.
  • Principal Component Analysis will be able to
    remove these unwanted shapes from spectra by
    careful creation of the row feature vector
    mentioned earlier.
Write a Comment
User Comments (0)
About PowerShow.com