Principal Components Analysis - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Principal Components Analysis

Description:

Scree diagram: Principal Components Analysis on: Covariance Matrix: ... how many components to use by examining eigenvalues (perhaps using scree diagram) ... – PowerPoint PPT presentation

Number of Views:218
Avg rating:3.0/5.0
Slides: 30
Provided by: HalWhi9
Category:

less

Transcript and Presenter's Notes

Title: Principal Components Analysis


1
Principal Components Analysis
  • Hal Whitehead
  • BIOL4062/5062

2
Principal Components AnalysisPCA
  • A. K. A.
  • latent vectors
  • latent variates
  • principal axes
  • principal factors
  • etc

3
Principal Components AnalysisPrincipal purpose
  • Reducing dimensionality
  • large body of data to manageable set

4
PCA General methodology
  • From k original variables x1,x2,...,xk
  • Produce k new variables y1,y2,...,yk
  • y1 a11x1 a12x2 ... a1kxk
  • y2 a21x1 a22x2 ... a2kxk
  • ...
  • yk ak1x1 ak2x2 ... akkxk

5
PCA General methodology
  • From k original variables x1,x2,...,xk
  • Produce k new variables y1,y2,...,yk
  • y1 a11x1 a12x2 ... a1kxk
  • y2 a21x1 a22x2 ... a2kxk
  • ...
  • yk ak1x1 ak2x2 ... akkxk

such that yk's are uncorrelated (orthogonal) y1
explains as much as possible of original variance
in data set y2 explains as much as possible of
remaining variance etc.
6
Principal Components Analysis
7
PCA General methodology
  • From k original variables x1,x2,...,xk
  • Produce k new variables y1,y2,...,yk
  • y1 a11x1 a12x2 ... a1kxk
  • y2 a21x1 a22x2 ... a2kxk
  • ...
  • yk ak1x1 ak2x2 ... akkxk

yk's are Principal Components
such that yk's are uncorrelated (orthogonal) y1
explains as much as possible of original variance
in data set y2 explains as much as possible of
remaining variance etc.
8
Principal Components Analysis
  • Rotates multivariate dataset into a new
    configuration which is easier to interpret
  • Purposes
  • simplify data
  • look at relationships between variables
  • look at patterns of units

9
Principal Components Analysis
  • Uses
  • Correlation matrix, or
  • Covariance matrix when variables in same units
    (morphometrics, etc.)

10
Principal Components Analysis
  • a11,a12,...,a1k is 1st Eigenvector of
    correlation/covariance matrix, and coefficients
    of first principal component
  • a21,a22,...,a2k is 2nd Eigenvector of
    correlation/covariance matrix, and coefficients
    of 2nd principal component
  • ak1,ak2,...,akk is kth Eigenvector
    of correlation/covariance matrix,
    and coefficients of kth principal component

11
Principal Components Analysis
  • So, principal components are given by
  • y1 a11x1 a12x2 ... a1kxk
  • y2 a21x1 a22x2 ... a2kxk
  • ...
  • yk ak1x1 ak2x2 ... akkxk
  • xjs are standardized if correlation matrix is
    used (mean 0.0, SD 1.0)

12
Principal Components Analysis
  • Score of ith unit on jth principal component
  • yi,j aj1xi1 aj2xi2 ... ajkxik

13
PCA Scores
14
Principal Components Analysis
  • Amount of variance accounted for by
  • 1st principal component, ?1, 1st eigenvalue
  • 2nd principal component, ?2, 2nd eigenvalue
  • ...
  • ?1 gt ?2 gt ?3 gt ?4 gt ...
  • Average ?j 1 (correlation matrix)

15
Principal Components AnalysisEigenvalues
16
PCA Terminology
  • jth principal component is jth eigenvector
    of correlation/covariance matrix
  • coefficients, ajk, are elements of eigenvectors
    and relate original variables (standardized if
    using correlation matrix) to components
  • scores are values of units on components
    (produced using coefficients)
  • amount of variance accounted for by component is
    given by eigenvalue, ?j
  • proportion of variance accounted for by component
    is given by ?j / S ?j
  • loading of kth original variable on jth component
    is given by ajkv?j --correlation between
    variable and component

17
How many components to use?
  • If ?j lt 1 then component explains less variance
    than original variable (correlation matrix)
  • Use 2 components (or 3) for visual ease
  • Scree diagram

18
Principal Components Analysis on
  • Covariance Matrix
  • Variables must be in same units
  • Emphasizes variables with most variance
  • Mean eigenvalue ?1.0
  • Useful in morphometrics, a few other cases
  • Correlation Matrix
  • Variables are standardized (mean 0.0, SD 1.0)
  • Variables can be in different units
  • All variables have same impact on analysis
  • Mean eigenvalue 1.0

19
PCA Potential Problems
  • Lack of Independence
  • NO PROBLEM
  • Lack of Normality
  • Normality desirable but not essential
  • Lack of Precision
  • Precision desirable but not essential
  • Many Zeroes in Data Matrix
  • Problem (use Correspondence Analysis)

20
Hourly records of sperm whale behaviour
  • Data collected
  • Off Galapagos Islands
  • 1985 and 1987
  • Units
  • hours spent following sperm whales
  • 440 hours
  • Variables
  • Mean cluster size
  • Max. cluster size
  • Mean speed
  • Heading consistency
  • Fluke-up rate
  • Breach rate
  • Lobtail rate
  • Spyhop rate
  • Sidefluke rate
  • Coda rate
  • Creak rate
  • High click rate

21
(No Transcript)
22
(No Transcript)
23
Scores plots
24
Rotations of Principal Components(Exploratory
Factor Analysis)
  • Factors are rotated components
  • (just rotate a few principal components)
  • Varimax tries to maximize variance of squared
    loadings for each factor (orthogonal)
  • lines up factors with original variables
  • improves interpretability of factors
  • Quartimax tries to minimize sums of squares of
    products of loadings (orthogonal)

25
US Crime Statistics
  • Variables
  • Murder
  • Rape
  • Robbery
  • Assault
  • Burglary
  • Larceny
  • Autotheft
  • Units
  • States

26
Crime Statistics
  • Component loadings
  • 1 2
  • MURDER 0.557 -0.771
  • RAPE 0.851 -0.139
  • ROBBERY 0.782 0.055
  • ASSAULT 0.784 -0.546
  • BURGLARY 0.881 0.308
  • LARCENY 0.728 0.480
  • AUTOTHFT 0.714 0.438

27
Crime Statistics Component Loadings
28
Crime Statistics Scores Plot
Crimes against people
Crimes against property
29
Procedure for principal components analysis
  • 1. Decide whether to use correlation or
    covariance matrix
  • 2. Find eigenvectors (components) and
    eigenvalues (variance accounted for)
  • 3. Decide how many components to use by
    examining eigenvalues (perhaps using scree
    diagram)
  • 4. Examine loadings (perhaps vector loading
    plot)
  • 5. Plot scores
  • 6. Try rotation--go to step 4
Write a Comment
User Comments (0)
About PowerShow.com