PCA on raw data - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

PCA on raw data

Description:

Nocomis.leptocephalus -0.0404813269 2.005198e-02 1.965699e-01 ... Nocomis.leptocephalus 0.108923218 -0.108325946 -0.450937632 0.02454138 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 25
Provided by: jakesc3
Category:
Tags: pca | data | leptocephalus | raw

less

Transcript and Presenter's Notes

Title: PCA on raw data


1
PCA on raw data
Importance of components
PC1 PC2 PC3 PC4 PC5 PC6 PC7
PC8 Standard deviation 20.159 20.118 16.851
11.621 9.4234 8.8082 6.2834 6.0790 Proportion of
Variance 0.256 0.255 0.179 0.085 0.0559
0.0488 0.0249 0.0233 Cumulative Proportion
0.256 0.510 0.689 0.774 0.8301 0.8789 0.9037
0.9270
PC1 PC2
PC3 Ichthyomyzon.gagei 0.0017832346
-1.383224e-04 -1.268793e-03 Labidesthes.sicculus
-0.0028727706 -1.881688e-03 -2.055479e-03 Notem
igonus.chrysoleucas 0.0002350518 2.011322e-04
2.584251e-04 Nocomis.leptocephalus
-0.0404813269 2.005198e-02 1.965699e-01 Ericymba
.buccata -0.1276139906 -3.517594e-02
1.969177e-01 Pimephales.vigilax
-0.0016238194 -5.924597e-04 -6.249391e-04 Cyprinel
la.venusta -0.2071696573 -1.051431e-01
1.097405e-01 Luxilus.chrysocephalus
-0.2058364947 3.661838e-02 8.918647e-01
2
PCA on raw data
3
PCoA on raw data, Euclidean distance
4
PCA, scaled, no rare species
Importance of components
PC1 PC2 PC3 PC4 PC5 PC6 PC7
PC8 Standard deviation 2.097 1.904 1.7647
1.4790 1.4664 1.3255 1.2615 1.2194 Proportion of
Variance 0.129 0.107 0.0916 0.0643 0.0632 0.0517
0.0468 0.0437 Cumulative Proportion 0.129 0.236
0.3276 0.3919 0.4551 0.5068 0.5536 0.5974
PC1 PC2 PC3
PC4 Ichthyomyzon.gagei -0.090135288
0.043747687 -0.002877967 0.01744758 Labidesthes.s
icculus 0.041107995 -0.038163610 0.124468292
-0.01789019 Nocomis.leptocephalus 0.108923218
-0.108325946 -0.450937632 0.02454138 Ericymba.buc
cata 0.255164516 -0.036053266
-0.218276882 0.04718731 Cyprinella.venusta
0.162116327 -0.058824879 -0.150635695
0.13533026 Luxilus.chrysocephalus 0.185008668
-0.212133298 -0.358435526 -0.02178550 Lythrurus.ro
seipinnis -0.135509539 -0.392224605
0.050916821 0.07785387 Notropis.longirostris
0.257018040 -0.060612841 0.087004613 -0.03929462
5
PCA biplot, scaled, no rare species
6
PCA, scaled, no rare species, log transformed
PC1 PC2 PC3 PC4 PC5 PC6 PC7
PC8 Standard deviation 2.080 1.778 1.607
1.223 1.1031 0.9896 0.9001 0.7607 Proportion of
Variance 0.226 0.165 0.135 0.078 0.0635 0.0511
0.0423 0.0302 Cumulative Proportion 0.226 0.391
0.525 0.603 0.6667 0.7177 0.7600 0.7902
PC1 PC2 PC3
PC4 Ichthyomyzon.gagei 0.0165784997
-0.005150938 0.0055671363 -0.002121706 Labidesthe
s.sicculus -0.0236522313 -0.011409750
0.0001065219 0.009332682 Nocomis.leptocephalus
-0.0546988734 -0.084951957 -0.2062884655
-0.201899456 Ericymba.buccata
-0.2946789538 0.031038552 -0.2694025305
-0.167515049 Cyprinella.venusta
-0.3610402907 -0.069026127 0.0066823120
0.306635579 Luxilus.chrysocephalus -0.2598456140
-0.410318820 -0.3358348999 -0.003684535 Lythrurus.
roseipinnis -0.0983100419 -0.560316356
0.2872093354 -0.065075897
7
PCA, scaled, no rare species, log transformed
8
(No Transcript)
9
Non Metric Multidimensional Scaling
  • Most robust and common unconstrained ordination
  • Distance based
  • Similar to metric multidimensional scaling
    (PCoA), rank distances used
  • Usually called NMDS, sometimes called MDS
  • Goal of analysis place samples in k dimensional
    space to minimize differences between rank
    similarities in the distance matrix and rank
    euclidean similarities in ordination space.

10
Non Metric Multidimensional Scaling
  • Stress measure of the lack of fit between rank
    order dissimilarities and rank order euclidean
    distance in ordination space.
  • Usually expressed as a scaled percentage.
  • 0perfect fit, higher worse fit

11
Shepard (stress) Plot
Multidimensional Shepard Plot
First 2 axes of the same solution
12
Non Metric Multidimensional Scaling
  • Species information lost
  • No variation accounted for by axes, stress is
    an analog
  • Not necessarily any order of importance to axes
  • Points and axes can be fully rotated, all that
    matters is the relative position of points
  • Must specify number of axes ahead of time
  • Stress is reduced as more axes are used
  • 1st dimension of a 2D NMDS is not the same thing
    as the 1st dimension of a 6D NMDS

13
(No Transcript)
14
Non Metric Multidimensional Scaling
  • Iterative algorithm, computationally intensive
  • Start with initial configuration, move points to
    reduce stress
  • Susceptible to getting stuck in local optima

stress
Multidimensional Space
15
Multiple Starting Points
  • Use multiple starting points to reduce local
    optima problem
  • Starting point options
  • Multiple random starting points
  • Perform other ordination first to get a starting
    configuration

stress
Multidimensional Space
16
Convergence and Procrustes Analysis
  • NMDS converges on a configuration that minimizes
    stress, additional iterations do not improve
    stress
  • Some approaches use Procrustean analysis to
    assess differences in configurations for each
    iteration

Iteration 2
Iteration 1
17
NMDS Code
  • Plain NMDS
  • Code (MASS package)
  • distancelt-vegdist(community, method"bray")
  • nmdslt-isoMDS(distance, k2)
  • nmds
  • plot(nmdspoints)
  • Options
  • K number of axes
  • Tol convergence tolerance
  • Maxit maximum number of iterations
  • Can specify a starting configuration, otherwise
    will perform a PCoA to obtain a starting position

18
NMDS Code
  • metaMDS
  • Performs multiple NMDS with multiple different
    starting positions
  • Uses Procrustes to track convergence
  • Code
  • metanmdslt-metaMDS(community,k4,distance"bray")
  • Options
  • Plot plot Procrustes errors along the way
  • K number of axes
  • Distance distance measure to use on raw data
  • Autotransform use some automatic
    transformations if the analysis thinks they are
    necessary
  • Expand get species scores as weighted averages
  • Noshare alter similarity if a certain
    proportion of samples have no species in common
  • Various other options to center/rotate scores
    (?metaMDS for details)

19
NMDS Code
  • Output
  • isoMDS
  • Scores for each sample, stress values for the
    final solution and each iteration
  • metaMDS
  • Sample scores, species scores (weighted
    averages), Procrustes errors, Procrustes plots,
    final stress, stress at each iteration.

20
NMDS Example
21
NMDS Example
initial value 14.250589 iter 5 value
9.263973 iter 10 value 7.920977 iter 15 value
7.734398 iter 15 value 7.726740 final value
7.701990 converged gt nmdslt-isoMDS(distance,k3) i
nitial value 7.999925 iter 5 value
4.118777 iter 10 value 3.973427 iter 15 value
3.870255 final value 3.822107 converged gt
nmdslt-isoMDS(distance,k5) initial value
1.595118 iter 5 value 1.371944 iter 10 value
1.210396 iter 15 value 1.137483 iter 20 value
1.060814 iter 25 value 1.026373 iter 30 value
1.019002 iter 30 value 1.018266 iter 35 value
1.004853 iter 35 value 1.003870 iter 35 value
1.003204 final value 1.003204 converged gt
nmdslt-isoMDS(distance,k10) initial value
1.110138 iter 5 value 0.765316 iter 10 value
0.664021 iter 15 value 0.610360 iter 20 value
0.552149 iter 25 value 0.490166 iter 30 value
0.453501 iter 35 value 0.408528 iter 40 value
0.366280 iter 45 value 0.340277 iter 50 value
0.324808 final value 0.324808 stopped after 50
iterations gt nmdslt-isoMDS(distance,k20) Error in
isoMDS(distance, k 20) initial
configuration must be complete In addition
Warning messages 1 In cmdscale(d, k) some of
the first 20 eigenvalues are lt 0
  • Stress ranges from 7.7 with k2 to 0.3 with k10
  • Undefined at K20

Note isoMDS uses an unchanging stress value to
indicate convergence on a solution. Change the
tolerance (tol) value to adjust what is
considered converged.
22
Stress Plots
K10
K2
K10, first 2 axes
If you want a good 2D representation, k2 is
better than k10 even though the stress will be
higher.
23
Procrustes
24
No indication of percent variation accounted for
on the two axes. However, the first axis very
nicely captures the pattern in the raw data.
Write a Comment
User Comments (0)
About PowerShow.com