Multivariate Applications in Ecology - PowerPoint PPT Presentation

1 / 28
About This Presentation
Title:

Multivariate Applications in Ecology

Description:

Eigenvalues, and their contribution to the mean squared contingency coefficient. CA1 CA2 CA3 CA4 CA5 CA6 CA7 CA8 CA9 ... lambda 0.6812 0.5616 0.3733 0.2121 0. ... – PowerPoint PPT presentation

Number of Views:455
Avg rating:3.0/5.0
Slides: 29
Provided by: jakesc3
Category:

less

Transcript and Presenter's Notes

Title: Multivariate Applications in Ecology


1
Call cca(X community) Partitioning of mean
squared contingency coefficient
Inertia Proportion Total 2.761
1 Unconstrained 2.761 1 Eigenvalues,
and their contribution to the mean squared
contingency coefficient CA1 CA2
CA3 CA4 CA5 CA6 CA7 CA8
CA9 lambda 0.6812 0.5616 0.3733 0.2121 0.2043
0.1805 0.1567 0.1165 0.07975 accounted 0.2467
0.4501 0.5853 0.6621 0.7361 0.8014 0.8582 0.9004
0.92925 CA10 CA11 CA12 CA13
CA14 CA15 lambda 0.0590 0.04742 0.04265
0.02934 0.01352 0.003424 accounted 0.9506 0.96779
0.98324 0.99386 0.99876 1.000000
Species scores CA1
CA2 CA3 CA4 CA5
CA6 Aphredoderus.sayanus 1.09199 -0.78358
0.66014 -0.39574 0.2458 -0.47548 Campostoma.anoma
lum 0.21026 -0.12250 -0.92305 -1.01428
0.0582 -0.74628 Cyprinella.lutrensis -0.86791
-0.33791 -1.01547 -0.16398 -0.5521
-0.22539 Cyprinella.venusta -0.04740
1.75157 0.56948 0.37331 -0.4395
-0.19827 Dorosoma.cepedianum -1.35682
-0.13187 0.60274 0.64975 0.6390
-0.35667 Site scores (weighted averages of
species scores) CA1 CA2
CA3 CA4 CA5 CA6 Fun07-01 -0.13052
2.10087 0.5171 1.44237 -0.61814
-0.9957 Fun07-02 0.15168 1.44981 0.6642
0.12578 0.95677 -1.1987 Fun07-03 -0.06375
1.38442 0.1463 0.75381 1.20652
-0.7524 Fun07-04 -1.31171 0.44255 -0.3123
5.66553 2.73707 -2.7540 Fun07-05 0.09770
1.92609 0.9488 -0.99570 -1.87127
1.6549 Fun07-06 -0.15637 0.24728 -0.6476
0.60933 2.46743 2.5612
2
CA - raw data
3
CA log transformed data
Call cca(X com_log) Partitioning of mean
squared contingency coefficient
Inertia Proportion Total 2.052
1 Unconstrained 2.052 1
Total inertia reduced. Arch less pronounced.
4
DCA of raw data
No inertia figure. Arch less pronounced, but
recall the problems with the detrending
procedure. Long 1st axis high beta
diversity. Only four axes returned.
Call decorana(veg community) Detrended
correspondence analysis with 26
segments. Rescaling of axes with 4 iterations.
DCA1 DCA2 DCA3
DCA4 Eigenvalues 0.6662 0.3033 0.1826
0.28720 Decorana values 0.6812 0.2749 0.1088
0.07432 Axis lengths 4.0662 2.0618 1.6719
1.70013
5
DCA of log transformed data
DCA1 DCA2 DCA3
DCA4 Eigenvalues 0.4835 0.3074 0.18072
0.10157 Decorana values 0.4871 0.2696 0.09774
0.04817 Axis lengths 3.7761 2.0160 2.56518
1.89460
6
Gradients unknown, want to describe data or imply
gradients from species data.
Cluster hierarchical (agglomerative or
divisive) or non hierarchial (k means).
Data are thought to be distinctly structured.
Goal is to put in distinct groups.
yes
yes
no
No
Goal is to look at the relationship among
samples. Indirect gradient analysis NMDS or CA
preferred, PCA or PCoA for non-community datasets.
Important gradients are known and measured. Goal
is to describe variability in species data as it
relates to the measured gradients. Variability
due to other unmeasured factors is ignored.
Direct gradient analysis CCA or RDA.
7
Canonical Correspondence Analysis (CCA)
Ter Braak, C. J. F. (1986) Canonical
correspondence analysis a new eigenvector
technique for multivariate direct gradient
analysis. Ecology, 67, 11671179.
  • Canonical in simplest or standard form
  • Good choice if you have clear and strong a priori
    hypotheses on constraints and you are not
    interested in general structure of the data.

Species
Env. Variables
Factors
Samples
Samples
Samples
8
CCA
  • 1st step (CA)
  • Site scores weighted average of all species
    abundances
  • Species scores weighted average of all sample
    abundances
  • Standardize species and site scores, converge on
    a solution
  • ---------------------------
  • Calculate weighted multiple regression of site
    scores on environmental variables, use to
    calculate new site scores
  • Rescale axes (various types)
  • Repeat process to convergence

9
Use CCA with caution
  • CCA uses multiple regression, has all of those
    associated assumptions and potential issues.
  • Multicollinearity
  • Outliers or error in environmental data
  • Linear relationships
  • Not well suited to exploratory analyses.
  • Only use when you have a good idea of what
    environmental variables are structuring a
    gradient.

10
Running CCA in R
  • Function cca in vegan package (see also cca
    function in ade4 package)
  • The CA from last week used the same function,
    just included no constraining environmental
    matrix.
  • Code
  • ca_examplelt-cca(community)
  • cca_examplelt-cca(community .,environmental)
  • Models must be specified, example above includes
    all environmental variables in the model do not
    do this.
  • Cca_examplelt-cca(community DO pH substrate,
    environmental)

11
CCA models
  • Multiple regression assess the importance of
    predictor variables by adding/subtracting and
    looking at ? R2
  • The best model is always the one with all
    variables. Adding variables, even if random
    numbers, will improve variance explained.
  • Goal most explanatory power with the least
    number of variables
  • Recall you are using CCA because important
    gradients are known and measured. If this is
    true, do not throw additional variables in the
    analysis to see how it works.
  • Models can include interaction terms but
    interpretation will be difficult.

12
CCA models
  • Specify your own model based on a priori
    hypothesis
  • Test for and eliminate redundant variables
  • Use stepwise procedures to select the best model
  • AIC in conjunction with forward or backward
    selection
  • Function stepAIC
  • fullmodel_ccalt-cca(community .,environmental)
  • smallmodel_ccalt-cca(community 1, environmental)
  • fit_model lt- stepAIC(smallmodel_cca,
    scopeformula(fullmodel_cca))
  • Iterative procedure, seeks the best combination
    of environmental variables

13
Variance Inflation Factors (VIF)
  • If environmental variables are correlated they
    will have high VIF scores.
  • Automatic model selection procedures (step) do
    not control or look for this. Always look at VIF
    scores after a model is selected.
  • Rule VIF of over 10 indicates redundancy in
    environmental variables
  • vif.cca function

gt vif.cca(fullmodel_cca) Temp ORP.mV.
depth flow substrate cover
veg 29.089645 13.714015 235.274774 12.056663
92.520635 32.770680 76.474315 bank
width canopy cv_dep cv_flow cv_sub
cv_width 45.308347 24.108944 8.022191
21.856455 5.770589 135.175366 36.929193
14
CCA output and interpretation
  • Correlation matrix from constraining
    (environmental) matrix
  • Total, constrained and unconstrained inertia
    (species variance). Eigenvalues proportional to
    variance explained for each axis.
  • Species scores - weighted averages of sample
    scores
  • Two sets of sites scores one a weighted average
    of species scores and one from the multiple
    regression with environmental variables.
  • Environmental variable scores derived from
    cannonical correlates correlations between the
    environmental variables and CCA axes, weighted by
    the eigenvalues of those axes

15
Monte Carlos Hypothesis Tests
  • Test the significance of the overall ordination
  • Community data permuted
  • Pseudo-F calculated as ration of constrained to
    unconstrained variance accounted for
  • Null environmental variables not related, zero
    constrained
  • P proportion of random communities producing
    more than the observed constrained variation.
  • Test significance of each axis
  • Similar, test significance of each axis
    separately
  • Test significance of constraining (environmental)
    variables
  • Variables tested sequentially, order in the model
    will effect results.

16
CCA Options
  • Detrending (DCCA), but recall problems with
    detrending
  • Multiple regression options
  • Model can include interactions (difficult
    interpretation)
  • A third matrix to be partialed out
  • Forward, backward selection
  • Other model selection besides AIC
  • Number of permutations in Monte Carlo procedures
  • Data transformations same considerations as
    with CA

17
Triplot
Samples and species represented as
points. Environmental constraints represented as
vectors.
18
Redundancy Analysis (RDA)
  • CCA aligned with CA, species relationships
    assumed to be unimodal
  • A CCA with no constraints CA
  • RDA aligned with PCA, species relationships
    assumed to be linear
  • An RDA with no constraints PCA
  • CCA is more common for the same reasons PCA not
    usually used for species data.

19
The same sample dataset for species. For the CCA
I combined this with 12 environmental variables
(env1, env2) that are all simply random numbers.
20
CCA Example
Inertia Rank Total 1.78
Unconstrained 1.78 9 Inertia is mean
squared contingency coefficient Eigenvalues for
unconstrained axes CA1 CA2 CA3
CA4 CA5 CA6 CA7 CA8
0.849792 0.529798 0.252153 0.095300 0.030295
0.010971 0.006376 0.003073 CA9 0.001793
Standard CA of our sample dataset.
Call cca(formula community env1 env2
env3, data environmental) Partitioning of
mean squared contingency coefficient
Inertia Proportion Total 1.7795
1.0000 Constrained 0.2047
0.1150 Unconstrained 1.5748
0.8850 Eigenvalues, and their contribution to
the mean squared contingency coefficient
CCA1 CCA2 CCA3 CA1 CA2 CA3
CA4 CA5 CA6 lambda 0.1839 0.01449
0.006308 0.7758 0.4086 0.2487 0.09124 0.02972
0.01028 accounted 0.1034 0.11150 0.115047 0.4360
0.6656 0.8053 0.85658 0.87328 0.87906
CA7 CA8 CA9 lambda 0.005796
0.003044 0.001652 accounted 0.882314 0.884025
0.884953
CCA using the first three random environmental
variables.
21
CCA example
Call cca(formula community env1 env2
env3 env4 env5 env6, data environmental)
Partitioning of mean squared contingency
coefficient Inertia
Proportion Total 1.7795
1.0000 Constrained 0.5965
0.3352 Unconstrained 1.1831
0.6648 Eigenvalues, and their contribution to
the mean squared contingency coefficient
CCA1 CCA2 CCA3 CCA4 CCA5
CCA6 CA1 CA2 CA3 lambda 0.3445 0.2254
0.02022 0.003053 0.002212 0.001101 0.5236 0.3252
0.2326 accounted 0.1936 0.3202 0.33160 0.333318
0.334561 0.335180 0.2943 0.4770 0.6077
CA4 CA5 CA6 CA7 CA8
CA9 lambda 0.05594 0.02716 0.008827 0.005477
0.002783 0.001475 accounted 0.63913 0.65439
0.659350 0.662428 0.663991 0.664820
CCA using six and then all ten random
environmental variables.
Call cca(formula community env1 env2
env3 env4 env5 env6 env7 env8 env9
env10, data environmental) Partitioning of
mean squared contingency coefficient
Inertia Proportion Total 1.7795
1.0000 Constrained 0.8360
0.4698 Unconstrained 0.9436
0.5302 Eigenvalues, and their contribution to
the mean squared contingency coefficient
CCA1 CCA2 CCA3 CCA4 CCA5
CCA6 CCA7 CCA8 lambda 0.4859 0.3034
0.03013 0.01050 0.003002 0.002401 0.0005999
4.651e-05 accounted 0.2731 0.4435 0.46046 0.46636
0.468047 0.469397 0.4697336 4.698e-01
CCA9 CA1 CA2 CA3 CA4 CA5
CA6 CA7 lambda 5.707e-06 0.3992 0.2512
0.2046 0.0504 0.02481 0.00662 0.003603 accounted
4.698e-01 0.2243 0.3655 0.4805 0.5088 0.52272
0.52644 0.528463 CA8
CA9 lambda 0.002315 0.0008424 accounted
0.529764 0.5302370
0.4859/1.7795 Proportion of the total
variability accounted for by the first
constrained axis.
22
VIF
env1 env2 env3 env4 env5 env6
env7 env8 1.888859 1.349229 1.491540
1.675699 1.428965 2.180527 1.546446 2.118550
env9 env10 1.467078 2.144085
Recall larger VIF indicates a variable has
redundant information and can be eliminated. VIF
of 1.0 all unique information. As the number
of random environmental variables increased, so
did the percent of variation accounted for by the
constrained ordination. Be careful!!! Recall
that our environmental variables here are random.
CCA maximizes species-environmental correlations.
Permutation test for cca under direct model
Df Chisq F N.Perm Pr(gtF) Model 9
0.8360 0.8859 2000 0.0655 . Residual 9 0.9436
--- Permutation test for
cca under direct model Terms added sequentially
(first to last) Df Chisq F N.Perm
Pr(gtF) env1 1 0.0626 0.5971 1000
0.313 env2 1 0.1284 1.2250 1000 0.089
. env3 1 0.0137 0.1306 1000 0.865
env4 1 0.1338 1.2765 1000 0.066 . env5
1 0.2417 2.3054 1000 0.006 env6
1 0.0162 0.1545 1000 0.833 env7 1
0.1054 1.0056 1000 0.139 env8 1
0.1025 0.9778 1000 0.138 env9 1
0.0231 0.2203 1000 0.760 env10 0
0.0085 Inf 1000 lt0.001 Residual 9
0.9436 Df
Chisq F N.Perm Pr(gtF) CCA1 1 0.4859
4.6346 200 0.115 CCA2 1 0.3034 2.8936
100 0.390 CCA3 1 0.0301 0.2873 100
1.000 CCA4 1 0.0105 0.1002 100
1.000 CCA5 1 0.0030 0.0286 100
1.000 CCA6 1 0.0024 0.0229 100
1.000 CCA7 1 0.0006 0.0057 100
1.000 CCA8 1 4.651e-05 0.0004 100
1.000 CCA9 1 5.707e-06 0.0001 100
1.000 Residual 9 0.9436
23
(No Transcript)
24
CCA AIC model fit
Call cca(formula community env5 env2
env7 env4 env1 env8, data environmental)
Partitioning of mean squared contingency
coefficient Inertia
Proportion Total 1.7795
1.0000 Constrained 0.7887
0.4432 Unconstrained 0.9909
0.5568 Eigenvalues, and their contribution to
the mean squared contingency coefficient
CCA1 CCA2 CCA3 CCA4 CCA5
CCA6 CA1 CA2 lambda 0.4663 0.2922
0.02674 0.00209 0.0009866 0.0003904 0.4194
0.2614 accounted 0.2620 0.4262 0.44125 0.44242
0.4429767 0.4431961 0.2357 0.3825
CA3 CA4 CA5 CA6 CA7 CA8
CA9 lambda 0.2120 0.0522 0.02698 0.009981
0.004913 0.002755 0.001240 accounted 0.5017
0.5310 0.54619 0.551798 0.554559 0.556107
0.556804
The AIC fit model six variables selected that
account for nearly as much variation as all 10
(47 vs 44).
25
(No Transcript)
26
RA Example
Importance of components
PC1 PC2 PC3 PC4 PC5 PC6 PC7
PC8 Standard deviation 0.809 0.680 0.383
0.3692 0.11845 0.09077 0.06398 0.06184 Proportion
of Variance 0.457 0.323 0.102 0.0952 0.00979
0.00575 0.00286 0.00267 Cumulative Proportion
0.457 0.780 0.882 0.9774 0.98723 0.99298 0.99584
0.99851 PC9
PC10 Standard deviation 0.03825
0.02595 Proportion of Variance 0.00102
0.00047 Cumulative Proportion 0.99953 1.00000
Standard PCA and An unconstrained RDA (no
environmental data) produce the same results.
Call rda(X community) Partitioning of
variance Inertia Proportion Total
1.432 1 Unconstrained 1.432
1 Eigenvalues, and their contribution to
the variance PC1 PC2 PC3
PC4 PC5 PC6 PC7 PC8
PC9 lambda 0.6541 0.4628 0.1468 0.1363 0.01403
0.00824 0.004094 0.003824 0.001463 accounted
0.4567 0.7798 0.8822 0.9774 0.98723 0.99298
0.995838 0.998508 0.999530
PC10 lambda 0.0006734 accounted 1.0000000
27
RDA Example
AIC model selected using AIC critera
rda(formula community env5 env2 env1
env7 env4 env12 env8, data
environmental) Inertia Rank Total
1.4323 Constrained 0.7197
7 Unconstrained 0.7126 10 Inertia is variance
Eigenvalues for constrained axes RDA1
RDA2 RDA3 RDA4 RDA5 RDA6
RDA7 0.3907903 0.2590340 0.0659836 0.0016877
0.0012908 0.0007069 0.0002364 Eigenvalues for
unconstrained axes PC1 PC2 PC3
PC4 PC5 PC6 PC7 PC8
0.2978968 0.2056125 0.1157088 0.0668621
0.0123420 0.0072716 0.0036784 0.0019144
PC9 PC10 0.0008962 0.0004091
Permutation test for rda under direct model Terms
added sequentially (first to last) Model
rda(formula community env5 env2 env1
env7 env4 env12 env8, data
environmental) Df Var F N.Perm
Pr(gtF) env5 1 0.209 2.9345 100.000
lt0.01 env2 1 0.087 1.2258 100.000
0.03 env1 1 0.084 1.1729 100.000
0.04 env7 1 0.091 1.2732 100.000
0.04 env4 1 0.092 1.2935 100.000
0.08 . env12 1 0.086 1.2131 100.000
0.04 env8 1 0.070 0.9872 100.000
0.11 Residual 10 0.713
Df Var F N.Perm Pr(gtF)
RDA1 1 0.39 5.4841 200.00 0.00500
RDA2 1 0.26 3.6351 600.00 0.02667
RDA3 1 0.07 0.9260 100.00 0.85000
RDA4 1 0.0016877 0.0237 100.00 1.00000
RDA5 1 0.0012908 0.0181 100.00 1.00000
RDA6 1 0.0007069 0.0099 100.00 1.00000
RDA7 1 0.0002364 0.0033 100.00 1.00000
Residual 10 0.71
28
Example RDA
Write a Comment
User Comments (0)
About PowerShow.com