Title: Cluster Analysis of Abiotic Environmental Characteristics
1Cluster Analysis of Abiotic Environmental
Characteristics
- Sandy Gillespie
- Holly Bernardo
- Nicole Soper Gorden
2The Data - Background
Open Area
Understory
3The Data - Background
- 32 plots (16 understory, 16 open area)
- Set of abiotic characteristics
- Percent canopy cover
- Soil moisture
- Soil pH
- Photosynthetically active radiation (ppf)
4The Data - Background
- Wanted to test how well the designations open
area and understory represent true abiotic
characteristics of the area - In other words, do the abiotic characteristics
cluster plots into those two groups?
5The Data - Screening
- Normality
- Canopy and ppf non-normal in univariate space
- But were working in multivariate space
6The Data - Screening
- Standardization
- Columns in different units!
- Column standardization
- Z-scores scale them all the same
- Outliers
- No univariate outliers
- No multivariate outliers (Bray-Curtis)
7The Data - Screening
- Correlations
- Spearman correlation coefficients
- Canopy and ppf
- Soil moisture and pH
Significant, with Plt0.0001
8The Data Distance Metric
- Chose Euclidean distance
- Continuous data
- No uninhabitable space
- No need to be proportional
- Correlations suggest Mahalanobis distance
- BUT only useful after clusters formed
- Balanced correlations (2 and 2)
9Hierarchal or Non-hierarchal?
- Sample size suggests HC
- 32 plots (small)
- Goals suggest NHC
- Looking for environmental clusters, not ranked
- We did both (exploratory)
10Non-Hierarchal Clustering
- Makes sense even with small of variables
- K-Means Expected of groups
- 1) specify seeds
- 2) all samples are assigned to nearest seed
- 3) compute centroids and variances
- 4) move samples to closest centroid
- 5) new centroids and variances
- Until the variance in each group no longer gets
any smaller - i.e. maximized the within group homogeneity
11Evaluating our choice of 2 clusters
Or maybe 3?
123 clusters
So we ran the analysis both ways
2 clusters
Jaccard bootstrap mean 1 0.9716024 2
0.9242937 3 0.9385516
Jaccard bootstrap mean 1 0.9413424 2
0.9224089
13Box plots of the 2 cluster solution
14Box plots of the 3 cluster solution Despite our
original prediction of 2 groups, going to go with
3
15Polytheic Heirarchical Clustering
16Introduction
- While the purpose of this study was better suited
to Non-Heirarchical clustering, the low number of
entities means that HC might give interesting
insight into relationships between 3 groups
described by NHC
17Clustering strategy
- Goal Maintain original data structure
- Used Average Linkage
- Compared with Wards
- Also tried Diana Divisive clustering
- Reminder Still using Euclidian distance, for
reasons mentioned above
18Results
19Evaluating the Cluster solution
- Average Linkage
- Agglomerative Coefficient 0.8026942
- Cophenetic Correlation 0.8381367
- Wards
- Agglomerative Coefficient 0.9732249
- Cophenetic Correlation 0.779189
- Wards has denser clusters - not surprising
- Average linkage has higher Cophenetic correlation
- better representation of original data
20We decided to use the 3 cluster solution
3 Cluster solution is at obvious elbow
21Cluster Stability for 3 cluster solution
Evaluated using bootstrap
Clusterwise Jaccard bootstrap mean 1 1.0000
0.9975 0.9975 dissolved 1 0 0 0 recovered 1
100 99 99 Clusterwise Jaccard subsetting
mean 1 0.9923687 0.9574524 0.9801667 dissolved
1 0 3 1 recovered 1 99 90 97
22Describing the clusters
23(No Transcript)
24This solution looks great, so wed expect the
same results with Divisive Heirarchical
Clustering, right?
Close, but not quite Divisive Coefficient
0.87 Coph. Corr 0.8138967 - Comparable to our
findings with Average Linkage
2526 looks like a moderate outlier in its canopy
cover, which may explain strange grouping in
Divisive method
26Hierarchal or Non-hierarchal?
- Consistent clustering with both
- HC more common for small sample size
- But small sample size can be used in either
- NHC makes the most sense ecologically
- Data is not hierarchal in nature
27Ecological Importance
Open Area
Understory
- 1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
16
17 18 19 20 21 22
23 24 25 26 27
28 29 30 31 32
28Ecological Importance
- Understory/open area cluster by light
- Understory is broken apart by soil moisture and
pH - Wet acidic
- Dry basic
29Ecological Importance
- 2 initial designations not full picture!
- Microhabitats in understory
- Important to measure several abiotic
characteristics