Title: Discovery of Climate Indices Using Clustering
1Discovery of Climate Indices Using Clustering
Michael Steinbach Vipin Kumar University of
Minnesota /AHPCRC Pang-Ning Tan Michigan State
University Chris Potter NASA Ames Research
Center Steven Klooster California State
University, Monterey Bay NASA funded project
Discovery of Changes from the Global Carbon Cycle
and Climate System Using Data Mining Additional
support from Army High Performance Computing
Research Center
2Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results
- Conclusion
3Research Goal
Average Monthly Temperature
- Research Goal
- Find global climate patterns of interest to
Earth Scientists
A key interest is finding connections between the
ocean / atmosphere and the land.
- Global snapshots of values for a number of
variables on land surfaces or water. - Span a range of 10 to 50 years.
- Gridded data
4The El Nino Climate Phenomenon
- El Nino is the anomalous warming of the eastern
tropical region of the Pacific.
Normal Year Trade winds push warm ocean water
west, cool water rises in its place
El Nino Year Trade winds ease, switch direction,
warmest water moves east.
http//www.usatoday.com/weather/tg/wetnino/wetnino
.htm
5The El Nino Climate Phenomenon
6Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results
- Conclusion
7Climate Indices Connecting the Ocean/Atmosphere
and the Land
- A climate index is a time series of temperature
or pressure - Similar to business or economic indices
- Based on Sea Surface Temperature (SST) or Sea
Level Pressure (SLP)
- A climate index is a time series of temperature
or pressure - Similar to business or economic indices
- Based on Sea Surface Temperature (SST) or Sea
Level Pressure (SLP) - Climate indices are important because
- They distill climate variability at a regional or
global scale into a single time series. - They are well-accepted by Earth scientists.
- They are related to well-known climate phenomena
such as El Niño.
Dow Jones Index (from Yahoo)
8A Temperature Based Climate Index NINO12
El Nino Events
Nino 12 Index
9A Pressure Based Climate Index SOI
- The Southern Oscillation Index (SOI) is also
associated with El Niño.
- The Southern Oscillation Index (SOI) is also
associated with El Niño. - Defined as the normalized pressure differences
between Tahiti and Darwin Australia.
- The Southern Oscillation Index (SOI) is also
associated with El Niño. - Defined as the normalized pressure differences
between Tahiti and Darwin Australia. - Both temperature and pressure based indices
capture the same El Nino climate phenomenon.
10List of Well Known Climate Indices
11Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results
- Conclusion
12Discovering Climate Indices Traditional
Approaches
- Earth scientists have discovered currently known
climate indices. - Observation
- The El Nino phenomenon was first noticed by
Peruvian fishermen centuries ago. - They observed that in some years the warm
southward current, which appeared around
Christmas, would persist for an unusually long
time, with a disastrous impact on fishing. - Eigenvalue techniques such as Singular Value
Decomposition (SVD).
13Discovering Climate Indices via SVD
- At a high level, SVD decomposes a matrix into two
sets of patterns - A set of spatial patterns
- A set of temporal patterns
- We applied SVD to the global Sea Surface
Temperature (SST) MATLAB command u s v
svds(z_sst, 30)
14Limitations of SVD
- Eigenvector analysis techniques have some well
known limitations - Components (patterns) must be orthogonal making
physical interpretation difficult. - Stronger patterns tend to hide weaker patterns
- Earth Scientists select the regions of interest
- Requires domain knowledge
15Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results
- Conclusion
16Discovering Climate Indices via Data Mining
- Clustering provides an alternative approach for
finding candidate indices. - Clusters represent ocean regions with relatively
homogeneous behavior. - The centroids of these clusters are time series
that summarize the behavior of these ocean areas,
and thus, represent potential climate indices. - Need to evaluate the influence of potential
indices on land points.
17Shared Nearest Neighbor (SNN) Clustering
- Density based clustering approach
- Determine the density of each point (time series)
- Density is high if most of your neighbors have
you as a neighbor - Perform the clustering using the density
- Identify and eliminate noise and outliers, which
are points with low density. - Identify core points, which are time series with
high density. - Build clusters around the core points.
18SNN Density of SLP Time Series
Redder areas are high density, i.e., high
homogeneity.
19SLP Clusters
25 SLP Clusters
20Influence of Climate Indices on LandArea
Weighted Correlation
- For each grid point, compute the correlation of
the climate index with a time series representing
the temperature at that point. - Use absolute correlation
- For each grid point, compute the correlation of
the climate index with a time series representing
the temperature at that point. - Use absolute correlation
- The area-weighted correlation is the weighted
average of these correlations. - The weights are the areas of the grid points.
- The area of a grid cell varies by latitude.
21Baseline for Area Weighted Correlation
- Need to establish what level of area weighted
correlation is significant - Baseline based on correlation of random time
series to land temperature - Typical values of current indices
22Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results SST-based Clusters
- Conclusion
23SST Clusters
24Evaluating Cluster Centroids as Potential Climate
Indices
- Evaluation will be based on area weighted
correlation - Ignore clusters who area weighted correlation is
low.
SST Clusters With Area Weighted Correlation gt 0.1
25SST Clusters That Reproduce El Nino Indices
75 78 67 94
El Nino Regions Defined by Earth Scientists
26An SST Cluster Moderately Correlated to Known
Indices
Cluster 29 vs. Known El Nino Climate Indices
Nino 12, Nino 3, Nino 3.4, Nino 4, and SOI
27Overview
- Background
- Climate indices
- Techniques for Discovering Climate Indices
- Traditional approaches
- Clustering
- Results SVD vs. Clusters vs. Known Indices
- Conclusion
28SVD Approach
- Found the top 30 SVD components for SST and SLP
29Correlation of Known Indices with SST Cluster
Centroids and SVD Components
SST based cluster centroids have better
correlation to known indices than SVD based
indices in all but one case. Red indicates
higher magnitude of correlation.
30Area-weighted Correlation of Known Indices with
SST Cluster Centroids and SVD Components
SST based cluster centroids have higher
area-weighted correlation than SVD based indices
and known indices in most cases. Red indicates
higher correlation.
31Conclusions and Future Work
- We have demonstrated that clustering is a viable
alternative to eigenvalue based approaches for
the discovery of climate indices. - Can replicate many well-known climate indices
- Have also discovered variants of known indices
that may be better for some regions - Some indices may represent new Earth Science
phenomena - No need for discovered indices to be orthogonal
- No need to pre-select the area to analyze
- We have demonstrated that clustering is a viable
alternative to eigenvalue based approaches for
the discovery of climate indices. - Can replicate many well-known climate indices
- Have also discovered variants of known indices
that may be better for some regions - Some indices may represent new Earth Science
phenomena - No need for discovered indices to be orthogonal
- No need to pre-select the area to analyze
- Future Work
- Investigation of candidate indices by Earth
Scientists - Investigate whether there are climate indices
that cannot be represented by clusters - Noise elimination
- Aggregation
32