Title: Best Practices vs. Misuse of PCA in the Analysis of Climate Variability
1Best Practices vs. Misuse of PCA in the Analysis
of Climate Variability
Bob Livezey Climate Services /Office of
Services/NWS/NOAA 30th Climate Diagnostics and
Prediction Workshop State College, PA, October
26, 2005
2Outline
- Motivation, take-home messages and references
- Preprocessing considerations
- S-mode example Mathematics, characteristics,
interpretation, testing, and truncation - Rotation Benefits and truncation considerations
- Conclusions
3Eigenvector-BasedLinear Techniques
- Dealing simultaneously with many time series
- Principal Component Analysis (PCA) efficient
representation of the information in multiple
time series (time series of gridded maps) - Rotation linear transformation of PCA and other
eigenvector based methods to improve the
representation - Canonical Correlation Analysis (CCA) one of the
better ways to efficiently represent linearly the
relationships between two different time series
of gridded maps (say 500 mb heights and surface
temperatures).
4Take-Home Messages
- PCA is an extremely useful linear tool for data
compression, orthogonalization, and filtering - PCA results are mathematical and (for even the
first mode) dont necessarily have to have
physical relevance - Even when the first mode has physical relevance
its representation may be flawed (e.g. the
Arctic Oscillation) - PCA results can be critically impacted by choices
of domain, grid, scaling, etc. - Effective PC truncation requires insight and
experimentation - Rotation can enhance physical relevance and
reduce sampling variability - Under- and over-rotation can negate these gains
- Just because an area on a map has a closed
loading contour doesnt make it part of a
dipole or tripole
5REFERENCES FOR BASIC PCA AND RPCA
- Barnston, A. G., and R. E. Livezey, 1987
Classification, seasonality, and persistence of
low frequency atmospheric circulation patterns.
Mon. Wea. Rev., 115, 1083-1126. - Huth, R., 2006 The effect of various
methodological options on the detection of
leading modes of sea level pressure variability.
Tellus, under revision. - Jolliffe, I. T., 1995 Rotation of principal
components choice of normalization constraints.
J. Appl. Statistics, 22, 29-35. - Livezey, R. E., and T. M. Smith, 1999b
Considerations for use of the Barnett and
Preisendorfer (1987) algorithm for canonical
correlation analysis of climate variations. J.
Climate, 12, 303-305. - North, G. R., T. L. Bell, and R. F. Cahalan,
1982 Sampling errors in the estimation of
empirical orthogonal functions. Mon. Wea. Rev.,
110, 699-706. - O'Lenic, E., and R. E. Livezey , 1988 Practical
considerations in the use of rotated principal
components analysis (RPCA) in diagnostic studies
of upper_air height fields. Mon. Wea. Rev., 116,
1682-1689. - Richman, M. B., 1986 Rotation of principal
components. J. Climatology, 6, 293-335. - Richman, M. B., and P. J. Lamb, 1985 Climatic
pattern analysis of 3- and 7-day summer rainfall
in the central United States Some
methodological considerations and a
regionalization. J. Clim. Appl. Meteor., 24,
1325-1343.
6Preparing Data
- 1. Preprocessing often has major impact on
results and their interpretation. - 2. PCA results are inherently domain dependent
as I - will illustrate later.
- 3. Standardization means each record has equal
weight in variance-based multivariate analyses
ie high latitudes vs tropics, January vs.
November. - If this is desirable then PCA should be based
on the correlation matrix, if not desirable then
the covariance matrix.
7Preparing Data
- 4. PCA should be performed on as narrow a window
in the seasonal cycle as sample considerations
permit to avoid mixing inhomogeneous climates
(like the January vs. November example in 3
above). - 5. Area averaged or gridded data often must be
weighted in in multivariate analyses - Smaller areas can influence results as much as
larger - On lat/lon grids density of points (and
influence) increase with latitude.
8Preparing Data
- 5. Two ways to treat the problem
- Create an approximate equal area representation
(ie CPC megadivisions, Barnston and Livezey,
1987, grid) - Weight the data generally proportional to the
square root of the area.
9Preparing Data
- 5 . If weights are needed and PCA on the
correlation matrix is the objective, then
standardization should be performed before
weighting and then the covariance matrix formed.
Otherwise weights are removed in the
standardization step.
10Preparing Data
-
- 6. In EPCA (see below), CCA, etc. maps of
variables with greater numbers of data points
will have disproportionate influence on the
results unless the maps are weighted, ie
proportionately to the square root of the ratio
of the total variance in all variables to the
total variance in the weighted variable (see
Livezey and Smith, 1999b).
11Principal Component Analysis
- Used principally for data compression and
filtering, often as first step to other analyses
direct physical interpretation VERY limited. - The form most commonly used in climate studies
(S-mode) starts with n (t 1,,n) maps or groups
of maps z with m data points x and the
period-of-record means removed z(x,t). - The maps are decomposed into a linear combination
of map patterns the first pattern explains the
most variance, the second is orthogonal to the
first and explains the second most variance, etc.
12Principal Component Analysis
- Nsmaller(m,n),
- z(x,t) Original maps, linear combinations of
fixed patterns ei(x) with time-dependent weights
ai(t) - ai(t) Principal component scores (time series),
the projections of the maps onto the eigenvectors - ei(x) Principal component loadings (map
patterns), also eigenvectors of the covariance
matrix of z. - ?i Eigenvalues of the covariance matrix of z.
-
-
13Principal Component Analysis
- 4. Example of first four patterns of 3-day
precipitation for May-August over the central US
(Richman and Lamb, 1985). The sequence of
patterns is seen repeatedly in other analyses
and can be considered an artifact of the geometry
of PCA
14Principal Component Analysis
- All of the patterns (the es) are orthogonal and
the leading ones reflect the data points with the
most variance. The eigenvalues give these
variances the first four for the Richman and
Lamb patterns are 11.13, 9.33, 5.55, and
4.54. - Usually (always when the PCA is on the
correlation matrix) the numbers on the maps are
correlations of the original data series with the
corresponding scores, thus their squares
represent explained variance. Thus in the latter
context -
- (a) a point with 0.5 is more than 6 times more
important than a point with 0.2, a point with 0.8
more than 7 times more important than one with
0.3, etc. -
- (b) summations of the squares over the maps give
the total variances listed in 5 above - (c) comparing the squared central values within
closed contours allows practical discrimination
between monopoles, dipoles, etc.
15Principal Component Analysis
- 7. The time series that go with the patterns (the
as) are uncorrelated (i.e. not collinear), so
they are desirable for multiple linear
regression. - 8. To compress or filter the data some of the
patterns must be thrown out, i.e. the series must
be truncated this is an ART (see OLenic and
Livezey, 1988 for the best approach I know). - In these applications over-truncation (throwing
baby out with the bath water) is of far more
concern than under-truncation (retention of some
noise). As a pre-step for rotation, CCA, etc.,
both should be of concern (see below).
16Principal Component Analysis
- 9. Physical interpretation of other than the
leading PC pattern is usually unwarranted, and
this is often the case for the first as well.
Richman (1986) shows this for the example in two
ways. First he splits the domain in two and does
separate PCA on each. Heres the result for the
first PCA mode. Note that the first mode for the
southern domain (a monopole covering the domain)
is not reproduced in the full domain analysis
17Principal Component Analysis
- Next he computes the one-point teleconnection
pattern for the largest loading on each pattern.
Heres the result for the second PCA mode. The
PCA mode is a dipole, the teleconnection pattern
(reflecting the physical covariance structure
around the point) a monopole
18Principal Component Analysis
- 10. The North et al. (1982) Test is to determine
whether two consecutive patterns can be
reasonably interpreted as distinct patterns or
separate signals. It assumes the n samples are
independent (heuristically adjust downward for
dependence) - 10. Other kinds of PCA
- Combined (CPCA) more than one mapped variable
- Extended (EPCA) group of maps of same variable
at different lags to capture pattern evolution
(MSSA is a variant) - Rotated (RPCA) to reduce sampling error and
improve physical representiveness.
19Rotation
- Rotation, ie the linear transformation of a
truncated set of patterns (Richman, 1986), should
be considered in many problems when patterns with
minimum sampling variability, little domain
dependence, and increased physical relevance are
needed. - 2. Note the robustness of rotated patterns in
Richmans split domain example (all patterns are
present in both analyses) -
-
20Rotation
- Now compare rotated mode 2 and its corresponding
teleconnection pattern (both are monopoles with
similar scales)
21Rotation
- 3. Barnston and Livezey (1987) compared 120
monthly 700 mb height PCA and RPCA patterns
with their corresponding one-point
teleconnection patterns the average pattern
correlation was 0.69 and 0.90 respectively.
They also used sensitivity tests to demonstrate
dramatic reductions in sampling error. -
22Barnston and Livezey (1987) RPCA Patterns
Pacific North America
North Atlantic Oscillation (a dipole!)
Western Pacific Oscillation
Tropical Northern Hemisphere
23Rotation
- 4. The most likely reason for the success of
rotation is the relaxation of the geometrical
and mathematical constraints on the analysis, ie
the data can speak more for itself. - In a commonly used variant of varimax where the
eigenvectors are weighted by the square root of
the eigenvalue the resulting patterns do not
have to be orthogonal and the resulting time
series do not have to be independent (Jolliffe,
1995). -
24Under- and Over-Rotation
- 5. Under-rotation (truncation of too many modes)
can result in discarded signal while
over-rotation (truncation of too few) can result
in over-regionalization of signals (see Olenic
and Livezey, 1988). -
- Map (a) here is a dipole but (b)and (c) are
monopoles.
25Conclusions
- PCA is an extremely useful linear tool for data
compression, orthogonalization, and filtering - PCA results are mathematical and (for even the
first mode) dont necessarily have to have
physical relevance - Even when the first mode has physical relevance
its representation may be flawed (e.g. the
Arctic Oscillation) - PCA results can be critically impacted by choices
of domain, grid, scaling, etc. - Effective PC truncation requires insight and
experimentation - Rotation can enhance physical relevance and
reduce sampling variability - Under- and over-rotation can negate these gains
- Just because an area on a map has a closed
loading contour doesnt make it part of a
dipole or tripole