Title: Jery R' Stedinger
1Regionalization of Statistics Describing the
Distribution of Hydrologic Extremes
SAMSI Workshop 23 January 2008
- Jery R. Stedinger
- Cornell University
- Research with G. Tasker, E. Martins, D. Reis, A.
Gruber, V. Griffis, D.I. Jeong and Y.O. Kim
2Extreme Value Theory Hydrology
- Annual maximum flood may be daily maximum,
- or instantaneous maximum.
- Annual maximum 24-hour rainfall may be daily
maximum or maximum 1440-minute values. - Annual maximums are not maximum of I.I.D. series
- Years have definite wet and dry seasons
- Daily values are correlated
- Because of El Niño and atmospheric patterns,
- some years extreme-event prone, others are not.
- Peaks-over-threshold (PDS) another alternative.
3Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
4Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
5Definitions Product-Moments
- Mean, measure of location
- µx E X
- Variance, measure of spread
- ?x2 E (X µx )2
- Coef. of Skewness, asymmetry
- ?x E (X µx )3 /?x3
6Conventional Moment Ratios
- Conventional descriptions of shape are
- Coefficient of Variation, CV s / m
- Coefficients of skewness, g E(X-µ)3/s3
- Coefficients of kurtosis, k E(X-µ)4/s4
7Samples drawn from a Gumbel distribution.
8L-Moments
- An alternative to product moments
- now widely used in hydrology.
9L-Moments an alternative
- L-moments can summarize data as do conventional
moments using linear combinations of the ordered
observations. -
- Because L-moments avoid squaring and cubing the
data, their ratios do not suffer from the severe
bias problems encountered with product moments. - Estimate using order statistics
10L-Moments an alternative
- Let X(in) be ith largest obs. in sample of size
n. - Measure of Scale
- expected difference largest and smallest
observations in sample of 2 - l2 (1/2) E X(22) - X(12)
- Measure of Asymmetry
- l3 (1/3) E X(33) - 2 X(23) X(13)
- where l3 gt 0 for positively skewed distributions
11L-Moments an alternative
- Measure of Kurtosis
- l4 (1/4) E X(44) 3 X(34) 3 X(24)
X(14) - For highly kurtotic distributions, l4 large.
- For the uniform distribution l4 0.
12Dimensionless L-moment ratios
- L-moment Coefficient of variation (L-CV)
- ?? l2/l1 l2/µ
- L-moment coef. of skew (L-Skewness)
- t3 l3/l2
- L-moment coef. of kurtosis (L-Kurtosis)
- t4 l4/l2
- (Note Hosking calls L-CV ? instead of ??.)
13Samples drawn from a Gumbel distribution.
14Samples drawn from a Gumbel distribution.
15Generalized Extreme Value (GEV) distribution
- Gumbel's Type I, II III Extreme Value distr.
- F(x) exp 1 (k/a)(x-x)1/k for k ? 0
- shape a scale, x location.
- Mostly -0.3 lt k 0
- Others use for shape ????????.
16GEV Prob. Density Function
17GEV Prob. Density Function large x
18Simple GEV L-Moment Estimators
- Using L-moments Hosking, Wallis Wood (1985)
- c 2/(?3 3) ln(2)/ln(3) ?3 l3 / l2
- then
- k 7.8590 c 2.9554 c2 ? ?3 ? 0.5
- a k l2 / G(1k ) (1 2-k )
- x l1 a G(1k ) 1 / k
- Quantiles
- xp x (a/k) 1 -ln(p) k
- Method of L-moments simple and attractive.
19Index Flood Methodology
- Research has demonstrated potential advantages
of index flood procedures for combining
regional and at-site data to improve the
estimators at individual sites.
20Hosking and Wallis (1997)Development
ofL-moments for regional flood frequency
analysis.Research done in the 1980-1995
period. J.R.M. Hosking and J.R. Wallis,
Regional Frequency Analysis An Approach Based
on L-moments, Cambridge University Press, 1997.
21Compute for region average L-CV and L-CS which
yields regional yp
22Index Flood Methodology
- Use data from hydrologically "similar" basins to
estimate a dimensionless flood distribution which
is scaled by at-site sample mean. - "Substitutes Space for Time" by using regional
information to compensate for relatively short
records at each site. - Most of these studies have used the GEV
distribution and L-moments or equivalent.
23Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
24Trouble with MLEs for GEV
CASE N 15, X GEV(x? 0, a? 1, k? 0.20)
MLE Solution
- X0.999
- 14.9 (true)
- 6,000,000 (est.)
25Parameter Estimators for 3-parameter GEV
distribution
- Maximum Likelihood (ML)
- Method of Moments (MOM)
- Method of L-moments (LM)
- 4. Generalized Maximum Likelihood (GML)
- Introduces a prior distribution for k that
ensures estimator - within ( -0.5, 0.5), and encourages values
within (-0.3, 0.1) - Martins, E.S., and J.R. Stedinger, Generalized
Maximum Likelihood GEV quantile estimators for
hydrologic data, Water Resour. Res.. 36(3),
737-744, 2000. - Or can use a penalty to enfore constraint that ?
gt -1 - Coles, S.G., and M.J.Dixon, Likelihood-Based
Inference for Extreme Value Models, Extremes 21,
5-23, 1999.
26Prior distribution on GEV k
27Performance Alternative Estmators of x0.99 for
GEV distribution, n 25
28Performance Alternative Estmators of x0.99 for
GEV distribution, n 100
?
29GEV Estimators
- In 1985 when Hosking, Wallis and Wood introduced
L-moment (PWM) estimators for GEV, they were much
better than MLEs and Quantile estimators - In 1998 Madsen and Rosbjerg demonstrated MOM were
not so bad, perhaps better than L-Moments. - Finally in 2000 Martins Stedinger demonstrated
that adding realistic control of GEV shape
parameter k yielded estimators that dominated
competition. This is a distribution with
modest-accuracy regional description of shape
parameter.
30Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
31Partial Duration or Annual Maximum Series.
- by seeing more little floods,
- do we know more about big floods ?
32Partial Duration Series (PDS)Peaks over
threshold (POT)
33Poisson/Pareto model for PDS
- arrival rate for floods gt x0
- which follow a Poisson process
- G(x) Pr X x for peaks over threshold x gt
x0 - is a Generalized Pareto distribution
- 1 1 - k (x - x0)/a 1/k
- Then annual maximums have
- Generalized Extreme Value distribution
- F(x) exp ( 1 - k (x - x)/a )1/k?
- x x0 a(1 l-k)/ k
- a a l-k
- same ?
34Which is more precise AMS or PDS?
Consider where estimate only 2 parameter. Fix ?
0, corresponding to Poisson arrivals with
exponential exceendances Share Lynn (1964)
model for flood risk.
35Poisson Arrivals withExponential Exceedances
(?? 0)
36Which is more precise AMSGP or PDSGEV ?
RMSE-ratio
Now estimate 3 parameters using PDS data
employing XXX MOM, L-Moments (LM) and
GML with Generalized Pareto distribution and
compare RMSE of PDS-XXX to RMSE of AMS-GMLE GEV
estimator.
37RMSE 3 PDS estimators vs AMS-GML ? 5
events/year
RMSE-Ratio PDS/AMS-GMLE
-0.3 -0.2 -0.1 0
0.1 0.2 0.3
shape parameter?k
38RMSE 3 PDS estimators vs AMS-GML k 0.30
RMSE-Ratio PDS/AMS-GMLE
??events per year
39Conclusions PDS versus AMS
For ? lt 0, with PDS data, again GML quantile
estimators generally better than MOM, LM and
ML. Precision of GML quantile estimators
insensitive to ?? A year of PDS data generally
worth a year of AMS data for estimating 100-year
flood when employing the GMLE estimators of GP
and GEV parameters more little floods do not
tell us about the distribution of large floods.
40Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
41GLS Regression for Regional Analyses
- GOAL
- Obtain efficient estimators of the mean, standard
deviation, T-yr flood, or GEV parameters - as a function of physiographic basin
characteristics - and provide the precision of that estimator.
- MODEL
- logStatistic-of-interest
- a b1 log(Area) b2 log(Slope) . . .
Error
42GLS Analysis Complications
- With available records, only obtain sample
estimates of Statistic-of-Interest, denoted yi - Total error ?i?is a combination of
- time-sampling-error ?i in sample estimators yi
which are often cross-correlated, and - underlying model error ?i (true lack of fit).
- Variance of those errors about prediction X?
depends on statistics-of-interest at each site.
43GLS for Regionalization
- Use Available
- record lengths ni,
- concurrent record lengths mij,
- regional estimates of stan. deviations si, or ?2i
, ?3i and - cross-correlations rij of floods to estimate
variance - cross-correlations of ? describing errors in
i. - With true model error variance ????determine
covariance - matrix L(??) of residual errors
- L(??) ?? I ?? ??
- where ?( ) is covariance matrix of the estimator
44GLS Analysis Solution
- GLS regression model (Stedinger Tasker, 1985,
1989) - X b e
- with parameter estimator b for b
- XT L(??)-1 X b XT L(??)-1
- Can estimate model-error ?? using moments
- ( X b)T L(??)-1 ( X b) n - k
- L(??) ?? I ?? ?
- n dimension of y k dimension of b
45Likelihood function - model error ??2 (Tibagi
River, Brazil, n17)
Maximum of likelihood may be at zero, but
larger values are very probable. Zero clearly
not in middle of likely range of values. Method
of moments has Same problem zero estimate.
46Advantages of Bayesian Analysis
- Provides posterior distribution of
- parameters ?
- model error variance ??2, and
- predictive distribution for dependent variable
Bayesian Approach is a natural solution to the
problem
47Bayesian GLS Model
- Prior distribution x(?, ??2)
- Parameter b are multivariate normal (Q)
- Model error variance ??2
- Exponential dist. (?) E??2 ? 24
- Likelihood function
- Assume data is multivariate N X?, ?
48Quasi-Analytic Bayesian GLS
- Joint posterior distribution
- Marginal posterior of sd2
where integrate analytically normal likelihood
prior to determine f in closed-form.
49Example of a posterior of ??2 (Model 1,?Tibagi,
Brazil, n 17)
MM-GLS for sd2 0.000 MLE-GLS for sd2
0.000 Bayesian GLS for sd2 0.046
Model error variance ??2
50Quasi-Analytic Result
From joint posterior distribution
can compute marginal posterior of b
and moments by 1- dimensional num. integrations
51Bayesian GLS for Regionalization of Flood
Characteristics in Korea
- Dae Il Jeong
- Post-doctoral Researcher, Cornell University
- Jery R. Stedinger
- Professor, Cornell University
- Young-Oh Kim
- Associate Professor, Seoul National University
- Jang Hyun Sung
- Graduate Student, Seoul National University
52Korean River basins
Han River Basin
- Land Area 120,000 km2
- Major river basins
- Han, Nakdong, Geum
- Total Annual Precipitation (TAP) 1283mm
- Two thirds of TAP occurs
- during 3-month flood season (JulSep)
- Available sites 31
- Average length 22 years
Nakdong River Basin
Geum River Basin
53Korean Application
- Regional estimators of L-CV ?2 and L-CS ?3 for
- flood frequency analysis using GEV distribution
- 6 Explanatory Variables
- 2 indicators (Han-Nakdong-Geum basins)
- logs of drainage area
- logs of channel slope
- mean precipitation
- SD of annual maximum precipitation
54Cross-correlation concurrent maxima
55Monte Carlo results forcross-correlation L-CS
estimators GEV when ? -0.3 and t2 0.3
?xy - cross- Corre- lation L-CS estimators
?xy - cross-correlation annual maxima
56Regression Results L-CV
Standard error in parentheses ( - ) p-value in
brackets - .
57Performance Measures
- Average Variance of Prediction (AVP)
- How well model estimates true value of quantity
of - interest on average across sites
-
- Pseudo R2 improvement of GLS(k) versus GLS(0)
- Effective Record Length (ERL)
- Relative uncertainty of regional estimate
compared to an at-site estimator
58Regression Results L-CS ?3
Standard error in parentheses ( - ) p-value in
brackets - .
59Model Diagnostic Measures
- Pseudo ANOVA table
- Variation explained by regional model
- Residual variation due to model errors
- Residual variation due sampling errors
- Represents partition of TOTAL variation
60Pseudo ANOVA Table for L-CV and L-CS
We need GLS regression analysis
, where w is the vector ( )
ERL (years) 21 51
61Conclusion Value in Korea
- Regional estimator for L-Coefficient of Variation
should be combined with its at-site estimator - ERL(t2) 21 years average record length (22
yrs) - Regional estimator for L-skewness was more
precise than at-site estimators - ERL(t3) 51 years gt average record length (22
yrs) - Clearly advantageous to use
- BOTH regional and at-site information
- in analysis of annual maxima.
62Diagnostic Statistics
- Statistics for evaluating data concerns,
precision of predicted values, sources of
variation, and model adequacy - Leverage and Influence
- Measures of Prediction Precision
- Pseudo R2 and ANOVA
- Modeling Diagnostics EVR MBV
- Bayesian Plausibility Level
63Bayesian Hierarchical ModelSolve whole problem
at once?
- Assume values for each site i for i 1, , K
- Xit GEV( ???????), t 1, , ni
- where for parameters we have
- ?i N(µ???? ??
- ?i N(µ??????????where perhaps? ?i ???? ?i / ?I
or coef. of variation - ?i N(µ??????
- with priors on µ???? ? µ????? µ?????
- whose values for each site I may depend on
at-site physiographic characteristics of that
site. - Ignores cross-correlations need multivariate
model for K variates? - Beware of special cases and lack of fit.
64Outline
- Summarizing Data Moments and L-moments
- Parameter estimation for GEV
- Use of a prior on ??
- PDS versus AMS with GMLEs
- Bayesian GLS Regression for regionalization
- Concluding observations
65Concluding Remarks
- GEV distribution used by many water agencies and
countries to describe the distribution of
extremes. - L-moments provide simple estimators, but not
efficient. - Generalized Maximum Likelihood Estimators GMLEs
- (modest prior on ?) solve problems with MLEs and
were the most precise. - PDS (GPD-Poisson) no better than AMS (GEV) when
estimating three parameters with GMLE.
66Final Comments
- Regional regression procedures should account for
precision of at-site estimators and their
cross-correlations, as can be done with - Generalized Least Squares regression
- Otherwise estimates of model accuracy and of
precision of parameter estimates will be in
error. - When model error variance is small relative to
errors in estimated hydrologic statistics, - Bayesian model error variance
- estimator is particularly attractive.
67Hosking and Wallis (1997)
We can do better than simple index flood
procedures that everywhere use regional
average L-CV ?2 and L-CS ?3 values.
68Conclusion Applicability of GLS
- Developed Bayesian Generalized Least Squares
modeling framework to analyze regional
information addressing distribution parameters
recognizing - Sampling error in at-site estimators as function
of record length, cross-correlation of concurrent
events, and concurrent record lengths, and - regional model error (true precision of regional
model) - Developed regression models for L-CV and L-CS for
Korean annual maximum flood using B-GLS analysis
69Background ReadingStedinger, J.R., Flood
Frequency Analysis and Statistical Estimation of
Flood Risk, Chapter 12, Inland Flood Hazards
Human, Riparian and Aquatic Communities, E.E.
Wohl (ed.), Cambridge University Press, Stanford,
United Kingdom, 2000. ReferencesHosking, J. R.
M., L-Moments Analysis and Estimation of
Distributions Using Linear Combinations of Order
Statistics, J. of Royal Statistical Society, B,
52(2), 105-124, 1990.Hosking, J.R.M., and
J.R. Wallis, Regional Frequency Analysis An
Approach Based on L-moments, Cambridge University
Press, 1997.Martins, E.S., and J.R. Stedinger,
Generalized Maximum Likelihood GEV quantile
estimators for hydrologic data, Water Resources
Research. 36(3), 737-744, 2000.Martins, E.S.,
and J.R. Stedinger, Generalized Maximum
Likelihood Pareto-Poisson Flood Risk Analysis for
Partial Duration Series, Water Resources
Research.37(10), 2559-2567, 2001.Stedinger, J.
R. , and L. Lu, Appraisal of Regional and Index
Flood Quantile Estimators, Stochastic Hydrology
and Hydraulics, 9(1), 49-75, 1995.
Flood Frequency References
70GLS References
- Griffis, V. W., and J. R. Stedinger, The Use of
GLS Regression in Regional Hydrologic Analyses,
J. of Hydrology, 344(1-2), 82-95, 2007
doi10.1016/j.jhydrol.2007.06.023. - Gruber, Andrea M., Dirceu S. Reis Jr., and Jery
R. Stedinger, Models of Regional Skew Based on
Bayesian GLS Regression, Paper 40927-3285, World
Environ. Water Resour. Conf. - Restoring our
Natural Habitat, K.C. Kabbes editor, Tampa, FL,
May 15-18, 2007. - Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim,
and Jang Hyun Sung, Bayesian GLS for
Regionalization of Flood Characteristics in
Korea, Paper 40927-2736, World Environ. Water
Resour. Conf. - Restoring our Natural Habitat,
Tampa, FL, May 15-18, 2007. - Martins, E.S., and J.R. Stedinger,
Cross-correlation among estimators of shape,
Water Resources Research, 38(11), doi
10.1029/2002WR001589, 26 November 2002. - Reis, D. S., Jr., J. R. Stedinger, and E. S.
Martins, Bayesian generalized least squares
regression with application to log Pearson type 3
regional skew estimation, Water Resour. Res., 41,
W10419, doi10.1029/2004WR003445, 2005. - Stedinger, J.R., and G.D. Tasker, Regional
Hydrologic Analysis, 1. Ordinary, Weighted and
Generalized Least Squares Compared, Water Resour.
Res., 21(9), 1421-1432, 1985. - Tasker, G.D., and J.R. Stedinger, Estimating
Generalized Skew With Weighted Least Squares
Regression, J. of Water Resources Planning and
Management, 112(2), 225-237, 1986. - Tasker, G.D., and J.R. Stedinger, An Operational
GLS Model for Hydrologic Regression, J. of
Hydrology, 111(1-4), 361-375, 1989.
71Pseudo R2 for GLS
Consider the GLS model
- Not interested in total error e that includes
sampling error ?? which cannot explain. -
- Traditional adjusted R2
- How much of critical model error ? can we
explain, where Var? ??(k) for model with k
parameters?
72Pseudo ANOVA Table
- Source Degrees of Freedom Estimator
- Model k
- Model Error ? n - k - 1
- Sampling Error ? n
- Total 2n - 1
73Modeling Diagnostics
Do we need WLS or GLS to correctly analyze this
data?
- To evaluate whether OLS might be sufficient
- consider the Error Variance Ratio EVR.
- If EVR gt 20, then sampling error ???in
estimators of y - are potentially an important fraction of the
observed total error ? ???????.
74Modeling Diagnostics
- EVR gt 20 suggests a need for WLS or GLS.
- But when is cross-correlation so large
- that a GLS analysis is needed?
- Misrepresentation of Beta Variance (MBV)
- Describes error made by WLS in its evaluation
- of precision of estimator b0 of the constant
term.
75OLS, WLS and GLS for L-CS
Standard error in parentheses ( - ).