Jery R' Stedinger presentation

About This Presentation

Transcript and Presenter's Notes

Title: Jery R' Stedinger

1
Regionalization of Statistics Describing the
Distribution of Hydrologic Extremes
SAMSI Workshop 23 January 2008

Jery R. Stedinger
Cornell University
Research with G. Tasker, E. Martins, D. Reis, A.
Gruber, V. Griffis, D.I. Jeong and Y.O. Kim

2
Extreme Value Theory Hydrology

Annual maximum flood may be daily maximum,
or instantaneous maximum.
Annual maximum 24-hour rainfall may be daily
maximum or maximum 1440-minute values.
Annual maximums are not maximum of I.I.D. series
Years have definite wet and dry seasons
Daily values are correlated
Because of El Niño and atmospheric patterns,
some years extreme-event prone, others are not.
Peaks-over-threshold (PDS) another alternative.

3
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

4
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

5
Definitions Product-Moments

Mean, measure of location
µx E X
Variance, measure of spread
?x2 E (X µx )2
Coef. of Skewness, asymmetry
?x E (X µx )3 /?x3

6
Conventional Moment Ratios

Conventional descriptions of shape are
Coefficient of Variation, CV s / m
Coefficients of skewness, g E(X-µ)3/s3
Coefficients of kurtosis, k E(X-µ)4/s4

7
Samples drawn from a Gumbel distribution.
8
L-Moments

An alternative to product moments
now widely used in hydrology.

9
L-Moments an alternative

L-moments can summarize data as do conventional
moments using linear combinations of the ordered
observations.
Because L-moments avoid squaring and cubing the
data, their ratios do not suffer from the severe
bias problems encountered with product moments.
Estimate using order statistics

10
L-Moments an alternative

Let X(in) be ith largest obs. in sample of size
n.
Measure of Scale
expected difference largest and smallest
observations in sample of 2
l2 (1/2) E X(22) - X(12)
Measure of Asymmetry
l3 (1/3) E X(33) - 2 X(23) X(13)
where l3 gt 0 for positively skewed distributions

11
L-Moments an alternative

Measure of Kurtosis
l4 (1/4) E X(44) 3 X(34) 3 X(24)
X(14)
For highly kurtotic distributions, l4 large.
For the uniform distribution l4 0.

12
Dimensionless L-moment ratios

L-moment Coefficient of variation (L-CV)
?? l2/l1 l2/µ
L-moment coef. of skew (L-Skewness)
t3 l3/l2
L-moment coef. of kurtosis (L-Kurtosis)
t4 l4/l2
(Note Hosking calls L-CV ? instead of ??.)

13
Samples drawn from a Gumbel distribution.
14
Samples drawn from a Gumbel distribution.
15
Generalized Extreme Value (GEV) distribution

Gumbel's Type I, II III Extreme Value distr.
F(x) exp 1 (k/a)(x-x)1/k for k ? 0
shape a scale, x location.
Mostly -0.3 lt k 0
Others use for shape ????????.

16
GEV Prob. Density Function
17
GEV Prob. Density Function large x
18
Simple GEV L-Moment Estimators

Using L-moments Hosking, Wallis Wood (1985)
c 2/(?3 3) ln(2)/ln(3) ?3 l3 / l2
then
k 7.8590 c 2.9554 c2 ? ?3 ? 0.5
a k l2 / G(1k ) (1 2-k )
x l1 a G(1k ) 1 / k
Quantiles
xp x (a/k) 1 -ln(p) k
Method of L-moments simple and attractive.

19
Index Flood Methodology

Research has demonstrated potential advantages
of index flood procedures for combining
regional and at-site data to improve the
estimators at individual sites.

20
Hosking and Wallis (1997)Development
ofL-moments for regional flood frequency
analysis.Research done in the 1980-1995
period. J.R.M. Hosking and J.R. Wallis,
Regional Frequency Analysis An Approach Based
on L-moments, Cambridge University Press, 1997.
21
Compute for region average L-CV and L-CS which
yields regional yp
22
Index Flood Methodology

Use data from hydrologically "similar" basins to
estimate a dimensionless flood distribution which
is scaled by at-site sample mean.
"Substitutes Space for Time" by using regional
information to compensate for relatively short
records at each site.
Most of these studies have used the GEV
distribution and L-moments or equivalent.

23
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

24
Trouble with MLEs for GEV
CASE N 15, X GEV(x? 0, a? 1, k? 0.20)
MLE Solution

X0.999
14.9 (true)
6,000,000 (est.)

25
Parameter Estimators for 3-parameter GEV
distribution

Maximum Likelihood (ML)
Method of Moments (MOM)
Method of L-moments (LM)
4. Generalized Maximum Likelihood (GML)
Introduces a prior distribution for k that
ensures estimator
within ( -0.5, 0.5), and encourages values
within (-0.3, 0.1)
Martins, E.S., and J.R. Stedinger, Generalized
Maximum Likelihood GEV quantile estimators for
hydrologic data, Water Resour. Res.. 36(3),
737-744, 2000.
Or can use a penalty to enfore constraint that ?
gt -1
Coles, S.G., and M.J.Dixon, Likelihood-Based
Inference for Extreme Value Models, Extremes 21,
5-23, 1999.

26
Prior distribution on GEV k
27
Performance Alternative Estmators of x0.99 for
GEV distribution, n 25

28
Performance Alternative Estmators of x0.99 for
GEV distribution, n 100

?
29
GEV Estimators

In 1985 when Hosking, Wallis and Wood introduced
L-moment (PWM) estimators for GEV, they were much
better than MLEs and Quantile estimators
In 1998 Madsen and Rosbjerg demonstrated MOM were
not so bad, perhaps better than L-Moments.
Finally in 2000 Martins Stedinger demonstrated
that adding realistic control of GEV shape
parameter k yielded estimators that dominated
competition. This is a distribution with
modest-accuracy regional description of shape
parameter.

30
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

31
Partial Duration or Annual Maximum Series.

by seeing more little floods,
do we know more about big floods ?

32
Partial Duration Series (PDS)Peaks over
threshold (POT)
33
Poisson/Pareto model for PDS

arrival rate for floods gt x0
which follow a Poisson process
G(x) Pr X x for peaks over threshold x gt
x0
is a Generalized Pareto distribution
1 1 - k (x - x0)/a 1/k
Then annual maximums have
Generalized Extreme Value distribution
F(x) exp ( 1 - k (x - x)/a )1/k?
x x0 a(1 l-k)/ k
a a l-k
same ?

34
Which is more precise AMS or PDS?
Consider where estimate only 2 parameter. Fix ?
0, corresponding to Poisson arrivals with
exponential exceendances Share Lynn (1964)
model for flood risk.
35
Poisson Arrivals withExponential Exceedances
(?? 0)
36
Which is more precise AMSGP or PDSGEV ?
RMSE-ratio
Now estimate 3 parameters using PDS data
employing XXX MOM, L-Moments (LM) and
GML with Generalized Pareto distribution and
compare RMSE of PDS-XXX to RMSE of AMS-GMLE GEV
estimator.
37
RMSE 3 PDS estimators vs AMS-GML ? 5
events/year
RMSE-Ratio PDS/AMS-GMLE
-0.3 -0.2 -0.1 0
0.1 0.2 0.3
shape parameter?k
38
RMSE 3 PDS estimators vs AMS-GML k 0.30
RMSE-Ratio PDS/AMS-GMLE
??events per year
39
Conclusions PDS versus AMS
For ? lt 0, with PDS data, again GML quantile
estimators generally better than MOM, LM and
ML. Precision of GML quantile estimators
insensitive to ?? A year of PDS data generally
worth a year of AMS data for estimating 100-year
flood when employing the GMLE estimators of GP
and GEV parameters more little floods do not
tell us about the distribution of large floods.
40
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

41
GLS Regression for Regional Analyses

GOAL
Obtain efficient estimators of the mean, standard
deviation, T-yr flood, or GEV parameters
as a function of physiographic basin
characteristics
and provide the precision of that estimator.
MODEL
logStatistic-of-interest
a b1 log(Area) b2 log(Slope) . . .
Error

42
GLS Analysis Complications

With available records, only obtain sample
estimates of Statistic-of-Interest, denoted yi
Total error ?i?is a combination of
time-sampling-error ?i in sample estimators yi
which are often cross-correlated, and
underlying model error ?i (true lack of fit).
Variance of those errors about prediction X?
depends on statistics-of-interest at each site.

43
GLS for Regionalization

Use Available
record lengths ni,
concurrent record lengths mij,
regional estimates of stan. deviations si, or ?2i
, ?3i and
cross-correlations rij of floods to estimate
variance
cross-correlations of ? describing errors in
i.
With true model error variance ????determine
covariance
matrix L(??) of residual errors
L(??) ?? I ?? ??
where ?( ) is covariance matrix of the estimator

44
GLS Analysis Solution

GLS regression model (Stedinger Tasker, 1985,
1989)
X b e
with parameter estimator b for b
XT L(??)-1 X b XT L(??)-1
Can estimate model-error ?? using moments
( X b)T L(??)-1 ( X b) n - k
L(??) ?? I ?? ?
n dimension of y k dimension of b

45
Likelihood function - model error ??2 (Tibagi
River, Brazil, n17)
Maximum of likelihood may be at zero, but
larger values are very probable. Zero clearly
not in middle of likely range of values. Method
of moments has Same problem zero estimate.
46
Advantages of Bayesian Analysis

Provides posterior distribution of
parameters ?
model error variance ??2, and
predictive distribution for dependent variable

Bayesian Approach is a natural solution to the
problem
47
Bayesian GLS Model

Prior distribution x(?, ??2)
Parameter b are multivariate normal (Q)
Model error variance ??2
Exponential dist. (?) E??2 ? 24
Likelihood function
Assume data is multivariate N X?, ?

48
Quasi-Analytic Bayesian GLS

Joint posterior distribution

Marginal posterior of sd2

where integrate analytically normal likelihood
prior to determine f in closed-form.
49
Example of a posterior of ??2 (Model 1,?Tibagi,
Brazil, n 17)
MM-GLS for sd2 0.000 MLE-GLS for sd2
0.000 Bayesian GLS for sd2 0.046
Model error variance ??2
50
Quasi-Analytic Result
From joint posterior distribution
can compute marginal posterior of b
and moments by 1- dimensional num. integrations
51
Bayesian GLS for Regionalization of Flood
Characteristics in Korea

Dae Il Jeong
Post-doctoral Researcher, Cornell University
Jery R. Stedinger
Professor, Cornell University
Young-Oh Kim
Associate Professor, Seoul National University
Jang Hyun Sung
Graduate Student, Seoul National University

52
Korean River basins
Han River Basin

Land Area 120,000 km2
Major river basins
Han, Nakdong, Geum
Total Annual Precipitation (TAP) 1283mm
Two thirds of TAP occurs
during 3-month flood season (JulSep)
Available sites 31
Average length 22 years

Nakdong River Basin
Geum River Basin
53
Korean Application

Regional estimators of L-CV ?2 and L-CS ?3 for
flood frequency analysis using GEV distribution
6 Explanatory Variables
2 indicators (Han-Nakdong-Geum basins)
logs of drainage area
logs of channel slope
mean precipitation
SD of annual maximum precipitation

54
Cross-correlation concurrent maxima
55
Monte Carlo results forcross-correlation L-CS
estimators GEV when ? -0.3 and t2 0.3
?xy - cross- Corre- lation L-CS estimators
?xy - cross-correlation annual maxima
56
Regression Results L-CV
Standard error in parentheses ( - ) p-value in
brackets - .
57
Performance Measures

Average Variance of Prediction (AVP)
How well model estimates true value of quantity
of
interest on average across sites
Pseudo R2 improvement of GLS(k) versus GLS(0)
Effective Record Length (ERL)
Relative uncertainty of regional estimate
compared to an at-site estimator

58
Regression Results L-CS ?3
Standard error in parentheses ( - ) p-value in
brackets - .
59
Model Diagnostic Measures

Pseudo ANOVA table
Variation explained by regional model
Residual variation due to model errors
Residual variation due sampling errors
Represents partition of TOTAL variation

60
Pseudo ANOVA Table for L-CV and L-CS
We need GLS regression analysis
, where w is the vector ( )
ERL (years) 21 51
61
Conclusion Value in Korea

Regional estimator for L-Coefficient of Variation
should be combined with its at-site estimator
ERL(t2) 21 years average record length (22
yrs)
Regional estimator for L-skewness was more
precise than at-site estimators
ERL(t3) 51 years gt average record length (22
yrs)
Clearly advantageous to use
BOTH regional and at-site information
in analysis of annual maxima.

62
Diagnostic Statistics

Statistics for evaluating data concerns,
precision of predicted values, sources of
variation, and model adequacy
Leverage and Influence
Measures of Prediction Precision
Pseudo R2 and ANOVA
Modeling Diagnostics EVR MBV
Bayesian Plausibility Level

63
Bayesian Hierarchical ModelSolve whole problem
at once?

Assume values for each site i for i 1, , K
Xit GEV( ???????), t 1, , ni
where for parameters we have
?i N(µ???? ??
?i N(µ??????????where perhaps? ?i ???? ?i / ?I
or coef. of variation
?i N(µ??????
with priors on µ???? ? µ????? µ?????
whose values for each site I may depend on
at-site physiographic characteristics of that
site.
Ignores cross-correlations need multivariate
model for K variates?
Beware of special cases and lack of fit.

64
Outline

Summarizing Data Moments and L-moments
Parameter estimation for GEV
Use of a prior on ??
PDS versus AMS with GMLEs
Bayesian GLS Regression for regionalization
Concluding observations

65
Concluding Remarks

GEV distribution used by many water agencies and
countries to describe the distribution of
extremes.
L-moments provide simple estimators, but not
efficient.
Generalized Maximum Likelihood Estimators GMLEs
(modest prior on ?) solve problems with MLEs and
were the most precise.
PDS (GPD-Poisson) no better than AMS (GEV) when
estimating three parameters with GMLE.

66
Final Comments

Regional regression procedures should account for
precision of at-site estimators and their
cross-correlations, as can be done with
Generalized Least Squares regression
Otherwise estimates of model accuracy and of
precision of parameter estimates will be in
error.
When model error variance is small relative to
errors in estimated hydrologic statistics,
Bayesian model error variance
estimator is particularly attractive.

67
Hosking and Wallis (1997)
We can do better than simple index flood
procedures that everywhere use regional
average L-CV ?2 and L-CS ?3 values.
68
Conclusion Applicability of GLS

Developed Bayesian Generalized Least Squares
modeling framework to analyze regional
information addressing distribution parameters
recognizing
Sampling error in at-site estimators as function
of record length, cross-correlation of concurrent
events, and concurrent record lengths, and
regional model error (true precision of regional
model)
Developed regression models for L-CV and L-CS for
Korean annual maximum flood using B-GLS analysis

69
Background ReadingStedinger, J.R., Flood
Frequency Analysis and Statistical Estimation of
Flood Risk, Chapter 12, Inland Flood Hazards
Human, Riparian and Aquatic Communities, E.E.
Wohl (ed.), Cambridge University Press, Stanford,
United Kingdom, 2000. ReferencesHosking, J. R.
M., L-Moments Analysis and Estimation of
Distributions Using Linear Combinations of Order
Statistics, J. of Royal Statistical Society, B,
52(2), 105-124, 1990.Hosking, J.R.M., and
J.R. Wallis, Regional Frequency Analysis An
Approach Based on L-moments, Cambridge University
Press, 1997.Martins, E.S., and J.R. Stedinger,
Generalized Maximum Likelihood GEV quantile
estimators for hydrologic data, Water Resources
Research. 36(3), 737-744, 2000.Martins, E.S.,
and J.R. Stedinger, Generalized Maximum
Likelihood Pareto-Poisson Flood Risk Analysis for
Partial Duration Series, Water Resources
Research.37(10), 2559-2567, 2001.Stedinger, J.
R. , and L. Lu, Appraisal of Regional and Index
Flood Quantile Estimators, Stochastic Hydrology
and Hydraulics, 9(1), 49-75, 1995.
Flood Frequency References
70
GLS References

Griffis, V. W., and J. R. Stedinger, The Use of
GLS Regression in Regional Hydrologic Analyses,
J. of Hydrology, 344(1-2), 82-95, 2007
doi10.1016/j.jhydrol.2007.06.023.
Gruber, Andrea M., Dirceu S. Reis Jr., and Jery
R. Stedinger, Models of Regional Skew Based on
Bayesian GLS Regression, Paper 40927-3285, World
Environ. Water Resour. Conf. - Restoring our
Natural Habitat, K.C. Kabbes editor, Tampa, FL,
May 15-18, 2007.
Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim,
and Jang Hyun Sung, Bayesian GLS for
Regionalization of Flood Characteristics in
Korea, Paper 40927-2736, World Environ. Water
Resour. Conf. - Restoring our Natural Habitat,
Tampa, FL, May 15-18, 2007.
Martins, E.S., and J.R. Stedinger,
Cross-correlation among estimators of shape,
Water Resources Research, 38(11), doi
10.1029/2002WR001589, 26 November 2002.
Reis, D. S., Jr., J. R. Stedinger, and E. S.
Martins, Bayesian generalized least squares
regression with application to log Pearson type 3
regional skew estimation, Water Resour. Res., 41,
W10419, doi10.1029/2004WR003445, 2005.
Stedinger, J.R., and G.D. Tasker, Regional
Hydrologic Analysis, 1. Ordinary, Weighted and
Generalized Least Squares Compared, Water Resour.
Res., 21(9), 1421-1432, 1985.
Tasker, G.D., and J.R. Stedinger, Estimating
Generalized Skew With Weighted Least Squares
Regression, J. of Water Resources Planning and
Management, 112(2), 225-237, 1986.
Tasker, G.D., and J.R. Stedinger, An Operational
GLS Model for Hydrologic Regression, J. of
Hydrology, 111(1-4), 361-375, 1989.

71
Pseudo R2 for GLS
Consider the GLS model

Not interested in total error e that includes
sampling error ?? which cannot explain.
Traditional adjusted R2
How much of critical model error ? can we
explain, where Var? ??(k) for model with k
parameters?

72
Pseudo ANOVA Table

Source Degrees of Freedom Estimator
Model k
Model Error ? n - k - 1
Sampling Error ? n
Total 2n - 1

73
Modeling Diagnostics
Do we need WLS or GLS to correctly analyze this
data?

To evaluate whether OLS might be sufficient
consider the Error Variance Ratio EVR.
If EVR gt 20, then sampling error ???in
estimators of y
are potentially an important fraction of the
observed total error ? ???????.

74
Modeling Diagnostics

EVR gt 20 suggests a need for WLS or GLS.
But when is cross-correlation so large
that a GLS analysis is needed?
Misrepresentation of Beta Variance (MBV)
Describes error made by WLS in its evaluation
of precision of estimator b0 of the constant
term.

75
OLS, WLS and GLS for L-CS
Standard error in parentheses ( - ).

Write a Comment

User Comments (0)

About PowerShow.com

Jery R' Stedinger PowerPoint PPT Presentation