Verification Evaluating the Quality of Seasonal Forecasts - PowerPoint PPT Presentation

1 / 35

About This Presentation

Title:

Verification Evaluating the Quality of Seasonal Forecasts

Description:

Relative Operating Characteristics (ROC) ... ROC. Easy to understand. Can be mapped spatially. Brier score. Can't map spatially ... – PowerPoint PPT presentation

Number of Views:132

Avg rating:3.0/5.0

Slides: 36

Provided by: joannec3

Category:

more less

Transcript and Presenter's Notes

Title: Verification Evaluating the Quality of Seasonal Forecasts

1
Verification - Evaluating the Quality of Seasonal
Forecasts
2
Acknowledgments

IRI online course material
http//iri.columbia.edu
Robert Fawcett, Andrew Watkins, Phillip Reid,
David Jones (BoM)

3
Verification - Questions

How do we decide whether a forecast was
correct?
How do we decide whether a set of forecasts is
correct often enough to be considered good?
How do we decide if forecasts are valuable?
How can we answer any of these questions when
forecasts are expressed probabilistically?

4
Was a Forecast Correct?
Deterministic Forecast it will rain
tomorrow Verification did it rain? Yes or
no Probabilistic Forecast there is a 50
chance of rain tomorrow Verification the
forecast is correct irrespective of whether it
rains!
5
Verifying Climate (probabilistic) Forecasts
How do we decide whether a forecast was
correct? Unless the probability is 0 or 100, a
probability forecast is always correct. All
possible outcomes are forecasted e.g., 90 chance
of rain implies 10 chance of no rain. However,
the forecast may be too confident or
under-confident this is termed reliability.
Whenever a forecaster says there is a high
probability of rain tomorrow, it should rain more
frequently than when the forecaster says there is
a low probability of rain.
6
Terminology

Validation assessment of hindcast skill
assessment of skill by scoring (cross-validated)
hindcasts
essential for assessing new models and expected
future performance of current models
large sample size, immediate results, know
likely skill of forecasts before issuing to
public ?
possibly gives inflated (or deflated) skill
measures, past may not be a good guide to the
future ?
Tells us how well we would have done in the past

7
Terminology

Verification assessment of forecast skill
assessment of skill by scoring independent
real-time forecasts
used for assessing how forecasts have performed
undertaken for accountability reasons
can be applied across multiple forecast models,
but interpretation may be problematic
forecasts accumulate too slowly to allow
verification to be used as the basis for model
selection.
measures skill of what we provide to the public,
is an accurate and accountable measure of
performance ?
takes many forecasts (years) to obtain reliable
statistics?

8
Climate Prediction 101
Climate prediction is the process of estimating
the PDF (probability distribution function) of a
climate variable, conditional on an external
forcing (e.g., SO, SSTs, greenhouse gasses,
etc)
9
The Basis for Climate Prediction

This model points to three distinct verification
issues
How consistent is the shift in the PDF
(probability distribution function) with
observations?
Are the probabilities reliable
How large are the shifts in the PDF?
Are the forecasts emphatic
How tight is the shifted PDF?
Are the forecasts emphatic
No single skill measure can describe all three of
these aspects, which is why we use a variety of
measures. This is why there are so many skill
measures

10
Desired Characteristics of Forecasts

Probabilities should be reliable.
Reliability is a function of forecast accuracy.
Probabilities should be sharp.
Assuming the forecasts are reliable, sharpness
is a function of predictability or forecast
signal, and relates to skill.

11
Reliability Diagrams
Reliability Diagrams For all forecasts of a given
confidence, identify how often the event occurs.
If the proportion of times that the event occurs
is the same as the forecast probability, the
probabilities are reliable (or well calibrated gt
accurate). A plot of relative frequency of
occurrence against forecast probability will be a
diagonal line if the forecasts are
reliable. Problem large number of forecasts
required, cant map spatially or temporally
12
Rainfall Above/Below median

Reliability data (all Aust. grid points)
34 verified forecasts
1 percent probability bins
Evidence that wet conditions are easier to
forecast

histogram of forecast probabilities
13
Brier Score
Measures the mean-squared error of probability
forecasts. Effectively a Root-Mean-Square error
measure, framed in a probabilistic context. If
an event was forecasted with a probability of
60, and the event occurred, the probability
error is 60 - 100 -40 Brier score2
14
Relative Operating Characteristics (ROC)
Convert probabilistic forecasts to deterministic
forecasts by issuing a warning if the probability
exceeds a threshold minimum. By raising the
threshold less warnings are likely to be issued -
reducing the potential of issuing a false alarm,
but increasing the potential of a miss. By
lowering the threshold more warnings are likely
to be issued - reducing the potential of a miss,
but increasing the potential of a false
alarm. The ROCs curve measures the trade-off
between a correct warning and a false alarm,
across a range of decision thresholds i.e.
between Hits and False Alarms.
15
Relative Operating Characteristics (ROC)
OBSERVED
HIT RATE Hits/(Hits Misses) prob that
event is forewarned FALSE ALARM RATE F.A/(F.A
Correct Rejections) prob warning is made
for a non-event
16
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 50 Number of forecasts 32
Threshold 50
OBSERVED
HIT RATE Hits/(Hits Misses) 7/(710)
0.41 FALSE ALARM RATE FA/(FA CR) 2/(213)
0.13
17
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 10
OBSERVED
HIT RATE Hits/(Hits Misses) 21/(210)
1.0 FALSE ALARM RATE FA/(FA CR) 7/(74)
0.64
18
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 80
OBSERVED
HIT RATE Hits/(Hits Misses) 1/(113)
0.07 FALSE ALARM RATE FA/(FA CR) 1/(117)
0.06
19
Relative Operating Characteristics (ROC)
0
10
20
30
40
50
60
80
70
20
Relative Operating Characteristics (ROC)
21
Relative Operating Characteristics

Advantages
Skill can be mapped in space and time
Weights forecasts equally
Relatively simple to calculate
Disadvantages
Some complexity to understand
Categorical not Probabilistic
ROCS score not intuitive

22
Percent Consistent (Correct Forecast Rate)
For a particular category (e.g., above
median) PC (hits correct rejections)/(total
no. of outlooks) Simply How often did the
outlook favor the eventual outcome?
23
Max Temp above/below median

Correct forecast rates
37 verified forecasts
better than guessing across most of country
correct 2/3 of times across most of eastern
Australia

inset shows validation comparison
37F
24
Percent Correct

Advantages
Very simple to calculate
Simple to understand
Able to map
Disadvantages
May be miss-interpreted. PC does not measure
accuracy.
Categorical not Probabilistic, thereby
encouraging categorical decision making.

25
Linear Error in Probability Space (LEPS)
1.0
Forecast
Observed
0.48
Pf Po
Probability
0.20
0.0
23mm
31mm
Rainfall Value
LEPS 100(1 - Pf - Po)
26
Rainfall above/below median

LEPS skill scores
34 verified forecasts (from JJA 2000)
positive skill across most of country

34F
27
Max/Min Temp above/below median

Australian average LEPS2 scores
max temp solid
min temp dotted
positive averages through most of 2002/03 El Niño
Periods of low skill generally correspond to
periods of low forecast signal

28
Linear Error in Probability Space
(LEPS)

Advantages
Rewards emphatic forecasts
Valid across all categories
Can be mapped
Disadvantages
Complex to understand/not intuitive
Rather difficult to calculate
Penalizes forecast systems which give near
climatologically
probabilities

29
The Value of Forecasts

Just because a forecast is skilful doesnt mean
that it is valuable A forecast only has value if
it leads to a changed decision.

An Idealized Example - Informed use of a
Probability Forecast
Consider a farmer with 100 sheep. It is the end
of winter, and she/he has the option of buying
supplementary feed for summer at 10 a head.
30
Informed use of a Probability Forecast
If the farm receives 100mm of spring rainfall
there will be sufficient pasture and no need for
the extra feed (which will rot). If 100mm of rain
does not fall, however, there will not be
sufficient pasture and hay will have to be bought
over summer at a cost of 20 a head. The
climate forecast is that there is only a 30
chance of receiving at least100mm in spring. What
should the farmer do?
31
Informed use of a Probability Forecast
Definitions C is cost of preventative action
(-1000) L is cost of not taking preventative
action if adverse climate outcome occurs
(-2000) In general if PgtC/L the user should
take preventative action. That is, protection is
optimal when the C/L ratio is less than the
probability of the adverse climate outcome. Buy
feed cost -1000 not buying average cost is
-1400 (0.72000) gt farmer should buy feed
now. The forecast is valuable as it motivates a
decision. Imagine, however the accurate model
which never predicts Pgt0.5 For this decision,
the forecasts would never be valuable! Not all
forecasts are relevant to all decisions.
32
Conclusions
33
Conclusions

Even for deterministic forecasts, there is no
single measure that gives a comprehensive summary
of forecast quality
- accuracy
- skill
- uncertainty
Probabilistic forecasts address the two
fundamental questions
-What is going to happen?
How confident can we be that it is going to
happen?
These aspects require verification.

34
Conclusions
A probability forecast is always correct.
Further, any scoring technique which converts a
probability to a categorical outcome runs the
risk of encouraging in-appropriate decision
making. No single skill score can describe all
aspects of the verification problem. Fortunately,
most skill scores in most situations will tell a
similar story. Forecast Reliability ? Forecast
skill ? Forecast Value
35
Further Information

Wilks, D. S., 1995 Statistical Methods in the
Atmospheric Sciences. Academic Press, San Diego.
Chapter 7, Forecast verification, pp 233283.
Wilks D.S., 2001 A skill score based on economic
value for probability forecasts. Meteor. Appl.,
8, 209-219.
Hartmann et al., 2002. Confidence Builders.
Evaluating Seasonal Forecasts from User
Perspectives. Bull. Amer. Met. Soc., 683-698.
d.jones_at_bom.gov.au or a.watkins_at_bom.gov.au