Verification Evaluating the Quality of Seasonal Forecasts - PowerPoint PPT Presentation

1 / 35
About This Presentation
Title:

Verification Evaluating the Quality of Seasonal Forecasts

Description:

Relative Operating Characteristics (ROC) ... ROC. Easy to understand. Can be mapped spatially. Brier score. Can't map spatially ... – PowerPoint PPT presentation

Number of Views:132
Avg rating:3.0/5.0
Slides: 36
Provided by: joannec3
Category:

less

Transcript and Presenter's Notes

Title: Verification Evaluating the Quality of Seasonal Forecasts


1
Verification - Evaluating the Quality of Seasonal
Forecasts
2
Acknowledgments
  • IRI online course material
  • http//iri.columbia.edu
  • Robert Fawcett, Andrew Watkins, Phillip Reid,
    David Jones (BoM)

3
Verification - Questions
  • How do we decide whether a forecast was
    correct?
  • How do we decide whether a set of forecasts is
    correct often enough to be considered good?
  • How do we decide if forecasts are valuable?
  • How can we answer any of these questions when
    forecasts are expressed probabilistically?

4
Was a Forecast Correct?
Deterministic Forecast it will rain
tomorrow Verification did it rain? Yes or
no Probabilistic Forecast there is a 50
chance of rain tomorrow Verification the
forecast is correct irrespective of whether it
rains!
5
Verifying Climate (probabilistic) Forecasts
How do we decide whether a forecast was
correct? Unless the probability is 0 or 100, a
probability forecast is always correct. All
possible outcomes are forecasted e.g., 90 chance
of rain implies 10 chance of no rain. However,
the forecast may be too confident or
under-confident this is termed reliability.
Whenever a forecaster says there is a high
probability of rain tomorrow, it should rain more
frequently than when the forecaster says there is
a low probability of rain.
6
Terminology
  • Validation assessment of hindcast skill
  • assessment of skill by scoring (cross-validated)
    hindcasts
  • essential for assessing new models and expected
    future performance of current models
  • large sample size, immediate results, know
    likely skill of forecasts before issuing to
    public ?
  • possibly gives inflated (or deflated) skill
    measures, past may not be a good guide to the
    future ?
  • Tells us how well we would have done in the past

7
Terminology
  • Verification assessment of forecast skill
  • assessment of skill by scoring independent
    real-time forecasts
  • used for assessing how forecasts have performed
  • undertaken for accountability reasons
  • can be applied across multiple forecast models,
    but interpretation may be problematic
  • forecasts accumulate too slowly to allow
    verification to be used as the basis for model
    selection.
  • measures skill of what we provide to the public,
    is an accurate and accountable measure of
    performance ?
  • takes many forecasts (years) to obtain reliable
    statistics?

8
Climate Prediction 101
Climate prediction is the process of estimating
the PDF (probability distribution function) of a
climate variable, conditional on an external
forcing (e.g., SO, SSTs, greenhouse gasses,
etc)
9
The Basis for Climate Prediction
  • This model points to three distinct verification
    issues
  • How consistent is the shift in the PDF
    (probability distribution function) with
    observations?
  • Are the probabilities reliable
  • How large are the shifts in the PDF?
  • Are the forecasts emphatic
  • How tight is the shifted PDF?
  • Are the forecasts emphatic
  • No single skill measure can describe all three of
    these aspects, which is why we use a variety of
    measures. This is why there are so many skill
    measures

10
Desired Characteristics of Forecasts
  • Probabilities should be reliable.
  • Reliability is a function of forecast accuracy.
  • Probabilities should be sharp.
  • Assuming the forecasts are reliable, sharpness
    is a function of predictability or forecast
    signal, and relates to skill.

11
Reliability Diagrams
Reliability Diagrams For all forecasts of a given
confidence, identify how often the event occurs.
If the proportion of times that the event occurs
is the same as the forecast probability, the
probabilities are reliable (or well calibrated gt
accurate). A plot of relative frequency of
occurrence against forecast probability will be a
diagonal line if the forecasts are
reliable. Problem large number of forecasts
required, cant map spatially or temporally
12
Rainfall Above/Below median
  • Reliability data (all Aust. grid points)
  • 34 verified forecasts
  • 1 percent probability bins
  • Evidence that wet conditions are easier to
    forecast

histogram of forecast probabilities
13
Brier Score
Measures the mean-squared error of probability
forecasts. Effectively a Root-Mean-Square error
measure, framed in a probabilistic context. If
an event was forecasted with a probability of
60, and the event occurred, the probability
error is 60 - 100 -40 Brier score2
14
Relative Operating Characteristics (ROC)
Convert probabilistic forecasts to deterministic
forecasts by issuing a warning if the probability
exceeds a threshold minimum. By raising the
threshold less warnings are likely to be issued -
reducing the potential of issuing a false alarm,
but increasing the potential of a miss. By
lowering the threshold more warnings are likely
to be issued - reducing the potential of a miss,
but increasing the potential of a false
alarm. The ROCs curve measures the trade-off
between a correct warning and a false alarm,
across a range of decision thresholds i.e.
between Hits and False Alarms.
15
Relative Operating Characteristics (ROC)
OBSERVED
HIT RATE Hits/(Hits Misses) prob that
event is forewarned FALSE ALARM RATE F.A/(F.A
Correct Rejections) prob warning is made
for a non-event
16
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 50 Number of forecasts 32
Threshold 50
OBSERVED
HIT RATE Hits/(Hits Misses) 7/(710)
0.41 FALSE ALARM RATE FA/(FA CR) 2/(213)
0.13
17
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 10
OBSERVED
HIT RATE Hits/(Hits Misses) 21/(210)
1.0 FALSE ALARM RATE FA/(FA CR) 7/(74)
0.64
18
Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 80
OBSERVED
HIT RATE Hits/(Hits Misses) 1/(113)
0.07 FALSE ALARM RATE FA/(FA CR) 1/(117)
0.06
19
Relative Operating Characteristics (ROC)
0
10
20
30
40
50
60
80
70
20
Relative Operating Characteristics (ROC)
21
Relative Operating Characteristics
  • Advantages
  • Skill can be mapped in space and time
  • Weights forecasts equally
  • Relatively simple to calculate
  • Disadvantages
  • Some complexity to understand
  • Categorical not Probabilistic
  • ROCS score not intuitive

22
Percent Consistent (Correct Forecast Rate)
For a particular category (e.g., above
median) PC (hits correct rejections)/(total
no. of outlooks) Simply How often did the
outlook favor the eventual outcome?
23
Max Temp above/below median
  • Correct forecast rates
  • 37 verified forecasts
  • better than guessing across most of country
  • correct 2/3 of times across most of eastern
    Australia

inset shows validation comparison
37F
24
Percent Correct
  • Advantages
  • Very simple to calculate
  • Simple to understand
  • Able to map
  • Disadvantages
  • May be miss-interpreted. PC does not measure
    accuracy.
  • Categorical not Probabilistic, thereby
    encouraging categorical decision making.

25
Linear Error in Probability Space (LEPS)
1.0
Forecast
Observed
0.48
Pf Po
Probability
0.20
0.0
23mm
31mm
Rainfall Value
LEPS 100(1 - Pf - Po)
26
Rainfall above/below median
  • LEPS skill scores
  • 34 verified forecasts (from JJA 2000)
  • positive skill across most of country

34F
27
Max/Min Temp above/below median
  • Australian average LEPS2 scores
  • max temp solid
  • min temp dotted
  • positive averages through most of 2002/03 El Niño
  • Periods of low skill generally correspond to
    periods of low forecast signal

28
Linear Error in Probability Space
(LEPS)
  • Advantages
  • Rewards emphatic forecasts
  • Valid across all categories
  • Can be mapped
  • Disadvantages
  • Complex to understand/not intuitive
  • Rather difficult to calculate
  • Penalizes forecast systems which give near
    climatologically
  • probabilities

29
The Value of Forecasts
  • Just because a forecast is skilful doesnt mean
    that it is valuable A forecast only has value if
    it leads to a changed decision.

An Idealized Example - Informed use of a
Probability Forecast
Consider a farmer with 100 sheep. It is the end
of winter, and she/he has the option of buying
supplementary feed for summer at 10 a head.
30
Informed use of a Probability Forecast
If the farm receives 100mm of spring rainfall
there will be sufficient pasture and no need for
the extra feed (which will rot). If 100mm of rain
does not fall, however, there will not be
sufficient pasture and hay will have to be bought
over summer at a cost of 20 a head. The
climate forecast is that there is only a 30
chance of receiving at least100mm in spring. What
should the farmer do?
31
Informed use of a Probability Forecast
Definitions C is cost of preventative action
(-1000) L is cost of not taking preventative
action if adverse climate outcome occurs
(-2000) In general if PgtC/L the user should
take preventative action. That is, protection is
optimal when the C/L ratio is less than the
probability of the adverse climate outcome. Buy
feed cost -1000 not buying average cost is
-1400 (0.72000) gt farmer should buy feed
now. The forecast is valuable as it motivates a
decision. Imagine, however the accurate model
which never predicts Pgt0.5 For this decision,
the forecasts would never be valuable! Not all
forecasts are relevant to all decisions.
32
Conclusions
33
Conclusions
  • Even for deterministic forecasts, there is no
    single measure that gives a comprehensive summary
    of forecast quality
  • - accuracy
  • - skill
  • - uncertainty
  • Probabilistic forecasts address the two
    fundamental questions
  • -What is going to happen?
  • How confident can we be that it is going to
    happen?
  • These aspects require verification.

34
Conclusions
A probability forecast is always correct.
Further, any scoring technique which converts a
probability to a categorical outcome runs the
risk of encouraging in-appropriate decision
making. No single skill score can describe all
aspects of the verification problem. Fortunately,
most skill scores in most situations will tell a
similar story. Forecast Reliability ? Forecast
skill ? Forecast Value
35
Further Information
  • Wilks, D. S., 1995 Statistical Methods in the
    Atmospheric Sciences. Academic Press, San Diego.
    Chapter 7, Forecast verification, pp 233283.
  • Wilks D.S., 2001 A skill score based on economic
    value for probability forecasts. Meteor. Appl.,
    8, 209-219.
  • Hartmann et al., 2002. Confidence Builders.
    Evaluating Seasonal Forecasts from User
    Perspectives. Bull. Amer. Met. Soc., 683-698.
  • d.jones_at_bom.gov.au or a.watkins_at_bom.gov.au
Write a Comment
User Comments (0)
About PowerShow.com