Title: Verifying Climate Forecasts
1Verifying Climate Forecasts
- Stakeholder Workshop Enhanced Application of
Climate Predictions in Pacific Island Countries
Vanuatu, Solomon Islands, Fiji, Tonga, Samoa,
Niue, Cook Islands, Tuvalu, Kiribati - 2005
2Verification
- How do we decide if a forecast is correct?
- How do we decide if a set of forecasts are
correct often enough to be good? - How do we decide if climate forecasts are
valuable?
3When is a Forecast Correct?
Tomorrow it will rain. Observed Rain
correct or incorrect? Observed No Rain correct
or incorrect? Tomorrow there is a 70 chance of
rain. Observed Rain correct or
incorrect? Observed No Rain correct or
incorrect?
Correct
In-correct
Correct
Also Correct
4Climate Prediction
Statistical climate prediction is the process of
estimating the change in the probability
distribution function of rainfall (or
temperature) conditional on a climate forcing
5The Basis for Climate Forecast Verification
- There are three distinct verification issues
- How consistent are the probability shifts with
observations? - Are the forecasts reliable
- How large are the probability shifts?
- Are the forecasts emphatic
- How tight is the shifted distribution?
- Are the forecasts emphatic
- weight, height, age
- No single measure can describe all three of
- these, which is why we use a variety of skill
measures.
6Hit Rate (Percent Consistent )
Hit Rate (total number of hits)/(total number
of forecasts) Hit outcome with the highest
probability is observed Miss outcome with the
highest probability is not observed Below Median
Above Median Observed 75 25 Below
30 70 Below 10 90 Above Hit Rate
How often did the outlook favor the observed
outcome
hit
miss
hit
7Hit Rates Above/Below
OBSERVED
8An Example
Forecast of rainfall being above/below the median
Number of forecasts 50
OBSERVED
HIT RATE (Hits)/(Hits Misses)
2019/(206519) 0.78
9Max Temp above/below median
- Hit Rate
- 37 verified forecasts
- better than guessing across most of country
- correct 2/3 of times across most of eastern
Australia
inset shows historical comparison
37F
10Forecast Performance Via Hit Rate Score (for
above/below median)
Often expressed as percentage Hit Rate
50
0
100
As good as Climatology
Worse than Climatology
Better than Climatology
i.e. Hit Rate 70 gt Good Forecasting Hit
Rate 20 gt Poor Forecasting
11Hit Rate
- Advantages
- Very simple to calculate
- Simple to understand
- Able to map
- Disadvantages
- May be miss-interpreted. Hit Rate does not
measure accuracy. - Categorical not probabilistic, thereby
encouraging categorical decision making.
5565100... - Does not distinguish between a near and far miss!
12Linear Error in Probability Space (LEPS)
1.0
Forecast
Observed
0.48
Pf Po
Probability
0.20
0.0
23mm
31mm
Rainfall Value
LEPS 100(1 - Pf - Po)
13Another way of looking at itCumulative
Probability Distribution
Note, both forecasts have an error of 2C
14Rainfall above/below median
- LEPS skill scores
- 34 verified forecasts (from JJA 2000)
- positive skill across most of country
34F
15Forecast Performance Via LEPS Score
Often expressed as percentage LEPS
0
-100
100
As good as Climatology
Worse than Climatology
Better than Climatology
i.e. LEPS 42 gt Good Forecasting LEPS
-3 gt Poor Forecasting
16Linear Error in Probability Space (LEPS)
- Advantages
- Rewards emphatic forecasts
- Valid across all categories
- Can be mapped
- Rewards difficult/extreme forecasts
- Disadvantages
- Complex to understand/not intuitive
- Rather difficult to calculate
17Reliability
Unless the probability is 0 or 100, a forecast
is always correct. e.g., 90 chance of rain
implies 10 chance of no rain. However, the
forecast may be over/under confident this is
reliability. When a forecaster says there is a
high probability of rain tomorrow, it should rain
more frequently than when the forecaster says
there is a low probability of rain.
18Reliability Diagrams
Reliability Diagrams For all forecasts of a given
confidence, identify how often the event occurs.
If the proportion of times that the event occurs
is the same as the forecast probability, the
probabilities are reliable (or well
calibrated). A plot of relative frequency of
occurrence against forecast probability will be a
straight line if the forecasts are
reliable. Problem large number of forecasts
required, cant map spatially or temporally
19Rainfall Above/Below median
- Reliability data (all Aust. grid points)
- 34 verified forecasts
- 1 percent probability bins
- Evidence that wet conditions are easier to
forecast
Forecast probability 60, outcome probability 64
histogram of forecast probabilities
20Forecast Performance Via Reliability Curve
As good as Climatology
Worse than Climatology
Better than Climatology
21How do we verify forecasts?
We use two different methods to verify climate
forecasts. These both involve a comparison of
forecasts against independent observations.
These are called, hindcast verification and
forecast verification.
22Hindcast Verification
Hindcast verification uses a technique called
cross-validation where one past forecast
(hindcast) period is deleted, the forecast model
is developed on the remaining cases, and then
tested on the deleted case. This process is then
repeated exhaustively. This attempts to mimic the
process of producing independent forecasts.
23Forecast Verification
Forecast verification involves the comparison of
real forecasts with real outcomes. Obviously,
this cannot be done reliable until SCOPIC has
been used for some months or years.
24Conclusions
- There is no single measure that gives a
comprehensive summary of forecast quality - Probabilistic forecasts tell you
- What is most likely to happen?
- How likely it is that something will happen?
- These aspects require verification
25Conclusions
- There is no single measure that gives a
comprehensive summary of forecast quality - Probabilistic forecasts tell you
- What is most likely to happen?
- How likely it is that something will happen?
- These aspects require verification
26Conclusions
A probability forecast is always correct. Any
scoring technique which converts a probability to
a yes/no (categorical) outcome runs the risk of
encouraging in-appropriate decision making. No
single skill score can describe all aspects of
the verification problem. Fortunately, most skill
scores in most situations will tell a similar
story.
27Conclusions
28Further Information
- Your Notes SCOPIC Manual
- The Internet e.g., IRI online course material
http//iri.columbia.edu - Wilks, D. S., 1995 Statistical Methods in the
Atmospheric Sciences. Academic Press, San Diego.
Chapter 7, Forecast verification, pp 233283. - Hartmann et al., 2002. Confidence Builders.
Evaluating Seasonal Forecasts from User
Perspectives. Bull. Amer. Met. Soc., 683-698. - d.jones_at_bom.gov.au or a.watkins_at_bom.gov.au
29(No Transcript)
30Relative Operating Characteristics (ROC)
Convert probabilistic forecasts to deterministic
forecasts by issuing a warning if the probability
exceeds a threshold minimum. By raising the
threshold less warnings are likely to be issued -
reducing the potential of issuing a false alarm,
but increasing the potential of a miss. By
lowering the threshold more warnings are likely
to be issued - reducing the potential of a miss,
but increasing the potential of a false
alarm. The ROCs curve measures the trade-off
between a correct warning and a false alarm,
across a range of decision thresholds i.e.
between Hits and False Alarms.
31Relative Operating Characteristics (ROC)
OBSERVED
HIT RATE Hits/(Hits Misses) prob that
event is forewarned FALSE ALARM RATE F.A/(F.A
Correct Rejections) prob warning is made
for a non-event
32Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 50 Number of forecasts 32
Threshold 50
OBSERVED
HIT RATE Hits/(Hits Misses) 7/(710)
0.41 FALSE ALARM RATE FA/(FA CR) 2/(213)
0.13
33Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 10
OBSERVED
HIT RATE Hits/(Hits Misses) 21/(210)
1.0 FALSE ALARM RATE FA/(FA CR) 7/(74)
0.64
34Relative Operating Characteristics (ROC) - EXAMPLE
Probability of rainfall being greater than median
is greater than 10 Number of forecasts 32
Threshold 80
OBSERVED
HIT RATE Hits/(Hits Misses) 1/(113)
0.07 FALSE ALARM RATE FA/(FA CR) 1/(117)
0.06
35Relative Operating Characteristics (ROC)
0
10
20
30
40
50
60
80
70
36Relative Operating Characteristics (ROC)
37Relative Operating Characteristics
- Advantages
- Skill can be mapped in space and time
- Weights forecasts equally
- Relatively simple to calculate
- Disadvantages
- Some complexity to understand
- Categorical not Probabilistic
- ROCS score not intuitive
38The Value of Forecasts
- Just because a forecast is skilful doesnt mean
that it is valuable A forecast only has value if
it leads to a changed decision.
An Idealized Example - Informed use of a
Probability Forecast
Consider a farmer with 100 sheep. It is the end
of winter, and she/he has the option of buying
supplementary feed for summer at 10 a head.
39Informed use of a Probability Forecast
If the farm receives 100mm of spring rainfall
there will be sufficient pasture and no need for
the extra feed (which will rot). If 100mm of rain
does not fall, however, there will not be
sufficient pasture and hay will have to be bought
over summer at a cost of 20 a head. The
climate forecast is that there is only a 30
chance of receiving at least100mm in spring. What
should the farmer do?
40Informed use of a Probability Forecast
Definitions C is cost of preventative action
(-1000) L is cost of not taking preventative
action if adverse climate outcome occurs
(-2000) In general if PgtC/L the user should
take preventative action. That is, protection is
optimal when the C/L ratio is less than the
probability of the adverse climate outcome. Buy
feed cost -1000 not buying average cost is
-1400 (0.72000) gt farmer should buy feed
now. The forecast is valuable as it motivates a
decision. Imagine, however the accurate model
which never predicts Pgt0.5 For this decision,
the forecasts would never be valuable! Not all
forecasts are relevant to all decisions.
41Conclusions