METHODS TO EVALUATE PROBABILISTIC AND ENSEMBLE FORECASTS - PowerPoint PPT Presentation

About This Presentation
Title:

METHODS TO EVALUATE PROBABILISTIC AND ENSEMBLE FORECASTS

Description:

Probabilistic evaluation: simple measurement. Probabilistic evaluation: multi-categories ... Reliability measurement, system bias detected. positive/negative ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 33
Provided by: YZh9
Category:

less

Transcript and Presenter's Notes

Title: METHODS TO EVALUATE PROBABILISTIC AND ENSEMBLE FORECASTS


1
METHODS TO EVALUATE PROBABILISTIC AND ENSEMBLE
FORECASTS
  • Yuejian Zhu and Zoltan Toth
  • SAIC at Environmental Modeling Center
  • National Centers for Environmental Prediction
  • EMC Seminar Presentation
  • November 5 2002
  • Camp Springs, MD

2
Acknowledgements
  • Richard Wobus EMC/NCEP
  • Jun Du
    EMC/NCEP
  • Mozheng Wei EMC/NCEP
  • Steve Tracton ONR/NAVY
  • Olivier Talagrand LDM/FR
  • David Richardson ECMWF
  • Kenneth Mylne UKMET
  • Hua-Lu Pan EMC/NCEP
  • Steve Lord EMC/NCEP

3
References
  • 1. Toth, Talagrand, Candille and Zhu, 2002
  • "Probability and ensemble forecasts" book
    chapter, in print.
  • 2. Zhu, Iyenger, Toth, Tracton and Marchok, 1996
  • "Objective evaluation of the NCEP global
    ensemble forecasting system"
  • AMS conference proceeding.
  • 3. Toth, Zhu and Marchok, 2001
  • "The use of ensembles to identify forecasts
    with small and large uncertainty".
  • Weather and Forecasting
  • 4. Zhu, Toth, Wobus, Rechardson and Mylne, 2002
  • "The economic value of ensemble-based weather
    forecasts"
  • BAMS
  • more related articles

4
CONTENTS
  • Introduction
  • Ensemble forecasts
  • Current status of NCEP model evaluation
  • Probabilistic evaluation simple measurement
  • Probabilistic evaluation multi-categories
  • Probabilistic evaluation cost-loss analysis
  • Probabilistic evaluation useful tools
  • Conclusion and discussion
  • Plans and more information

5
Introduction
  • 1. Why do we need verification/evaluation/diagnost
    ics?
  • a. Diagnose errors introduced at each step of
    forecast process
  • Observation --(Data assimilation) --gt
    Analysis
  • Initial condition --(Numerical model) --gt
    Forecast
  • b. Assess forecast uncertainty
  • include uncertainty as part of forecast.
  • 2. What do we use as proxy for truth?
  • a. analysis ( at grid points )
  • b. observation ( at report stations )
  • 3. Various aspects of forecast verifications.
  • a. skill --gt compare errors to reference
    system ( clim. )
  • b. compare different MWP model forecasts
  • c. spatial/temporal variations in skill

6
Ensemble Forecasts
  • 1. Why do we need ensemble forecast?
  • Look at following schematic diagrams

7
Ensemble Forecasts (continue)
Deterministic forecast
Initial uncertainty
Forecast probability

Verified analysis
8
Ensemble Forecasts (continue)
  • 2. Ensemble forecast methods
  • for example
  • . NCEP and ECMWF
  • both adding perturbations to initial
    unperturbed analysis
  • perturbation generated by
  • . breeding vectors (BVs ---gt NCEP), fast
    and cheaper
  • . singular vectors ( SVs ---gt ECMWF),
    solve matrix.
  • . CMC/MSC -- Monte Carlo approach
  • using 2 models and different physical
    parameterizations.
  • design to simulate
  • . observation errors ( random
    perturbation).
  • . imperfect boundary conditions.
  • . model errors.

9
Ensemble forecasts(NCEP's Configuration)
  • . Last implementation --- T12Z Jan. 9th 2001
  • . 24 hours breeding cycle
  • . Initial time --- T00Z and T12Z
  • . Total 23 runs --- 12 runs at T00Z, 11 runs at
    T12Z
  • . Integration --- up to 384 hours ( 16 days )
  • . 3 different model resolutions
    T170L42 --- 75 km
  • T126L28 --- 105 km
    T62L28 --- 210 Km
  • . GFS - T254/L64 implementation ( Oct 29th. 2002
    )
  • . Next implementation --- 40 members (T126 up to
    168hrs), Feb. 2003 ( new IBM machine )
  • . Next implementation --- 6hrs breeding cycle.

10

Current Status of NCEP Model Verification
1. Current status of NCEP global model objective
verification. gtgtfor single model forecast
(deterministic forecast) Fcst .vs. Analysis
(500hPa, 1000hPa, 850hPa and etc...) Fcst
.vs. Obs ( sigma/pressure .vs. Obs, 3-dimension)
gtgtwe are objectively evaluating following
models, too. ECMWF T12Z up to 168 (7 days)
hours UKMET T00Z and T12Z up to 144 (6
days) hours CMC T00Z and T12Z up to
144 (6 days) hours NOGAPS T00Z and T12Z up to
120 (5 days) hours -----gt multi-model
ensemble from above productions if
we receive longer lead time forecasts 2. Global
ensemble evaluation/verification.
NCEP/ensemble T00Z and T12Z up to 16 days
ECMWF/ensemble T00Z and T12Z up to 10 days
CMC/ensemble T00Z and T12Z up to 10 days
-----gt We will have a seminar for the
comparison

11
Probabilistic Evaluation
  • Introduce two characteristics
  • reliability and resolution
  • reliability --- forecast lt-gt
    observation
  • the property of statistical consistency
    between predicted
  • probabilities and observed frequencies of
    occurrence of
  • the event under consideration.
  • resolution --- forecast/observation lt-gt
    climatology
  • the ability of a forecast system to discern
    sub-sample
  • forecast periods with different relative
    frequencies of event
  • (the difference between sub-sample and
    overall sample
  • climatology).
  • reliability and resolution together determine the
    usefulness
  • of a probabilistic forecast system.

12
Prob. Evaluation (simple measurement)
  • 1. Talagrand Distribution (histogram
    distribution)
  • Sorting fcst in order, to check where the
    analysis is falling
  • Reliability measurement, system bias
    detected.
  • positive/negative biased for forecasting
    model,
  • example of these forecasts --gt cold bias,
  • assume analysis is bias-free (perfect).
    Common -"U" sharp

avg distribution
13
Prob. Evaluation (simple measurement)
  • 1. Talagrand distribution (continue).
  • outlier evolution by different leading
    time
  • adding up two outliers subtract the
    average.
  • ideal forecasts will have outliers.

Due to inability of ensemble to capture model
related errors?
14
Prob. Evaluation (simple measurement)
  • Outlier --gt diagnostic
  • forecasts .vs. next forecasts ( f24hrs
    valid at same time)
  • assume forecasting model is perfect,
    f24.
  • perfect forecast system will expect the
    outliers are zero.

Detecting model initial uncertainty?
15
Prob. Evaluation (simple measurement)
  • 2. Outlier maps (2-dimension).
  • flow dependent systematic model errors
  • . 40 members at 4 different lead time.
  • . measured by normalized distance.
  • d (AVGe-ANL)/SQRT(1/(n-1) SPRDe)
  • The model errors evaluation ( outliers ) is a
    part of model bias estimation ( see map ). The
    "Normalized Distance" is defined as the
    difference of AVE and ANL over the square root of
    normalized ensemble spread. Therefore, the
    positive distance means there are positive biases
    for ensemble forecasts, the negative means there
    are negative biases for ensemble forecasts.
  • ---gt show example maps.
  • . area where consecutive ensembles fail ( missed
    )
  • -- initial errors ? / errors in boundary
    forcing ?
  • model development - identify problem
    area.
  • . weather system where consecutive ensembles
    fail ( missed )
  • -- inability to capture model related
    errors?

16
Prob. Evaluation (multi-categories)
  • Based on climatological equally likely bins ( for
    example. 5 bins )
  • For verifying multi-category probability
    forecasts.
  • measure both reliability and resolution.
  • 1. Ranked (ordered) probability score ( RPS) and
    RPSS
  • RPSS( RPSf - RPSc )/( 1 - RPSc )

17
Prob. Evaluation (multi-categories)
  • 2. Brier Score(BS, non-ranked), Brier Skill
    Score(BSS).
  • from two categories to multi-categories/probab
    ilistic
  • ----measure both reliability and resolution

Brier Skill Score
Skill line (ref. Is climatology)
18
Prob. Evaluation (multi-categories)
  • 3. Decomposition of Brier Score
  • consider sub-sample and overall-sample
  • reliability, resolution and uncertainty.
  • for reliability 0 is perfectly reliable
  • for resolution 0 is no resolution (
    climatology )
  • when resolution reliability ? no skill
  • example of global ensemble

No skill beyond this point
resolution
reliability
19
Prob. Evaluation (multi-categories)
  • 4. Reliability and possible calibration ( remove
    bias )
  • For period precipitation evaluation

Calibrated forecast
Skill line
Raw forecast
Resolution line Climatological prob.
20
Prob. Evaluation (multi-categories)
  • 5. Reliability and possible probabilistic
    calibration
  • re-label fcst prob by obs frequency
    associated with fcst

calibrated
Un-calibrated
21
Prob. Evaluation (cost-loss analysis)
  • Based on hit rate (HR) and false alarm (FA) rate.
  • 1. Relative Operating Characteristics (ROC) area
    - Appl. of signal detection theory for measuring
    discrimination between two alternative outcome.
  • ROCarea Intergrated area 2 ( 0-1
    normality )

h/(hm)
Relative Operating Characteristics
-------------------------- o\f
y(f) n(f) --------------------------
y(o) h m -------------------------
- n(o) f c -------------------
-------
f/(hf)
22
Prob. Evaluation (cost-loss analysis)
  • Based on hit rate (HR) and false alarm (FA) rate
  • 2. Relative Operating Characteristics (ROC)
    distance
  • for control forecast and ensemble forecasts
  • D distance from point (control) to ensemble
    polygon
  • positive --gt control is better than
    ensemble
  • negative --gt ensemble is better than
    control

High resolution control forecast
Low resolution control forecast
23
Prob. Evaluation (cost-loss analysis)
  • 3. Economic Value (EV) of forecasts.
  • Given a particular forecast, a user either
    does or does not take action

Highest value (110)
Ensemble forecast
Value line
Deterministic forecast
24
Prob. Evaluation (cost-loss analysis)
  • Based on hit rate (HR) and false alarm (FA)
    analysis
  • .. Economic Value (EV) of forecasts

Ensemble forecast
Average 2-day advantage
Deterministic forecast
25
Prob. Evaluation (cost-loss analysis)
  • Based on 2 2 table for precipitation forecast
  • 4. ETS, TSS and FBI
  • . ETS -- Equitable Threat Score
  • ETS (h-R(h))/(hfm-R(h))
  • where R(h)(hf).(hm)/(hfmc)
  • . TSS -- True Skill Statistics
  • probability of detection of hit rate and
    false alram.
  • TSS (h.c - f.m)/((hm).(fc))
  • . FBI -- Frequency Bias
  • FBI (hf)/(hm)

26
Prob. Evaluation (Useful Tools)
  • 1. Small and large uncertainty
  • productions spaghetti diagram
    and RMOP
  • for example of evaluation
  • and statistical results next diagram--gt

27
Prob. Evaluation (useful tools)
  • ... Small and large uncertainty.
  • 1 day (large uncertainty) 4 days (control)
    10-13 days (small uncertainty)

28
Prob. Evaluation (useful tools)
  • 2. Information content

Example from large small uncertainty
deterministic forecast
small uncertainty
large uncertainty
information content
29
Prob. Evaluation (useful tools)
  • 2. Information content
  • Statistics show a 7.5-day fully probabilistic
    forecast or 6-day
  • categorical forecast has as much information
    content as
  • 5-day control forecast. Or fully
    probabilistic forecast has
  • more than twice as much information content
    at day-5.

ensemble mode considers as most frequent forecasts
30
Prob. Evaluation (useful tools)
  • 3. Bimodality -- possible another seminar by
    Zoltan Toth.
  • Working in progress

31
Conclusion and Discussion
  • 1. Two probabilistic attributions
  • reliability --gt must statistically consist
    with observation
  • resolution --gt more information with respect
    to climatology
  • 2. Success for probabilistic attributions
  • reliability Talagrand, outliers, RPS, BS
  • resolution ROC, EV, IC, RPS, BS
  • 3. Calibration could improve reliability
    (resolution).
  • 4. Ensemble forecasts .vs. single value forecast.
  • Potential abilities of ensemble forecasts.
  • 5. Probabilistic forecast and related
    probabilistic skill?

32
Plans and Information
  • 1. To apply the methods to different scale of
    forecast.
  • such as short, medium, and season
  • or regional/global
  • 2. Using observation instead of analysis will be
    another set of
  • evaluation. ( thanks Mark Iredell make OBS
    possible )
  • . Visiting NCEP global ensemble web-page
  • http//wwwt.emc.ncep.noaa.gov/gmb/ens
  • click Verification
  • . Or visiting Yuejian Zhu's research home-page
  • http//wwwt.emc.ncep.noaa.gov/gmb/yzhu
  • click Ensemble evaluation
Write a Comment
User Comments (0)
About PowerShow.com