Title: An Overview of How Using Reforecasts Can Improve Your Probabilistic Weather Forecasts
1An Overview of How Using Reforecasts Can Improve
Your Probabilistic Weather Forecasts
NOAA Earth System Research Laboratory
- Tom Hamill and Jeff Whitaker
- NOAA / ESRL, Physical Sciences Div.
- tom.hamill_at_noaa.gov, jeffrey.s.whitaker_at_noaa.gov
2More background in January 2006 BAMS and other
articles. Reference list provided after
conclusions.
3What do we want from ensemble forecasts?
BAD (unreliable)
O
BEST
O
GOOD (reliable)
sharp and reliable
O
4Ensemble-based probabilistic forecasts some
problems wed like to correct
5- bias (drizzle over-forecast)
6(2) ensemble members too similar to each other.
7(3) Ensembles are too smooth, not capturing
intense local precipitation due to orographic
forcing. Downscaling needed.
8Calibration and reforecasting
- Problems with probabilities from raw ensemble
so then what? - Would like f(OF), that is, the probability
distribution of the expected observed state given
the forecast (much like your thought process as
a forecaster).
todays ensemble mean forecast
9Calibration and reforecasting
- Problems with probabilities from raw ensemble
so then what? - Would like f(OF), that is, the probability
distribution of the expected observed state given
the forecast. (much like your thought process as
a forecaster).
todays ensemble mean forecast
lots of other forecasts that are like todays
forecast
10Calibration and reforecasting
- Problems with probabilities from raw ensemble
so then what? - Would like f(OF), that is, the probability
distribution of the expected observed state given
the forecast. (much like your thought process as
a forecaster).
form ensemble from observed weather on days of
those past forecasts
todays ensemble mean forecast
lots of other forecasts that are like todays
forecast
11The concept of reforecasting
- Approach use FIXED model and data set of many
past forecasts from this model. Correct current
forecast using knowledge about the forecast
errors of this model for several decades in the
past (MOS on steroids) - Calibration should implicitly
- adjust for model bias
- adjust for any spread deficiency
- downscale (coarse prediction grid --gt predictable
local detail in observations).
12NOAAs reforecast data set
- Reforecast definition a data set of
retrospective numerical forecasts using the same
model as is used to generate real-time forecasts. - Model T62L28 NCEP GFS, circa 1998
- Initial States NCEP-NCAR Reanalysis II plus 7
/- bred modes. - Duration 15 days runs every day at 00Z from
19781101 to now. (http//www.cdc.noaa.gov/people/j
effrey.s.whitaker/refcst/week2). - Data Selected fields (winds, hgt, temp on 5
press levels, precip, t2m, u10m, v10m, pwat,
prmsl, rh700, heating). NCEP/NCAR reanalysis
verifying fields included (Web form to download
at http//www.cdc.noaa.gov/reforecast). - Real-time probabilistic precipitation forecasts
http//www.cdc.noaa.gov/reforecast/narr
13Why the fuss? Cant we calibrate with only a past
few forecasts?
But, consider training with a short sample in a
climatologically dry region. How could you
calibrate this latest forecast?
youd like enough training data to have
some similar events at a similar time of year to
this one.
14Analog high-resolution precipitation forecast
calibration technique
(actually run with 10 to 75 analogs)
15Analog high-resolution precipitation forecast
calibration technique
Approximate O F
(actually run with 10 to 75 analogs)
16Here we show forecast analogs for a recent
precipitation forecast in the NW, as well as
the analyzed precip. on those dates. Notice
patterns in the observed that are not in the
forecast (e.g., rain on Coast Range, drier in
Central Oregon).
17Example probability of greater than 25 mm/day
(downscaled to 5 km)
Downscaling using PRISM / Mountain Mapper
technology (C. Daly. Oregon St., NOAA RFCs, OHD)
18Verified over 25 years of forecasts skill
scores use conventional method of calculation
which may overestimate skill (Hamill and Juras
2006).
19Comparison against NCEP medium-range T126
ensemble, ca. 2002
the improvement is a little bit of increased
reliability, a lot of increased resolution.
20Effect of training sample size
colors of dots indicate which size analog
ensemble provided the largest amount of skill.
21Real-time products
22Heres a recent forecast
23Climatological probabilities
24Were also working on temperature calibration
(right now, MOS a better bet)
?
25Conclusions
- Large improvement in probabilistic forecast skill
and reliability by calibrating using large,
stable data set of NWP forecasts / obs. - Precipitation products are out there for you to
use (www.cdc.noaa.gov/reforecast/narr) - The NWS expects to produce more reforecasts and
calibrated products in the coming years. Were
working with Zoltan Toths group at NCEP on this.
26References
Hamill, T. M., J. S. Whitaker, and X. Wei, 2003
Ensemble re-forecasting improving medium-range
forecast skill using retrospective forecasts.
Mon. Wea. Rev., 132, 1434-1447.
http//www.cdc.noaa.gov/people/tom.hamill/reforeca
st_mwr.pdf Hamill, T. M., J. S. Whitaker, and
S. L. Mullen, 2005 Reforecasts, an important
dataset for improving weather predictions. Bull.
Amer. Meteor. Soc., 87, 33-46. http//www.cdc.noaa
.gov/people/tom.hamill/refcst_bams.pdf
Whitaker, J. S, F. Vitart, and X. Wei, 2006
Improving week two forecasts with multi-model
re-forecast ensembles. Mon. Wea. Rev., 134,
2279-2284. http//www.cdc.noaa.gov/people/jeffrey.
s.whitaker/Manuscripts/multimodel.pdf Hamill,
T. M., and J. S. Whitaker, 2006 Probabilistic
quantitative precipitation forecasts based on
reforecast analogs theory and application. Mon.
Wea. Rev., in press. http//www.cdc.noaa.gov/peopl
e/tom.hamill/reforecast_analog_v2.pdf Hamill,
T. M., and J. Juras, 2006 Measuring forecast
skill is it real skill or is it the varying
climatology? Quart. J. Royal Meteor. Soc., in
press. http//www.cdc.noaa.gov/people/tom.hamill/s
kill_overforecast_QJ_v2.pdf Wilks, D. S., and
T. M. Hamill, 2006 Comparison of ensemble-MOS
methods using GFS reforecasts. Mon. Wea. Rev., in
press. http//www.cdc.noaa.gov/people/tom.hamill/W
ilksHamill_emos.pdf Hamill, T. M. and J. S.
Whitaker, 2006 White Paper. Producing
high-skill probabilistic forecasts
using reforecasts implementing the National
Research Council vision. Available at
http//www.cdc.noaa.gov/people/tom.hamill/whitepap
er_reforecast.pdf .
27Bias correction using forecast and observed CDFs?
28Issues (3) NCEP proposes a single-member T126
reforecast. Is that enough?
Analog reforecast process repeated, as in prior
cartoon. But now rather than matching ensemble-mea
n pattern, match todays control forecast to
past control forecast. Grey area measures
degradation relative to baseline using
ensemble mean. Not much degradation in skill,
esp. at short leads! (and you dont even have to
run an ensemble to get a probabilistic forecast).
29Issues (2) Are reforecasts still necessary with
improved models?
ECMWF produced a short reforecast data
set. Calibration using their week-2 reforecasts p
roduced a skill increase of 11 for our
reforecast, skill improvement was 16
Whitaker and Vitart (2006)
30850 hPa temperature bias for a grid point in
the central U.S.
Spread of yearly bias estimates from
31-day running mean F-O Note the spread is
often larger than the bias, especially for long
leads.
31Disadvantages to reforecast calibration?
- Calibration research wont lead to a correction
of the underlying problem. Prefer to achieve
unbiased, reliable forecasts by doing numerical
modeling correctly in the first place. - Forecasts may be improved, but to end products,
not raw forecasts, so little gain in
meteorological insight. - Corrections may be model-specific the
calibrations for GFS v 2.0 may not be useful for
ECMWF, much less GFS v 3.0. - Could constrain model development. Calibration
ideally based on long database of prior forecasts
(reforecasts, or hindcasts) from same model. Do
we delay model upgrades until new set of
reforecasts completed? -
32Main Points
- Large improvement in probabilistic forecast skill
and reliability by calibrating using large,
stable data set of NWP forecasts / obs. - Generally
- smaller training sample size --gt small benefit.
- large training sample size --gt large benefit.
- Improvements are larger for surface variables
(surface temperature, precipitation) than for
upper-air variables (Z500). - Use for bias correction, of course. But also
useful for calibration of spread deficiencies,
statistical downscaling.