Robin Hogan, Ewan O - PowerPoint PPT Presentation

About This Presentation
Title:

Robin Hogan, Ewan O

Description:

Objective assessment of the skill of cloud forecasts: Towards an NWP-testbed Robin Hogan, Ewan O Connor, Andrew Barrett University of Reading, UK – PowerPoint PPT presentation

Number of Views:109
Avg rating:3.0/5.0
Slides: 44
Provided by: RobinH154
Category:

less

Transcript and Presenter's Notes

Title: Robin Hogan, Ewan O


1
Objective assessment of the skill of cloud
forecasts Towards an NWP-testbed
  • Robin Hogan, Ewan OConnor, Andrew Barrett
  • University of Reading, UK
  • Maureen Dunn, Karen Johnson
  • Brookhaven National Laboratory

2
Overview
  • Cloud schemes in NWP models are basically the
    same as in climate models, but easier to evaluate
    using ARM because
  • NWP models are trying to simulate the actual
    weather observed
  • They are run every day
  • In Europe at least, NWP modelers are more
    interested in comparisons with ARM-like data than
    climate modelers (not true in US?)
  • But can we use these comparisons to improve the
    physics?
  • Can compare different models which have different
    parameterizations
  • But each model uses different data assimilation
    system
  • Cleaner test if the setup is identical except one
    aspect of physics
  • SCM-testbed is the crucial addition to the
    NWP-testbed
  • How do we set such a system up?
  • Start by interfacing Cloudnet processing with ARM
    products
  • Metrics test both bias and skill (can only test
    bias of climate model)
  • Diurnal compositing to evaluate boundary-layer
    physics

3
Level 1b
  • Minimum instrument requirements at each site
  • Cloud radar, lidar, microwave radiometer, rain
    gauge, model or sondes
  • Radar
  • Lidar

4
Level 1c
  • Instrument Synergy product
  • Example of target classification and data quality
    fields

Ice
Liquid
Rain
Aerosol
5
Level 2a/2b
  • Cloud products on (L2a) observational and (L2b)
    model grid
  • Water content and cloud fraction

L2a IWC on radar/lidar grid L2b Cloud fraction
on model grid
6
Cloud fraction
Chilbolton Observations Met Office Mesoscale
Model ECMWF Global Model Meteo-France ARPEGE
Model KNMI RACMO Model Swedish RCA model
7
Cloud fraction in 7 models
  • Mean PDF for 2004 for Chilbolton, Paris and
    Cabauw

0-7 km
Illingworth et al. (BAMS 2007)
8
ARM-Cloudnet interface
  • First step interface ARM products to Cloudnet
    processing
  • Now done at Reading need to implement at
    Brookhaven
  • Is this a long-term solution?
  • Extra products and verification metrics still
    desirable

9
Skill and bias
  • If directly evaluating a climate model, can only
    evaluate bias
  • Zero bias can often be because of compensating
    errors
  • In NWP- and SCM-testbed, can also measure skill
  • Answers the question was cloud forecast at the
    right time?
  • This checks whether the cloud responds to the
    correct forcing
  • Easiest to do for binary events, e.g. threshold
    exceedence
  • Metrics of skill should be
  • Equitability (random and constant forecasts score
    zero)
  • Robust for rare events (many scores tend to 0 or
    1)
  • A metric with good properties is the Symmetric
    Extreme Dependency Score (SEDS) Hogan et al.
    (2009)
  • Awards score of 1 to perfect forecast and 0 for
    random
  • We have tested 3 models over SCP in 2004...
  • Apply with cloud-fraction threshold of 0.1

10
Southern Great Plains 2004
ECMWF NCEP UK Met Office (Hadley Centre
? Met Office)
11
Winter2004
12
Summer2004
13
Microbase IWC vs. ECMWF
  • Maureen Dunn

14
Different mixing schemes
Longwave cooling
Height (z)
Virtual potential temp. (qv)
15
Different mixing schemes
Non-local mixing scheme (e.g. Met Office, ECMWF,
RACMO)
Longwave cooling
  • Use a test parcel to locate the unstable
    regions of the atmosphere
  • Eddy diffusivity is positive over this region
    with a strength determined by the cloud-top
    cooling rate (Lock 1998)

Height (z)
Virtual potential temp. (qv)
Eddy diffusivity (Km) (strength of the mixing)
16
Different mixing schemes
Prognostic turbulent kinetic energy (TKE) scheme
(e.g. SMHI-RCA)
Longwave cooling
  • Model carries an explicit variable for TKE
  • Eddy diffusivity parameterized as KmTKE1/2l,
    where l is a typical eddy size

TKE generated
Height (z)
TKE transported downwards by turbulence itself
dqv/dzlt0
dqv/dzgt0
TKE destroyed
Virtual potential temp. (qv)
17
Diurnal cycle composite of clouds
Radar and lidar provide cloud boundaries and
cloud properties above site
Most models have a non-local mixing scheme in
unstable conditions and an explicit formulation
for entrainment at cloud top good performance
over the diurnal cycle
  • Barrett, Hogan OConnor (GRL 2009)

18
Summary and future work
  • One years evaluation over SGP
  • All models underestimate mid- and low-level cloud
  • Skill may be robustly quantified using SEDS less
    skill in summer
  • Infrastructure to interface ARM and Cloudnet data
    has been tested on 1 year of data with cloud
    fraction and IWC
  • So far Met Office, NCEP, ECMWF and Meteo-France
    can be processed
  • Next implement code at BNL, with other ARM
    products and models
  • Then run on many years of ARM data from multiple
    sites
  • Question have cloud forecasts improved in 10
    years?
  • Next apply to SCM-testbed
  • Comparisons already demonstrate strong difference
    in performance of different boundary-layer
    parameterizations non-local mixing with explicit
    entrainment is clearly best
  • We have the tools to quantify objectively
    improvements in both bias and skill with changed
    parameterizations in SCMs
  • Other metrics of performance or compositing
    methods required?
  • Could also forward-model the observations and
    evaluate in obs space?

19
(No Transcript)
20
Joint PDFs of cloud fraction
  • Raw (1 hr) resolution
  • 1 year from Murgtal
  • DWD COSMO model

21
Contingency tables
Observed cloud Observed clear-sky
a 7194 b 4098
c 4502 d 41062
DWD model, Murgtal DWD model, Murgtal
a Cloud hit b False alarm
c Miss d Clear-sky hit
  • Model cloud
  • Model clear-sky

For given set of observed events, only 2 degrees
of freedom in all possible forecasts (e.g. a
b), because 2 quantities fixed - Number of
events that occurred n a b c d - Base
rate (observed frequency of occurrence) p (a
c)/n
22
Skill-Bias diagrams
Reality (n16, p1/4) Forecast

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
-
23
5 desirable properties of verification measures
  • Equitable all random forecasts receive
    expected score zero
  • Constant forecasts of occurrence or
    non-occurrence also score zero
  • Note that forecasting the right cloud climatology
    versus height but with no other skill should also
    score zero
  • Difficult to hedge
  • Some measures reward under- or over-prediction
  • Useful for rare events
  • Almost all measures are degenerate in that they
    asymptote to 0 or 1 for vanishingly rare events
  • Dependence on full joint PDF, not just 2x2
    contingency table
  • Difference between cloud fraction of 0.9 and 1 is
    as important for radiation as a difference
    between 0 and 0.1
  • Difficult to achieve with other desirable
    properties wont be studied much today...
  • Linear so that can fit an inverse exponential
    for half-life
  • Some measures (e.g. Odds Ratio Skill Score) are
    very non-linear

24
HedgingIssuing a forecast that differs from
your true belief in order to improve your score
(e.g. Jolliffe 2008)
  • Hit rate Ha/(ac)
  • Fraction of events correctly forecast
  • Easily hedged by randomly changing some forecasts
    of non-occurrence to occurrence

25
Equitability
  • Defined by Gandin and Murphy (1992)
  • Requirement 1 An equitable verification measure
    awards all random forecasting systems, including
    those that always forecast the same value, the
    same expected score
  • Inequitable measures rank some random forecasts
    above skillful ones
  • Requirement 2 An equitable verification measure
    S must be expressible as the linear weighted sum
    of the elements of the contingency table, i.e. S
    (Saa Sbb Scc Sdd) / n
  • This can safely be discarded it is incompatible
    with other desirable properties, e.g. usefulness
    for rare events
  • Gandin and Murphy reported that only the Peirce
    Skill Score and linear transforms of it is
    equitable by their requirements
  • PSS Hit Rate minus False Alarm Rate a/(ac)
    b/(bd)
  • What about all the other measures reported to be
    equitable?

26
Some reportedly equitable measures
HSS x-E(x) / n-E(x) x ad ETS
a-E(a) / abc-E(a)
E(a) (ab)(ac)/n is the expected value of a
for an unbiased random forecasting system
LOR lnad/bc ORSS ad/bc 1 / ad/bc 1
27
Skill versus cloud-fraction threshold
  • Consider 7 models evaluated over 3 European sites
    in 2003-2004

LOR
HSS
28
Extreme dependency score
  • Stephenson et al. (2008) explained this behavior
  • Almost all scores have a meaningless limit as
    base rate p ? 0
  • HSS tends to zero and LOR tends to infinity
  • They proposed the Extreme Dependency Score
  • where n a b c d
  • It can be shown that this score tends to a
    meaningful limit
  • Rewrite in terms of hit rate H a/(a c) and base
    rate p (a c)/n
  • Then assume a power-law dependence of H on p as p
    ? 0
  • In the limit p ? 0 we find
  • This is useful because random forecasts have Hit
    rate converging to zero at the same rate as base
    rate d1 so EDS0
  • Perfect forecasts have constant Hit rate with
    base rate d0 so EDS1

29
Symmetric extreme dependency score
  • EDS problems
  • Easy to hedge (unless calibrated)
  • Not equitable
  • Solved by defining a symmetric version
  • All the benefits of EDS, none of the drawbacks!

Hogan, OConnor and Illingworth (2009 QJRMS)
30
Skill versus cloud-fraction threshold
SEDS
LOR
HSS
SEDS has much flatter behaviour for all models
(except for Met Office which underestimates high
cloud occurrence significantly)
31
Skill versus height
  • Most scores not reliable near the tropopause
    because cloud fraction tends to zero

LBSS
EDS
LOR
HSS
32
A surprise?
  • Is mid-level cloud well forecast???
  • Frequency of occurrence of these clouds is
    commonly too low (e.g. from Cloudnet Illingworth
    et al. 2007)
  • Specification of cloud phase cited as a problem
  • Higher skill could be because large-scale ascent
    has largest amplitude here, so cloud response to
    large-scale dynamics most clear at mid levels
  • Higher skill for Met Office models (global and
    mesoscale) because they have the arguably most
    sophisticated microphysics, with separate liquid
    and ice water content (Wilson and Ballard 1999)?
  • Low skill for boundary-layer cloud is not a
    surprise!
  • Well known problem for forecasting (Martin et al.
    2000)
  • Occurrence and height a subtle function of
    subsidence rate, stability, free-troposphere
    humidity, surface fluxes, entrainment rate...

33
Key properties for estimating ½ life
  • We wish to model the score S versus forecast lead
    time t as
  • where t1/2 is forecast half-life
  • We need linearity
  • Some measures saturate at high skill end
    (e.g. Yules Q / ORSS)
  • Leads to misleadingly long half-life
  • ...and equitability
  • The formula above assumes that score tends to
    zero for very long forecasts, which will only
    occur if the measure is equitable

34
Which measures are equitable?
  • Expected values of ad for a random forecasting
    system may score zero
  • SE(a), E(b), E(c), E(d) 0
  • But expected score may not be zero!
  • ES(a,b,c,d) S P(a,b,c,d)S(a,b,c,d)
  • Width of random probability distribution
    decreases for larger sample size n
  • A measure is only equitable if positive and
    negative scores cancel

35
Asyptotic equitability
  • Consider first unbiased forecasts of events that
    occur with probability p ½

36
What about rarer events?
  • Equitable Threat Score still virtually
    equitable for n gt 30
  • ORSS, EDS and SEDS approach zero much more slowly
    with n
  • For events that occur 2 of the time (e.g.
    Finleys tornado forecasts), need n gt 25,000
    before magnitude of expected score is less than
    0.01
  • But these measures are supposed to be useful for
    rare events!

37
Possible solutions
  • Ensure n is large enough that E(a) gt 10
  • Inequitable scores can be scaled to make them
    equitable
  • This opens the way to a new class of non-linear
    equitable measures

38
What is the origin of the term ETS?
  • First use of Equitable Threat Score Mesinger
    Black (1992)
  • A modification of the Threat Score a/(abc)
  • They cited Gandin and Murphys equitability
    requirement that constant forecasts score zero
    (which ETS does) although it doesnt satisfy
    requirement that non-constant random forecasts
    have expected score 0
  • ETS now one of most widely used verification
    measures in meteorology
  • An example of rediscovery
  • Gilbert (1884) discussed a/(abc) as a possible
    verification measure in the context of Finleys
    (1884) tornado forecasts
  • Gilbert noted deficiencies of this and also
    proposed exactly the same formula as ETS, 108
    years before!
  • Suggest that ETS is referred to as the Gilbert
    Skill Score (GSS)
  • Or use the Heidke Skill Score, which is
    unconditionally equitable and is uniquely related
    to ETS HSS / (2 HSS)

Hogan, Ferro, Jolliffe and Stephenson (WAF, in
press)
39
Properties of various measures
Measure Equitable Useful for rare events Linear
Peirce Skill Score, PSS Heidke Skill Score, HSS Y N Y
Equitably Transformed SEDS Y Y
Symmetric Extreme Dependency Score, SEDS Y
Log of Odds Ratio, LOR
Odds Ratio Skill Score, ORSS (also known as Yules Q) N
Gilbert Skill Score, GSS (formerly ETS) N N
Extreme Dependency Score, EDS N Y
Hit rate, H False alarm rate, FAR N N Y
Critical Success Index, CSI N N N
  • Truly equitable
  • Asymptotically equitable
  • Not equitable

40
Skill versus lead time
2004
2007
  • Only possible for UK Met Office 12-km model and
    German DWD 7-km model
  • Steady decrease of skill with lead time
  • Both models appear to improve between 2004 and
    2007
  • Generally, UK model best over UK, German best
    over Germany
  • An exception is Murgtal in 2007 (Met Office model
    wins)

41
Forecast half life
2004
2007
  • Fit an inverse-exponential
  • S0 is the initial score and t1/2 is the half-life
  • Noticeably longer half-life fitted after 36 hours
  • Same thing found for Met Office rainfall forecast
    (Roberts 2008)
  • First timescale due to data assimilation and
    convective events
  • Second due to more predictable large-scale
    weather systems

42
Why is half-life less for clouds than pressure?
  • Different spatial scales? Convection?
  • Average temporally before calculating skill
    scores
  • Absolute score and half-life increase with number
    of hours averaged

43
Statistics from AMF
  • Murgtal, Germany, 2007
  • 140-day comparison with Met Office 12-km model
  • Dataset released to the COPS community
  • Includes German DWD model at multiple resolutions
    and forecast lead times

44
Alternative approach
  • How valid is it to estimate 3D cloud fraction
    from 2D slice?
  • Henderson and Pincus (2009) imply that it is
    reasonable, although presumably not in convective
    conditions
  • Alternative treat cloud fraction as a
    probability forecast
  • Each time the model forecasts a particular cloud
    fraction, calculate the fraction of time that
    cloud was observed instantaneously over the site
  • Leads to a Reliability Diagram

Perfect
No resolution
No skill
Jakob et al. (2004)
Write a Comment
User Comments (0)
About PowerShow.com