Robin%20Hogan - PowerPoint PPT Presentation

About This Presentation
Title:

Robin%20Hogan

Description:

Most model evaluations of clouds test the cloud climatology. What about individual forecasts? ... Continuous evaluation of the climatology of clouds in models ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 51
Provided by: robin107
Category:

less

Transcript and Presenter's Notes

Title: Robin%20Hogan


1
Verifying cloud forecastsWhat is the
half-life of a cloud forecast?Is the Equitable
Threat Score really equitable?
  • Robin Hogan
  • Ewan OConnor, Anthony Illingworth
  • University of Reading, UK
  • Chris Ferro, Ian Jolliffe, David Stephenson
  • University of Exeter, UK

2
How skillful is a forecast?
ECMWF 500-hPa geopotential anomaly correlation
  • Most model evaluations of clouds test the cloud
    climatology
  • What about individual forecasts?
  • Standard measure shows ECMWF forecast half-life
    of 6 days in 1980 and 9 days in 2000
  • But virtually insensitive to clouds!

3
Overview
  • The Cloudnet processing of ground-based radar
    and lidar observations
  • Continuous evaluation of the climatology of
    clouds in models
  • Evaluation of the diurnal cycle of boundary-layer
    clouds
  • Desirable properties of verification measures
    (skill scores)
  • Usefulness for rare events the Symmetric Extreme
    Dependency Score
  • Equitability is the Equitable Threat Score
    equitable?
  • Testing the skill of cloud forecasts from seven
    models
  • Skill versus cloud fraction, height, scale,
    forecast lead time, season...
  • Estimating the forecast half life
  • Testing the skill of cloud forecasts from space
  • Evaluation of ECMWF model with ICESat/GLAS lidar
  • Most results taken from these papers
  • Hogan, OConnor Illingworth (QJ 2009)
  • Hogan, Ferro, Jolliffe Stephenson (WAF, in
    press)

4
Project
  • Aim to retrieve and evaluate the crucial cloud
    variables in forecast and climate models
  • 8 models global, mesoscale and high-resolution
    forecast models
  • Variables cloud fraction, LWC, IWC, plus a
    number of others
  • Sites 4 across Europe plus worldwide ARM sites
  • Period several years to avoid unrepresentative
    case studies
  • Current status
  • Funded by US Department of Energy Climate Change
    Prediction Program to apply to ARM data worldwide

5
Level 1b
  • Minimum instrument requirements at each site
  • Cloud radar, lidar, microwave radiometer, rain
    gauge, model or sondes
  • Radar
  • Lidar

6
Level 1c
  • Instrument Synergy product
  • Example of target classification and data quality
    fields

Ice
Liquid
Rain
Aerosol
7
Level 2a/2b
  • Cloud products on (L2a) observational and (L2b)
    model grid
  • Water content and cloud fraction

L2a IWC on radar/lidar grid L2b Cloud fraction
on model grid
8
Cloud fraction
Chilbolton Observations Met Office Mesoscale
Model ECMWF Global Model Meteo-France ARPEGE
Model KNMI RACMO Model Swedish RCA model
9
Cloud fraction in 7 models
  • Mean PDF for 2004 for Chilbolton, Paris and
    Cabauw

0-7 km
Illingworth et al. (BAMS 2007)
10
Diurnal cycle composite of clouds
Radar and lidar provide cloud boundaries and
cloud properties above site
  • Barrett, Hogan OConnor (GRL 2009)

11
Joint PDFs of cloud fraction
  • Raw (1 hr) resolution
  • 1 year from Murgtal
  • DWD COSMO model

12
Contingency tables
Observed cloud Observed clear-sky
a 7194 b 4098
c 4502 d 41062
DWD model, Murgtal DWD model, Murgtal
a Cloud hit b False alarm
c Miss d Clear-sky hit
  • Model cloud
  • Model clear-sky

For given set of observed events, only 2 degrees
of freedom in all possible forecasts (e.g. a
b), because 2 quantities fixed - Number of
events that occurred n a b c d - Base
rate (observed frequency of occurrence) p (a
c)/n
13
Skill-Bias diagrams
Reality (n16, p1/4) Forecast

? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
-
14
5 desirable properties of verification measures
  • Equitable all random forecasts receive
    expected score zero
  • Constant forecasts of occurrence or
    non-occurrence also score zero
  • Note that forecasting the right cloud climatology
    versus height but with no other skill should also
    score zero
  • Difficult to hedge
  • Some measures reward under- or over-prediction
  • Useful for rare events
  • Almost all measures are degenerate in that they
    asymptote to 0 or 1 for vanishingly rare events
  • Dependence on full joint PDF, not just 2x2
    contingency table
  • Difference between cloud fraction of 0.9 and 1 is
    as important for radiation as a difference
    between 0 and 0.1
  • Difficult to achieve with other desirable
    properties wont be studied much today...
  • Linear so that can fit an inverse exponential
    for half-life
  • Some measures (e.g. Odds Ratio Skill Score) are
    very non-linear

15
HedgingIssuing a forecast that differs from
your true belief in order to improve your score
(e.g. Jolliffe 2008)
  • Hit rate Ha/(ac)
  • Fraction of events correctly forecast
  • Easily hedged by randomly changing some forecasts
    of non-occurrence to occurrence

16
Equitability
  • Defined by Gandin and Murphy (1992)
  • Requirement 1 An equitable verification measure
    awards all random forecasting systems, including
    those that always forecast the same value, the
    same expected score
  • Inequitable measures rank some random forecasts
    above skillful ones
  • Requirement 2 An equitable verification measure
    S must be expressible as the linear weighted sum
    of the elements of the contingency table, i.e. S
    (Saa Sbb Scc Sdd) / n
  • This can safely be discarded it is incompatible
    with other desirable properties, e.g. usefulness
    for rare events
  • Gandin and Murphy reported that only the Peirce
    Skill Score and linear transforms of it is
    equitable by their requirements
  • PSS Hit Rate minus False Alarm Rate a/(ac)
    b/(bd)
  • What about all the other measures reported to be
    equitable?

17
Some reportedly equitable measures
HSS x-E(x) / n-E(x) x ad ETS
a-E(a) / abc-E(a)
E(a) (ab)(ac)/n is the expected value of a
for an unbiased random forecasting system
LOR lnad/bc ORSS ad/bc 1 / ad/bc 1
18
Skill versus cloud-fraction threshold
  • Consider 7 models evaluated over 3 European sites
    in 2003-2004

LOR
HSS
19
Extreme dependency score
  • Stephenson et al. (2008) explained this behavior
  • Almost all scores have a meaningless limit as
    base rate p ? 0
  • HSS tends to zero and LOR tends to infinity
  • They proposed the Extreme Dependency Score
  • where n a b c d
  • It can be shown that this score tends to a
    meaningful limit
  • Rewrite in terms of hit rate H a/(a c) and base
    rate p (a c)/n
  • Then assume a power-law dependence of H on p as p
    ? 0
  • In the limit p ? 0 we find
  • This is useful because random forecasts have Hit
    rate converging to zero at the same rate as base
    rate d1 so EDS0
  • Perfect forecasts have constant Hit rate with
    base rate d0 so EDS1

20
Symmetric extreme dependency score
  • EDS problems
  • Easy to hedge (unless calibrated)
  • Not equitable
  • Solved by defining a symmetric version
  • All the benefits of EDS, none of the drawbacks!

Hogan, OConnor and Illingworth (2009 QJRMS)
21
Skill versus cloud-fraction threshold
SEDS
LOR
HSS
SEDS has much flatter behaviour for all models
(except for Met Office which underestimates high
cloud occurrence significantly)
22
Skill versus height
  • Most scores not reliable near the tropopause
    because cloud fraction tends to zero

LBSS
EDS
LOR
HSS
23
A surprise?
  • Is mid-level cloud well forecast???
  • Frequency of occurrence of these clouds is
    commonly too low (e.g. from Cloudnet Illingworth
    et al. 2007)
  • Specification of cloud phase cited as a problem
  • Higher skill could be because large-scale ascent
    has largest amplitude here, so cloud response to
    large-scale dynamics most clear at mid levels
  • Higher skill for Met Office models (global and
    mesoscale) because they have the arguably most
    sophisticated microphysics, with separate liquid
    and ice water content (Wilson and Ballard 1999)?
  • Low skill for boundary-layer cloud is not a
    surprise!
  • Well known problem for forecasting (Martin et al.
    2000)
  • Occurrence and height a subtle function of
    subsidence rate, stability, free-troposphere
    humidity, surface fluxes, entrainment rate...

24
Key properties for estimating ½ life
  • We wish to model the score S versus forecast lead
    time t as
  • where t1/2 is forecast half-life
  • We need linearity
  • Some measures saturate at high skill end
    (e.g. Yules Q / ORSS)
  • Leads to misleadingly long half-life
  • ...and equitability
  • The formula above assumes that score tends to
    zero for very long forecasts, which will only
    occur if the measure is equitable

25
Which measures are equitable?
  • Expected values of ad for a random forecasting
    system may score zero
  • SE(a), E(b), E(c), E(d) 0
  • But expected score may not be zero!
  • ES(a,b,c,d) S P(a,b,c,d)S(a,b,c,d)
  • Width of random probability distribution
    decreases for larger sample size n
  • A measure is only equitable if positive and
    negative scores cancel

26
Asyptotic equitability
  • Consider first unbiased forecasts of events that
    occur with probability p ½

27
What about rarer events?
  • Equitable Threat Score still virtually
    equitable for n gt 30
  • ORSS, EDS and SEDS approach zero much more slowly
    with n
  • For events that occur 2 of the time (e.g.
    Finleys tornado forecasts), need n gt 25,000
    before magnitude of expected score is less than
    0.01
  • But these measures are supposed to be useful for
    rare events!

28
Possible solutions
  • Ensure n is large enough that E(a) gt 10
  • Inequitable scores can be scaled to make them
    equitable
  • This opens the way to a new class of non-linear
    equitable measures

29
What is the origin of the term ETS?
  • First use of Equitable Threat Score Mesinger
    Black (1992)
  • A modification of the Threat Score a/(abc)
  • They cited Gandin and Murphys equitability
    requirement that constant forecasts score zero
    (which ETS does) although it doesnt satisfy
    requirement that non-constant random forecasts
    have expected score 0
  • ETS now one of most widely used verification
    measures in meteorology
  • An example of rediscovery
  • Gilbert (1884) discussed a/(abc) as a possible
    verification measure in the context of Finleys
    (1884) tornado forecasts
  • Gilbert noted deficiencies of this and also
    proposed exactly the same formula as ETS, 108
    years before!
  • Suggest that ETS is referred to as the Gilbert
    Skill Score (GSS)
  • Or use the Heidke Skill Score, which is
    unconditionally equitable and is uniquely related
    to ETS HSS / (2 HSS)

Hogan, Ferro, Jolliffe and Stephenson (WAF, in
press)
30
Properties of various measures
Measure Equitable Useful for rare events Linear
Peirce Skill Score, PSS Heidke Skill Score, HSS Y N Y
Equitably Transformed SEDS Y Y
Symmetric Extreme Dependency Score, SEDS Y
Log of Odds Ratio, LOR
Odds Ratio Skill Score, ORSS (also known as Yules Q) N
Gilbert Skill Score, GSS (formerly ETS) N N
Extreme Dependency Score, EDS N Y
Hit rate, H False alarm rate, FAR N N Y
Critical Success Index, CSI N N N
  • Truly equitable
  • Asymptotically equitable
  • Not equitable

31
Skill versus lead time
2004
2007
  • Only possible for UK Met Office 12-km model and
    German DWD 7-km model
  • Steady decrease of skill with lead time
  • Both models appear to improve between 2004 and
    2007
  • Generally, UK model best over UK, German best
    over Germany
  • An exception is Murgtal in 2007 (Met Office model
    wins)

32
Forecast half life
2004
2007
  • Fit an inverse-exponential
  • S0 is the initial score and t1/2 is the half-life
  • Noticeably longer half-life fitted after 36 hours
  • Same thing found for Met Office rainfall forecast
    (Roberts 2008)
  • First timescale due to data assimilation and
    convective events
  • Second due to more predictable large-scale
    weather systems

33
Why is half-life less for clouds than pressure?
  • Different spatial scales? Convection?
  • Average temporally before calculating skill
    scores
  • Absolute score and half-life increase with number
    of hours averaged

34
Geopotential height anomaly Vertical velocity
  • Cloud is noisier than geopotential height Z
    because it is separated by around two orders of
    differentiation
  • Cloud vertical wind relative vorticity
    ?2streamfunction ?2pressure
  • Suggests cloud observations should be used
    routinely to evaluate models

35
Satellite observations IceSAT
  • Cloud observations from IceSAT 0.5-micron lidar
    (first data Feb 2004)
  • Global coverage but lidar attenuated by thick
    clouds direct model comparison difficult

Lidar apparent backscatter coefficient (m-1 sr-1)
Latitude
Optically thick liquid cloud obscures view of any
clouds beneath
Solution forward-model the measurements
(including attenuation) using the ECMWF variables
36
Global cloud fraction comparison
ECMWF raw cloud fraction
ECMWF processed cloud fraction
  • Results for October 2003
  • Tropical convection peaks too high
  • Too much polar cloud
  • Elsewhere agreement is good
  • Results can be ambiguous
  • An apparent low cloud underestimate could be a
    real error, or could be due to high cloud above
    being too thick

IceSAT cloud fraction
Wilkinson, Hogan, Illingworth and Benedetti (MWR
2008)
37
Testing the model skill from space
Unreliable region
  • Clearly need to apply SEDS to cloud estimated
    from lidar radar!

Wilkinson, Hogan, Illingworth and Benedetti (MWR
2008)
38
CCPP project
  • US Dept of Energy Climate Change Prediction
    Program recently funded 5-year consortium project
    centred at Brookhaven, NY
  • Implement updated Cloudnet processing system at
    Atmospheric Radiation Measurement (ARM)
    radar-lidar sites worldwide
  • Ingests ARMs cloud boundary diagnosis, but uses
    Cloudnet for stats
  • New diagnostics being tested
  • Testing of NWP models
  • NCEP, ECMWF, Met Office, Meteo-France...
  • Over a decade of data at several sites have
    cloud forecasts improved over this time?
  • Single-column model testbed
  • SCM versions of many GCMs will be run over ARM
    sites by Roel Neggers
  • Different parameterization schemes tested
  • Verification measures can be used to judge
    improvements

39
US Southern Great Plains 2004
40
Winter2004
41
Summer2004
42
Summary and outlook
  • Model comparisons reveal
  • Half-life of a cloud forecast is between 2.5 and
    4 days, much less than 9 days for ECMWF 500-hPa
    geopotential height forecast
  • In Europe, higher skill for mid-level cloud and
    lower for boundary-layer cloud, but larger
    seasonal contrast in Southern US
  • Findings applicable to other verification
    problems
  • Symmetric Extreme Dependency Score is a
    reliable measure of skill for both common and
    rare events (given we have large enough sample)
  • Many measures regarded as equitable are only so
    for very large samples, including the Equitable
    Threat Score, but they can be rescaled
  • Future work (in addition to CCPP)
  • CloudSat Calipso what is the skill of cloud
    forecasts globally?
  • What is half-life of ECMWF cloud forecasts? (Need
    more data!)
  • Near-real-time evaluation for rapid feedback to
    NWP centres?
  • Dept of Meteorology Lunchtime Seminar, 1pm
    Tuesday 3rd Nov Faster and more accurate
    representation of clouds and gases in GCM
    radiation schemes

43
(No Transcript)
44
Monthly skill versus time
  • Measure of the skill of forecasting cloud
    fractiongt0.05
  • Comparing models using similar forecast lead time
  • Compared with the persistence forecast
    (yesterdays measurements)
  • Lower skill in summer convective events

45
Statistics from AMF
  • Murgtal, Germany, 2007
  • 140-day comparison with Met Office 12-km model
  • Dataset released to the COPS community
  • Includes German DWD model at multiple resolutions
    and forecast lead times

46
Possible skill scores
Contingency table Observed cloud Observed clear sky
Modeled cloud a hit b false alarm
Modeled clear sky c miss d correct negative
  • Cloud deemed to occur when cloud fraction f is
    larger than some threshold fthresh
  • To ensure equitability and linearity, we can use
    the concept of the generalized skill score
    (x-xrandom)/(xperfect-xrandom)
  • Where x is any number derived from the joint
    PDF
  • Resulting scores vary linearly from random0 to
    perfect1
  • Simplest example Heidke skill score (HSS) uses
    xad
  • We will use this as a reference to test other
    scores

DWD model DWD model
a 7194 b 4098
c 4502 d 41062
Perfect forecast Perfect forecast
ap 11696 bp 0
cp 0 dp 45160
Random forecast Random forecast
ar 2581 br 8711
cr 9115 dr 36449
  • Brier skill score uses xmean squared
    cloud-fraction difference, Linear Brier skill
    score (LBSS) uses xmean absolute difference
  • Sensitive to errors in model for all values of
    cloud fraction

47
(No Transcript)
48
Alternative approach
  • How valid is it to estimate 3D cloud fraction
    from 2D slice?
  • Henderson and Pincus (2009) imply that it is
    reasonable, although presumably not in convective
    conditions
  • Alternative treat cloud fraction as a
    probability forecast
  • Each time the model forecasts a particular cloud
    fraction, calculate the fraction of time that
    cloud was observed instantaneously over the site
  • Leads to a Reliability Diagram

Perfect
No resolution
No skill
Jakob et al. (2004)
49
ECMWF raw cloud fraction
  • Simulate lidar backscatter
  • Create subcolumns with max-rand overlap
  • Forward-model lidar backscatter from ECMWF water
    content particle size
  • Remove signals below lidar sensitivity

50
Testing the model climatology
Write a Comment
User Comments (0)
About PowerShow.com