Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt) David Blake (Pensions Institute, Cass Business School) Guy D. Coughlan (JPMorgan) David - PowerPoint PPT Presentation

About This Presentation
Title:

Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt) David Blake (Pensions Institute, Cass Business School) Guy D. Coughlan (JPMorgan) David

Description:

Title: PowerPoint Presentation Author: Pearson Education, Inc. Last modified by: sa436 Created Date: 8/23/1999 3:35:09 PM Document presentation format – PowerPoint PPT presentation

Number of Views:253
Avg rating:3.0/5.0

less

Transcript and Presenter's Notes

Title: Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt) David Blake (Pensions Institute, Cass Business School) Guy D. Coughlan (JPMorgan) David


1
Backtesting Stochastic Mortality Models An
Ex-Post Evaluation of Multi-Period-Ahead Density
ForecastsKevin Dowd (CRIS, NUBS) Andrew J. G.
Cairns (Heriot-Watt)David Blake (Pensions
Institute, Cass Business School)Guy D. Coughlan
(JPMorgan)David Epstein (JPMorgan)Marwa
Khalaf-Allah (JPMorgan)4th International
Longevity Risk and Capital Market Solutions
ConferenceAmsterdam September 2008
2
Purposes of Paper
  • To set out a framework to backtest the forecast
    performance of mortality models
  • Backtesting evaluation of forecasts against
    subsequently realised outcomes
  • To apply this backtesting framework to a set of
    mortality models
  • How well do they actually perform?

3
Background
  • This study is the fourth in a series involving a
    collaboration between Blake, Cairns and Dowd and
    the LifeMetrics team at JPMorgan
  • Involves actuaries, economists and investment
    bankers
  • Of course, it is very easy (and fun!) to attack
    the forecasting abilities of actuaries
    (remember Equitable?) and investment bankers
    (remember subprime? etc), but we should remember

4
Its not just actuaries and investment bankers who
cant forecast
5
Background
  • Cairns et alia (2007) examines the empirical fits
    of 8 different mortality models applied to EW
    and US male mortality data
  • Compares model performance
  • Uses a range of qualitative criteria (e.g.,
    biological reasonableness, etc)
  • Uses a range of quantitative criteria (e.g.,
    Bayes information criterion)

6
Models considered
  • Model M1 Lee-Carter, no cohort effect
  • Model M2 Renshaw-Habermans 2006 cohort effect
    generalisation of M1
  • Model M3 Curries age-period-cohort model
  • Model M4 P-splines model, Currie 2004
  • Model M5 CBD two-factor model, Cairns et al
    (2006), no cohort effect
  • Models M6, M7 and M8 alternative cohort-effect
    generalisations of CBD

7
Second study, Cairns et al (2008)
  • Examines ex ante plausibility of models density
    forecasts
  • M4 (P-Splines not considered)
  • Amongst other conclusions, finds that M8 (which
    did very well in first study) gives very
    implausible forecasts for US data
  • Hence, decided to drop M8 as well
  • Thus, a model might fit past data well but still
    give unreliable forecasts
  • ? Not enough just to look at past fits

8
Third study, Dowd et al (2008a)
  • Examines the Goodness of Fits of models M1, M2B,
    M3B, M5, M6 and M7 more systematically
  • M2B is a special case of M2, which uses an
    ARIMA(1,1,0) for cohort effect
  • M3B is a special case of M3, which the same
    ARIMA(1,1,0) for cohort effect
  • Basic idea to unravel the models testable
    implications and test them systematically
  • Finds some problems with all models but M2B
    unstable

9
Motivation for present study
  • A model might
  • Give a good fit to past data and
  • Generate density forecasts that appear plausible
    ex ante
  • And still produce poor forecasts
  • Hence, it is essential to test performance of
    models against subsequently realised outcomes
  • This is what backtesting is about
  • In the end, it is the forecast performance that
    really matters
  • Would you want to drive a car that hadnt been
    field-tested?

10
Backtesting framework
  • Choose metric of interest
  • Could choose mortality rates, survival rates,
    life expectancy, annuity prices etc.
  • Select historical lookback window used to
    estimate model params
  • Select forecast horizon or lookforward window for
    forecasts
  • Implement tests of how well forecasts
    subsequently performed

11
Backtesting framework
  • We choose focus mainly on mortality rate as
    metric
  • We choose a fixed 10-year lookback window
  • This seems to be emerging as the standard amongst
    practitioners
  • We examine a range of backtests
  • Over contracting horizons
  • Over expanding horizons
  • Over rolling fixed-length horizons
  • Future mortality density tests

12
Backtesting framework
  • We consider forecasts both with and without
    parameter uncertainty
  • Parameter certain case treat estimates of
    parameters as if known values
  • Parameter uncertain case forecast using a
    Bayesian approach that allows for uncertainty in
    parameter estimates
  • Allows for uncertainty in parameters governing
    period and cohort effects
  • Results indicate it is very important to allow
    for parameter uncertainty

13
Contracting horizon BT age 65
14
Contracting horizon BT age 75
15
Contracting horizon BT age 85
16
Conclusions so far
  • Big difference between PC and PU forecasts
  • PU prediction intervals usually considerably
    wider than PC ones
  • M2B sometimes unstable
  • Now consider expanding horizon predictions

17
Prediction-Intervals from 1980 age 65
18
Prediction-Intervals from 1980 age 75
19
Prediction-Intervals from 1980 age 85
20
Expanding PI conclusions
  • PC models have far too many lower exceedances
  • PU models have exceedances that are much closer
    to expectations
  • Especially for M1, M7 and M3B
  • Suggests that PU forecasts are more plausible
    than PC ones
  • Negligible differences between PC and PU median
    predictions
  • Very few upper exceedances

21
Expanding PI conclusions
  • Too few upper exceedances, and two many median
    and lower exceedances
  • ? some upward bias, especially for PC forecasts
  • This upward bias is especially pronounced for PC
    forecasts
  • Evidence of upward bias less clearcut for PU
    forecasts

22
Rolling Fixed Horizon Forecasts
  • From now on, work with PU forecasts only
  • Assume illustrative horizon 15 years
  • Now examine performance of each model in turn

23
Model M1
24
Model M2B
25
Model M3B
26
Model M5
27
Model M6
28
Model M7
29
Tentative conclusions so far
  • Rolling PI charts broadly consistent with earlier
    results
  • Some evidence of upward bias but not consistent
    across models or always especially compelling
  • M2B again shows instability



30
Mortality density tests
  • Choose age (e.g., 65) and horizon (e.g., 15 years
    ahead)
  • Use model to project pdf (or cdf) of mortality
    rate 15 years ahead
  • Plot realised q on to pdf/cdf
  • Obtain associated p-value (or PIT value)
  • Reject if p is too far out in either tail



31
Example P-Values of Realised Mortality Males
65, 1980 Start, Horizon 26 Years Ahead



32
Many ways to do this
  • For h25 years ahead 1 way
  • 1980-2005 only
  • For h24 years ahead, 2 ways
  • 1980-2004, 1981-2005
  • For h23 years ahead, 3 ways
  • .
  • For h1 year ahead, 26 ways
  • 1980-1981, 1981-1982, , 2004-2005



33
Lots of cases to consider
  • The are 2524231325 separate cases to
    consider, each equally legitimate
  • Need some way to make use of all possibilities
    but consolidate results
  • We do so by computing p-values for each case and
    then work with mean p-values from each test
  • These are reported below for each age, for h5,
    10 and 15 years ahead



34
Age 65


35
Age 75


36
Age 85


37
Conclusions from these tests
  • All models perform well
  • No rejections at 1 SL
  • Only 3 at 5 SL



38
Overall conclusions
  • Study outlines a framework for backtesting
    forecasts of mortality models
  • As regards individual models and this dataset
  • M1, M3B, M5 and M7 perform well most of the time
    and there is little between them
  • M2B unstable
  • Of the Lee-Carter family of models, hard to
    choose between M1 and M3B
  • Of the CBD family, M7 seems to perform best
    little to choose between M5 and M7



39
Two other points stand out
  • In many but not all cases, and depending also on
    the model, there is evidence of an upward bias in
    forecasts
  • This is very pronounced for PC forecasts
  • This bias is less pronounced for PU forecasts
  • Except maybe for M2B, PU forecasts are more
    plausible than the PC forecasts
  • ? Very important to take account of param
    uncertainty more or less regardless of the model
    one uses



40
References
  • Cairns et al. (2007) A quantitative comparison
    of stochastic mortality models using data from
    England Wales and the United States. Pensions
    Institute Discussion Paper PI-0701, March
  • Cairns et al. (2008) The plausibility of
    mortality density forecasts An analysis of six
    stochastic mortality models. Pensions Institute
    Discussion Paper PI-0801, April.
  • Dowd et al. (2008a) Evaluating the goodness of
    fit of stochastic mortality models. Pensions
    Institute Discussion Paper PI-0802, September.
  • Dowd et al. (2008b) Backtesting stochastic
    mortality models An ex-post evaluation of
    multi-year-ahead density forecasts. Pensions
    Institute Discussion Paper PI-0803, September.
  • These papers are also available at
    www.lifemetrics.com


Write a Comment
User Comments (0)
About PowerShow.com