Title: Backtesting Stochastic Mortality Models: An Ex-Post Evaluation of Multi-Period-Ahead Density Forecasts Kevin Dowd (CRIS, NUBS) Andrew J. G. Cairns (Heriot-Watt) David Blake (Pensions Institute, Cass Business School) Guy D. Coughlan (JPMorgan) David
1Backtesting Stochastic Mortality Models An
Ex-Post Evaluation of Multi-Period-Ahead Density
ForecastsKevin Dowd (CRIS, NUBS) Andrew J. G.
Cairns (Heriot-Watt)David Blake (Pensions
Institute, Cass Business School)Guy D. Coughlan
(JPMorgan)David Epstein (JPMorgan)Marwa
Khalaf-Allah (JPMorgan)4th International
Longevity Risk and Capital Market Solutions
ConferenceAmsterdam September 2008
2Purposes of Paper
- To set out a framework to backtest the forecast
performance of mortality models - Backtesting evaluation of forecasts against
subsequently realised outcomes - To apply this backtesting framework to a set of
mortality models - How well do they actually perform?
3Background
- This study is the fourth in a series involving a
collaboration between Blake, Cairns and Dowd and
the LifeMetrics team at JPMorgan - Involves actuaries, economists and investment
bankers - Of course, it is very easy (and fun!) to attack
the forecasting abilities of actuaries
(remember Equitable?) and investment bankers
(remember subprime? etc), but we should remember
4Its not just actuaries and investment bankers who
cant forecast
5Background
- Cairns et alia (2007) examines the empirical fits
of 8 different mortality models applied to EW
and US male mortality data - Compares model performance
- Uses a range of qualitative criteria (e.g.,
biological reasonableness, etc) - Uses a range of quantitative criteria (e.g.,
Bayes information criterion)
6Models considered
- Model M1 Lee-Carter, no cohort effect
- Model M2 Renshaw-Habermans 2006 cohort effect
generalisation of M1 - Model M3 Curries age-period-cohort model
- Model M4 P-splines model, Currie 2004
- Model M5 CBD two-factor model, Cairns et al
(2006), no cohort effect - Models M6, M7 and M8 alternative cohort-effect
generalisations of CBD
7Second study, Cairns et al (2008)
- Examines ex ante plausibility of models density
forecasts - M4 (P-Splines not considered)
- Amongst other conclusions, finds that M8 (which
did very well in first study) gives very
implausible forecasts for US data - Hence, decided to drop M8 as well
- Thus, a model might fit past data well but still
give unreliable forecasts - ? Not enough just to look at past fits
8Third study, Dowd et al (2008a)
- Examines the Goodness of Fits of models M1, M2B,
M3B, M5, M6 and M7 more systematically - M2B is a special case of M2, which uses an
ARIMA(1,1,0) for cohort effect - M3B is a special case of M3, which the same
ARIMA(1,1,0) for cohort effect - Basic idea to unravel the models testable
implications and test them systematically - Finds some problems with all models but M2B
unstable
9Motivation for present study
- A model might
- Give a good fit to past data and
- Generate density forecasts that appear plausible
ex ante - And still produce poor forecasts
- Hence, it is essential to test performance of
models against subsequently realised outcomes - This is what backtesting is about
- In the end, it is the forecast performance that
really matters - Would you want to drive a car that hadnt been
field-tested?
10Backtesting framework
- Choose metric of interest
- Could choose mortality rates, survival rates,
life expectancy, annuity prices etc. - Select historical lookback window used to
estimate model params - Select forecast horizon or lookforward window for
forecasts - Implement tests of how well forecasts
subsequently performed
11Backtesting framework
- We choose focus mainly on mortality rate as
metric - We choose a fixed 10-year lookback window
- This seems to be emerging as the standard amongst
practitioners - We examine a range of backtests
- Over contracting horizons
- Over expanding horizons
- Over rolling fixed-length horizons
- Future mortality density tests
12Backtesting framework
- We consider forecasts both with and without
parameter uncertainty - Parameter certain case treat estimates of
parameters as if known values - Parameter uncertain case forecast using a
Bayesian approach that allows for uncertainty in
parameter estimates - Allows for uncertainty in parameters governing
period and cohort effects - Results indicate it is very important to allow
for parameter uncertainty
13Contracting horizon BT age 65
14Contracting horizon BT age 75
15Contracting horizon BT age 85
16Conclusions so far
- Big difference between PC and PU forecasts
- PU prediction intervals usually considerably
wider than PC ones - M2B sometimes unstable
- Now consider expanding horizon predictions
17Prediction-Intervals from 1980 age 65
18Prediction-Intervals from 1980 age 75
19Prediction-Intervals from 1980 age 85
20Expanding PI conclusions
- PC models have far too many lower exceedances
- PU models have exceedances that are much closer
to expectations - Especially for M1, M7 and M3B
- Suggests that PU forecasts are more plausible
than PC ones - Negligible differences between PC and PU median
predictions - Very few upper exceedances
21Expanding PI conclusions
- Too few upper exceedances, and two many median
and lower exceedances - ? some upward bias, especially for PC forecasts
- This upward bias is especially pronounced for PC
forecasts - Evidence of upward bias less clearcut for PU
forecasts
22Rolling Fixed Horizon Forecasts
- From now on, work with PU forecasts only
- Assume illustrative horizon 15 years
- Now examine performance of each model in turn
23Model M1
24Model M2B
25Model M3B
26Model M5
27Model M6
28Model M7
29Tentative conclusions so far
- Rolling PI charts broadly consistent with earlier
results - Some evidence of upward bias but not consistent
across models or always especially compelling - M2B again shows instability
30Mortality density tests
- Choose age (e.g., 65) and horizon (e.g., 15 years
ahead) - Use model to project pdf (or cdf) of mortality
rate 15 years ahead - Plot realised q on to pdf/cdf
- Obtain associated p-value (or PIT value)
- Reject if p is too far out in either tail
31Example P-Values of Realised Mortality Males
65, 1980 Start, Horizon 26 Years Ahead
32Many ways to do this
- For h25 years ahead 1 way
- 1980-2005 only
- For h24 years ahead, 2 ways
- 1980-2004, 1981-2005
- For h23 years ahead, 3 ways
- .
- For h1 year ahead, 26 ways
- 1980-1981, 1981-1982, , 2004-2005
33Lots of cases to consider
- The are 2524231325 separate cases to
consider, each equally legitimate - Need some way to make use of all possibilities
but consolidate results - We do so by computing p-values for each case and
then work with mean p-values from each test - These are reported below for each age, for h5,
10 and 15 years ahead
34Age 65
35Age 75
36Age 85
37Conclusions from these tests
- All models perform well
- No rejections at 1 SL
- Only 3 at 5 SL
38Overall conclusions
- Study outlines a framework for backtesting
forecasts of mortality models - As regards individual models and this dataset
- M1, M3B, M5 and M7 perform well most of the time
and there is little between them - M2B unstable
- Of the Lee-Carter family of models, hard to
choose between M1 and M3B - Of the CBD family, M7 seems to perform best
little to choose between M5 and M7
39Two other points stand out
- In many but not all cases, and depending also on
the model, there is evidence of an upward bias in
forecasts - This is very pronounced for PC forecasts
- This bias is less pronounced for PU forecasts
- Except maybe for M2B, PU forecasts are more
plausible than the PC forecasts - ? Very important to take account of param
uncertainty more or less regardless of the model
one uses
40References
- Cairns et al. (2007) A quantitative comparison
of stochastic mortality models using data from
England Wales and the United States. Pensions
Institute Discussion Paper PI-0701, March - Cairns et al. (2008) The plausibility of
mortality density forecasts An analysis of six
stochastic mortality models. Pensions Institute
Discussion Paper PI-0801, April. - Dowd et al. (2008a) Evaluating the goodness of
fit of stochastic mortality models. Pensions
Institute Discussion Paper PI-0802, September. - Dowd et al. (2008b) Backtesting stochastic
mortality models An ex-post evaluation of
multi-year-ahead density forecasts. Pensions
Institute Discussion Paper PI-0803, September. - These papers are also available at
www.lifemetrics.com