Estimating the Predictive Distribution for Loss Reserve Models - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Estimating the Predictive Distribution for Loss Reserve Models

Description:

Examples of S Expected value of future loss payments Second moment of future loss ... use the method Estimating the Predictive Distribution for Loss Reserve ... – PowerPoint PPT presentation

Number of Views:140
Avg rating:3.0/5.0
Slides: 60
Provided by: GlennM96
Learn more at: http://www.casact.org
Category:

less

Transcript and Presenter's Notes

Title: Estimating the Predictive Distribution for Loss Reserve Models


1
Estimating the Predictive Distribution for Loss
Reserve Models
  • Glenn Meyers
  • ISO Innovative Analytics
  • CAS Annual Meeting
  • November 14, 2007

2
SP Report, November 2003Insurance Actuaries A
Crisis in Credibility
  • Actuaries are signing off on reserves
  • that turn out to be wildly inaccurate.

3
Background to Methodology - 1
  • Zehnwirth/Mack
  • Loss reserve estimates via regression
  • y ax e
  • GLM EY f(ax)
  • Allows choice of f and the distribution of Y
  • Choices restricted to speed calculations
  • Clark Direct maximum likelihood
  • Assumes Y has an Overdispersed Poisson
    distribution

4
Background to Methodology - 2
  • Heckman/Meyers
  • Used Fourier transforms to calculate aggregate
    loss distributions in terms of frequency and
    severity distributions.
  • Hayne
  • Applied Heckman/Meyers to calculate distributions
    of ultimate outcomes, given estimate of mean
    losses

5
High Level View of Paper
  • Combine 1-2 above
  • Use aggregate loss distributions defined in terms
    of Fourier transforms to (1) estimate losses and
    (2) get distributions of ultimate outcomes.
  • Uses other information from data of ISO and
    from other insurers.
  • Implemented with Bayes theorem

6
Objectives of Paper
  • Develop a methodology for predicting the
    distribution of outcomes for a loss reserve
    model.
  • The methodology will draw on the combined
    experience of other similar insurers.
  • Use Bayes Theorem to identify similar
    insurers.
  • Illustrate the methodology on Schedule P data
  • Test the predictions of the methodology on
    several insurers with data from later Schedule P
    reports.
  • Compare results with reported reserves.

7
A Quick Description of the Methodology
  • Expected loss is predicted by chain ladder/Cape
    Cod type formula
  • The distribution of the actual loss around the
    expected loss is given by a collective risk (i.e.
    frequency/severity) model.

8
A Quick Description of the Methodology
  • The first step in the methodology is to get the
    maximum likelihood estimates of the model
    parameters for several large insurers.
  • For an insurers data
  • Find the likelihood (probability of the data)
    given the parameters of each model in the first
    step.
  • Use Bayes Theorem to find the posterior
    probability of each model in the first step given
    the insurers data.

9
A Quick Description of the Methodology
  • The predictive loss model is a mixture of each of
    the models from the first step, weighted by its
    posterior probability.
  • From the predictive loss model, one can calculate
    ranges or statistics of interest such as the
    standard deviation or various percentiles of the
    predicted outcomes.

10
The Data
  • Commercial Auto Paid Losses from 1995 Schedule P
    (from AM Best)
  • Long enough tail to be interesting, yet we expect
    minimal development after 10 years.
  • Selected 250 Insurance Groups
  • Exposure in all 10 years
  • Believable payment patterns
  • Set negative incremental losses equal to zero.

11
16 insurer groups account for one half of the
premium volume
12
Look at Incremental Development Factors
  • Accident year 1986
  • Proportion of loss paid in the Lag development
    year
  • Divided the 250 Insurers into four industry
    segments, each accounting for about 1/4 of the
    total premium.
  • Plot the payment paths

13
Incremental Development Factors - 1986
Incremental development factors appear to be
relatively stable for the 40 insurers that
represent about 3/4 of the premium. They are
highly unstable for the 210 insurers that
represent about 1/4 of the premium. The
variability appears to increase as size decreases
14
Do Incremental Development Factors Differ by Size
of Insurer?
  • Form loss triangles as the sum of the loss
    triangles for all insurers in each of the four
    industry segments defined above.
  • Plot the payment paths

15
There is no consistent pattern in aggregate loss
payment factors for the four industry segments.
Segment 1
Segment 3
Segment 2
Segment 4
16
Expected Loss Model
  • Paid Loss is the incremental paid loss in the AY
    and Lag
  • ELR is the Expected Loss Ratio
  • ELR and DevLag are unknown parameters
  • Can be estimated by maximum likelihood
  • Can be assigned posterior probabilities for
    Bayesian analysis
  • Similar to Cape Cod method in that the expected
    loss ratio is estimated rather than determined
    externally.

17
Distribution of Actual Loss around the Expected
Loss
  • Compound Negative Binomial Distribution (CNB)
  • Conditional on Expected Loss CNB(x EPaid
    Loss)
  • Claim count is negative binomial
  • Claim severity distribution determined externally
  • The claim severity distributions were derived
    from data reported to ISO. Policy Limit
    1,000,000
  • Vary by settlement lag. Later lags are more
    severe.
  • Claim Count has a negative binomial distribution
    with l EPaid Loss/EClaim Severity and c
    .01
  • See Meyers - 2007 The Common Shock Model for
    Correlated Insurance Losses for background on
    this model.

18
Claim Severity Distributions
Lags 5-10
Lag 4
Lag 3
Lag 2
Lag 1
19
Where
20
Likelihood Function for a Given
Insurers Losses
where
21
Maximum Likelihood Estimates
  • Estimate ELR and DevLag simultaneously by maximum
    likelihood
  • Constraints on DevLag
  • Dev1 Dev2
  • Devi Devi1 for i 2,3,,7
  • Dev8 Dev9 Dev10
  • Use Rs optim function to maximize likelihood
  • Read appendix of paper before you try this

22
Maximum Likelihood Estimates of Incremental
Development Factors
Loss development factors reflect the constraints
on the MLEs described in prior slide Contrast
this with the observed 1986 loss development
factors on the next slide
23
Incremental Development Factors - 1986(Repeat of
Earlier Slide)
Loss payment factors appear to be relatively
stable for the 40 insurers that represent about
3/4 of the premium. They are highly unstable for
the 210 insurers that represent about 1/4 of the
premium. The variability appears to increase as
size decreases
24
Maximum Likelihood Estimates of Expected Loss
Ratios
Estimates of the ELRs are more volatile for the
smaller insurers.
25
Testing the Compound Negative Binomial (CNB)
Assumption
  • Calculate the percentiles of each observation
    given EPaid Loss.
  • 55 observations for each insurer
  • If CNB is right, the calculated percentiles
    should be uniformly distributed.
  • Test with PP Plot
  • Sort calculated percentiles in increasing order
  • Vector (1n)/(n1) where n is the number of
    percentiles
  • The plot of the above two vectors against each
    other should be on the diagonal line.

26
Interpreting PP Plots
Take 1000 lognormally distributed random
variables with m 0 and s 2 as data If a
whole bunch of predicted percentiles are at the
ends, the predicted tail is too light. If a whole
bunch of predicted percentiles are in the middle,
the predicted tail is too heavy. If in general
the predicted percentiles are low, the predicted
mean is too high
27
Testing the CNB AssumptionsInsurer Ranks 1-40
(Large Insurers)
This sample has 5540 or 2200 observations.
According to the Kolomogorov-Smirnov test, D
statistic for a sample of 2200 uniform random
numbers should be within 0.026 of the 45º line
95 of the time. Actual D statistic 0.042. As
the plot shows, the predicted percentiles are
slightly outside the 95 band. We are close.
28
Testing the CNB AssumptionsInsurer Ranks 1-40
(Large Insurers)
Breaking down the prior plot by settlement lag
shows that there could be some improvement by
settlement lag. But in general, not bad!
pp plots by settlement lag
29
Testing the CNB AssumptionsInsurer Ranks 41-250
(Smaller Insurers)
This is bad!
pp plots by settlement lag
30
Using Bayes Theorem
  • Let W ELR, DevLag, Lag 1,2,,10 be a set of
    models for the data.
  • A model may consist of different models or of
    different parameters for the same model.
  • For each model in W, calculate the likelihood of
    the data being analyzed.

31
Using Bayes Theorem
  • Then using Bayes Theorem, calculate the
    posterior probability of each parameter set given
    the data.

32
Selecting Prior Probabilities
  • For Lag, select the payment paths from the
    maximum likelihood estimates of the 40 largest
    insurers, each with equal probability.
  • For ELR, first look at the distribution of
    maximum likelihood estimates of the ELR from the
    40 largest insurers and visually smooth out the
    distribution. See the slide on ELR prior below.
  • Note that Lag and ELR are assumed to be
    independent.

33
Prior Distribution of Loss Payment Paths
Prior loss payment paths come from the loss
development paths of the insurers ranked 1-40,
with equal probability Posterior loss payment
path is a mixture of prior loss development paths.
34
Prior Distribution of Expected Loss Ratios
The prior distribution of expected loss ratios
was chosen by visual inspection.
35
Predicting Future Loss PaymentsUsing Bayes
Theorem
  • For each model, estimate the statistic of choice,
    S, for future loss payments.
  • Examples of S
  • Expected value of future loss payments
  • Second moment of future loss payments
  • The probability density of a future loss payment
    of x,
  • The cumulative probability, or percentile, of a
    future loss payment of x.
  • These examples can apply to single (AY,Lag)
    cells, of any combination of cells such as a
    given Lag or accident year.

36
Predicting Future Loss PaymentsUsing Bayes
Theorem forSums over Sets of AY,Lag
  • If we assume losses are independent by AY and Lag
  • Actually use the negative multinomial
    distribution
  • Assumes correlation of frequency between lags in
    the same accident year

37
Predicting Future Loss Payments Using Bayes
Theorem
  • Calculate the Statistic S for each model.
  • Then the posterior estimate of S is the model
    estimate of S weighted by the posterior
    probability of each model

38
Sample Calculations for Selected Insurers
  • Coefficient of Variation of predictive
    distribution of unpaid losses.
  • Plot the probability density of the predictive
    distribution of unpaid losses.

39
Predictive DistributionInsurer Rank 7
Predictive Mean 401,951 K CV of Total Reserve
6.9
40
Predictive DistributionInsurer Rank 97
Predictive Mean 40,277 K CV of Total Reserve
12.6
41
CV of Unpaid Losses
42
Validating the Model on Fresh Data
  • Examined data from 2001 Annual Statements
  • Both 1995 and 2001 statements contained losses
    paid for accident years 1992-1995.
  • Often statements did not agree in overlapping
    years because of changes in corporate structure.
    We got agreement in earned premium for 109 of the
    250 insurers.
  • Calculated the predicted percentiles for the
    amount paid 1997-2001
  • Evaluate predictions with pp plots.

43
PP Plots on Validation Data
KS 95 critical values 13.03
44
Feedback
  • If you have paid data, you must also have the
    posted reserves. How do your predictions match
    up with reported reserves?
  • In other words, is SP right?
  • Your results are conditional on the data reported
    in Schedule P. Shouldnt an actuary with access
    to detailed company data (e.g. case reserves) be
    able to get more accurate estimates?

45
Response Expand the Original Scope of the Paper
  • Could persuade more people to look at the
    technical details.
  • Warning Do not over-generalize the results
    beyond commercial auto in 1995-2001 timeframe.

46
Predictive and Reported Reserves
  • For the validation sample, the predictive mean
    (in aggregate) is closer to the 2001
    retrospective reserve.
  • Possible conservatism in reserves. OK?
  • means reported over the predictive mean.
  • Retrospective reported less paid prior to end
    of 1995.

47
Predictive Percentiles of Reported Reserves
  • Conservatism is not evenly spread out.
  • Conservatism appears to be independent of insurer
    size
  • Except for the evidence of conservatism, the
    reserves are spread out in a way similar to
    losses.
  • Were the reserves equal to ultimate losses?

48
Reported Reserves More Accurate?
  • Divide the validation sample in to two groups and
    look at subsequent development.
  • 1. Reported Reserve lt Predictive Mean
  • 2. Reported Reserve gt Predictive Mean
  • Expected result if Reported Reserve is accurate.
  • Reported Reserve Retrospective Reserve for each
    group
  • Expected result if Predictive Mean is accurate?
  • Predictive Mean ? Retrospective Reserve for each
    group
  • There are still some outstanding losses in the
    retrospective reserve.

49
Subsequent Reserve Changes
Group 1
Group 2
  • Group 1
  • 50-50 up/down
  • Ups are bigger
  • Group 2
  • More downs than ups
  • Results are independent of insurer size

50
Subsequent Reserve Changes
  • The CNB formula identified two groups where
  • Group 1 tends to under-reserve
  • Group 2 tends to over-reserve
  • Incomplete agreement at Group level
  • Some in each group get it right
  • Discussion??

51
Main Points of Paper
  • How do we evaluate stochastic loss reserve
    formula?
  • Test predictions of future loss payments
  • Test on several insurers
  • Main Focus
  • Are there any formulas that can pass these tests?
  • Bayesian CNB does pretty good on CA Schedule P
    data.
  • Uses information from many insurers
  • Are there other formulas? This paper sets a bar
    for others to raise.

52
Subsequent Developments
  • Paper completed in April 2006
  • Additional critique
  • Describe recent developments
  • Describe ongoing research

53
PP Plots on Validation DataClive Keatinges
Observation
  • Does the leveling of plots at the end indicate
    that the predicted tails are too light?
  • The plot is still within the KS bounds and thus
    is not statistically significant.
  • The leveling looks rather systematic.

54
Alternative to the KSAnderson-Darling Test
  • AD is more sensitive to tails.
  • Critical values are 1.933, 2.492, and 3.857 for
    10, 5 and 1 levels respectively.
  • Value for validation sample is 2.966
  • Not outrageously bad, but Clive has a point.
  • Explanation Did not reflect all sources of
    uncertainty??

55
Is Bayesian Methodology Necessary?
  • Thinking Outside the Triangle
  • Paper in June 2007 ASTIN Colloquium
  • Works with simulated data on a similar model
  • Compares Bayesian with maximum likelihood
    predictive distributions

56
Maximum Likelihood Fitting MethodologyPP Plots
for Combined Fits
  • PP plot reveals the S-shape that characterizes
    overfitting.
  • The tails are too light

57
Bayesian Fitting MethodologyPP Plots for
Combined Fits
Nailed the Tails
58
IN THIS EXAMPLE
  • Maximum Likelihood method understates the true
    variability
  • I call this overfitting i.e. the model fits
    the data rather than the population
  • Nine parameters fit to 55 points
  • SPECULATION Overfitting will occur in all
    maximum likelihood methods and in moment based
    methods
  • i.e. GLM and Mack

59
Expository Paper in Preparation
  • Focus on the Bayesian method described in this
    paper
  • Uses Gibbs sampler to simulate posterior
    distribution of the results
  • Complete algorithm coded in R
  • Hope to increase population of actuaries who
  • Understand what the method means
  • Can actually use the method
Write a Comment
User Comments (0)
About PowerShow.com