CAS Predictive Modeling Seminar Evaluating Predictive Models - PowerPoint PPT Presentation

About This Presentation
Title:

CAS Predictive Modeling Seminar Evaluating Predictive Models

Description:

30 year old, single car with no SDIP points. 500 deductible ... Most often used to measure income and/or wealth inequality. Search for 'Gini' in wikipedia.org ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 21
Provided by: glennm1
Category:

less

Transcript and Presenter's Notes

Title: CAS Predictive Modeling Seminar Evaluating Predictive Models


1
CAS Predictive Modeling SeminarEvaluating
Predictive Models
  • Glenn Meyers
  • ISO Innovative Analytics
  • October 5, 2006

2
Choosing Models
  • Predicting losses for individual insurance
    policies involves
  • Millions of policy records
  • Hundreds (or thousands) of variables
  • There are a number of models that provide good
    predictions
  • GLM, GAM, CART, MARS, Neural Nets, etc.
  • Business objectives influence choice of model

3
The Modeling Process
  • Modeling process involves dimension reduction
    techniques
  • Clustering, Principal Components, Factor Analysis
  • Building submodels and using predicted values as
    input into a higher level model
  • The modeling cycle
  • 1. Build model with training data
  • 2. Evaluate model with test data
  • 3. Identify improvements in models and data
  • 4. Go back to Step 1

4
Hidden Parameters
  • Classic model building methods correct for the
    number of parameters using degrees of freedom.
  • The model exploration process eats up degrees of
    freedom in ways that cannot be captured by
    formal model adjustments.
  • In essence the test data gets merged into the
    training data.

5
What Is Significant?
  • Statistical packages will often identify
    improvements that are statistically significant
    but not practically significant.
  • This talk is about determining when a model
    identifies practically significant
    improvements.
  • Illustrate how to do this on a real example.

6
The ExampleA Personal Auto Model Under
Development Preliminary Results
  • Input Address of insured vehicle
  • Output Address Specific Loss Cost
  • 30 year old, single car with no SDIP points
  • 500 deductible or 25/50/25 policy limits
  • Symbol 8, model year 2006
  • etc.
  • Model derived from over 1,200 variables
    reflecting weather, traffic, demographic,
    topographical and economic conditions.

7
Difference Between Address Specific and ISO
Territory Loss Cost
8
Differences Abound
  • Some Questions to Ask
  • Can the model output be used to improve insurer
    underwriting results?
  • Are the results statistically significant?
  • Define ELI

9
Use Expected Loss Index for Risk Selection
10
Propose a Standard Way of Evaluating Lift The
Gini Index
  • Originally proposed by Corrado Gini in 1912
  • Most often used to measure income and/or wealth
    inequality
  • Search for Gini in wikipedia.org
  • In insurance underwriting, we want to evaluate
    systematic methods of finding loss inequality.

11
Gini Index
  • Look at set of policy records below cutoff point,
    ELI lt 1.
  • This set of records accounts for 59 of total ISO
    (full) loss cost.
  • This set of records accounts for 48 of total
    loss.
  • 1 - 48/59 ? 19 reduction in loss ratio.

12
Gini Index
  • Do this calculation for other cutoff points.
  • The results make up the what we call the Lorenz
    Curve

13
Gini Index
  • If ELI is random, the Lorenz curve will be on the
    diagonal line.
  • The Gini index is the percentage of the area
    under the random line that is above the Lorenz
    curve.
  • Higher Gini means better predictive model.

14
A Gini Index Thought Experiment
  • If we had the ability to predict who will have
    losses, what would the Gini index be?
  • It would be 100 if only one risk had all the
    losses

15
Bodily Injury
16
Property Damage
17
Collision
18
Statistical Significance
  • How much random fluctuation is in the Gini index
    calculation?
  • Use bootstrapping to evaluate
  • Take a random sample of records, with
    replacement.
  • Calculate Gini index for the sample.
  • Repeat 250 times.
  • Plot a histogram of the results.

19
Bootstrap Results
20
Summary
  • Standard tests of statistical significance are
    suspect.
  • Informal model selection process
  • Statistical/Practical significance
  • Propose Gini index as a test of practical
    significance.
  • Divide data into three samples
  • Training Used to fit models
  • Test Used to evaluate fits
  • Holdout Final evaluation

R2
Write a Comment
User Comments (0)
About PowerShow.com