Title: Pr
1(No Transcript)
2New Developments in Predictive Modeling
Jonathan Shreve, FSA, MAAA Principal and
Consulting Actuary Milliman
3SOMMAIRE/ SUMMARY
Overview Optimal Use of Risk
Adjusters Lifestyle-Based Prediction
4- Predictive Modeling
- Methods to predict expected claims costs
- Uses historical data and calibrated models
- Many uses in health insurance context
- Renewal underwriting
- Cost impact modeling
- Payment equalization
- Care management
5- Health Insurance Market in the United States
- Individual
- Small group (2-50 employees)
- Medium group (51-500 employees)
- Large group (gt 500 employees)
6- Risk Adjusters Overview
- Risk adjusters measure morbidity
- Used for adjusting payments (Medicare),
predictive modeling (SG rating), and medical
management (DM) - Function of age, gender, and claim history
(diagnoses and services - medical and/or Rx). - ERG, ACG, DxCG, etc.
7- Risk Adjusters Overview
- Claim detail is sorted and formatted
- Software assigns members to relatively broad
diagnosis categories (e.g. Symmetry has 120
categories called Episode Risk Groups (ERGs)) - Output file (array) of 0s and 1s under each
demographic category and each condition category
for each member - Regression to fit actual costs to array of 0s and
1s - Other risk adjusters
8- Risk Adjusters Theoretical Value
9- United States Small Group Underwriting
- Small group rating
- Health insurance coverage
- Small group 2 to 50 employees
- Guaranteed Issue
- Limits on rate adjustments due to health status
- Limits on rates offered to different groups
10- Introduction Real World Considerations
- Delay between when rates are developed and the
rating period - Incomplete data (IBNR)
- Rating limits (total Health Status Factor and
changes) - Turnover
- Competing against carriers new business methods,
not their renewal methods
11- Introduction Prior Studies
- Society of Actuaries Report (May, 2002 Cummings
et al) - Society of Actuaries Health Section Council
Article (Aug, 2003 Ellis - DxCGs) - Society of Actuaries Report (Summer 2006)
12- Society of Actuaries Assessment of Available
Claims Based Predictive Modeling/Risk Adjuster
Tools - Objective analysis of predictive power of
commercially available risk adjusters - Updates 2002 study
- Measures , MAPE, and grouped statistics
(including fit within disease category)
13- Society of Actuaries Assessment of Available
Claims Based Predictive Modeling/Risk Adjuster
Tools - Vendors/Products Included
- Company Product
- Ingenix Episode Risk Groups (ERGs)
- Ingenix Pharmacy Risk Groups (PRGs)
- Ingenix Impact Pro
- Johns Hopkins Adjusted Clinical Groups (ACGs)
- UCSD, Todd Gilmer Medicaid Rx
- MedAI MedAI
- DxCG Diagnostic Cost Groups (DCGs)
- DxCG RxGroups
- DxCG Underwriting Models
- 3M Clinical Risk Groups
14- Society of Actuaries Assessment of Available
Claims Based Predictive Modeling/Risk Adjuster
Tools - Biggest changes from prior study
- New tools (i.e. MedAI)
- Improvement in tools
- Use of prior costs in some models
- Results with data lag
15- Publicly Available Risk Adjusters
- Medicaid Rx
- RxRisk
- CDPS
- Information from 2002 Study A Comparative
Analysis of Claims-Based Methods of Health Risk
Assessment for Commercial Populations, - Cumming/Knutson/Cameron/Derrick
- Some restrictions on use may exist
16- Publicly Available Risk Adjusters
- Medicaid Rx
- Pharmacy based risk assessment model developed by
Todd Gilmer and other at Univ. of California - Assigns each member to one or more of 45
condition categories based on prescription drugs
used - Assigns each member to one of 11 age/gender
categories - Predicts overall costs for each member
- Includes separate sets of weights for adults and
children
17- Publicly Available Risk Adjusters
- Rx Risk
- Pharmacy based risk assessment model developed by
Paul Fishman at Group Health Cooperative of Puget
Sound - Assigns each member to one or more of 27 medical
condition categories for adults, and up to 42 for
children - Assigns members to one of 22 age/gender
categories - Predicts total medical costs for each member
18- Publicly Available Risk Adjusters
- CDPS (www.medicine.ucsd.edu/fpm/cdps)
- Diagnosis based risk assessment model developed
by Richard Kronick and others at the Univ. of
California - Orignally intended for use with Medicaid,
including disabled and Temporary Aid for Need
Familites (TANF) populations - Assigns members to up to 67 possible medical
condition categories - Assigns members to one of 16 age/gender
categories - Predicts total medical costs
- Model contains different sets of weights for
adults and children
19- Milliman Research
- Optimal Renewal Guidelines
- Goal of Research
- Understand current small group renewal practices
- Identify optimal renewal methodologies
20- Introduction Survey Results
- What methods are currently practiced to rate
small groups at renewal? - Surveyed 21 carriers on SG methods
- 30 of carriers used risk adjusters
- 60 of groups
21- Introduction Main Components
- Individualized Data Analysis
- Carrier Analysis
- Competitive Simulation
22- Introduction Individualized Data
- Large multicarrier database used to review
individual predictions - Advantages
- Large database
- Good geographical representation
- Disadvantages
- No group identifiers
- Manual rate unavailable
23- Introduction Carrier Data
- Advantages
- Actual Group Data
- Group Manual Rates Available
- Disadvantages
- Medium sized data set
- Geographical concentration
- Biased
24- Models Loss Ratio Model
- 1st Renewal
- 2nd Renewal
25- Models Risk Adjuster Model
- 1st Renewal
- 2nd Renewal
26 - Models Service Category Model
- 1st Renewal
27- Results Error Measures
- R-Squared - of variance from the mean explained
by rating variables - MAPE - Absolute error as of total costs
28 29- Results Error Calculation Example
- Small Group ABC
- Traditional Prediction 150
- Risk Adjuster Prediction 125
- Actual Claims equal 120 of manual
- Which method is better?
- Error / R-squared?
30- Results Credibility Weights
- 1st Renewal, Individual Analysis
Svc category 2 IP, 24 OP, 18 Rx
31- Results R-square
- R-Square vs. Rating Caps (Group Size 10)
32- Results Mean Absolute Prediction Error (as )
- MAPE vs. Rating Caps (Group Size 10)
33- Results Mean Absolute Prediction Error (as )
- MAPE vs. Group Size (Rating Cap 35)
34- Results Mean Absolute Prediction Error (as )
- MAPE vs. Group Size (Uncapped)
35- Results Carrier Analysis
- Real groups
- Turnover
- Biased sample
- Traditional / Risk Adjuster very similar!
- Health status correlation
36- Competitive Simulation Introduction
- Based on carrier data
- Excel model - stochastic
- First renewal with 9 months of historic claims.
- New business method accuracy simulated relative
to renewal method accuracy (less accurate) - New business quotes generated stochastically
(Bayesian from renewal quote distribution) with
some correlation among different carriers
37- Competitive Simulation Results
- Small improvements in new business methods
significantly increase profitability for new
business and hurt profitability for renewal - Very sensitive to point at which group seeks new
business quotes (try to keep your groups from
getting quotes!) - Number of competing quotes is important.
- Accuracy and results are sensitive to credibility
of risk adjuster and/or historic experience
components
38- Research Conclusions
- Marginal value of improvements decrease as
allowable rate variation decreases, and as group
size increases - New business is less profitable than renewal
business. Dont chase the wrong groups away. - Competitive results are very sensitive to
accuracy of new business methods - Credibility is affected by accuracy / explanatory
power of manual rate and level of health status
correlation
39- Recommendations
- Understand effects of rating environment
- Fundamentals (Blocking Tackling)
- Objectively analyze what prediction method is
right for you. It may be that multiple methods
are most appropriate (state, group size, costs,
etc). - Use all relevant data / information on a group.
- Understand what your competitors are doing with
new business - Assign credibility explicitly and carefully.
- Use a rigorous, systematic method to develop
renewal quotes, with appropriate, efficient
manual intervention. - Capture all information on each renewal quote and
what happens with group. Analyze data and modify
your approach.
40- Lifestyle-Based Prediction
41- The US Surgeon General
- 70 of the diseases and subsequent deaths
- in the U.S. are lifestyle-based
- The Centers for Disease Control
- Lifestyle-based chronic diseases account
- for 75 of the United States 1.4 trillion
medical care costs
42- Definition of Lifestyle Diseases
- Lifestyle diseases (also called diseases of
longevity or diseases of civilization) are
diseases that appear to increase in frequency as
countries become more industrialized and people
live longer. (WHO) - Lifestyle disease is a disease associated with
the way a person or group of people lives. - Lifestyle diseases include atherosclerosis, heart
disease, and stroke obesity and type 2 diabetes
diseases associated with smoking, alcohol, and
drug abuse. Regular physical activity helps
prevent obesity, heart disease, hypertension,
diabetes, colon cancer, and premature mortality.
- (Stedmans Medical
Dictionary)
43- Lifestyle-Based Diseases
- Lifestyle-Based Diseases/Conditions
- Diabetes
- Hypertension
- Cardiovascular
- Stroke
- COPD
- Most cancers
- Some mental health Depression, Alzheimers, etc.
- Others Osteoporosis, Arthritis, Back Pain, etc.
- Maternity
44- Lifestyle-Based Diseases
- Correlation between Lifestyle and Cancer
Source American Cancer Society
45- 2004 INTERHEART Study
- Over 90 of the risk of a heart attack
(myocardial infarction) is attributed to
lifestyle factors - Factors include abnormal lipids, smoking,
hypertension, abdominal obesity, consumption of
fruits and vegetables, alcohol and regular
physical activity - Family history thought by many to be the major
risk, only accounts for 1 of the population
attributable risk
46- Lifestyle Based Prediction (LBP)
- Most healthcare costs are driven by lifestyle
choices - Claims data does not reflect lifestyle
- How else can we gather this information?
47- Lifestyle-Based Prediction (LBP)
- Lifestyle-Based Prediction is based on strong
correlations that exist between lifestyle-based
behaviors and diseases in particular,
lifestyle-based diseases - LBP switches the method of detection focus from
poorly correlated medical events to highly
correlated lifestyle behaviors
48- Challenges in Predictive Modeling
- Predictive models are only as good as the data
that drive them - Challenge 1 New business
- Challenge 2 High employee turnover
- Challenge 3 Data consolidation
- Challenge 4 Increase in lifestyle diseases
49- Development of Lifestyle-Based Prediction Models
- Over 700 fields of lifestyle-based data are
appended to two data sets - Individuals with a disease state
- Base group average representation of the group
at large - Clinical datasets development
- Various models are tested including linear
regression, logistical regression, CHAID
analysis, discriminative analysis, Bayesian
methods, and cluster analysis
50- Ties Between Lifestyles and Diseases
- Two types of statistical principles used in LBP
- Correlation Lifestyle-based behaviors which
will result in a higher propensity for an
individual to have the disease - Obesity and latent lifestyle promote diabetes
- Causality There are lifestyle-based behaviors
that exist or change as a result of the disease - Once diagnosed with diabetes, you become a diet
food purchaser
51- Lifestyle-Based Prediction Example
52- Maternity Example
- Traditional maternity factors are based on
age/sex/geographic/family enrollment - In fact, a simple Bayesian model using number and
ages of children can lift results by over 40 - Lifestyle-Based Prediction can dramatically
improve accuracy by including number and ages of
children, financial indicators, household living
parameters, etc.
53- Early Disease Detection Study (EDDS)
- Screening Data
- Over 100,000 patient screening records per
condition - Abdominal Aortic Aneurysm (AA Screening)
- Carotid Artery Disease (CA Ultrasound)
- Congestive Heart Failure (Cardiac Echo)
- Diabetes (Fasting Plasma Glucose)
- Osteoporosis (Bone Densitometer)
- Peripheral Arterial Disease (Ankle Brachial
Index)
54- Early Disease Detection Study (EDDS)
- Health Information
- Health History
- 45 Personal health history elements
- Medical histories stroke, heart attack, CAD,
etc. - Medical procedures improve blood flow to heart
or legs, prior screenings, medications, etc. - Medical symptoms chest pain, loss of speech,
blurred vision, etc. - 10 Family history elements
- Medical conditions
- Medical procedures
55- Early Disease Detection Study (EDDS)
- Lifestyle Information
- Lifestyle Elements
- 8 Exercise elements
- How often do you exercise
- What types of exercise
- 5 Tobacco elements
- 8 Nutritional elements
- Caffeine intake
- Calcium intake
- Fast food intake
- Food group intake
56- Early Disease Detection Study (EDDS) Results
- Predictive coefficients for the 21
lifestyle-based elements were relatively equal to
the 55 health elements in all six cases - Minimum Coronary Artery Disease
- Lifestyle-based elements relatively equal to the
health history elements on stand alone basis - Maximum Osteoporosis
- Lifestyle elements have twice the potential to
affect the score compared to health history
elements - Combination of lifestyle with health elements
increased health risk identification by over 45
(as defined by R-squared)
57- Currently in Place
- Applications and enrollment forms
- Individuals and groups
- Family information
- Age, sex and age differences in family members
- Employment
- Job description
- Height/weight
- Commute time
- Geography
58- HRAs and Other Surveys
- Excellent source for lifestyle-based data
- Several key problems
- Expensive to administer (gt10/member)
- Additional cost tied to participation incentives
- Poor participation rates
- Questionable results on the unhealthiest
population - Timing issues for new business/members
59- Publicly Available Consumer Data
- Who, What, Where Why
60- Consumer Data in the United States
- The plethora of consumer data has dramatically
changed our way of interacting with consumers - Consumer data measured in Disk Storage per Person
(DSPS) - 1985 0.02 Mbytes/yr
- 1995 26 Mbytes/yr
- 2005 3,500 Mbytes/yr
61- Consumer Data Why?
- Primarily used for marketing, customer service
and fraud purposes - United States Graham-Leach-Bliley Act of 1999
- Requires opt-out
- Permitted by law
- Joint marketing agreements
62- Consumer Data Where?
- Government Public Records
- Census
- Financial Services
- Surveys
- Warranties
- Loyalty Programs
- Internet Purchases
- Subscriptions
63- Consumer Data Who?
- 95 of U.S. Households
- Historically household-based
- Newest trend individual-based
- Observed
- Implied
64- Consumer Data What?
- Traditional Demographics
- Age, sex, race, etc.
- Financial
- Homeowner, credit score, mortgage/auto/credit
card balances, etc. - Household
- Marriage status, number andages of children,
etc.
65- Consumer Data What?
- Physical inactiveness
- Television time, computer time, board games,
stamp and coin collecting, etc.
- Lifestyle-Based Elements
- Physical activeness
- Running, walking, cycling, aerobics, golf,
tennis, etc.
66- Consumer Data What?
- Lifestyle-Based Elements
- Food purchases
- Fast food, diet food, gourmet, vegetarian, etc.
- Wine and other alcohol
- Self improvement
- Health fitness, dieting/weight loss, etc.
- Mental wellness, personal improvement, etc.
67- Consumer Data What?
- Lifestyle-Based Elements
- Tobacco
- Occupation
- Travel
- Motor vehicle type
- Recreational vehicles
- Other
68- The Expense of Consumer Data
- Medical Data Costs
- MIB, Rx, historical medical, etc. start at about
10.00 per individual and go up - Consumer Data Costs
- Rapidly decreasing in price due to fierce
competition - 5 years ago 100 data elements cost 2.00/head
- Today over 500 data elements cost 0.25/head
- The data needed for medical modeling costs about
0.10/head or less
69- Practical Applications
- Individual
- Small group (2-50 employees)
- Medium group (51-500 employees)
- Large group (gt500 employees)
70- Practical Applications Tele-underwriting
- Determiner of At Risk population
- Who to call
- Identifier of Risk Conditions
- What questions to ask
71- Practical Application Preferred Risk
- Determination of Jet Issue Application
- Clean application plus healthy score
- Determination of Preferred Status
- Current techniques rely on clean application plus
what? - Lifestyle indicators provide the best what
72- Massive Consumer Database
- Over 55 million records in the US
- Every US adult over the age of 50
- Over 500 fields of lifestyle-based data
- Updated monthly
- Scored for marketing and health risk status
monthly - Looking at real-time hosted applications
73- Cancer Policy Example
- Model Objective
- Develop Models to Identify the Most Risky Cancer
Policies in Terms of Claims and Track the Quality
of Portfolio - Rank Customers by Their Likelihood to Have Claims
in the Next 2.5 Years - Used in Conjunction With the Underwriting Rules
to Validate and Improve Underwriting Process
74- Risk Model Logistic Regression
- The risk model was based on the comparison of key
customer demographics and lifestyle
characteristics of policyholders or applicants
who had claims in the performance window against
the people who do not have claims. - The rank and plot distribution of the claims vs.
non-claims are compared for each demographic
attribute. - The attributes which showed significantly
different distributions or trends were selected
for the Logistic regression analysis.
75- The Key Drivers of the Application Risk Model
- ISSUEAGE Customer Age at the Time of
Application - CHILD Presents of Children (Yes/no)
- MARRITAL Marital Status
- VEHREG Dominant Vehicle Life Style
- KID610 Have Kids Between 6 to 10 Year Old
- VEHSUV Dominant Vehicle Life Style
- ADUL35 Adult Age Under 35 in Household
-
- ADUL65P Adult Age Over 65 in Household
-
- RATIO1 Weight/height for the First Individual
- RATIO2 Weight/height for the Second Individual
76- 5 of the Customers Ranked by Scores
- include 13 of Claims
Conclusions the Lorenz Curve Shows the
Application Risk Model Rank Orders Claim Risk
Well.
77- Model Summary
- By working at the top 20 of the policies, we
have potential to cut 43 of claims, which
represents 45 of dollar losses. The hit rate
(number of good policies sacrificed per bad
policy stopped) is 14 (in the 2.5 year analysis
window), model lifts renders 117 gains in
targeting.
78 79- Statistical Results
- Compared Traditional Underwriting and LBA Scores
to Actual Claims Results - LBA Beat Traditional Underwriting in All
Statistical Measures - Adjusted R-squared
- Bias
- MSE
- MAD
- AAD
80Operational Overview - Individual
81- Conclusion
- Recognize much of medical costs cannot be
predicted by traditional methods - Look for nontraditional data sources
- The real value of consumer data in the healthcare
industry lies in its ability to predict
lifestyle-based diseases. - Whether used as an identifier for health risks or
as an early predictor of a disease state, we see
the use of Lifestyle-Based Analytics accelerating
rapidly within the healthcare and in particular
disease management industries.
82Questions?
Jonathan Shreve, FSA, MAAA Milliman Jon.Shreve_at_Mil
liman.com 001 303-299-9400