Risk Prediction Models: Calibration, Recalibration, and Remodeling - PowerPoint PPT Presentation

About This Presentation
Title:

Risk Prediction Models: Calibration, Recalibration, and Remodeling

Description:

... different statistical packages generate different results ... - Case Outcome Prediction (0 1) Logistic Regression Bayesian Networks Artificial Neural ... – PowerPoint PPT presentation

Number of Views:360
Avg rating:3.0/5.0
Slides: 53
Provided by: Maboroshi
Category:

less

Transcript and Presenter's Notes

Title: Risk Prediction Models: Calibration, Recalibration, and Remodeling


1
Risk Prediction Models Calibration,
Recalibration, and Remodeling
  • HST 951 Biomedical Decision Support
  • 12/04/2006 Lecture 23
  • Michael E. Matheny, MD, MS
  • Brigham Womens Hospital
  • Boston, MA

2
Lecture Outline
  • Review Risk Model Performance Measurements
  • Individual Risk Prediction for Binary Outcomes
  • Inadequate Calibration is the rule not the
    exception
  • Addressing the problem with Recalibration and
    Remodeling

3
Model Performance Measures
  • Discrimination
  • Ability to distinguish well between patients who
    will and will not experience an outcome
  • Calibration
  • Ability of a model to match expected and observed
    outcome rates across all of the data

4
DiscriminationArea Under the Receiver Operating
Characteristic Curve
5
DiscriminationROC Curve Generation
6
CalibrationExample Data
Expected Outcome Observed Outcome
0.05 0
0.10 0
0.15 0
0.20 0
0.25 1
0.30 0
0.35 0
0.40 1
0.45 1
0.50 1
2.75 4
7
Standardized Outcomes Ratio
  • Most Aggregated (Crude) comparison of expected
    and observed values
  • 1 Value for Entire Sample
  • Risk-Adjusted by using a risk prediction model to
    generate expected outcomes

8
Standardized Mortality Ratios(SMR)
CAUSE OF DEATH(ICD CODES 140-204) EXPECTEDDEATHS OBSERVEDDEATHS SMR
All Cancer Deaths 1325.37 1516 1.14
Lip, Oral Cavity and Pharynx 33.81 47 1.39
Esophagus 36.84 45 1.22
Stomach 54.58 72 1.32
Colon, Rectum, Rectosigmoid 180.48 238 1.32
Pancreas 62.51 72 1.15
Trachea, Bronchus Lung 430.98 481 1.12
Genitourinary 168.90 162 0.96
Bladder 45.02 50 1.11
Lymphomas 44.57 47 1.05

CANCER MORTALITY ANALYSIS ALL MALES, SCRANTON
CITY, 1975-1985
9
Outcome Ratios
  • Strengths
  • Simple
  • Frequently used in medical literature
  • Easily understood by clinical audiences
  • Weaknesses
  • Not a quantitative test of model calibration
  • Unable to show variations in calibration in
    different risk strata
  • Likely to underestimate the lack of fit

10
Outcome RatiosExample Calibration Plot
11
Global Performance Measurementswith Calibration
Components
  • Methods that calculate a value for each data
    point (most granular)
  • Pearson Test
  • Residual Deviance
  • Brier Score

12
Brier Score Calculation
Expected Outcome Observed Outcome (Yi Pi)2
0.05 0 0.0025
0.10 0 0.01
0.15 0 0.0225
0.20 0 0.04
0.25 1 0.5625
0.30 0 0.09
0.35 0 0.1225
0.40 1 0.36
0.45 1 0.3025
0.50 1 0.25
1.7625
13
Brier Score Calculation
  • To assess the accuracy of the set of predictions,
    Spiegelhalters method is used
  • Expected Brier (EBrier) 0.18775
  • Variance of Brier (VBrier) 0.003292

14
Brier Score
  • Strengths
  • Quantitative evaluation
  • Weaknesses
  • Sensitive to sample size (?sample size more
    likely to fail test)
  • Sensitive to outliers (large differences between
    expected and observed)
  • Difficult to determine relative performance in
    risk subpopulations

15
Hosmer-LemeshowGoodness of Fit
  • Divide the data into subgroups and compare
    observed to expected outcomes by subgroup
  • C Test
  • Divides the sample into 10 equal groups (by
    number of samples)
  • H Test
  • Divides the sample into 10 groups (by deciles of
    risk)

16
Hosmer-LemeshowGoodness of Fit
17
CALICO RegistryHosmer-Lemeshow Goodness of Fit
C Test C Test C Test C Test C Test C Test C Test C Test C Test C Test
Predicted Mortality by Decile () Predicted Mortality by Decile () Predicted Mortality by Decile () Predicted Mortality by Decile () Admissions Admissions Observed Observed Expected H-L
Deaths Deaths Deaths Statistic
0.007 - .034 466 466 466 2 2 10.3 6.88
0.034 - 0.052 461 461 461 17 17 19.7 0.39
0.052 - 0.073 454 454 454 27 27 28.3 0.07
0.073 - 0.100 478 478 478 24 24 41.5 8.07
0.100 - 0.127 450 450 450 35 35 51.4 5.89
0.127 - 0.154 469 469 469 53 53 65.8 2.90
0.154 - 0.202 465 465 465 66 66 82.1 3.83
0.203 - 0.287 461 461 461 93 93 111.2 3.94
0.288 - 0.445 463 463 463 138 138 162.5 5.70
0.445 - 0.968 463 463 463 255 255 287.9 9.94
Total Total Total 4630 4630 4630 710 710 860.8 47.61
C C C C C 47.61 47.61 df 8, p lt 0.0001 df 8, p lt 0.0001 df 8, p lt 0.0001
18
Calibration PlotC Test Data
19
CALICO RegistryHosmer-Lemeshow Goodness of Fit
H Test H Test H Test H Test H Test H Test H Test H Test H Test H Test
Predicted Mortality by Decile () Predicted Mortality by Decile () Predicted Mortality by Decile () Predicted Mortality by Decile () Admissions Admissions Observed Observed Expected H-L
Deaths Deaths Deaths Statistic
0.007 - 0.100 1859 1859 1859 70 70 99.9 9.46
0.100 - 0.200 1348 1348 1348 149 149 192.0 11.24
0.200 - 0.300 555 555 555 115 115 135.5 4.10
0.301 - 0.400 323 323 323 97 97 110.9 2.65
0.400 - 0.499 185 185 185 58 58 83.0 13.64
0.500 - 0.598 131 131 131 70 70 71.7 0.09
0.600 - 0.694 103 103 103 58 58 66.4 3.02
0.701 - 0.800 65 65 65 48 48 48.6 0.03
0.803 - 0.896 48 48 48 34 34 40.7 7.29
0.904 - 0.968 13 13 13 11 11 12.1 1.59
Total Total Total 4630 4630 4630 710 710 860.8 53.10
H H H H H 53.10 53.10 df 8, p lt 0.0001 df 8, p lt 0.0001 df 8, p lt 0.0001
20
Calibration PlotH Test Data
21
Hosmer-LemeshowGoodness of Fit
  • Strengths
  • Quantitative evaluation
  • Assesses calibration in risk subgroups
  • Weaknesses
  • Disagreement with how to generate subgroups (C
    versus H)
  • Even among the same method (C or H), different
    statistical packages generate different results
    due to rounding rule differences
  • Sensitive to sample size (?sample size more
    likely to fail test)
  • Sensitive to outliers (but to a lesser degree
    than Brier Score)

22
Risk Prediction Modelsfor Binary Outcomes
  • Case Data (Variables X1..Xi)
  • -gt Predictive Model for Outcome Y (Yes/No)
  • -gt Case Outcome Prediction (0 1)
  • Logistic Regression
  • Bayesian Networks
  • Artificial Neural Networks
  • Support Vector Machine Regression

23
Risk Prediction ModelsClinical Utility
  • Risk Stratification for Research and Clinical
    Practice
  • Risk-Adjusted Assessment of Providers and
    Institutions
  • Individual risk prediction

24
Individual Risk Prediction
  • Good discrimination is necessary but not
    sufficient for individual risk prediction
  • Calibration is the key index for individual risk
    prediction

25
Inadequate CalibrationWhy?
  • Models require external validation to be
    generally accepted, and in those studies the
    general trend is
  • Discrimination retained
  • Calibration fails
  • Factors that contribute to inadequate model
    calibration in clinical practice
  • Regional Variation
  • Different Clinical Practice Standards
  • Different Patient Case Mixes
  • Temporal Variation
  • Changes in Clinical Practice
  • New diagnostic tools available
  • Changes in Disease Incidence and Prevalence

26
Individual Risk PredictionClinical Examples
  • 10 year Hard Coronary heart disease risk
    estimation
  • Logistic Regression
  • Framingham Heart Study
  • Calibration Problems
  • Low SES
  • Young age
  • Female
  • Non-US populations

Kannel et al. Am J Cardiol, 1976
27
Individual Risk PredictionClinical Examples
  • Lifetime Invasive Breast Cancer Risk Estimation
  • Logistic Regression
  • Gail Model
  • Calibration Problems
  • Age lt35
  • Prior Hx Breast CA
  • Strong Family Hx
  • Lack of regular mammograms

Gail et al. JNCI, 1989
28
Individual Risk PredictionClinical Examples
  • Intensive Care Unit Mortality Prediction
  • APACHE-II
  • APACHE-III
  • MPM0
  • MPM0-II
  • SAPS
  • SAPS-II

29
Individual Risk PredictionClinical Examples
Ohno-Machado, et al. Annu Rev Biomed Eng.
20068567-99
30
Individual Risk PredictionClinical Examples
Ohno-Machado, et al. Annu Rev Biomed Eng.
20068567-99
31
Individual Risk Prediction Clinical Examples
  • Interventional Cardiology Mortality Prediction

Model  Dates Location Sample
NY 1992 1991 NY 5827
NY 1997 1991 1994 NY 62670
CC 1997 1993 1994 Cleveland, OH 12985
NNE 1999 1994 1996 NH, ME, MA, VT 15331
MI 2001 1999 2000 Detroit, MI 10796
BWH 2001 1997 1999 Boston, MA  2804
ACC 2002 1998 2000 National 100253
Matheny, et al. J Biomed Inform. 2005
Oct38(5)367-75
32
Individual Risk Prediction Clinical Examples
Model Deaths AUC HL ?2 HL (p)
NY 1992  96.7 0.82 31.1 lt0.001
NY 1997  61.6 0.88 32.2 lt0.001
CC 1997 78.8 0.88 27.8 lt0.001
NNE 1999  56.2 0.89 45.9 lt0.001
MI 2001  61.8 0.86 30.4 lt0.001
BWH 2001  136.1 0.89 39.7 lt0.001
ACC 2002  49.9 0.90 42.0 lt0.001
BWH 2004 70.5 0.93 7.61 0.473
Observed Deaths 71
Matheny, et al. J Biomed Inform. 2005
Oct38(5)367-75
33
Inadequate CalibrationWhat to do?
  • In most cases, risk prediction models are
    developed on much larger data sets than are
    available for local model generation.
  • Decreased variance and increased stability of
    model covariate values
  • Large, external models (especially those that
    have been externally validated) are generally
    accepted by domain experts
  • Goal is to throw out as little prior model
    information as possible while improving
    performance

34
Recalibration and RemodelingGeneral Evaluation
Rules
  • Model recalibration or remodeling follows the
    same rules of evaluation as model building in
    general
  • Separate training and test data, or
  • Cross-Validation, etc
  • If temporal issues are central to that domains
    calibration problems, training data should be
    both before (in time) and separate from testing
    data

35
Discrimination versus Calibration
Model A Expected Outcome Model B Expected Outcome Observed Outcome
0.05 0.33 0
0.10 0.45 0
0.15 0.47 0
0.20 0.53 0
0.25 0.68 1
0.30 0.77 0
0.35 0.81 0
0.40 0.93 1
0.45 0.95 1
0.50 0.96 1
2.75 6.88 4
36
Logistic RegressionGeneral Equation
  • B0 is the intercept of the equation, which
    represents the outcome probability in the absence
    of all other risk factors (baseline risk)
  • The model assumes each covariate is independent
    of each other, and Bx is the natural log of the
    odds ratio of the risk attributable to that risk
    factor

37
Logistic RegressionOriginal Model and Cases
Model
Variable ß coeff Case 1 Case 2 Case 3 Case 4
Intercept -3 1 1 1 1
Variable 1 0.2 0 1 1 1
Variable 2 0.5 0 0 1 1
Variable 3 1.0 0 0 0 1
Case Probability 0.047 0.057 0.091 0.310
  • Minimum predicted risk for each case is intercept
    only
  • Adjusting intercept scales all results

Case 4 is Outcome 1, Case 1 -3 are Outcome 0
38
LR Intercept Recalibration
  • The proportion of risk contributed by the
    intercept (baseline) can be calculated for a data
    set by

39
LR Intercept Recalibration
  • The intercept contribution to risk (RiskInt())
    is multiplied by the observed event rate, and
    converted back to a Beta Coefficient from a
    probability
  • A relative weakness of the method is that values
    can exceed 1, and must be truncated

40
LR Intercept RecalibrationExample Model and Cases
Old New
Variable ß coeff ß coeff Case 1 Case 2 Case 3 Case 4
Intercept -3.0 -2.2 1 1 1 1
Variable 1 0.2 0.2 0 1 1 1
Variable 2 0.5 0.5 0 0 1 1
Variable 3 1.0 1.0 0 0 0 1
New Prob. 0.099 0.119 0.182 0.500
Orig Prob. 0.047 0.057 0.091 0.310
  • Original Expected 0.51
  • Intercept Recalibration Expected 0.90

41
LR Slope Recalibration
  • In this method, the output probability of the
    original LR equation is used to model a new LR
    equation with that output as the only covariate

42
LR Slope RecalibrationExample Model and Cases
New Model
Variable ß coeff Case 1 Case 2 Case 3 Case 4
New Model Intercept -3.0 1 1 1 1
Orig Model Result 11.0 0.047 0.057 0.091 0.310
New Probability 0.077 0.086 0.119 0.601
Intercept Probability 0.099 0.119 0.182 0.500
  • Original Expected 0.51
  • Slope Recalibration Expected 0.88

43
LR Covariate Recalibration
Old New
Variable ß coeff ß coeff Case 1 Case 2 Case 3 Case 4
Intercept -3 -2.5 1 1 1 1
Variable 1 0.2 0.1 0 1 1 1
Variable 2 0.5 0.3 0 0 1 1
Variable 3 1.0 3.0 0 0 0 1
New Prob 0.076 0.083 0.109 0.711
Orig Prob 0.047 0.057 0.091 0.310
  • Original Expected 0.51
  • Covariate Recalibration Expected 0.97

44
Recalibration ExampleLocal Institutional Data
Year Cases Mortality ()
2002 1947 15 (0.8)
2003 1841 33 (1.8)
2004 1767 33 (1.9)
45
Recalibration ExampleExternal Risk Prediction
Models
Year Abbrev Outcomes Sample
National ACC ACC 707 50123 1.4
Northern New England NNE 165 15331 1.1
University of Michigan MIC 169 10796 1.6
Cleveland Clinic CCL 169 2985 1.3
46
ResultsNo Recalibration
Model Observed Expected HL ?2
2003
ACC 33 414 634
NNE 33 39.0 24.3
MIC 33 27.2 6.6
CCL 33 56.3 14.0
2004
ACC 33 418 641
NNE 33 36.6 51.0
MIC 33 23.3 22.9
CCL 33 60.3 21.2
47
ResultsLR Intercept Recalibration
Model Observed Expected HL ?2
2003
ACC 33 45.1 10.0
NNE 33 26.0 43.6
MIC 33 22.1 12.7
CCL 33 24.8 10.5
2004
ACC 33 34.1 14.6
NNE 33 28.9 69.8
MIC 33 26.5 17.6
CCL 33 33.5 14.2
48
ResultsLR Slope Recalibration
Model Observed Expected HL ?2
2003
ACC 33 24.0 12.7
NNE 33 18.6 32.9
MIC 33 20.1 24.0
CCL 33 25.5 15.2
2004
ACC 33 32.0 35.7
NNE 33 31.2 21.7
MIC 33 31.0 23.6
CCL 33 31.6 13.2
49
Clinical ApplicationsCALICO
  • California Intensive Care Outcomes (CALICO)
    Project
  • 23 Volunteer Hospitals beginning in 2002
  • Compare hospital outcomes for selected
    conditions, procedures, and intensive care unit
    types
  • Identified popular, well-validated models
  • MPMo-II, SAPS-II, APACHE-II, APACHE-III
  • Evaluated the models on CALICO data, after
    determining they were inadequately calibrated,
    conducted recalibration of each of the models
    using the LR Covariate Recalibration method

50
Clinical ApplicationsCALICO
51
Examples on Website
  • Most of the calculations from this presentation
    are available on the website in an Excel workbook

52
Michael Matheny, MD, MS mmatheny_at_dsg.harvard.edu
Brigham Womens HospitalThorn 30975 Francis
StreetBoston, MA 02115
The End
Write a Comment
User Comments (0)
About PowerShow.com