Model Evaluation and Selection via Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Model Evaluation and Selection via Prediction

Description:

Developing and evaluating prediction rules based on a set of markers for ... person's biological and genetic make up to tailor strategies for the prevention, ... – PowerPoint PPT presentation

Number of Views:41
Avg rating:3.0/5.0
Slides: 33
Provided by: statCol
Category:

less

Transcript and Presenter's Notes

Title: Model Evaluation and Selection via Prediction


1
Model Evaluation and Selection via Prediction

2
Real contributors
  • Lu Tian (Northwestern University)
  • Tianxi Cai (Harvard University)
  • Hajime Uno (Harvard University, DFCI)

3
Outline
  • Background and motivation
  • Developing and evaluating prediction rules based
    on a set of markers for
  • Non-censored outcomes
  • Censored event time outcomes
  • Evaluating the incremental value of a biomarker
    over
  • the entire population
  • various sub-populations
  • Incorporating the patient level precision of the
    prediction
  • Prediction intervals/sets
  • Remarks

4
Regression modeling, Tree classification et al?
  • Association
  • Prediction

5
Model checking?
  • Goodness of fit test (lack of fit test)? Is
    p-value a good metric for measuring lack of fit?
  • Quantitative approach? R-square? Likelihood
    ratio-type? Need heuristically interpretable
    distance function? (cost-benefit)
  • Every model is an approximation to the truth?

6
Background and Motivation
  • Personalized medicine using information about a
    persons biological and genetic make up to tailor
    strategies for the prevention, detection and
    treatment of disease
  • Important step develop prediction rules that can
    accurately predict the disease outcome or
    treatment response

7
Background and Motivation
  • Accurate prediction of disease outcome and
    treatment response, however, are complex and
    difficult tasks.
  • Developing prediction rules involve
  • Identifying important predictors
  • Evaluating the accuracy of the prediction
  • Evaluating the incremental value of new markers

8
Background and Motivation AIDS Clinical Trial
ACTG320
  • Study objective to compare
  • 3-drug regimen (n579) Zidovudine Lamivudine
    Indinarvir
  • 2-drug regimen (n577) Zidovudine Lamivudine
  • Identify biomarkers for predicting treatment
    response
  • How well can we predict the treatment response?
  • Is RNA needed?

9
Background and Motivation
Is RNA needed?
Predictors
10
Background and Motivation AIDS Clinical Trial
Regression Coefficient
  • Coefficient for ?RNAweek 8 highly significant ?
  • RNA needed for a more precise prediction of
    responses??

11
Background and Motivation
Is RNA needed?
Y ?CD4week 8
ZPredictors
12
Developing Prediction RulesBased on a Set of
Markers
  • Regression approach to approximate Y Z
  • Non-censored outcome linear regression
  • Survival outcome
  • Proportional Hazards model (Example Framingham
    Risk Score)
  • Time-specific prediction models
  • Regression modeling as a vehicle
  • the procedure has to be valid when the imposed
    statistical model is not the true model!

13
Developing and Evaluating Prediction Rules
  • Predict Y with Z based on the prediction model
  • Evaluate the performance of the prediction by the
    average distance between and Y
  • The utility or cost to predicting Y as
    is
  • The average distance is
  • Examples
  • Absolute prediction error
  • Total Cost of Risk Stratification

14
Evaluating and Comparing Prediction Rules
  • The performance of the prediction model/rule with
    can be estimated by
  • Prediction Model/Rule Comparison
  • Prediction with E(Y Z) g1(aZ) vs E(Y W)
    g2(bW)
  • Compare two models/rules by comparing

15
Variability in the Estimated Prediction
Performance Measures
  • Variability in the prediction errors
  • Estimate ? 50, SE 1? SE 50?
  • Inference about D and ? D1 D2
  • Confidence intervals based on large sample
    approximations to the distribution of

16
Bias Correction
  • Bias issue in the apparent error type estimators
  • Bias correction via Cross-validation
  • Data partition? Tk, Vk
  • For each partition
  • Obtain based on observations in Tk
  • Obtain based on observations in Vk
  • Obtain cross-validated estimator

17
Example AIDS Clinical Trial
  • Objective identify biomarkers to predict the
    treatment response
  • Outcome Y ?CD4week 24
  • Predictors Z Age, CD4week 0, ?CD4week 8,
  • RNAweek 0, ?RNAweek 8
  • Working Model E(YZ) ?Z

18
Example AIDS Clinical TrialIncremental Value of
RNA
Estimates
95 C.I.
Std Error Estimates
19
Incremental Value of RNA within Various
Sub-populations
20
ExampleBreast Cancer Gene Expression Study
  • Objective construct a new classifier that can
    accurately predict future disease outcome
  • vant Veer et al (2002) established a classifier
    based on a 70-gene profile
  • good- or poor-prognosis signature based on their
    correlation with the previously determined
    average profile in tumors from patients with good
    prognosis
  • Classify subjects as
  • Good prognosis if Gene score gt cut-off
  • Poor prognosis if Gene score lt cut-off
  • van de Vijver et al (2002) evaluated the accuracy
    of this classifier by using hazard ratios and
    signature specific Kaplan Meier curves

21
ExampleBreast Cancer Gene Expression Study
  • Data consist of 295 Subjects
  • Outcome T time to death
  • Predictors Lymph-Node Status, Estrogen Receptor
    Status, gene score
  • We are interested in
  • Constructing prediction rules for identify
    subjects who would survive t-year, Y I(T ?
    t)1.
  • Evaluating the incremental value of the Gene
    Score.

22
Example Breast Cancer DataPredicting 10-year
Survival
23
Evaluating the Prediction RuleBased on Various
Accuracy Measures
  • For a future patient with T0 and Z0, we predict
  • Classification accuracy measures
  • Sensitivity
  • Specificity
  • Prediction accuracy measures

24
Example Breast Cancer DataPredicting 10-year
Survival
25
Example Breast Cancer Data
  • To compare
  • Model II g(a Node ER)
  • Model III g(a Node ER Gene)
  • Choosing cut-off values for each model to achieve
    SE 69 which is an attainable value for Model
    II, then
  • Model II ? SP 0.45, PPV 0.35, NPV 0.77
  • Model III ? SP 0.75, PPV 0.54, NPV 0.85
  • 95 CI for the difference in
  • SP 0.11, 0.45, PPV 0.01, 0.24, NPV
    0.06, 0.19

26
Prediction IntervalAccounting for the Precision
of the Prediction
  • Based on a prediction model
  • predict the response
  • summarize the corresponding population average
    accuracy
  • What if the population average accuracy of 70 is
    not satisfactory? How to achieve 90 accuracy?
  • What if can predict Y0 more precisely
    for certain Z0, while on the other hand fails to
    predict Y0 accurately?
  • Account for the precision of the prediction?
    Identify patients would need further assessment?

27
(No Transcript)
28
Prediction Interval
  • To account for patient-level prediction error,
    one may instead predict
    such that
  • The optimal interval for the population with Z0
    ?? is
  • estimated conditional density
    function

29
Example Breast Cancer Study
  • Data 295 patients
  • Response 10 year survival
  • Predictors Lymph-Node Status, Estrogen Receptor
    Status, Gene Score
  • Model
  • Possible prediction sets ?, 0, 1, 0,1
  • Classic prediction considers 0, 1 only.

30
90 Prediction Set 0,1
90 Prediction Set 0
Predicted Risk 0.04
Predicted Risk 0.51
31
Example Breast Cancer Study Prediction Sets
Based on Clinical Gene Score
32
Remarks
  • Proper choice of the accuracy/cost measure
  • Classification accuracy vs predictive values
  • Utility function what is the consequence of
    predicting a subject with outcome Y as
  • With an expensive or invasive marker
  • Should it be applied to the entire population?
  • Is it helpful for a certain sub-population?
  • Should the cost of the marker be considered when
    evaluating its value?
Write a Comment
User Comments (0)
About PowerShow.com