Model Evaluation and Selection via Prediction - PowerPoint PPT Presentation

About This Presentation

Title:

Model Evaluation and Selection via Prediction

Description:

Developing and evaluating prediction rules based on a set of markers for ... person's biological and genetic make up to tailor strategies for the prevention, ... – PowerPoint PPT presentation

Number of Views:41

Avg rating:3.0/5.0

Slides: 33

Provided by: statCol

Learn more at: https://www.stat.colostate.edu

Category:

more less

Transcript and Presenter's Notes

Title: Model Evaluation and Selection via Prediction

1
Model Evaluation and Selection via Prediction

2
Real contributors

Lu Tian (Northwestern University)
Tianxi Cai (Harvard University)
Hajime Uno (Harvard University, DFCI)

3
Outline

Background and motivation
Developing and evaluating prediction rules based
on a set of markers for
Non-censored outcomes
Censored event time outcomes
Evaluating the incremental value of a biomarker
over
the entire population
various sub-populations
Incorporating the patient level precision of the
prediction
Prediction intervals/sets
Remarks

4
Regression modeling, Tree classification et al?

Association
Prediction

5
Model checking?

Goodness of fit test (lack of fit test)? Is
p-value a good metric for measuring lack of fit?
Quantitative approach? R-square? Likelihood
ratio-type? Need heuristically interpretable
distance function? (cost-benefit)
Every model is an approximation to the truth?

6
Background and Motivation

Personalized medicine using information about a
persons biological and genetic make up to tailor
strategies for the prevention, detection and
treatment of disease
Important step develop prediction rules that can
accurately predict the disease outcome or
treatment response

7
Background and Motivation

Accurate prediction of disease outcome and
treatment response, however, are complex and
difficult tasks.
Developing prediction rules involve
Identifying important predictors
Evaluating the accuracy of the prediction
Evaluating the incremental value of new markers

8
Background and Motivation AIDS Clinical Trial
ACTG320

Study objective to compare
3-drug regimen (n579) Zidovudine Lamivudine
Indinarvir
2-drug regimen (n577) Zidovudine Lamivudine
Identify biomarkers for predicting treatment
response
How well can we predict the treatment response?
Is RNA needed?

9
Background and Motivation
Is RNA needed?
Predictors
10
Background and Motivation AIDS Clinical Trial
Regression Coefficient

Coefficient for ?RNAweek 8 highly significant ?
RNA needed for a more precise prediction of
responses??

11
Background and Motivation
Is RNA needed?
Y ?CD4week 8
ZPredictors
12
Developing Prediction RulesBased on a Set of
Markers

Regression approach to approximate Y Z
Non-censored outcome linear regression
Survival outcome
Proportional Hazards model (Example Framingham
Risk Score)
Time-specific prediction models

Regression modeling as a vehicle
the procedure has to be valid when the imposed
statistical model is not the true model!

13
Developing and Evaluating Prediction Rules

Predict Y with Z based on the prediction model
Evaluate the performance of the prediction by the
average distance between and Y
The utility or cost to predicting Y as
is
The average distance is

Examples
Absolute prediction error

Total Cost of Risk Stratification

14
Evaluating and Comparing Prediction Rules

The performance of the prediction model/rule with
can be estimated by
Prediction Model/Rule Comparison
Prediction with E(Y Z) g1(aZ) vs E(Y W)
g2(bW)
Compare two models/rules by comparing

15
Variability in the Estimated Prediction
Performance Measures

Variability in the prediction errors
Estimate ? 50, SE 1? SE 50?
Inference about D and ? D1 D2
Confidence intervals based on large sample
approximations to the distribution of

16
Bias Correction

Bias issue in the apparent error type estimators
Bias correction via Cross-validation
Data partition? Tk, Vk
For each partition
Obtain based on observations in Tk
Obtain based on observations in Vk
Obtain cross-validated estimator

17
Example AIDS Clinical Trial

Objective identify biomarkers to predict the
treatment response
Outcome Y ?CD4week 24
Predictors Z Age, CD4week 0, ?CD4week 8,
RNAweek 0, ?RNAweek 8
Working Model E(YZ) ?Z

18
Example AIDS Clinical TrialIncremental Value of
RNA
Estimates
95 C.I.
Std Error Estimates
19
Incremental Value of RNA within Various
Sub-populations
20
ExampleBreast Cancer Gene Expression Study

Objective construct a new classifier that can
accurately predict future disease outcome
vant Veer et al (2002) established a classifier
based on a 70-gene profile
good- or poor-prognosis signature based on their
correlation with the previously determined
average profile in tumors from patients with good
prognosis
Classify subjects as
Good prognosis if Gene score gt cut-off
Poor prognosis if Gene score lt cut-off
van de Vijver et al (2002) evaluated the accuracy
of this classifier by using hazard ratios and
signature specific Kaplan Meier curves

21
ExampleBreast Cancer Gene Expression Study

Data consist of 295 Subjects
Outcome T time to death
Predictors Lymph-Node Status, Estrogen Receptor
Status, gene score
We are interested in
Constructing prediction rules for identify
subjects who would survive t-year, Y I(T ?
t)1.
Evaluating the incremental value of the Gene
Score.

22
Example Breast Cancer DataPredicting 10-year
Survival
23
Evaluating the Prediction RuleBased on Various
Accuracy Measures

For a future patient with T0 and Z0, we predict
Classification accuracy measures
Sensitivity
Specificity
Prediction accuracy measures

24
Example Breast Cancer DataPredicting 10-year
Survival
25
Example Breast Cancer Data

To compare
Model II g(a Node ER)
Model III g(a Node ER Gene)
Choosing cut-off values for each model to achieve
SE 69 which is an attainable value for Model
II, then
Model II ? SP 0.45, PPV 0.35, NPV 0.77
Model III ? SP 0.75, PPV 0.54, NPV 0.85
95 CI for the difference in
SP 0.11, 0.45, PPV 0.01, 0.24, NPV
0.06, 0.19

26
Prediction IntervalAccounting for the Precision
of the Prediction

Based on a prediction model
predict the response
summarize the corresponding population average
accuracy

What if the population average accuracy of 70 is
not satisfactory? How to achieve 90 accuracy?
What if can predict Y0 more precisely
for certain Z0, while on the other hand fails to
predict Y0 accurately?
Account for the precision of the prediction?
Identify patients would need further assessment?

27
(No Transcript)
28
Prediction Interval

To account for patient-level prediction error,
one may instead predict
such that
The optimal interval for the population with Z0
?? is
estimated conditional density
function

29
Example Breast Cancer Study

Data 295 patients
Response 10 year survival
Predictors Lymph-Node Status, Estrogen Receptor
Status, Gene Score
Model
Possible prediction sets ?, 0, 1, 0,1
Classic prediction considers 0, 1 only.

30
90 Prediction Set 0,1
90 Prediction Set 0
Predicted Risk 0.04
Predicted Risk 0.51
31
Example Breast Cancer Study Prediction Sets
Based on Clinical Gene Score
32
Remarks

Proper choice of the accuracy/cost measure
Classification accuracy vs predictive values
Utility function what is the consequence of
predicting a subject with outcome Y as
With an expensive or invasive marker
Should it be applied to the entire population?
Is it helpful for a certain sub-population?
Should the cost of the marker be considered when
evaluating its value?

Write a Comment

User Comments (0)