Title: Model and Variable Selections for Personalized Medicine
1Model and Variable Selections for Personalized
Medicine
Lu Tian (Northwestern University) Hajime Uno
(Kitasato University) Tianxi Cai, Els
Goetghebeur, L.J. Wei (Harvard University)
2Outline
- Background and motivation
- Developing and evaluating prediction rules based
on a set of markers for - Continuous or binary outcome
- Censored event time outcome
- Evaluating the incremental value of a biomarker
over - the entire population
- various sub-populations
- Incorporating the patient level precision of the
prediction - Prediction intervals/sets
- Remarks
3Background and Motivation
- Personalized medicine using information about a
persons biological and genetic make up to tailor
strategies for the prevention, detection and
treatment of disease - Important step develop prediction rules that can
accurately predict health outcome or diagnosis of
clinical phenotype
4Background and Motivation
- Accurate prediction of disease outcome and
treatment response, however, are complex and
difficult tasks. - Developing prediction rules involve
- Identifying important predictors
- Evaluating the accuracy of the prediction
- Evaluating the incremental value of new markers
5Background and Motivation AIDS Clinical Trial
ACTG320
- Study objective to compare
- 3-drug regimen (n579) Zidovudine Lamivudine
Indinarvir - 2-drug regimen (n577) Zidovudine Lamivudine
- Identify biomarkers for predicting treatment
response - How well can we predict the treatment response?
- Is RNA needed?
6Background and Motivation
Is RNA needed?
Predictors
7Background and Motivation AIDS Clinical Trial
Regression Coefficient
- Coefficient for ?RNAweek 8 highly significant ?
- RNA needed for a more precise prediction of
responses??
8Background and Motivation
Is RNA needed?
Y ?CD4week 8
ZPredictors
9Developing Prediction RulesBased on a Set of
Markers
- Regression approach to approximate Y Z
- Continuous or binary outcome Generalize linear
regression - Survival outcome
- Proportional Hazards model
- Time-specific prediction models
- Regression modeling as a vehicle
- the procedure has to be valid when the imposed
statistical model is not the true model!
10Developing and Evaluating Prediction Rules
- Predict Y with Z based on the prediction model
-
- Evaluate the performance of the prediction by the
average distance between and Y - The utility or cost to predicting Y as
is - The average distance is
- Examples
- Absolute prediction error
- Total Cost of Risk Stratification
11Evaluating and Comparing Prediction Rules
- The performance of the prediction model/rule with
can be estimated by - Prediction Model/Rule Comparison
- Prediction with E(Y Z) g1(aZ) vs E(Y W)
g2(bW) - Compare two models/rules by comparing
12Variability in the Estimated Prediction
Performance Measures
- Variability in the prediction errors
- Estimate ? 50, SE 1? SE 50?
- Inference about D and ? D1 D2
- Confidence intervals based on large sample
approximations to the distribution of
-
13Bias Correction
- Bias issue in the apparent error type estimators
- Bias correction via Cross-validation
- Data partition? Tk, Vk
- For each partition
- Obtain based on observations in Tk
- Obtain based on observations in Vk
- Obtain cross-validated estimator
14Example AIDS Clinical Trial
- Objective identify biomarkers to predict the
treatment response - Outcome Y ?CD4week 24
- Predictors Z Age, CD4week 0, ?CD4week 8,
- RNAweek 0, ?RNAweek 8
- Working Model E(YZ) ?Z
15Example AIDS Clinical TrialIncremental Value of
RNA
Estimates
95 C.I.
Std Error Estimates
16Incremental Value of RNA within Various
Sub-populations
17Trandolapril Cardiac Evaluation Study(Kober et
al 2005, NEJM)
- Prognostic importance of the left ventricular
dysfunction - Thune et al (2005) Diamond study
- Trace study (Kober et al 2005, NEJM)
- Designed to determine whether patients w/ left
ventricular dysfunction soon after myocardial
infarction benefit from long-term oral ACE
inhibition - Between 1990 and 1992, a total of 6676 patients
with myocardial infarction were screened with
echocardiography - A total of 5921 subjects had available data
18Trandolapril Cardiac Evaluation Study (Kober et
al 2005, NEJM)
- Routine Markers include
- Age
- creatine (CRE)
- occurrence of heart failure (CHF)
- history of diabetes (DIA),
- history of hypertension (HYP),
- cardiogenic shock after MI (KS)
- We are interested in evaluating in the
incremental value of wall motion index (WMI)
19Trandolapril Cardiac Evaluation Study (Kober et
al 2005, NEJM)
- Does WMI improve the prediction of 5-year
survival?
20Population Average Incremental Value of
WMIPredicting 5-year Survival
5-year mortality rate 42
21D1
D2
22(No Transcript)
23Gain Due to WMI
24? 1
? 4
? 9
Gain Due to WMI with respect to D?
25ExampleBreast Cancer Gene Expression Study
- Objective construct a new classifier that can
accurately predict future disease outcome - vant Veer et al (2002) established a classifier
based on a 70-gene profile - good- or poor-prognosis signature based on their
correlation with the previously determined
average profile in tumors from patients with good
prognosis - Classify subjects as
- Good prognosis if Gene score gt cut-off
- Poor prognosis if Gene score lt cut-off
- van de Vijver et al (2002) evaluated the accuracy
of this classifier by using hazard ratios and
signature specific Kaplan Meier curves
26ExampleBreast Cancer Gene Expression Study
- Data consist of 295 Subjects
- Outcome T time to death
- Predictors Lymph-Node Status, Estrogen Receptor
Status, gene score - We are interested in
- Constructing prediction rules for identify
subjects who would survive t-year, Y I(T ?
t)1. - Evaluating the incremental value of the Gene
Score.
27Example Breast Cancer DataPredicting 10-year
Survival
28Evaluating the Prediction RuleBased on Various
Accuracy Measures
- For a future patient with T0 and Z0, we predict
- Classification accuracy measures
- Sensitivity
- Specificity
- Prediction accuracy measures
-
29Example Breast Cancer DataPredicting 10-year
Survival
30Example Breast Cancer Data
- To compare
- Model II g(a Node ER)
- Model III g(a Node ER Gene)
- Choosing cut-off values for each model to achieve
SE 69 which is an attainable value for Model
II, then - Model II ? SP 0.45, PPV 0.35, NPV 0.77
- Model III ? SP 0.75, PPV 0.54, NPV 0.85
- 95 CI for the difference in
- SP 0.11, 0.45, PPV 0.01, 0.24, NPV
0.06, 0.19
31Prediction IntervalAccounting for the Precision
of the Prediction
- Based on a prediction model
- predict the response
- summarize the corresponding population average
accuracy
- What if the population average accuracy of 70 is
not satisfactory? How to achieve 90 accuracy? - What if can predict Y0 more precisely
for certain Z0, while on the other hand fails to
predict Y0 accurately? - Account for the precision of the prediction?
Identify patients would need further assessment?
32(No Transcript)
33Prediction Interval
- To account for patient-level prediction error,
one may instead predict
such that -
- The optimal interval for the population with Z0
?? is - estimated conditional density
function
34Example Breast Cancer Study
- Data 295 patients
- Response 10 year survival
- Predictors Lymph-Node Status, Estrogen Receptor
Status, Gene Score - Model
- Possible prediction sets ?, 0, 1, 0,1
- Classic prediction considers 0, 1 only.
3590 Prediction Set 0,1
90 Prediction Set 0
Predicted Risk 0.04
Predicted Risk 0.51
36Example Breast Cancer Study Prediction Sets
Based on Clinical Gene Score
37Remarks
- Proper choice of the accuracy/cost measure
- Classification accuracy vs predictive values
- Utility function what is the consequence of
predicting a subject with outcome Y as - With an expensive or invasive marker
- Should it be applied to the entire population?
- Is it helpful for a certain sub-population?
- Should the cost of the marker be considered when
evaluating its value?