Title: Use of Prognostic
1Use of Prognostic Predictive Biomarkers in
Clinical Trial Design
- Richard Simon, D.Sc.
- Chief, Biometric Research Branch
- National Cancer Institute
- http//brb.nci.nih.gov
2BRB Websitebrb.nci.nih.gov
- Powerpoint presentations
- Reprints
- BRB-ArrayTools software
- Data archive
- Q/A message board
- Web based Sample Size Planning
- Clinical Trials
- Optimal 2-stage phase II designs
- Phase III designs using predictive biomarkers
- Phase II/III designs
- Development of gene expression based predictive
classifiers
3Prognostic Predictive Biomarkers
- Most cancer treatments benefit only a minority of
patients to whom they are administered - Being able to predict which patients are likely
to benefit would - Save patients from unnecessary toxicity, and
enhance their chance of receiving a drug that
helps them - Control medical costs
- Improve the success rate of clinical drug
development
4- Predictive biomarkers
- Measured before treatment to identify who will or
will not benefit from a particular treatment - ER, HER2, KRAS
- Prognostic biomarkers
- Measured before treatment to indicate long-term
outcome for patients untreated or receiving
standard treatment - Only have medical utility if therapeutically
relevant - Used to identify who does or does not require
more intensive than standard treatment - OncotypeDx
5Prognostic and Predictive Biomarkers in Oncology
- Single gene or protein measurement
- Scalar index or classifier that summarizes
expression levels of multiple genes
6Prognostic Factors in Oncology
- Many prognostic factors are not used because they
are not actionable - Most prognostic factor studies are not conducted
with an intended use - They use a convenience sample of heterogeneous
patients for whom tissue is available - Retrospective studies of prognostic markers
should be planned and analyzed with specific
focus on intended use of the marker - Design of prospective studies depends on context
of use of the biomarker - Treatment options and practice guidelines
- Other prognostic factors
7Clinical Utility
- Biomarker benefits patient by improving treatment
decisions - Depends on context of use of the biomarker
- Treatment options and practice guidelines
- Other prognostic factors
8Potential Uses of a Prognostic Biomarker
- Identify patients who have very good prognosis on
standard treatment and do not require more
intensive regimens - Identify patients who have poor prognosis on
standard chemotherapy who are good candidates for
experimental regimens
9Prospective Evaluation of Prognostic Biomarker
- Identify low stage patients for whom standard of
care is chemotherapy - Find dataset of low stage patients who did not
receive chemotherapy for whom archived tissue is
available - Develop prognostic biomarker classifier of risk
without chemotherapy of low stage patients - Conduct RCT in which low stage patients who are
low risk by biomarker classifier are randomized
to - chemotherapy
10- In some cases, if biomarker predicted risk of
recurrence is sufficiently low for randomized
patients, then randomization is omitted and the
test of the biomarker is a test of whether the
risk is as low as predicted - Absolute benefit of very low risk patients is by
necessity very small - This is the approach of TAILORx
11How Does This Approach Compare to the So Called
Gold Standard of Randomizing Patients to Receive
or Not Receive the Test?
12Prospective Marker Strategy Design
- Patients are randomized to either
- have marker measured and treatment determined
based on marker result and clinical features - dont have marker measured and receive standard
of care treatment based on clinical features
alone
13Randomize Patients to Test or No Test
Rx Determined by Test
Rx Determined By SOC
14Marker Strategy Design
- Inefficient
- Many patients get the same treatment regardless
of which arm they are randomized to - Uninformative
- Since patients in the standard of care arm do not
have the marker measured, it is not possible to
compare outcome for patients whose treatment is
changed based on the marker result
15Apply Test to All Eligible Patients
Using phase II data, develop predictor of
response to new drug
Test Deterimined Rx Different From SOC
Test Determined Rx Same as SOC
Off Study
Use Test Determined Rx
Use SOC
16Prospective Evaluation of OncotypeDx (TAILORx)
- For patients with predicted low risk of
recurrence - Withhold chemotherapy and observe long term
recurrence rate - If recurrence rate is very low, potential
chemotherapy benefit must be very small
17- MINDACT randomizes breast cancer patients whose
Mammaprint based Rx differs from SOC - SOCchemo, low risk Mammaprint
- SOCno chemorx, high risk Mammaprint
- Trial is sized to estimate risk of relapse of low
risk Mammaprint patients randomized to no
chemotherapy
18Predictive Biomarkers
19(No Transcript)
20(No Transcript)
21- Cancers of a primary site are a molecularly
heterogeneous group of diverse diseases which
vary enormously in their responsiveness to a
given treatment - Can we develop new drugs in a manner more
consistent with modern tumor biology and obtain
reliable actionable information about what
regimens work for what kinds of tumors?
22- Evaluating a predictive biomarker for treatment T
involves an RCT of T versus a control C. - Analysis of RCT determines whether the biomarker
distinguishes the patients who benefit from T vs
C from those who dont - In this RCT, the biomarker should ideally be
- completely specified in advance
- focused on the single specific biomarker
- the trial sized with sufficient marker and
marker patients for adequately powered separate
analysis of T vs C differences in each stratum. - Evaluating a predictive biomarker does not
involve comparison of outcome of marker vs
marker patient
23(No Transcript)
24Prospective Co-Development of Drugs and Companion
Diagnostics
- Develop a completely specified genomic classifier
of the patients likely to benefit from a new drug - Establish analytical validity of the classifier
- Use the completely specified classifier in the
primary analysis plan of a phase III trial of the
new drug
25Guiding Principle
- The data used to develop the classifier should be
distinct from the data used to test hypotheses
about treatment effect in subsets determined by
the classifier - Developmental studies can be exploratory
- Studies on which treatment effectiveness claims
are to be based should not be exploratory
26Develop Predictor of Response to New Drug
Using phase II data, develop predictor of
response to new drug
Patient Predicted Responsive
Patient Predicted Non-Responsive
Off Study
New Drug
Control
27Applicability of Targeted/Enrichment Design
- Primarily for settings where the classifier is
based on a single gene whose protein product is
the target of the drug - eg trastuzumab
- With a strong biological basis for the
classifier, it may be unacceptable to expose
classifier negative patients to the new drug - Analytical validation, biological rationale and
phase II data provide basis for regulatory
approval of the test - Phase III study focused on test patients to
provide data for approving the drug
28Principle
- If a drug is found safe and effective in a
defined (test ) patient population, approval
should not depend on finding the drug ineffective
in some other (test -) population
29Evaluating the Efficiency of Enrichment Design
- Simon R and Maitnourim A. Evaluating the
efficiency of targeted designs for randomized
clinical trials. Clinical Cancer Research
106759-63, 2004 Correction and supplement
123229, 2006 - Maitnourim A and Simon R. On the efficiency of
targeted clinical trials. Statistics in Medicine
24329-339, 2005. - reprints and interactive sample size calculations
at http//linus.nci.nih.gov
30- Relative efficiency of targeted design depends on
- proportion of patients test positive
- effectiveness of new drug (compared to control)
for test negative patients - When less than half of patients are test positive
and the drug has little or no benefit for test
negative patients, the targeted design requires
dramatically fewer randomized patients
31TrastuzumabHerceptin
- Metastatic breast cancer
- 234 randomized patients per arm
- 90 power for 13.5 improvement in 1-year
survival over 67 baseline at 2-sided .05 level - If benefit were limited to the 25 assay
patients, overall improvement in survival would
have been 3.375 - 4025 patients/arm would have been required
32(No Transcript)
33Model for Two Treatments With Binary Response
- Molecularly targeted treatment T
- Control treatment C
- 1-? Proportion of patients that express target
- pc control response probability
- response probability for T patients who express
target (R) is (pc ?1) - Response probability for T patients who do not
express target (R-) is (pc ?0)
34Untargeted Trial
- Compare outcome for treatment group T vs control
group C without classifier data - Fisher-Exact test at two-sided level .05
comparing response proportion in control group to
response proportion in treatment group - Number of responses in C group of n patients is
binomial B(n,pc) - Number of responses in T group is
- B(n,(1-?)(pc?1) ?(pc?0))
- Determine n patients per treatment group for
power 1-? - Use Ury Fleiss approximation Biom
36347-51,1980.
35Targeted Trial
- Compare outcome for treatment group T vs control
group C for Assay positive patients - Fisher-Exact test at two-sided level .05
comparing response proportion in control group to
response proportion in treatment group - Number of responses in C group of n patients is
binomial B(n,pc) - Number of responses in T group is
- B(n,pc?1)
- Determine nT patients per treatment group for
power 1-? - Use Ury Fleiss approximation Biom
36347-51,1980.
36(No Transcript)
37(No Transcript)
38Approximations
- Observed response rate N(p,p(1-p)/n)
- pe(1-pe) pc(1-pc)
39Number of Randomized Patients Required
- Type I error ?
- Power 1-? for obtaining significance
40Randomized Ratio(normal approximation)
- RandRat nuntargeted/ntargeted
- ?1 rx effect in marker patients
- ?0 rx effect in marker - patients
- ? proportion of marker - patients
- If ?00, RandRat 1/ (1-?) 2
- If ?0 ?1/2, RandRat 1/(1- ?/2)2
41Randomized Rationuntargeted/ntargeted
1-? Express target ?00 ?0 ?1/2
0.75 1.78 1.31
0.5 4 1.78
0.25 16 2.56
42Screened Ratio
- Nuntargeted nuntargeted
- Ntargeted ntargeted/(1-?)
- ScreenRat Nuntargeted/Ntargeted(1- ?)RandRat
43Screened Ratio
Express target ?00 ?0 ?1/2
0.75 1.33 0.98
0.5 2 0.89
0.25 4 0.64
44Randomized Ratio
- RandRat nuntargeted/ntargeted
45Randomized Ratiosensitivityspecificity0.9
Express target ?00 ?0 ?1/2
0.75 1.29 1.26
0.5 1.8 1.6
0.25 3.0 1.96
0.1 25.0 1.86
46Screened RatioImperfect Assay
47Screened Ratiosensitivityspecificity0.9
Express target ?00 ?0 ?1/2
0.75 0.9 0.88
0.5 0.9 0.80
0.25 0.9 0.59
0.1 4.5 0.33
48Web Based Software for Designing RCT of Drug and
Predictive Biomarker
49- It can be very difficult to develop an effective
and analytically validated predictive biomarker
prior to launch of the phase III trial - Even for anti-EGFR antibodies, a more effective
biomarker turned out to be KRAS mutation, not
EGFR expression - For small molecule kinase inhibitors the task is
more difficult - In some settings it can be easier to use an
analytically validated biomarker of poor outcome
on the standard therapy
50- Score function S for distinguishing patients with
favorable outcome on standard rx vs those with
unfavorable outcome - Developed on training set of pts receiving std rx
- GF(s)CDF of S in favorable pts
- GU(s)CDF of S in unfavorable pts
- Computed on test set of pts receiving std rx
51- GU(s)sensitivity of test for selecting pts with
unfavorable outcome on std rx using threshold s - 1-GF(s)specificity of test
- Plot of GU(s) vs GF(s) ROC curve
52- Latent classes
- LCF
- LCU
- PrLCF?
- PrSRespFLCFp1
- PrSRespFLCUp0
- PrERespFLCFp1
- PrSRespFLCUp0?
53(No Transcript)
54(No Transcript)
55(No Transcript)
56(No Transcript)
57- The maximum treatment effect is ?. It can be
achieved if one selects a threshold t small
enough that the specificity of the test for
excluding cases with favorable outcome on the
standard treatment is 1. If the specificity is 1,
then the size of the treatment effect does not
depend on the sensitivity of the test - Proportion randomized (1-?)GU(t)?GF(t)
58- Simon and Maitnourim showed that the ratio of
number of patients needed to randomize for a
targeted design compared to a standard design
that does not use the biomarker is approximately
equal to the square of the ratio of the treatment
effects for the two designs - For the standard design the treatment effect is
(1-?)?
59(No Transcript)
60- If the threshold is selected for specificity 1,
then the randomization ratio equals (1-?)2 - Hence if half of the patients have favorable
outcome with standard treatment, i.e. ?0.5, then
the targeted design requires only one quarter the
number of randomized patients as the standard
design.
61Stratification Design
62Stratification Design
- Use the test to structure a prospective specified
primary analysis plan - Having a prospective analysis plan is essential
- Stratifying (balancing) the randomization is
useful to ensure that all randomized patients
have tissue available but is not a substitute for
a prospective analysis plan - The purpose of the study is to evaluate the new
treatment overall and for the pre-defined
subsets not to modify or refine the classifier - The purpose is not to demonstrate that repeating
the classifier development process on independent
data results in the same classifier
63Not Interaction Design
- Requiring a significant interaction at 5 level
to justify evaluating treatment effects in
subsets - was useful in the context of post-hoc subset
analysis when drugs were non-specific cytotoxins,
the subsets were not biology based and the prior
probability of qualitative interactions was low - is not useful for focused co-development of
molecularly targeted drugs when the subset
analysis is part of the primary analysis plan and
the study-wise type I error is controlled - is an example of how progress could be
unnecessarily stymied by making co-development
impracticably expensive
64- R Simon. Using genomics in clinical trial design,
Clinical Cancer Research 145984-93, 2008 - R Simon. Designs and adaptive analysis plans for
pivotal clinical trials of therapeutics and
companion diagnostics, Expert Opinion in Medical
Diagnostics 2721-29, 2008
65Analysis Plan A
- Compare the new drug to the control for
classifier positive patients - If pgt0.05 make no claim of effectiveness
- If p? 0.05 claim effectiveness for the
classifier positive patients and - Compare new drug to control for classifier
negative patients using 0.05 threshold of
significance
66Analysis Plan B(Limited confidence in test)
- Compare the new drug to the control overall for
all patients ignoring the classifier. - If poverall? 0.03 claim effectiveness for the
eligible population as a whole - Otherwise perform a single subset analysis
evaluating the new drug in the classifier
patients - If psubset? 0.02 claim effectiveness for the
classifier patients.
67Analysis Plan C
- Test for difference (interaction) between
treatment effect in test positive patients and
treatment effect in test negative patients at an
elevated level ?int (e.g. .10) - If interaction is significant at level ?int then
compare treatments separately for test positive
patients and test negative patients - Otherwise, compare treatments overall
68Sample Size Planning for Analysis Plan C
- 88 events in test patients needed to detect 50
reduction in hazard at 5 two-sided significance
level with 90 power - If 25 of patients are positive, when there are
88 events in positive patients there will be
about 264 events in negative patients - 264 events provides 90 power for detecting 33
reduction in hazard at 5 two-sided significance
level
69Simulation Results for Analysis Plan C
- Using ?int0.10, the interaction test has power
93.7 when there is a 50 reduction in hazard in
test positive patients and no treatment effect in
test negative patients - A significant interaction and significant
treatment effect in test positive patients is
obtained in 88 of cases under the above
conditions - If the treatment reduces hazard by 33 uniformly,
the interaction test is negative and the overall
test is significant in 87 of cases
70Does the RCT Need to Be Significant Overall for
the T vs C Treatment Comparison?
- No
- It is incorrect to require that the overall T vs
C comparison be significant to claim that T is
better than C for test patients but not for
test patients - That requirement has been traditionally used to
protect against data dredging. It is
inappropriate for focused trials of a treatment
with a companion test.
71Development of Genomic Classifiers
- During phase II development or
- Adaptively during phase III trial
- Using archived specimens from previous phase III
trial
72(No Transcript)
73(No Transcript)
74(No Transcript)
75Biomarker Adaptive Threshold Design
- Wenyu Jiang, Boris Freidlin Richard Simon
- JNCI 991036-43, 2007
76Biomarker Adaptive Threshold Design
- Randomized trial of T vs C
- Have identified a biomarker score B thought to be
predictive of patients likely to benefit from T
relative to C - Eligibility not restricted by biomarker
- No threshold for biomarker determined
- Biomarker value scaled to range (0,1)
- Time-to-event data
77Procedure A
- Compare T vs C for all patients
- If results are significant at level .04 claim
broad effectiveness of T - Otherwise proceed as follows
78Procedure A
- Test T vs C restricted to patients with biomarker
B gt b - Let S(b) be log likelihood ratio statistic
- Repeat for all values of b
- Let S maxS(b)
- Compute null distribution of S by permuting
treatment labels - If the data value of S is significant at 0.01
level, then claim effectiveness of T for a
patient subset - Compute point and bootstrap interval estimates of
the threshold b
79Estimated Power of Broad Eligibility Design
(n386 events) vs Adaptive Design A (n412
events) 80 power for 30 hazard reduction
Model Broad Eligibility Design Biomarker Adaptive Threshold A
40 reduction in 50 of patients (22 overall reduction) .70 .78
60 reduction in 25 of patients (20 overall reduction) .65 .91
79 reduction in 10 of patients (14 overall reduction) .35 .93
80(No Transcript)
81Multiple Biomarker Design
- Have identified K candidate binary classifiers B1
, , BK thought to be predictive of patients
likely to benefit from T relative to C - Eligibility not restricted by candidate
classifiers - For notation let B0 denote the classifier with
all patients positive
82- Test T vs C restricted to patients positive for
Bk for k0,1,,K - Let S(Bk) be log likelihood ratio statistic for
treatment effect in patients positive for Bk
(k1,,K) - Let S maxS(Bk) , k argmaxS(Bk)
- For a global test of significance
- Compute null distribution of S by permuting
treatment labels - If the data value of S is significant at 0.05
level, then claim effectiveness of T for patients
positive for Bk
83- Test T vs C restricted to patients positive for
Bk for k0,1,,K - Let S(Bk) be log likelihood ratio statistic for
treatment effect in patients positive for Bk
(k1,,K) - Let S maxS(Bk) , k argmaxS(Bk)
- The new treatment is superior to control for the
population defined by k - Repeating the analysis for bootstrap samples of
cases provides - an estimate of the stability of k (the
indication) - an interval estimate S (the size of treatment
effect for the size of treatment effect in the
target population)
84Adaptive Signature Design
- Boris Freidlin and Richard Simon
- Clinical Cancer Research 117872-8, 2005
85Adaptive Signature DesignEnd of Trial Analysis
- Compare E to C for all patients at significance
level 0.04 - If overall H0 is rejected, then claim
effectiveness of E for eligible patients - Otherwise
86- Otherwise
- Using only the first half of patients accrued
during the trial, develop a binary classifier
that predicts the subset of patients most likely
to benefit from the new treatment T compared to
control C - Compare T to C for patients accrued in second
stage who are predicted responsive to T based on
classifier - Perform test at significance level 0.01
- If H0 is rejected, claim effectiveness of T for
subset defined by classifier
87Treatment effect restricted to subset.10 of
patients sensitive, 10 sensitivity genes, 10,000
genes, 400 patients.
Test Power
Overall .05 level test 46.7
Overall .04 level test 43.1
Sensitive subset .01 level test (performed only when overall .04 level test is negative) 42.2
Overall adaptive signature design 85.3
88Cross-Validated Adaptive Signature Design(to be
submitted for publication)
- Wenyu Jiang, Boris Freidlin, Richard Simon
89Cross-Validated Adaptive Signature DesignEnd of
Trial Analysis
- Compare T to C for all patients at significance
level ?overall - If overall H0 is rejected, then claim
effectiveness of T for eligible patients - Otherwise
90Otherwise
- Partition the full data set into K parts
- Form a training set by omitting one of the K
parts. The omitted part is the test set - Using the training set, develop a predictive
classifier of the subset of patients who benefit
preferentially from the new treatment T compared
to control C using the methods developed for the
ASD - Classify the patients in the test set as
sensitive (classifier ) or insensitive
(classifier -) - Repeat this procedure K times, leaving out a
different part each time - After this is completed, all patients in the full
dataset are classified as sensitive or
insensitive
91- Compare T to C for sensitive patients by
computing a test statistic S e.g. the difference
in response proportions or log-rank statistic
(for survival) - Generate the null distribution of S by permuting
the treatment labels and repeating the entire
K-fold cross-validation procedure - Perform test at significance level 0.05 -
?overall - If H0 is rejected, claim effectiveness of T for
subset defined by classifier - The sensitive subset is determined by developing
a classifier using the full dataset
9270 Response to T in Sensitive Patients25
Response to T Otherwise25 Response to C20
Patients Sensitive
ASD CV-ASD
Overall 0.05 Test 0.486 0.503
Overall 0.04 Test 0.452 0.471
Sensitive Subset 0.01 Test 0.207 0.588
Overall Power 0.525 0.731
93Does It Matter If the Randomization in the RCT
Was Not Stratified By the Test?
- No
- Stratification improves balance of stratification
factors in overall comparisons - Stratification does not improve comparability of
treatment (T) and control (C) groups within test
positive patients or within test negative
patients. - In a fully prospective trial, stratification of
the randomization by the test is only useful for
ensuring that all patients have adequate test
performed
94Information about a predictive biomarker may
develop following completion of the pivotal
trials
- It may be infeasible to conduct a new
prospective trial for a previously approved drug - KRAS for anti-EGFR antibodies in colorectal
cancer - HER2 for doxorubicin in breast cancer
95- In some cases the benefits of a prospective trial
can be closely achieved by the carefully planned
use of archived tissue from a previously
conducted randomized clinical trial
96Use of Archived Specimens in Evaluation of
Prognostic and Predictive BiomarkersRichard M.
Simon, Soonmyung Paik and Daniel F. Hayes
- Claims of medical utility for prognostic and
predictive biomarkers based on analysis of
archived tissues can be considered to have either
a high or low level of evidence depending on
several key factors. - Studies using archived tissues, when conducted
under ideal conditions and independently
confirmed can provide the highest level of
evidence. - Traditional analyses of prognostic or predictive
factors, using non analytically validated assays
on a convenience sample of tissues and conducted
in an exploratory and unfocused manner provide a
very low level of evidence for clinical utility.
97Use of Archived Specimens in Evaluation of
Prognostic and Predictive BiomarkersRichard M.
Simon, Soonmyung Paik and Daniel F. Hayes
- For Level I Evidence
- (i) archived tissue adequate for a successful
assay must be available on a sufficiently large
number of patients from a phase III trial that
the appropriate analyses have adequate
statistical power and that the patients included
in the evaluation are clearly representative of
the patients in the trial. - (ii) The test should be analytically and
pre-analytically validated for use with archived
tissue. - (iii) The analysis plan for the biomarker
evaluation should be completely specified in
writing prior to the performance of the biomarker
assays on archived tissue and should be focused
on evaluation of a single completely defined
classifier. - iv) the results from archived specimens should be
validated using specimens from a similar, but
separate, study.
98Factor
A B C D
Clinical trial PRCT designed to address tumor marker Prospective trial not designed to address tumor marker, but design accommodates tumor marker utility. Accommodation of predictive marker requires PRCT Prospective observational registry, treatment and followup not dictated No prospective aspect to study
Patients and patient data Prospectively enrolled, treated, and followed in RCT Prospectively enrolled, treated, and followed in clinical trial and, especially if a predictive utility is considered, a PRCT addressing the treatment of interest Prospectively enrolled in registry, but treatment and followup standard of care No prospective stipulation of treatment or followup patient data collected by retrospective chart review
Specimen collection, processing, and archival Specimens collected, processed and assayed for specific marker in real time Specimens collected, processed, and archived prospectively using generic SOPs. Assayed after trial completed Specimens collected, processed, and archived prospectively using generic SOPs. Assayed after trial completed Specimens collected, processed and archived with no prospective SOPs
Statistical Design and analysis Study powered to address tumor marker question. Study powered to address therapeutic question underpowered to address tumor marker question. Focused analysis plan for marker question developed prior to doing assays Study not prospectively powered at all. Retrospective study design confounded by selection of specimens for study. Focused analysis plan for marker question developed prior to doing assays Study not prospectively powered at all. Retrospective study design confounded by selection of specimens for study. No focused analysis plan for marker question developed prior to doing assays
Validation Result unlikely to be play of chance Although preferred, validation not required Result more likely to be play of chance that A, but less likely than C. Requires one or more validation studies Result very likely to be play of chance. Requires subsequent validation studies Result very likely to be play of chance. Requires subsequent validation
Terminology Prospective Prospective using archived samples Prospective /observational Retrospective/observational
99Revised Levels of Evidence for Tumor Marker
Studies
Level of Evidence Category from Table 1 Validation Studies Available
I A None required
I B One or more with consistent results
II B None or Inconsistent results
II C 2 or more with consistent results
III C None or 1 with consistent results or Inconsistent results
IV-V D NA
100New Paradigms for Stud Design and Analysis for
Prediction
- Developments in biotechnology have forced
statisticians to focus on prediction problems - This has led to many exciting methodological
developments - pgtgtn problems in which number of genes is much
greater than the number of cases - Statistics has over-focused on inference. Many of
the methods and much of the conventional wisdom
of statistics are based on inference problems and
not applicable to prediction problems
101Some statisticians believe that accurate
prediction is not possible for pgtgtn
- Accurate prediction is often possible, but
standard statistical methods for model building
and evaluation are not effective - Much of the conventional wisdom about how to
develop and evaluate regression models is flawed
when applied to prediction
102- pgtn prediction problems are not multiple
comparison problems - Feature selection should be optimized for
accurate prediction, not for controlling the
false discovery rate - Standard statistical methods for model building
and evaluation are not effective - e.g. Fishers LDA vs diagonal LDA
- Model performance on the training set is
extremely misleading for pgtn problems and should
never be reported - Inadequate focus on selecting samples based on
pre-defined intended use of model
103- Goodness of fit is not a proper measure of
predictive accuracy - Odds ratios, hazard ratios and statistical
significance of regression coefficients are not a
proper measures of predictive accuracy
104- Validation of a predictive model means that the
model predicts accurately for independent data - Validation does not mean that the model is stable
or that using the same algorithm on independent
data will give a similar model
105Prediction Based Clinical Trials
- Using cross-validation we can evaluate new
methods for analysis of clinical trials in terms
of their intended use which is informing
therapeutic decision making
106- fj(x) probability of response for patient with
covariate vector x who receives treatment j
107Single Hypothesis Testing Based Decision Making
in an RCT
- Test H0 ExfT(x) ExfC(x)
- or fT(x) fC(x) for all x
- If you reject H0 then treat future patients with
T, otherwise treat future patients with C
108Other Approaches
109Predicting the Effect of Analysis Methods on
Patient Outcome
- At the conclusion of the trial randomly partition
the patients into 10 equally sized sets P1 , ,
P10 - Let D-i denote the full dataset minus data for
patients in Pi - Using 10-fold complete cross-validation, omit
patients in Pi - Analyze trial using only data in D-i with both
the standard analysis and the alternative
analysis
110- For each patient j in Pi record the
cross-validated treatment recommendations based
on D-i
111- Let ST denote the set of cases for which the
standard analysis recommends C and the
alternative analysis recommends T - Let SC denote the set of cases for which the
standard analysis recommends T and the
alternative analysis recommends C
112- For patients in ST compare outcomes for patients
who received T versus those who received C - For patients in SC compare outcomes for patients
who received T versus those who received C
113- Hence, alternative methods for analyzing RCTs
can be evaluated in an unbiased manner with
regard to their value to patients using the
actual RCT data
114Conclusions
- New biotechnology and knowledge of tumor biology
provide important opportunities to improve
therapeutic decision making - Treatment of broad populations with regimens that
do not benefit most patients is increasingly no
longer necessary nor economically sustainable - The established molecular heterogeneity of human
diseases requires the use new approaches to the
development and evaluation of therapeutics
115Conclusions
- New biotechnology and knowledge of tumor biology
provide important opportunities to improve
therapeutic decision making - Treatment of broad populations with regimens that
do not benefit most patients is increasingly no
longer necessary nor economically sustainable - The established molecular heterogeneity of human
diseases requires the use new approaches to the
development and evaluation of therapeutics
116Conclusions
- Some of the conventional wisdom about statistical
analysis of clinical trials is not applicable to
trials dealing with co-development of drugs and
diagnostic - e.g. subset analysis if the overall results are
not significant or if an interaction test is not
significant or if the randomization was not
stratified by the subsetting variable
117Conclusions
- Can we develop new drugs in a manner more
consistent with modern tumor biology and obtain
reliable actionable information about what
regimens work for what kinds of patients? - The information doesnt have to be perfect to be
much better than what we currently have
118Conclusions
- Co-development of drugs and companion diagnostics
increases the complexity of drug development - It does not make drug development simpler,
cheaper and quicker - But it may make development more successful and
it has great potential value for patients and for
the economics of health care