Predictive Analysis of Clinical Trials

About This Presentation

Title:

Predictive Analysis of Clinical Trials

Description:

Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://brb.nci.nih.gov * * * For Binary Outcome Covariates: Age, performance status ... – PowerPoint PPT presentation

Number of Views:251

Avg rating:3.0/5.0

Slides: 112

Provided by: rsi9

Learn more at: https://brb.nci.nih.gov

Category:

more less

Transcript and Presenter's Notes

Title: Predictive Analysis of Clinical Trials

1
Predictive Analysis of Clinical Trials

Richard Simon, D.Sc.
Chief, Biometric Research Branch
National Cancer Institute
http//brb.nci.nih.gov

2
Biomarker Biological Measurement

Early detection biomarker
Endpoint biomarker
Prognostic biomarkers
Predictive biomarkers

3
Kinds of Biomarkers

Endpoint
Measured before, during and after treatment to
monitor pace of disease and treatment effect
Pharmacodynamic (phase 0-1)
Does drug hit target
Intermediate response (phase 2)
Does drug have anti-tumor effect
Surrogate for clinical outcome (phase 3)

Prognostic biomarkers
Measured before treatment to indicate long-term
outcome for patients untreated or receiving
standard treatment
May reflect both disease aggressiveness and
effect of standard treatment
Used to determine who needs more intensive
treatment
Predictive biomarkers
Measured before treatment to identify who will
benefit from a particular treatment

5
Validation Fitness for Intended Use
6

Single gene or protein measurement
Scalar index or classifier that summarizes
contributions of multiple genes

7
Prognostic Predictive Biomarkersin Genomic
Oncology

Many cancer treatments benefit only a minority of
patients to whom they are administered
Being able to predict which patients are likely
to benefit can
Help patients get an effective treatment
Help control medical costs
Improve the success rate of clinical drug
development

8
Biomarker Validity

Analytical validity
Measures what its supposed to
Reproducible and robust
Clinical validity (correlation)
It correlates with something clinically
Medical utility
Actionable resulting in patient benefit

9
Clinical Utility

Prognostic and predictive biomarkers have utility
if they are actionable for informing treatment
decisions in a manner that results in patient
benefit

10
Clinical Utility

Biomarker benefits patient by improving treatment
decisions
Identify patients who have very good prognosis on
standard treatment and do not require more
intensive regimens
Identify patients who are likely or unlikely to
benefit from a specific regimen

11
Prognostic markers

There is an enormous published literature on
prognostic markers in cancer.
Very few prognostic markers (factors) are
recommended for measurement by ASCO, are approved
by FDA or are reimbursed for by payers. Very few
play a role in treatment decisions.

12
Pusztai et al. The Oncologist 8252-8, 2003

939 articles on prognostic markers or
prognostic factors in breast cancer in past 20
years
ASCO guidelines only recommend routine testing
for ER, PR and HER-2 in breast cancer
With the exception of ER or progesterone
receptor expression and HER-2 gene amplification,
there are no clinically useful molecular
predictors of response to any form of anticancer
therapy.

13
Prognostic Factors in Oncology

Most prognostic factors are not used because they
are not therapeutically relevant
Most prognostic factor studies are not conducted
with an intended use clearly in mind
They use a convenience sample of patients for
whom tissue is available.
Generally the patients are too heterogeneous to
support therapeutically relevant conclusions
There is rarely a validation study separate from
the developmental study that addresses medical
utility
An analytically validated test is rarely developed

Prognostic factors for such a heterogeneous group
of patients is not actionable i.e. does not
help with trteatment decision making.

15
(No Transcript)
16
Major problems with prognostic studies of gene
expression signatures

Inadequate focus on intended use
Cases selected based on availability of specimens
rather than for relevance to intended use
Heterogeneous sample of patients with mixed
stages and treatments. Attempt to disentangle
effects using regression modeling
Too a great a focus on which marker is prognostic
or independently prognostic, not whether the
marker is effective for intended use

17
If you dont know where you are going, you might
not get thereYogi Berra
18
Prognostic Biomarkers Can be Therapeutically
Relevant

lt10 of node negative ER breast cancer patients
require or benefit from the cytotoxic
chemotherapy that they receive

19
OncotypeDx Recurrence Score

Intended use
Patients with node negative estrogen receptor
positive breast cancer who are going to receive
an anti-estrogen drug following local
surgery/radiotherapy
Identify patients who have such good prognosis
that they are unlikely to derive much benefit
from adjuvant chemotherapy

Selected patients relevant for the intended use
Analyzed the data to see if the recurrence score
identified a subset with such good prognosis that
the absolute benefit of chemotherapy would at
best be very small in absolute terms

21
Biotechnology Has Forced Biostatistics to Focus
on Prediction

This has led to many exciting methodological
developments
pgtgtn problems in which number of genes is much
greater than the number of cases
And many erroneous publications
And growing pains in transitioning from an
over-dependence on inference
Many of the methods and much of the conventional
wisdom of statistics are based on inference
problems and are not applicable to prediction
problems

Goodness of fit is not a proper measure of
predictive accuracy
Odds ratios and hazards ratios are not proper
measures of prediction accuracy
Statistical significance of regression
coefficients are not proper measures of
predictive accuracy

23
Goodness of Fit vs Prediction Accuracy

Fit of a model to the same data used to develop
it is no evidence of prediction accuracy for
independent data
Prediction is difficult particularly the
future.
Dan Quale or Neils Bohr?

24
(No Transcript)
25
Prediction on Simulated Null DataSimon et al. J
Nat Cancer Inst 9514, 2003

Generation of Gene Expression Profiles
20 specimens (Pi is the expression profile for
specimen i)
Log-ratio measurements on 6000 genes
Pi MVN(0, I6000)
Can we distinguish between the first 10
specimens (Class 1) and the last 10 (Class 2)?
Prediction Method
Compound covariate predictor built from the
log-ratios of the 10 most differentially
expressed genes.

26
(No Transcript)
27
Cross Validation

Cross-validation simulates the process of
separately developing a model on one set of data
and predicting for a test set of data not used in
developing the model
The cross-validated estimate of misclassification
error is an estimate of the prediction error for
model fit using specified algorithm to full
dataset

28
Cross-validation Estimate of Prediction Error
29

Cross validation is only valid if the test set is
not used in any way in the development of the
model. Using the complete set of samples to
select genes violates this assumption and
invalidates cross-validation.
With proper cross-validation, the model must be
developed from scratch for each leave-one-out
training set. This means that feature selection
must be repeated for each leave-one-out training
set.

30
Predictive Biomarkers

Cancers of a primary site often represent a
heterogeneous group of diverse molecular entities
which vary fundamentally with regard to
the oncogenic mutations that cause them
their responsiveness to specific drugs

31
Most cancer treatments benefit only a minority of
patients to whom they are administered

Being able to predict who requires intensive
treatment and who is likely to benefit from which
treatments could
save patients from unnecessary debilitating
adverse effects of treatments that they dont
need or benefit from
enhance their chance of receiving a treatment
that helps them
Help control medical costs
Improve the success rate of clinical drug
development

In most positive phase III clinical trials
comparing a new treatment to control, most of the
patients treated with the new treatment did not
benefit.
Adjuvant breast cancer 70 long-term
disease-free survival on control. 80
disease-free survival on new treatment. 70 of
patients dont need the new treatment. Of the
remaining 30, only 1/3rd benefit.

33
Predictive Biomarkers

Estrogen receptor over-expression in breast
cancer
Anti-estrogens, aromatase inhibitors
HER2 amplification in breast cancer
Trastuzumab, Lapatinib
OncotypeDx gene expression recurrence score in
breast cancer
Low score for ER node - -gt no chemotherapy
KRAS in colorectal cancer
WT KRAS cetuximab or panitumumab
EGFR mutation in NSCLC
EGFR inhibitor
V600E mutation in BRAF of melanoma
vemurafenib
ALK translocation in NSCLC
crizotinib

34
Standard Paradigm of Broad Eligibility Phase III
Clinical Trials Sometimes Leads to

Treating many patients with few benefiting
Small average treatment effects
Problematic for health care economics
Inconsistency in results among studies
False negative studies

35
The standard approach to designing phase III
clinical trials is based on two assumptions

Qualitative treatment by subset interactions are
unlikely
Costs of over-treatment are less than costs
of under-treatment

Oncology therapeutics development is now focused
on molecularly targeted drugs that are only
expected to be effective in a subset of patients
whose tumors are driven by the molecular targets
Most new cancer drugs are very expensive
the aspirin paradigm on which some current
clinical trial dogma is based is a roadblock to
progress

37
Subset Analysis

In the past often studied as un-focused post-hoc
analyses
Numerous subsets examined
Same data used to define subsets for analysis and
for comparing treatments within subsets
No control of type I error
Led to conventional wisdom
Only hypothesis generation
Only valid if overall treatment difference is
significant
Only valid if there is a significant treatment by
subset interaction

Neither current practices of subset analysis nor
current practices of ignoring differences in
treatment effect among patients are effective for
evaluating treatments where qualitative
interactions are likely or for informing labeling
indications

Although the randomized clinical trial remains of
fundamental importance for predictive genomic
medicine, some of the conventional wisdom of how
to design and analyze rcts requires
re-examination
The concept of doing an rct of thousands of
patients to answer a single question about
average treatment effect for a target population
presumed homogeneous with regard to the direction
of treatment efficacy in many cases no longer has
an adequate scientific basis

How can we develop new drugs in a manner more
consistent with modern tumor biology and obtain
reliable information about what regimens work for
what kinds of patients?

41
Development is Most Efficient When the Scientific
Basis for the Clinical Trial is Strong

Having an important molecular target
Having a drug that can inhibit the target in an
overwhelming proportion of tumor cells at an
achievable concentration
Having a pre-treatment assay that can identify
the patients for whom the molecular target is
driving progression of disease

42
When the Biology is Clear

Develop a classifier that identifies the patients
likely (or unlikely) to benefit from the new drug
Classifier is based on either a single
gene/protein or composite score
Develop an analytically validated test
Measures what it should accurately and
reproducibly
Design a focused clinical trial to evaluate
effectiveness of the new treatment in test
patients

43
Using phase II data, develop predictor of
response to new drug
Targeted (Enrichment) Design
44
(No Transcript)
45
Evaluating the Efficiency of Targeted Design

Simon R and Maitnourim A. Evaluating the
efficiency of targeted designs for randomized
clinical trials. Clinical Cancer Research
106759-63, 2004 Correction and supplement
123229, 2006
Maitnourim A and Simon R. On the efficiency of
targeted clinical trials. Statistics in Medicine
24329-339, 2005.

Relative efficiency of targeted design depends on
proportion of patients test positive
specificity of treatment effect for test positive
patients
When less than half of patients are test positive
and the drug has minimal benefit for test
negative patients, the targeted design requires
dramatically fewer randomized patients than the
standard design in which the marker is not used

47
Two Clinical Trial Designs

Standard design
Randomized comparison of new drug E to control C
without the test for screening patients
Targeted design
Test patients
Randomize only test patients
Treatment effect D in test patients
Treatment effect D- in test patients
Proportion of patients test is p
Size each design to have power 0.9 and
significance level 0.05

48
RandRat nuntargeted/ntargeted

If D-0, RandRat 1/ p2
if p0.5, RandRat4
If D- D/2, RandRat 4/(p 1)2
if p0.5, RandRat16/91.77

49
Comparing T vs C on Survival or DFS5 2-sided
Significance and 90 Power
Reduction in Hazard Number of Events Required
25 509
30 332
35 227
40 162
45 118
50 88
50

Hazard ratio 0.60 for test patients
40 reduction in hazard
Hazard ratio 1.0 for test patients
0 reduction in hazard
33 of patients test positive
Hazard ratio for unselected population is
0.330.60 0.671 0.87
13 reduction in hazard

To have 90 power for detecting 40 reduction in
hazard within a biomarker positive subset
Number of events within subset 162
To have 90 power for detecting 13 reduction in
hazard overall
Number of events 2172

52
TrastuzumabHerceptin

Metastatic breast cancer
234 randomized patients per arm
90 power for 13.5 improvement in 1-year
survival over 67 baseline at 2-sided .05 level
If benefit were limited to the 25 test
patients, overall improvement in survival would
have been 3.375
4025 patients/arm would have been required

53
Web Based Software for Planning Clinical Trials
of Treatments with a Candidate Predictive
Biomarker

http//brb.nci.nih.gov

54
(No Transcript)
55
Principle

If a drug is found safe and effective in a
defined patient population, approval should not
depend on finding the drug ineffective in some
other population

56
Implications for Early Phase Studies

Need to design and size early phase studies to
discover an effective predictive biomarker for
identifying the correct target population
Need to establish an analytically validated test
for measuring the predictive marker in the phase
III pivotal studies

57
When the drug is specific for one target and the
biology is well understood

May need to evaluate several candidate tests
e.g. protein expression of target or
amplification of gene
Need to decide whether to include test negative
patients in phase II trials
Phase II trials sized for adequate numbers of
test positive patients

58
When the drug has several targets or the biology
is not well understood

Should biologically characterize tumors for all
patients on phase II studies with regard to
candidate targets and response moderators
Phase II trials sized for evaluating candidates
Opportunity for sequential and adaptive designs
to improve efficiency

59
Empirical screening of expression profiles or
mutations to develop predictive marker

Larger sample size required
Dobbin, Zhao, Simon, Clinical Ca Res 14108,
2008.
Use of archived samples from previous negative
phase III trial
Use of large disease specific panel of
molecularly characterized human tumor cell lines
to identify predictive marker

60
(No Transcript)
61
Stratification DesignInteraction Design
62
Develop prospective analysis plan for evaluation
of treatment effect and how it relates to
biomarker

Defined analysis plan that protects type I error
Trial sized for evaluating treatment effect in
test and test subsets
Test negative patients should be adequately
protected using interim futility analysis

63
Fallback Analysis Plan

Test average treatment effect at reduced level p0
(e.g. .01)
If significant claim broad effectiveness
If overall effect is not significant, test
treatment effect in marker subset at level
.05-p0
If significant claim effectiveness for marker
subset
Test of marker subset should not require either
Overall significance nor
Significant interaction

64
Sample size for Analysis Plan

To have 90 power for detecting uniform 33
reduction in overall hazard at 1 two-sided
level requires 370 events.
If 33 of patients are positive, then when there
are 370 total events there will be approximately
123 events in positive patients
123 events provides 90 power for detecting a 45
reduction in hazard at a 4 two-sided
significance level.

65
(No Transcript)
66
(No Transcript)
67
(No Transcript)
68
Bayesian Two-Stage DesignRCT With Single Binary
Marker
69
The Biology is Often Not So Clear

Cancer biology is complex and it is not always
possible to have the right single predictive
classifier identified with an appropriate
cut-point by the time the phase 3 trial of a new
drug is ready to start accrual

70
The Objectives of a Phase III Clinical Trial

Test the strong null hypothesis that the test
treatment is uniformly ineffective compared to
control for primary endpoint
If the null hypothesis is rejected, develop a
labeling indication for informing physicians in
their decisions about which patients they treat
with the drug.

The test of the null hypothesis of no average
treatment effect is not necessarily a good test
of the strong null hypothesis that the new
treatment is uniformly ineffective
Rejection of the null hypothesis is not in itself
adequate information for guiding physicians on
how to use the treatment

72
Biomarker Selection Design

Based on Adaptive Threshold Design
W Jiang, B Freidlin R Simon
JNCI 991036-43, 2007

72
73
Biomarker Selection Design

Have identified K candidate biomarkers B1 , , BK
thought to be predictive of patients likely to
benefit from T relative to C
Cut-points not necessarily established for each
biomarker
Eligibility not restricted by candidate markers

74
Marker Selection Design
75

Compute p minp1 , p2 , , pK
Compute whether the value of p is statistically
significant when adjusted for multiple testing
Adjust for multiple testing by permuting the
treatment labels and re-calculating p1pK and p
for the permuted treatment labels
Repeat for 10,000 random permutations to
approximate the null distribution of p

To detect a 40 reduction in hazard in an
a-priori defined subset with 90 power and a 4
two-sided significance level requires 171 events
in the subset.
To adjust for multiplicity with 4 independent
binary tests, 171 -gt 224.
If 33 are positive for each marker, then the
trial might be sized for 3224 total 672 events.

77
Designs When there are Many Candidate Markers and
too Much Patient Heterogeneity for any Single
Marker
78
(No Transcript)
79
Adaptive Signature Design
80

The indication classifier is not a binary
classifier of whether a patient has good
prognosis or poor prognosis
It is a two sample classifier of whether the
prognosis of a patient on E is better than the
prognosis of the patient on C

The indication classifier can be a binary
classifier that maps the vector of candidate
covariates into E,C indicating which treatment
is predicted superior for that patient
The classifier need not use all the covariates
but variable selection must be determined using
only the training set
Variable selection may be based on selecting
variables with apparent interactions with
treatment, with cut-off for variable selection
determined by cross-validation within training
set for optimal classification
The indication classifier can be a probabilistic
classifier

82
(No Transcript)
83
(No Transcript)
84
Treatment effect restricted to subset.10 of
patients sensitive, 400 patients.
Test Power
Overall .05 level test 46.7
Overall .04 level test 43.1
Sensitive subset .01 level test (performed only when overall .04 level test is negative) 42.2
Overall adaptive signature design 85.3
85
Overall treatment effect, no subset effect. 400
patients
Test Power
Overall .05 level test 74.2
Overall .04 level test 70.9
Sensitive subset .01 level test 1.0
Overall adaptive signature design 70.9
86

This approach can be used with any set of
candidate predictor variables
This approach can also be used to identify the
subset of patients who dont benefit from E in
cases where E is superior to C overall

87
(No Transcript)
88
(No Transcript)
89
Cross-Validated Adaptive Signature Design

Define indication classifier development
algorithm A
Apply algorithm to full dataset to develop
indication classifier for use in future patients
M(xA,P)
Using K fold cross validation
Classify patients in test sets based on
classifiers developed in training sets e.g.
yiM(xiA,P-i)
Si yi E
Compare E to C in S and estimate size of
treatment effect
is an estimate of the size of the
treatment effect
for future patients with M(xA,P)E

90
Cross-Validated Adaptive Signature Design

Approximate null distribution of
Permute treatment labels
Repeat complete cross-validation procedure
Generate permutation distribution of the
values for permuted data
Test null hypothesis that the treatment effect in
classifier positive patients is null using as
test statistic cross-validated estimate of
treatment effect in positive patients

91
Key Ideas

Replace multiple significance testing by
development of one indication classifier
Control study-wise type I error for significance
test of
Overall average treatment effect
Treatment effect in classifier positive patients
Test of treatment effect in classifier positive
patients does not depend on significance of
overall test nor on significant interaction
Obtain unbiased or conservative estimate of the
treatment effect of future classifier positive
patients

The size of the E vs C treatment effect for the
indicated population is (conservatively)
estimated from the cross validation by the Kaplan
Meier survival curves of E and of C in S
The Kaplan-Meier curves of E and C for patients
in S provides an estimate of

The stability of the indication classifier
M(xA,D)can be evaluated by examining the
consistency of classifications M(xiA, B) for
bootstrap samples B from D.

Although there may be less certainty about
exactly which types of patient benefit from E
relative to C, classification may be better than
for many standard clinical trial in which all
patients are classified based on results of
testing the single overall null hypothesis

95
70 Response to E in Sensitive Patients25
Response to E Otherwise25 Response to C30
Patients Sensitive
ASD CV-ASD
Overall 0.05 Test 0.830 0.838
Overall 0.04 Test 0.794 0.808
Sensitive Subset 0.01 Test 0.306 0.723
Overall Power 0.825 0.918
96
25 Response to T 25 Response to CNo Subset
Effect
ASD CV-ASD
Overall 0.05 Test 0.047 0.056
Overall 0.04 Test 0.04 0.048
Sensitive Subset 0.01 Test 0.001 0
Overall Power 0.041 0.048
97
For Binary Outcome
98
For Binary Outcome
99
(No Transcript)
100
506 prostate cancer patients were randomly
allocated to one of four arms Placebo and 0.2 mg
of diethylstilbestrol (DES) were combined as
control arm C 1.0 mg DES, or 5.0 mg DES were
combined as T. The end-point was overall
survival (death from any cause).
Covariates Age, performance status (pf), tumor
size (sz), stage/grade index (sg), serum acid
phosphatase (ap)
101
Figure 1 Overall analysis. The value of the
log-rank statistic is 2.9 and the corresponding
p-value is 0.09. The new treatment thus shows no
benefit overall at the 0.05 level.
102
Figure 2 Cross-validated survival curves for
patients predicted to benefit from the new
treatment. log-rank statistic 10.0, permutation
p-value is .002
103
Figure 3 Survival curves for cases predicted not
to benefit from the new treatment. The value of
the log-rank statistic is 0.54.
104
(No Transcript)
105
(No Transcript)
106
(No Transcript)
107
Prediction Based Clinical Trials

We can evaluate our methods for analysis of
clinical trials in terms of their effect on
patient outcome via informing therapeutic
decision making

108
Expected Survival Distribution for Future
PatientsWith Standard Analysis
109
Expected Survival Distribution for Future
PatientsWith Indication Classifier
110

Hence, alternative methods for analyzing RCTs
can be evaluated in an unbiased manner with
regard to their value to patients using the
actual RCT data

111
Conclusions

New biotechnology and knowledge of tumor biology
provide important opportunities to improve
therapeutic decision making
Treatment of broad populations with regimens that
do not benefit most patients is increasingly no
longer necessary nor economically sustainable
The established molecular heterogeneity of human
diseases requires the use new approaches to the
development and evaluation of therapeutics