Title: Predictive Classifiers Based on High Dimensional Data Development
1Predictive Classifiers Based on High Dimensional
DataDevelopment Use in Clinical Trial Design
- Richard Simon, D.Sc.
- Chief, Biometric Research Branch
- National Cancer Institute
- http//linus.nci.nih.gov/brb
2Biomarkers
- Surrogate endpoints
- A measurement made before and after treatment to
determine whether the treatment is working - Surrogate for clinical benefit
- Predictive classifiers
- A measurement made before treatment to select
good patient candidates for the treatment
3Predictive Biomarker Classifiers
- Many cancer treatments benefit only a small
proportion of the patients to which they are
administered - Targeting treatment to the right patients can
greatly improve the therapeutic ratio of benefit
to adverse effects - Treated patients benefit
- Treatment more cost-effective for society
4Developmental Strategy (I)
- Develop a diagnostic classifier that identifies
the patients likely to benefit from the new drug - Develop a reproducible assay for the classifier
- Use the diagnostic to restrict eligibility to a
prospectively planned evaluation of the new drug - Demonstrate that the new drug is effective in the
prospectively defined set of patients determined
by the diagnostic
5Develop Predictor of Response to New Drug
Using phase II data, develop predictor of
response to new drug
Patient Predicted Responsive
Patient Predicted Non-Responsive
Off Study
New Drug
Control
6Applicability of Design I
- Primarily for settings where the classifier is
based on a single gene whose protein product is
the target of the drug - With substantial biological basis for the
classifier, it may be unacceptable ethically to
expose classifier negative patients to the new
drug
7Evaluating the Efficiency of Strategy (I)
- Simon R and Maitnourim A. Evaluating the
efficiency of targeted designs for randomized
clinical trials. Clinical Cancer Research
106759-63, 2004. - Maitnourim A and Simon R. On the efficiency of
targeted clinical trials. Statistics in Medicine
24329-339, 2005. - reprints and interactive sample size calculations
at http//linus.nci.nih.gov/brb
8Two Clinical Trial Designs
- Un-targeted design
- Randomized comparison of T to C without screening
for expression of molecular target - Targeted design
- Assay patients for expression of target
- Randomize only patients expressing target
9- Relative efficiency depends on proportion of
patients test positive, and effectiveness of drug
(compared to control) for test negative patients - When less than half of patients are test negative
and the drug has little or no benefit for test
negative patients, the targeted design requires
dramatically fewer randomized patients. - May require fewer or more patients to be screened
than randomized with untargeted design
10Web Based Software for Comparing Sample Size
Requirements
- http//linus.nci.nih.gov/brb/
-
11(No Transcript)
12(No Transcript)
13(No Transcript)
14Developmental Strategy (II)
15Developmental Strategy (II)
- Do not use the diagnostic to restrict
eligibility, but to structure a prospective
analysis plan. - Compare the new drug to the control overall for
all patients ignoring the classifier. - If poverall? 0.04 claim effectiveness for the
eligible population as a whole - Otherwise perform a single subset analysis
evaluating the new drug in the classifier
patients - If psubset? 0.01 claim effectiveness for the
classifier patients.
16- The purpose of the RCT is to evaluate the new
treatment overall and for the pre-defined subset - The purpose is not to re-evaluate the components
of the classifier, or to modify or refine the
classifier - The purpose is not to demonstrate that repeating
the classifier development process on independent
data results in the same classifier
17Developmental Strategy III
- Do not use the diagnostic to restrict
eligibility, but to structure a prospective
analysis plan. - Compare the new drug to the control for
classifier positive patients - If pgt0.05 make no claim of effectiveness
- If p? 0.05 claim effectiveness for the
classifier positive patients and - Continue accrual of classifier negative patients
and eventually test treatment effect at 0.05
level
18The Roadmap
- Develop a completely specified genomic classifier
of the patients likely to benefit from a new drug - Establish reproducibility of measurement of the
classifier - Use the completely specified classifier to design
and analyze a new clinical trial to evaluate
effectiveness of the new treatment with a
pre-defined analysis plan.
19Guiding Principle
- The data used to develop the classifier must be
distinct from the data used to test hypotheses
about treatment effect in subsets determined by
the classifier - Developmental studies are exploratory
- And not closely regulated by FDA
- FDA should not regulate classifier development
- Studies on which treatment effectiveness claims
are to be based should be definitive studies that
test a treatment hypothesis in a patient
population completely pre-specified by the
classifier
20Adaptive Signature Design An adaptive design for
generating and prospectively testing a gene
expression signature for sensitive patients
- Boris Freidlin and Richard Simon
- Clinical Cancer Research 117872-8, 2005
21Adaptive Signature DesignEnd of Trial Analysis
- Compare E to C for all patients at significance
level 0.04 - If overall H0 is rejected, then claim
effectiveness of E for eligible patients - Otherwise
22- Otherwise
- Using only the first half of patients accrued
during the trial, develop a binary classifier
that predicts the subset of patients most likely
to benefit from the new treatment E compared to
control C - Compare E to C for patients accrued in second
stage who are predicted responsive to E based on
classifier - Perform test at significance level 0.01
- If H0 is rejected, claim effectiveness of E for
subset defined by classifier
23Classifier Development
- Using data from stage 1 patients, fit all single
gene logistic models (j1,,M) - Select genes with interaction significant at
level ?
24Classification of Stage 2 Patients
- For ith stage 2 patient, selected gene j votes
to classify patient as preferentially sensitive
to T if
25Classification of Stage 2 Patients
- Classify ith stage 2 patient as differentially
sensitive to T relative to C if at least G
selected genes vote for differential sensitivity
of that patient
26Treatment effect restricted to subset.10 of
patients sensitive, 10 sensitivity genes, 10,000
genes, 400 patients.
27Overall treatment effect, no subset effect.10
of patients sensitive, 10 sensitivity genes,
10,000 genes, 400 patients.
28Development of Classifiers Based on High
Dimensional Data
29Good Microarray Studies Have Clear Objectives
- Class Comparison
- For predetermined classes, identify
differentially expressed genes - Class Prediction
- Prediction of predetermined class (e.g. response)
using information from gene expression profile - Class Discovery
- Discover clusters among specimens or among genes
30Components of Class Prediction
- Feature (gene) selection
- Which genes will be included in the model
- Select model type
- E.g. Diagonal linear discriminant analysis,
Nearest-Neighbor, - Fitting parameters (regression coefficients) for
model - Selecting value of tuning parameters
31Simple Feature Selection
- Genes that are differentially expressed among the
classes at a significance level ? (e.g. 0.01) - The ? level is selected only to control the
number of genes in the model
32Complex Feature Selection
- Small subset of genes which together give most
accurate predictions - Combinatorial optimization algorithms
- Decision trees, Random forest
- Top scoring pairs, Greedy pairs
- Little evidence that complex feature selection is
useful in microarray problems - Many published complex methods for selecting
combinations of features do not appear to have
been properly evaluated - Wessels et al. (Bioinformatics 213755, 2005)
- Lai et al (BMC Bioinformatics 7235, 2006)
- Lecocke Hess (Cancer Informatics 2313,2006)
33Linear Classifiers for Two Classes
34Linear Classifiers for Two Classes
- Fisher linear discriminant analysis
- Diagonal linear discriminant analysis (DLDA)
assumes features are uncorrelated - Compound covariate predictor
- Weighted voting classifier
- Support vector machines with inner product kernel
- Perceptrons
- Naïve Bayes classifier
35Other Simple Methods
- Nearest neighbor classification
- Nearest centroid classification
- Shrunken centroid classification
36 When pgtgtn
- It is always possible to find a set of features
and a weight vector for which the classification
error on the training set is zero. - There is generally not sufficient information in
pgtgtn training sets to effectively use complex
methods
37- Myth Complex classification algorithms perform
better than simpler methods for class prediction. - Comparative studies indicate that simpler methods
usually work as well or better for microarray
problems because they avoid overfitting the data.
38Internal Validation of a Classifier
- Split-sample validation
- Split data into training and test sets
- Test single fully specified model on the test set
- Often applied invalidly with tuning parameter
optimized on test set - Cross-validation or bootstrap resampling
- Repeated training-test partitions
- Average errors over repetitions
39Cross-Validated Prediction (Leave-One-Out Method)
1. Full data set is divided into training and
test sets (test set contains 1 specimen). 2.
Prediction rule is built from scratch
using the training
set. 3. Rule is applied to the specimen in the
test set for class prediction. 4. Process is
repeated until each specimen has appeared once in
the test set.
40- Cross validation is only valid if the test set is
not used in any way in the development of the
model. Using the complete set of samples to
select genes violates this assumption and
invalidates cross-validation. - With proper cross-validation, the model must be
developed from scratch for each leave-one-out
training set. This means that feature selection
must be repeated for each leave-one-out training
set. - The cross-validated estimate of misclassification
error is an estimate of the prediction error for
model fit using specified algorithm to full
dataset
41Prediction on Simulated Null Data
- Generation of Gene Expression Profiles
- 14 specimens (Pi is the expression profile for
specimen i) - Log-ratio measurements on 6000 genes
- Pi MVN(0, I6000)
- Can we distinguish between the first 7 specimens
(Class 1) and the last 7 (Class 2)? - Prediction Method
- Compound covariate prediction
- Compound covariate built from the log-ratios of
the 10 most differentially expressed genes.
42(No Transcript)
43- For small studies, cross-validation, if performed
correctly, can be preferable to split-sample
validation - Cross-validation can only be used when there is a
well specified algorithm for classifier
development
44(No Transcript)
45(No Transcript)
46Simulated Data40 cases, 10 genes selected from
5000
47Simulated Data40 cases
48Permutation Distribution of Cross-validated
Misclassification Rate of a Multivariate
Classifier
- Randomly permute class labels and repeat the
entire cross-validation - Re-do for all (or 1000) random permutations of
class labels - Permutation p value is fraction of random
permutations that gave as few misclassifications
as e in the real data
49Validation of Predictive Classifier Does Not
Involve
- Measuring overlap of gene sets used in classifier
developed from independent data - Statistical significance of gene expression
levels or summary signatures in multivariate
analysis - Confirmation of gene expression measurements on
other platforms - Demonstrating that the classifier or any of its
components are validated biomarkers of disease
status
50(No Transcript)
51(No Transcript)
52(No Transcript)
53Publications Reviewed
- Searched Medline
- Hand screening of abstracts papers
- Original study on human cancer patients
- Published in English before December 31, 2004
- Analyzed gene expression of more than 1000 probes
- Related gene expression to clinical outcome
54Types of Clinical Outcome
- Survival or disease-free survival
- Response to therapy
55- 90 publications identified that met criteria
- Abstracted information for all 90
- Performed detailed review of statistical analysis
for the 42 papers published in 2004
56Major Flaws Found in 40 Studies Published in 2004
- Inadequate control of multiple comparisons in
gene finding - 9/23 studies had unclear or inadequate methods to
deal with false positives - 10,000 genes x .05 significance level 500 false
positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete
cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression
clusters based on differentially expressed genes
could help distinguish clinical outcomes - 50 of studies contained one or more major flaws
57Class Comparison and Class Prediction
- Not clustering problems
- Global similarity measures generally used for
clustering arrays may not distinguish classes - Dont control multiplicity or for distinguishing
data used for classifier development from data
used for classifier evaluation - Supervised methods
- Requires multiple biological samples from each
class
58(No Transcript)
59(No Transcript)
60Sample Size Planning References
- K Dobbin, R Simon. Sample size determination in
microarray experiments for class comparison and
prognostic classification. Biostatistics 627-38,
2005 - K Dobbin, R Simon. Sample size planning for
developing classifiers using high dimensional DNA
microarray data. Biostatistics 8101-117, 2007 - K Dobbin, Y Zhao, R Simon. How large a training
set is needed to develop a classifier for
microarray data, (Clinical Cancer Research, in
press)
61Predictive Classifiers in BRB-ArrayTools
- Classifiers
- Diagonal linear discriminant
- Compound covariate
- Bayesian compound covariate
- Support vector machine with inner product kernel
- K-nearest neighbor
- Nearest centroid
- Shrunken centroid (PAM)
- Random forest
- Tree of binary classifiers for k-classes
- Survival risk-group
- Supervised pcs
- Feature selection options
- Univariate t/F statistic
- Hierarchical variance option
- Restricted by fold effect
- Univariate classification power
- Recursive feature elimination
- Top-scoring pairs
- Validation methods
- Split-sample
- LOOCV
- Repeated k-fold CV
- .632 bootstrap
62Acknowledgements
- Kevin Dobbin
- Alain Dupuy
- Boris Freidlin
- Aboubakar Maitournam
- Michael Radmacher
- Sudhir Varma
- Yingdong Zhao
- BRB-ArrayTools Development Team