Title: Development and Validation of Predictive Classifiers using Gene Expression Profiles
1Development and Validation of Predictive
Classifiers using Gene Expression Profiles
- Richard Simon, D.Sc.
- Chief, Biometric Research Branch
- National Cancer Institute
- http//brb.nci.nih.gov
2BRB Websitebrb.nci.nih.gov
- Powerpoint presentations and audio files
- Reprints Technical Reports
- BRB-ArrayTools software
- BRB-ArrayTools Data Archive
- 100 published cancer gene expression datasets
with clinical annotations - Sample Size Planning for Clinical Trials with
Predictive Biomarkers
3(No Transcript)
4(No Transcript)
5Types of Clinical Outcome
- Survival or disease-free survival
- Response to therapy
6- 90 publications identified that met criteria
- Abstracted information for all 90
- Performed detailed review of statistical analysis
for the 42 papers published in 2004
7Major Flaws Found in 40 Studies Published in 2004
- Inadequate control of multiple comparisons in
gene finding - 9/23 studies had unclear or inadequate methods to
deal with false positives - 10,000 genes x .05 significance level 500 false
positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete
cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression
clusters based on differentially expressed genes
could help distinguish clinical outcomes - 50 of studies contained one or more major flaws
8(No Transcript)
9(No Transcript)
10Kinds of Biomarkers
- Surrogate endpoint
- Pre post rx, early measure of clinical outcome
- Pharmacodynamic
- Pre post rx, measures an effect of rx on
disease - Prognostic
- Which patients need rx
- Predictive
- Which patients are likely to benefit from a
specific rx - Product characterization
11Cardiac Arrhythmia Supression Trial
- Ventricular premature beats was proposed as a
surrogate for survival - Antiarrythmic drugs supressed ventricular
premature beats but killed patients at
approximately 2.5 times that of placebo
12Prognostic Biomarkers
- Most prognostic factors are not used because they
are not therapeutically relevant - Most prognostic factor studies are poorly
designed - They are not focused on a clear therapeutically
relevant objective - They use a convenience sample of patients for
whom tissue is available. Generally the patients
are too heterogeneous to support therapeutically
relevant conclusions - They address statistical significance rather than
predictive accuracy relative to standard
prognostic factors
13Pusztai et al. The Oncologist 8252-8, 2003
- 939 articles on prognostic markers or
prognostic factors in breast cancer in past 20
years - ASCO guidelines only recommend routine testing
for ER, PR and HER-2 in breast cancer - With the exception of ER or progesterone
receptor expression and HER-2 gene amplification,
there are no clinically useful molecular
predictors of response to any form of anticancer
therapy.
14Prognostic and Predictive Classifiers
- Most cancer treatments benefit only a minority of
patients to whom they are administered - Particularly true for molecularly targeted drugs
- Being able to predict which patients are likely
to benefit would - save patients from unnecessary toxicity, and
enhance their chance of receiving a drug that
helps them - Help control medical costs
- Improve the success rate of clinical drug
development
15(No Transcript)
16- Molecularly targeted drugs may benefit a
relatively small population of patients with a
given primary site/stage of disease - Iressa
- Herceptin
17(No Transcript)
18Prognostic Biomarkers Can be Therapeutically
Relevant
- 3-5 of node negative ER breast cancer patients
require or benefit from systemic rx other than
endocrine rx - Prognostic biomarker development should focus on
specific therapeutic decision context
19B-14 ResultsRelapse-Free Survival
Paik et al, SABCS 2003
20Key Features of OncotypeDx Development
- Identification of important therapeutic decision
context - Prognostic marker development was based on
patients with node negative ER positive breast
cancer receiving tamoxifen as only systemic
treatment - Use of patients in NSABP clinical trials
- Staged development and validation
- Separation of data used for test development from
data used for test validation - Development of robust assay with rigorous
analytical validation - 21 gene RTPCR assay for FFPE tissue
- Quality assurance by single reference laboratory
operation
21Predictive Biomarkers
- Cancers of a primary site are often a
heterogeneous grouping of diverse molecular
diseases - The molecular diseases vary enormously in their
responsiveness to a given treatment - It is feasible (but difficult) to develop
prognostic markers that identify which patients
need systemic treatment and which have tumors
likely to respond to a given treatment - e.g. breast cancer and ER/PR, Her2
22Mutations Copy number changes Translocations Expre
ssion profile
Treatment
23DNA Microarray Technology
- Powerful tool for understanding mechanisms and
enabling predictive medicine - Challenges ability of biomedical scientists to
use effectively to produce biological knowledge
or clinical utility - Challenges statisticians with new problems for
which existing analysis paradigms are often
inapplicable - Excessive hype and skepticism
24Myth
- That microarray investigations should be
unstructured data-mining adventures without clear
objectives
25- Good microarray studies have clear objectives,
but not generally gene specific mechanistic
hypotheses - Design and analysis methods should be tailored to
study objectives
26Good Microarray Studies Have Clear Objectives
- Class Comparison
- Find genes whose expression differs among
predetermined classes - Fing genes whose expression varies over a time
course in response to a defined stimulus - Class Prediction
- Prediction of predetermined class (phenotype)
using information from gene expression profile - Survival risk group prediction
- Class Discovery
- Discover clusters of specimens having similar
expression profiles - Discover clusters of genes having similar
expression profiles
27Class Comparison and Class Prediction
- Not clustering problems
- Global similarity measures generally used for
clustering arrays may not distinguish classes - Dont control multiplicity or for distinguishing
data used for classifier development from data
used for classifier evaluation - Supervised methods
- Requires multiple biological samples from each
class
28Levels of Replication
- Technical replicates
- RNA sample divided into multiple aliquots and
re-arrayed - Biological replicates
- Multiple subjects
- Replication of the tissue culture experiment
29- Biological conclusions generally require
independent biological replicates. The power of
statistical methods for microarray data depends
on the number of biological replicates. - Technical replicates are useful insurance to
ensure that at least one good quality array of
each specimen will be obtained.
30Class Prediction
- Predict which tumors will respond to a particular
treatment - Predict which patients will relapse after a
particular treatment
31Microarray Platforms for Developing Predictive
Classifiers
- Single label arrays
- Affymetrix GeneChips
- Dual label arrays using common reference design
- Dye swaps are unnecessary
32Common Reference Design
A1
A2
B1
B2
RED
R
R
R
R
GREEN
Array 1
Array 2
Array 3
Array 4
Ai ith specimen from class A
Bi ith specimen from class B
R aliquot from reference pool
33- The reference generally serves to control
variation in the size of corresponding spots on
different arrays and variation in sample
distribution over the slide. - The reference provides a relative measure of
expression for a given gene in a given sample
that is less variable than an absolute measure. - The reference is not the object of comparison.
- The relative measure of expression will be
compared among biologically independent samples
from different classes.
34(No Transcript)
35Class Prediction
- A set of genes is not a classifier
- Testing whether analysis of independent data
results in selection of the same set of genes is
not an appropriate test of predictive accuracy of
a classifier
36Components of Class Prediction
- Feature (gene) selection
- Which genes will be included in the model
- Select model type
- E.g. Diagonal linear discriminant analysis,
Nearest-Neighbor, - Fitting parameters (regression coefficients) for
model - Selecting value of tuning parameters
- Estimating prediction accuracy
37Class Prediction ? Class Comparison
- The criteria for gene selection for class
prediction and for class comparison are different - For class comparison false discovery rate is
important - For class prediction, predictive accuracy is
important - Demonstrating statistical significance of
prognostic factors is not the same as
demonstrating predictive accuracy. - Statisticians are used to inference, not
prediction - Most statistical methods were not developed for
pgtgtn prediction problems
38Myth
- Complex classification algorithms such as neural
networks perform better than simpler methods for
class prediction.
39Simple Gene Selection
- Select genes that are differentially expressed
among the classes at a significance level ? (e.g.
0.01) - The ? level is a tuning parameter
- For class comparison false discovery rate is
important - For class prediction, predictive accuracy is
important - For prediction it is usually more serious to
exclude an informative variable than to include
some noise variables
40Optimal significance level cutoffs for gene
selection. 50 differentially expressed genes
out of 22,000 on n arrays
2d/s standardized difference n10 n30 n50
1 0.167 0.003 0.00068
1.25 0.085 0.0011 0.00035
1.5 0.045 0.00063 0.00016
1.75 0.026 0.00036 0.00006
2 0.015 0.0002 0.00002
41(No Transcript)
42Complex Gene Selection
- Small subset of genes which together give most
accurate predictions - Genetic algorithms
- Little evidence that complex feature selection is
useful in microarray problems - Failure to compare to simpler methods
- Improper use of cross-validation
43Linear Classifiers for Two Classes
44Linear Classifiers for Two Classes
- Fisher linear discriminant analysis
- Diagonal linear discriminant analysis (DLDA)
assumes features are uncorrelated - Compound covariate predictor (Radmacher)
- Golubs weighted voting method
- Support vector machines with inner product kernel
- Perceptron
45Fisher LDA
46The Compound Covariate Predictor (CCP)
- Motivated by J. Tukey, Controlled Clinical
Trials, 1993 - A compound covariate is built from the basic
covariates (log-ratios) - tj is the two-sample t-statistic for gene j.
- xij is the log-expression measure of sample i for
gene j. - Sum is over selected genes.
- Threshold of classification midpoint of the CCP
means for the two classes.
47Linear Classifiers for Two Classes
- Compound covariate predictor
- Instead of for DLDA
48Support Vector Machine
49Perceptrons
- Perceptrons are neural networks with no hidden
layer and linear transfer functions between input
output - Number of input nodes equals number of genes
selected - Number of output nodes equals number of classes
minus 1 - Number of inputs may be major principal
components of genes or major principal components
of informative genes - Perceptrons are linear classifiers
50Other Simple Methods
- Nearest neighbor classification
- Nearest k-neighbors
- Nearest centroid classification
- Shrunken centroid classification
51Nearest Neighbor Classifier
- To classify a sample in the validation set as
being in outcome class 1 or outcome class 2,
determine which sample in the training set its
gene expression profile is most similar to. - Similarity measure used is based on genes
selected as being univariately differentially
expressed between the classes - Correlation similarity or Euclidean distance
generally used - Classify the sample as being in the same class as
its nearest neighbor in the training set
52 When pgtgtn
- It is always possible to find a set of features
and a weight vector for which the classification
error on the training set is zero. - Why consider more complex models?
53- Artificial intelligence sells to journal
reviewers and peers who cannot distinguish hype
from substance when it comes to microarray data
analysis. - Comparative studies generally indicate that
simpler methods work as well or better for
microarray problems because they avoid
overfitting the data.
54(No Transcript)
55Other Methods
- Top-scoring pairs
- CART
- Random Forrest
56Apparent Dimension Reduction Based Methods
- Principal component regression
- Supervised principal component regression
- Partial least squares
- Stepwise logistic regression
57When There Are More Than 2 Classes
- Nearest neighbor type methods
- Decision tree of binary classifiers
58Decision Tree of Binary Classifiers
- Partition the set of classes 1,2,,K into two
disjoint subsets S1 and S2 - e.g. S11, S2 2,3,4
- Develop a binary classifier for distinguishing
the composite classes S1 and S2 - Compute the cross-validated classification error
for distinguishing S1 and S2 - Repeat the above steps for all possible
partitions in order to find the partition S1and
S2 for which the cross-validated classification
error is minimized - If S1and S2 are not singleton sets, then repeat
all of the above steps separately for the classes
in S1and S2 to optimally partition each of them
59Evaluating a Classifier
- Prediction is difficult, especially the future.
- Neils Bohr
60Validating a Predictive Classifier
- Fit of a model to the same data used to develop
it is no evidence of prediction accuracy for
independent data - Goodness of fit is not prediction accuracy
- Demonstrating statistical significance of
prognostic factors is not the same as
demonstrating predictive accuracy - Demonstrating stability of selected genes is not
demonstrating predictive accuracy of a model for
independent data
61(No Transcript)
62(No Transcript)
63Split-Sample Evaluation
- Training-set
- Used to select features, select model type,
determine parameters and cut-off thresholds - Test-set
- Withheld until a single model is fully specified
using the training-set. - Fully specified model is applied to the
expression profiles in the test-set to predict
class labels. - Number of errors is counted
- Ideally test set data is from different centers
than the training data and assayed at a different
time
64Cross-Validated Prediction (Leave-One-Out Method)
1. Full data set is divided into training and
test sets (test set contains 1 specimen). 2.
Prediction rule is built from scratch
using the training
set. 3. Rule is applied to the specimen in the
test set for class prediction. 4. Process is
repeated until each specimen has appeared once in
the test set.
65Leave-one-out Cross Validation
- Omit sample 1
- Develop multivariate classifier from scratch on
training set with sample 1 omitted - Predict class for sample 1 and record whether
prediction is correct
66Leave-one-out Cross Validation
- Repeat analysis for training sets with each
single sample omitted one at a time - e number of misclassifications determined by
cross-validation - Subdivide e for estimation of sensitivity and
specificity
67- Cross validation is only valid if the test set is
not used in any way in the development of the
model. Using the complete set of samples to
select genes violates this assumption and
invalidates cross-validation. - With proper cross-validation, the model must be
developed from scratch for each leave-one-out
training set. This means that feature selection
must be repeated for each leave-one-out training
set. - Simon R, Radmacher MD, Dobbin K, McShane LM.
Pitfalls in the analysis of DNA microarray data.
Journal of the National Cancer Institute
9514-18, 2003. - The cross-validated estimate of misclassification
error is an estimate of the prediction error for
model fit using specified algorithm to full
dataset
68Prediction on Simulated Null Data
- Generation of Gene Expression Profiles
- 14 specimens (Pi is the expression profile for
specimen i) - Log-ratio measurements on 6000 genes
- Pi MVN(0, I6000)
- Can we distinguish between the first 7 specimens
(Class 1) and the last 7 (Class 2)? - Prediction Method
- Compound covariate prediction (discussed later)
- Compound covariate built from the log-ratios of
the 10 most differentially expressed genes.
69(No Transcript)
70Partial Cross-Validation of Random Data
- Generate data for p features and n cases
identically distributed in two classes - No model should predict more accurately than the
flip of a fair coin - Using all the data select kltltp features that
appear most differentially expressed between the
two classes - Cross validate the estimation of model parameters
using the same k features for all LOOCV training
sets - The cross-validated estimate of prediction error
will be 0 over 99 of the time.
71(No Transcript)
72Major Flaws Found in 40 Studies Published in 2004
- Inadequate control of multiple comparisons in
gene finding - 9/23 studies had unclear or inadequate methods to
deal with false positives - 10,000 genes x .05 significance level 500 false
positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete
cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression
clusters based on differentially expressed genes
could help distinguish clinical outcomes - 50 of studies contained one or more major flaws
73Class Prediction
- Cluster analysis is frequently used in
publications for class prediction in a misleading
way
74Fallacy of Clustering Classes Based on Selected
Genes
- Even for arrays randomly distributed between
classes, genes will be found that are
significantly differentially expressed - With 10,000 genes measured, about 500 false
positives will be differentially expressed with
p lt 0.05 - Arrays in the two classes will necessarily
cluster separately when using a distance measure
based on genes selected to distinguish the
classes
75Major Flaws Found in 40 Studies Published in 2004
- Inadequate control of multiple comparisons in
gene finding - 9/23 studies had unclear or inadequate methods to
deal with false positives - 10,000 genes x .05 significance level 500 false
positives - Misleading report of prediction accuracy
- 12/28 reports based on incomplete
cross-validation - Misleading use of cluster analysis
- 13/28 studies invalidly claimed that expression
clusters based on differentially expressed genes
could help distinguish clinical outcomes - 50 of studies contained one or more major flaws
76(No Transcript)
77Myth
- Split sample validation is superior to LOOCV or
10-fold CV for estimating prediction error
78(No Transcript)
79Simulated Data40 cases, 10 genes selected from
5000
Method Estimate Std Deviation
True .078
Resubstitution .007 .016
LOOCV .092 .115
10-fold CV .118 .120
5-fold CV .161 .127
Split sample 1-1 .345 .185
Split sample 2-1 .205 .184
.632 bootstrap .274 .084
80Comparison of Internal Validation
MethodsMolinaro, Pfiffer Simon
- For small sample sizes, LOOCV is much more
accurate than split-sample validation - Split sample validation over-estimates prediction
error - For small sample sizes, LOOCV is preferable to
10-fold, 5-fold cross-validation or repeated
k-fold versions - For moderate sample sizes, 10-fold is preferable
to LOOCV - Some claims for bootstrap resampling for
estimating prediction error are not valid for
pgtgtn problems
81(No Transcript)
82Simulated Data40 cases, 10 genes selected from
5000
Method Estimate Std Deviation
True .078
Resubstitution .007 .016
LOOCV .092 .115
10-fold CV .118 .120
5-fold CV .161 .127
Split sample 1-1 .345 .185
Split sample 2-1 .205 .184
.632 bootstrap .274 .084
83Simulated Data40 cases
Method Estimate Std Deviation
True .078
10-fold .118 .120
Repeated 10-fold .116 .109
5-fold .161 .127
Repeated 5-fold .159 .114
Split 1-1 .345 .185
Repeated split 1-1 .371 .065
84DLBCL Data
Method Bias Std Deviation MSE
LOOCV -.019 .072 .008
10-fold CV -.007 .063 .006
5-fold CV .004 .07 .007
Split 1-1 .037 .117 .018
Split 2-1 .001 .119 .017
.632 bootstrap -.006 .049 .004
85Permutation Distribution of Cross-validated
Misclassification Rate of a Multivariate
Classifier
- Randomly permute class labels and repeat the
entire cross-validation - Re-do for all (or 1000) random permutations of
class labels - Permutation p value is fraction of random
permutations that gave as few misclassifications
as e in the real data
86Gene-Expression Profiles in Hereditary Breast
Cancer
- Breast tumors studied
- 7 BRCA1 tumors
- 8 BRCA2 tumors
- 7 sporadic tumors
- Log-ratios measurements of 3226 genes for each
tumor after initial data filtering
RESEARCH QUESTION Can we distinguish BRCA1 from
BRCA1 cancers and BRCA2 from BRCA2 cancers
based solely on their gene expression profiles?
87BRCA1
88BRCA2
89Classification of BRCA2 Germline Mutations
Classification Method LOOCV Prediction Error
Compound Covariate Predictor 14
Fisher LDA 36
Diagonal LDA 14
1-Nearest Neighbor 9
3-Nearest Neighbor 23
Support Vector Machine (linear kernel) 18
Classification Tree 45
90Myth
- Huge sample sizes are needed to develop effective
predictive classifiers
91Sample Size Planning References
- K Dobbin, R Simon. Sample size determination in
microarray experiments for class comparison and
prognostic classification. Biostatistics 627,
2005 - K Dobbin, R Simon. Sample size planning for
developing classifiers using high dimensional DNA
microarray data. Biostatistics 8101, 2007 - K Dobbin, Y Zhao, R Simon. How large a training
set is needed to develop a classifier for
microarray data? Clinical Cancer Res 14108, 2008
92Sample Size Planning for Classifier Development
- The expected value (over training sets) of the
probability of correct classification PCC(n)
should be within ? of the maximum achievable
PCC(?)
93Probability Model
- Two classes
- Log expression or log ratio MVN in each class
with common covariance matrix - m differentially expressed genes
- p-m noise genes
- Expression of differentially expressed genes are
independent of expression for noise genes - All differentially expressed genes have same
inter-class mean difference 2? - Common variance for differentially expressed
genes and for noise genes
94Classifier
- Feature selection based on univariate t-tests for
differential expression at significance level ? - Simple linear classifier with equal weights
(except for sign) for all selected genes. Power
for selecting each of the informative genes that
are differentially expressed by mean difference
2? is 1-?(n)
95- For 2 classes of equal prevalence, let ?1 denote
the largest eigenvalue of the covariance matrix
of informative genes. Then
96(No Transcript)
97(No Transcript)
98Sample size as a function of effect size
(log-base 2 fold-change between classes divided
by standard deviation). Two different tolerances
shown, . Each class is equally represented in the
population. 22000 genes on an array.
99(No Transcript)
100BRB-ArrayToolsSurvival Risk Group Prediction
- No need to transform data to good vs bad outcome.
Censored survival is directly analyzed - Gene selection based on significance in
univariate Cox Proportional Hazards regression - Uses k principal components of selected genes
- Gene selection re-done for each resampled
training set - Develop k-variable Cox PH model for each
leave-one-out training set
101BRB-ArrayToolsSurvival Risk Group Prediction
- Classify left out sample as above or below median
risk based on model not involving that sample - Repeat, leaving out 1 sample at a time to obtain
cross-validated risk group predictions for all
cases - Compute Kaplan-Meier survival curves of the two
predicted risk groups - Permutation analysis to evaluate statistical
significance of separation of K-M curves
102BRB-ArrayToolsSurvival Risk Group Prediction
- Compare Kaplan-Meier curves for gene expression
based classifier to that for standard clinical
classifier - Develop classifier using standard clinical
staging plus genes that add to standard staging
103Does an Expression Profile Classifier Predict
More Accurately Than Standard Prognostic
Variables?
- Some publications fit logistic model to standard
covariates and the cross-validated predictions of
expression profile classifiers - This is valid only with split-sample analysis
because the cross-validated predictions are not
independent
104Does an Expression Profile Classifier Predict
More Accurately Than Standard Prognostic
Variables?
- Not an issue of which variables are significant
after adjusting for which others or which are
independent predictors - Predictive accuracy and inference are different
- The predictiveness of the expression profile
classifier can be evaluated within levels of the
classifier based on standard prognostic variables
105Survival Risk Group Prediction
- LOOCV loop
- Create training set by omitting ith case
- Develop PH model for training set
- Compute predictive index for ith case using PH
model developed for training set - Compute percentile of predictive index for ith
case among predictive indices for cases in the
training set
106Survival Risk Group Prediction
- Plot Kaplan Meier survival curves for cases with
predictive index percentiles above 50 and for
cases with cross-validated risk percentiles below
50 - Or for however many risk groups and thresholds is
desired - Compute log-rank statistic comparing the
cross-validated Kaplan Meier curves
107Survival Risk Group Prediction
- Evaluate individual genes by fitting single
variable proportional hazards regression models
to log expression for each gene - Select genes based on p-value threshold for
single gene PH regressions - Compute first k principal components of the
selected genes - Fit PH regression model with the k pcs as
predictors. Let b1 , , bk denote the estimated
regression coefficients - To predict for case with expression profile
vector x, compute the k supervised pcs y1 , ,
yk and the predictive index ? b1 y1 bk yk
108Survival Risk Group Prediction
- Repeat the entire procedure for permutations of
survival times and censoring indicators to
generate the null distribution of the log-rank
statistic - The usual chi-square null distribution is not
valid because the cross-validated risk
percentiles are correlated among cases - Evaluate statistical significance of the
association of survival and expression profiles
by referring the log-rank statistic for the
unpermuted data to the permutation null
distribution
109- Outcome prediction in estrogen-receptor positive,
chemotherapy and tamoxifen treated patients with
locally advanced breast cancer
R. Simon, G. Bianchini, M. Zambetti, S. Govi, G.
Mariani, M. L. Carcangiu, P. Valagussa, L.
Gianni National Cancer Institute, Bethesda, MD
Fondazione IRCCS - Istituto Tumori di Milano,
Milan, Italy
110PATIENTS AND METHODS - I
- Fifty-seven patients with ER positive tumors
enrolled in a neoadjuvant clinical trial for LABC
were evaluated. All patients had been treated
with doxorubicin and paclitaxel q 3wk x 3,
followed by weekly paclitaxel x 12 before
surgery, then adjuvant intravenous CMF q 4wk x 4
and thereafter tamoxifen. - High-throughput qRT-PCR gene expression analysis
in paraffin-embedded formalin-fixed core biopsies
at diagnosis was performed by Genomic Health to
quantify expression of 363 genes (plus 21 for
Oncotype DXTM determination), as described
previously (Gianni L, JCO 2005). RS genes were
excluded from analysis.
111PATIENTS AND METHODS - II
- Three models (prognostic index) were developed to
predict Distant Event Free Survival (DEFS) - GENE MODEL Using only expression data, genes were
selected based on univariate Cox analysis p value
under a specific threshold significance level. - COVARIATES MODEL Using RS (as continuous
variable), age and IBC status (covariates) a
multivariate proportional hazards model was
developed. - COMBINED MODEL Using a combination of these
covariates and expression data, genes were
selected which add to predicting survival over
the predictive value provided by the covariates
and under a specific threshold significance
level. - Survival risk groups were constructed using the
supervised principal component method implemented
in BRB-ArrayTools (Bair E, Tibshirani R, PLOS
Biology 2004).
112PATIENTS AND METHODS - III
- In order to evaluate the predictive value for
each model a complete Leave-One-Out
Cross-Validation was used. - For each i-th cross-validated training set (with
one case removed) a prognostic index (PI)
function was created. The PI for the omitted
patient is ranked relative to the PI for the i-th
training set. Because the PI is a continuous
variable, a cut-off percentiles have to be
pre-specified for defining the risk groups. The
omitted patient is placed into a risk group based
on her percentile ranking. The entire procedure
has been repeated using different cut-off
percentiles (BRB-ArrayTools Users Manual v3.7).
113PATIENTS AND METHODS - IV
- Statistical significance was determined by
repeating the entire cross-validation process
1000 random permutations of the survival data. - For GENE MODEL the p value was testing the null
hypothesis that there is no relation between the
expression data and survival (by providing a
null-distribution of the log-rank statistic) - For COVARIATES MODEL the p value was the
parametric log-rank test statistic between risk
groups - For COMBINED MODEL the p value addressed whether
the expression data adds significantly to risk
prediction compared to the covariates
114RESULTSPatients characteristics at diagnosis
- The median follow-up was 76 months (range 18-103)
(by inverse Kaplan-Meier method) - Patients characteristics were summarized in Table
1.
115OS and DEFS of all patients
Overall Survival and Distant Event Free survival
All patients
116Genes selected for the GENE MODEL and COMBINED
MODEL
- The significance level for gene selection used
for the identified models was p0.005. - All genes included in the COMBINED MODEL were
also selected in the GENE MODEL.
117Cross-validated Kaplan-Meier curves for risk
groups using 50th percentile cut-off
DISTANT EVENT FREE SURVIVAL
DISTANT EVENT FREE SURVIVAL
DISTANT EVENT FREE SURVIVAL
COMBINED MODEL
COVARIATES MODEL
GENE MODEL
118BRB-ArrayTools
- Contains analysis tools that I have selected as
valid and useful - Analysis wizard and multiple help screens for
biomedical scientists - Imports data from all platforms and major
databases - Automated import of data from NCBI Gene Express
Omnibus
119Predictive Classifiers in BRB-ArrayTools
- Classifiers
- Diagonal linear discriminant
- Compound covariate
- Bayesian compound covariate
- Support vector machine with inner product kernel
- K-nearest neighbor
- Nearest centroid
- Shrunken centroid (PAM)
- Random forrest
- Tree of binary classifiers for k-classes
- Survival risk-group
- Supervised pcs
- Feature selection options
- Univariate t/F statistic
- Hierarchical variance option
- Restricted by fold effect
- Univariate classification power
- Recursive feature elimination
- Top-scoring pairs
- Validation methods
- Split-sample
- LOOCV
- Repeated k-fold CV
- .632 bootstrap
120Selected Features of BRB-ArrayTools
- Multivariate permutation tests for class
comparison to control number and proportion of
false discoveries with specified confidence level - Permits blocking by another variable, pairing of
data, averaging of technical replicates - SAM
- Fortran implementation 7X faster than R versions
- Extensive annotation for identified genes
- Internal annotation of NetAffx, Source, Gene
Ontology, Pathway information - Links to annotations in genomic databases
- Find genes correlated with quantitative factor
while controlling number of proportion of false
discoveries - Find genes correlated with censored survival
while controlling number or proportion of false
discoveries - Analysis of variance
121Selected Features of BRB-ArrayTools
- Gene set enrichment analysis.
- Gene Ontology groups, signaling pathways,
transcription factor targets, micro-RNA putative
targets - Automatic data download from Broad Institute
- KS LS test statistics for null hypothesis that
gene set is not enriched - Efron/Tibshirani max mean test
- Goemans Global test of null hypothesis that no
genes in set are differentially expressed - Class prediction
- Multiple classifiers
- Complete LOOCV, k-fold CV, repeated k-fold, .632
bootstrap - permutation significance of cross-validated error
rate
122Selected Features of BRB-ArrayTools
- Survival risk-group prediction
- Supervised principal components with and without
clinical covariates - Cross-validated Kaplan Meier Curves
- Permutation test of cross-validated KM curves
- Clustering tools for class discovery with
reproducibility statistics on clusters - Internal access to Eisens Cluster and Treeview
- Visualization tools including rotating 3D
principal components plot exportable to
Powerpoint with rotation controls - Extensible via R plug-in feature
- Tutorials and datasets
123BRB-ArrayTools
- Extensive built-in gene annotation and linkage to
gene annotation websites - Publicly available for non-commercial use
- http//brb.nci.nih.gov
124Conclusions
- New technology and biological knowledge make it
increasingly feasible to identify which patients
are most likely to benefit from a specified
treatment - Predictive medicine is feasible based on
genomic characterization of a patients tumor - Targeting treatment can greatly improve the
therapeutic ratio of benefit to adverse effects - Treated patients benefit
- Economic benefit for society
125Conclusions
- Achieving the potential of new technology
requires paradigm changes in focus and methods of
correlative science. - Effective interdisciplinary research requires
increased emphasis on cross education of
laboratory, clinical and statistical scientists
126Acknowledgements
- Kevin Dobbin
- Alain Dupuy
- Wenyu Jiang
- Annette Molinaro
- Michael Radmacher
- Joanna Shih
- Yingdong Zhao
- BRB-ArrayTools Development Team