Title: Computational Diagnostics
1Breast Cancer, Expression Profiles and Binary
Regression in 7000 Dimensions
Computational Diagnostics We are a new research
group in the department of Computational
Molecular Biology at the Max Planck Institute for
Molecular Genetics in Berlin-Dahlem. Our group is
part of the Berlin Center for Genome Based
Bioinformatics and participates in the NGFN (
National Genome Research Network ). Research A
comprehensive understanding of the mostly subtle
differences in gene expression in patient
specific cell samples is crucial for elucidating
the molecular characteristics of diseases as well
as for the optimal choice of treatment. Large
scale gene expression profiling allow for a
systematic investigation of the molecular
characteristics of diseases. Recently, there was
tremendous progress in the development of
technologies that allows for the parallel
measurement of expression levels for tens of
thousands of genes. However, it is still very
challenging to interpret the data, and use it in
clinical decision processes. The focus of this
group is to develop statistical methodology for
the use of gene expression profiles in medical
diagnostics. We aim to identify pattern in
expression profiles that improve or facilitate
diagnosis, help to predict clinical outcome or
refine common diagnostic schemes.
Members Stefan Bentink Web www.molgen.mpg.de/
bentink email bentink_at_molgen.mpg.de Fon(49
30) 8413 - 1352 Claudio Lottaz Web
www.molgen.mpg.de/lottaz email
lottaz_at_molgen.mpg.de Fon (49 30) 8413 -
1352 Florian Markowetz Web www.molgen.mpg.de/m
arkowet email markowet_at_molgen.mpg.de Fon
(49 30) 8413 - 1352 Rainer Spang (head)
Web www.molgen.mpg.de/spang email
spang_at_molgen.mpg.de Fon (49 30) 8413 - 1352
Stefanie Scheid Web www.molgen.mpg.de/scheid
email scheid_at_molgen.mpg.de Fon (49 30)
8413 - 1352 Publications Prediction and
uncertainty in the analysis of gene expression
profiles Rainer Spang, Carrie Blanchette, Harry
Zuzan, Jeffrey R. Marks, Joseph Nevins and Mike
West Proceedings of the German Conference on
Bioinformatics GCB 2001 Predicting the clinical
status of human breast cancer by using gene
expression profiles West M, Blanchette C,
Dressman H, Huang E, Ishida S, Spang R, Zuzan H,
Olson JA Jr, Marks JR, Nevins JR. Proc Natl Acad
Sci U S A. 2001 Sep 2598(20)11462-7 Role for
E2F in control of both DNA replication and
mitotic functions as revealed from DNA
microarray analysis Ishida S, Huang E, Zuzan H,
Spang R, Leone G, West M, Nevins JR. Mol Cell
Biol. 2001 Jul21(14)4684-99
Rainer Spang, Harry Zuzan, Carrie Blanchette,
Erich Huang, Holly Dressman, Jeff Marks, Joe
Nevins, Mike West Duke Medical Center Duke
University
- Estrogen Receptor Status
- 7000 genes
- 49 breast tumors
- 25 ER
- 24 ER-
7000 Numbers Are More Numbers Than We Need
- Overfitting We Can Not Identify a Model
- There are many different models that assign high
probabilities for ER tumors and low
probabilities for ER- tumors in the training set - For a new patient we find among these models some
that support that she is ER and others that
predict she is ER-
Informative Priors
Likelihood Prior
Posterior
Prior Choice
Center
Orientation Not to
wide not to narrow
auto adjusting model hyper-parameters with their
own priors
Assumptions on the model correspond to
assumptions on the diagnosis
orthogonal super-genes
Which Genes Have Driven the Prediction ?
- What are the additional assumptions that came in
by the prior? - The model can not be dominated by only a few
super-genes ( genes! ) - The diagnosis is done based on global changes in
the expression profiles influenced by many genes - The assumptions are neutral with respect to the
individual diagnosis