Title: Microarrays: algorithms for knowledge discovery in oncology and molecular biology
1Microarrays algorithms for knowledge discovery
in oncology and molecular biology
- Frank De Smet
- Katholieke Universiteit Leuven
- Faculteit Toegepaste Wetenschappen
- Departement Elektrotechniek (ESAT)
- Promotor Prof. dr. ir. B. De Moor
2Overview
- Introduction basic concepts of microarray data
- Feature extraction
- Univariate analysis
- Multivariate analysis PCA
- Classification
- Clustering
- Conclusions and future research
Introduction Feature extraction Classification
Clustering Conclusions
3Transcription - Translation
Introduction Feature extraction Classification
Clustering Conclusions
4Microarrays
Introduction Feature extraction Classification
Clustering Conclusions
5Importance
- Clinical (oncology)
- Clinical management of cancer is in many cases
empirical and not all information that is
clinically relevant can be extracted using the
data that physicians have access to - Fundamental mechanisms behind carcinogenesis are
not always taken into account - But
- Expression patterns measured with microarrays in
malignant cells reflect the phenotype of the
tumour - Molecular biology
- Study of the expression behaviour of genes can
help to determine their biological role or
function
Introduction Feature extraction Classification
Clustering Conclusions
6Data-mining framework
Introduction Feature extraction Classification
Clustering Conclusions
7Expression matrix
Introduction Feature extraction Classification
Clustering Conclusions
Microarray experiments
8Univariate analysis in microarray data
Introduction Feature extraction Classification
Clustering Conclusions
9Multiple testing
Introduction Feature extraction Classification
Clustering Conclusions
- Overlap of the p-values of the genes with and
without actual differential expression Type I
and II errors - In literature control of the Type I error too
conservative for microarray data - Here balance of Type I and II error
10Estimation of Type I and II error
Introduction Feature extraction Classification
Clustering Conclusions
11Calculations
Introduction Feature extraction Classification
Clustering Conclusions
12ROC curve
- Optimal balance between Type I and II errors
- Area under the curve
- Quantifies how well the genes whose expression is
and is not affected by the difference between
conditions can be discriminated using their
p-values - Quality measure for microarray data
Introduction Feature extraction Classification
Clustering Conclusions
13Example Acute leukemia
Introduction Feature extraction Classification
Clustering Conclusions
14Multivariate analysis in microarray
dataPrincipal Component Analysis
Introduction Feature extraction Classification
Clustering Conclusions
Unsupervised
15Classification
Introduction Feature extraction Classification
Clustering Conclusions
Unsupervised
16Clustering gene expression profiles
- Importance
- Identification of groups of coexpressed genes
- Have a higher probability of having similar
biological functions e.g., might interact with
the same transcription factors (coregulation) - First generation algorithms disadvantages
- Parameter fine-tuning
- Assign each profile to a cluster
- Computational complexity
Introduction Feature extraction Classification
Clustering Conclusions
17Quality-based clustering (Heyer et al.)
- Algorithm produces clusters with
- a quality guarantee (fixed and user-defined
threshold for diameter D) - with a maximum number of profiles
Introduction Feature extraction Classification
Clustering Conclusions
D
? Still some disadvantages !
18Adaptive quality-based clustering (AQBC)
- A heuristic iterative two-step approach
- Step 1 Quality-based approach
- Find a cluster center in an area of the data set
where the density of expression profiles, within
a sphere with preliminary radius, is locally
maximal - Step 2 Adaptive approach
- Re-estimation of the radius
Introduction Feature extraction Classification
Clustering Conclusions
19Step 1 Localization of a cluster center
R
Introduction Feature extraction Classification
Clustering Conclusions
20Step 2 Re-calculation of the radius
Introduction Feature extraction Classification
Clustering Conclusions
21Comparison
Introduction Feature extraction Classification
Clustering Conclusions
22Validation
Introduction Feature extraction Classification
Clustering Conclusions
23Availability
Introduction Feature extraction Classification
Clustering Conclusions
24Conclusions
- Data-mining framework for microarray data
- Feature extraction
- Univariate analysis
- Estimation of n1 and n0
- ROC curves optimal balance between Type I and II
error quality measure - Multivariate analysis PCA
- Classification FDA and LS-SVM
- Clustering
- Microarray experiments
- Gene expression profiles AQBC
- Clinical data
Introduction Feature extraction Classification
Clustering Conclusions
25Selected publications
- De Smet, F., Marchal, K., Timmerman, D., Vergote,
I., De Moor, B. and Moreau, Y. (2001) Gebruik van
microroosters in de klinische oncologie, Tijdschr
voor Geneeskunde, 57, 1225-1236. - De Smet, F., Mathys, J., Marchal, K., Thijs, G.,
De Moor, B. and Moreau Y. (2002) Adaptive
quality-based clustering of gene expression
profiles. Bioinformatics, 18, 735-746. - Moreau, Y., De Smet, F., Thijs, G., Marchal, K.
and De Moor, B. (2002) Functional bioinformatics
of microarray data from expression to
regulation. Proceedings of the IEEE, 90,
1722-1743. - De Smet, F., Moreau, Y., Tmmerman, D., Vergote,
I. and De Moor, B. (2004) Balancing false
positives and false negatives for the detection
of differential expression in malignancies. Br J
Cancer, submitted. - Epstein, E., Skoog, L., Isberg, P.E., De Smet,
F., De Moor, B., Olofsson, P.A., Gudmundsson, S.
and Valentin, L. (2002) An algorithm including
results of gray-scale and power Doppler
ultrasound examination to predict endometrial
malignancy in women with postmenopausal bleeding.
Ultrasound Obstet Gynecol, 20, 370-376.
Introduction Feature extraction Classification
Clustering Conclusions
26Future research
- Specific
- Ovarian cancer transcriptomics
- Prediction of chemosensitivity in stage III
- Prediction of recurrence in stage I
- Endometriosis proteomics and transcriptomics
- Detection of endometriosis
- Prediction of relapse after surgery
- General
- Microarrays number of patients - validation -
standardization - Proteomics
- Combination and comparison of microarray,
proteomic and clinical data
Introduction Feature extraction Classification
Clustering Conclusions