Microarrays: algorithms for knowledge discovery in oncology and molecular biology

About This Presentation

Title:

Microarrays: algorithms for knowledge discovery in oncology and molecular biology

Description:

Microarrays: algorithms for knowledge discovery in oncology and molecular biology Frank De Smet Katholieke Universiteit Leuven Faculteit Toegepaste Wetenschappen – PowerPoint PPT presentation

Number of Views:139

Avg rating:3.0/5.0

Slides: 27

Provided by: Frank651

Category:

more less

Transcript and Presenter's Notes

Title: Microarrays: algorithms for knowledge discovery in oncology and molecular biology

1
Microarrays algorithms for knowledge discovery
in oncology and molecular biology

Frank De Smet
Katholieke Universiteit Leuven
Faculteit Toegepaste Wetenschappen
Departement Elektrotechniek (ESAT)
Promotor Prof. dr. ir. B. De Moor

2
Overview

Introduction basic concepts of microarray data
Feature extraction
Univariate analysis
Multivariate analysis PCA
Classification
Clustering
Conclusions and future research

Introduction Feature extraction Classification
Clustering Conclusions
3
Transcription - Translation
Introduction Feature extraction Classification
Clustering Conclusions
4
Microarrays
Introduction Feature extraction Classification
Clustering Conclusions
5
Importance

Clinical (oncology)
Clinical management of cancer is in many cases
empirical and not all information that is
clinically relevant can be extracted using the
data that physicians have access to
Fundamental mechanisms behind carcinogenesis are
not always taken into account
But
Expression patterns measured with microarrays in
malignant cells reflect the phenotype of the
tumour
Molecular biology
Study of the expression behaviour of genes can
help to determine their biological role or
function

Introduction Feature extraction Classification
Clustering Conclusions
6
Data-mining framework
Introduction Feature extraction Classification
Clustering Conclusions
7
Expression matrix
Introduction Feature extraction Classification
Clustering Conclusions

Microarray experiments

8
Univariate analysis in microarray data
Introduction Feature extraction Classification
Clustering Conclusions
9
Multiple testing
Introduction Feature extraction Classification
Clustering Conclusions

Overlap of the p-values of the genes with and
without actual differential expression Type I
and II errors
In literature control of the Type I error too
conservative for microarray data
Here balance of Type I and II error

10
Estimation of Type I and II error
Introduction Feature extraction Classification
Clustering Conclusions
11
Calculations
Introduction Feature extraction Classification
Clustering Conclusions
12
ROC curve

Optimal balance between Type I and II errors
Area under the curve
Quantifies how well the genes whose expression is
and is not affected by the difference between
conditions can be discriminated using their
p-values
Quality measure for microarray data

Introduction Feature extraction Classification
Clustering Conclusions
13
Example Acute leukemia
Introduction Feature extraction Classification
Clustering Conclusions
14
Multivariate analysis in microarray
dataPrincipal Component Analysis
Introduction Feature extraction Classification
Clustering Conclusions
Unsupervised
15
Classification
Introduction Feature extraction Classification
Clustering Conclusions
Unsupervised
16
Clustering gene expression profiles

Importance
Identification of groups of coexpressed genes
Have a higher probability of having similar
biological functions e.g., might interact with
the same transcription factors (coregulation)
First generation algorithms disadvantages
Parameter fine-tuning
Assign each profile to a cluster
Computational complexity

Introduction Feature extraction Classification
Clustering Conclusions
17
Quality-based clustering (Heyer et al.)

Algorithm produces clusters with
a quality guarantee (fixed and user-defined
threshold for diameter D)
with a maximum number of profiles

Introduction Feature extraction Classification
Clustering Conclusions
D
? Still some disadvantages !
18
Adaptive quality-based clustering (AQBC)

A heuristic iterative two-step approach
Step 1 Quality-based approach
Find a cluster center in an area of the data set
where the density of expression profiles, within
a sphere with preliminary radius, is locally
maximal
Step 2 Adaptive approach
Re-estimation of the radius

Introduction Feature extraction Classification
Clustering Conclusions
19
Step 1 Localization of a cluster center
R
Introduction Feature extraction Classification
Clustering Conclusions
20
Step 2 Re-calculation of the radius
Introduction Feature extraction Classification
Clustering Conclusions
21
Comparison
Introduction Feature extraction Classification
Clustering Conclusions
22
Validation
Introduction Feature extraction Classification
Clustering Conclusions
23
Availability
Introduction Feature extraction Classification
Clustering Conclusions
24
Conclusions

Data-mining framework for microarray data
Feature extraction
Univariate analysis
Estimation of n1 and n0
ROC curves optimal balance between Type I and II
error quality measure
Multivariate analysis PCA
Classification FDA and LS-SVM
Clustering
Microarray experiments
Gene expression profiles AQBC
Clinical data

Introduction Feature extraction Classification
Clustering Conclusions
25
Selected publications

De Smet, F., Marchal, K., Timmerman, D., Vergote,
I., De Moor, B. and Moreau, Y. (2001) Gebruik van
microroosters in de klinische oncologie, Tijdschr
voor Geneeskunde, 57, 1225-1236.
De Smet, F., Mathys, J., Marchal, K., Thijs, G.,
De Moor, B. and Moreau Y. (2002) Adaptive
quality-based clustering of gene expression
profiles. Bioinformatics, 18, 735-746.
Moreau, Y., De Smet, F., Thijs, G., Marchal, K.
and De Moor, B. (2002) Functional bioinformatics
of microarray data from expression to
regulation. Proceedings of the IEEE, 90,
1722-1743.
De Smet, F., Moreau, Y., Tmmerman, D., Vergote,
I. and De Moor, B. (2004) Balancing false
positives and false negatives for the detection
of differential expression in malignancies. Br J
Cancer, submitted.
Epstein, E., Skoog, L., Isberg, P.E., De Smet,
F., De Moor, B., Olofsson, P.A., Gudmundsson, S.
and Valentin, L. (2002) An algorithm including
results of gray-scale and power Doppler
ultrasound examination to predict endometrial
malignancy in women with postmenopausal bleeding.
Ultrasound Obstet Gynecol, 20, 370-376.