Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

About This Presentation

Title:

Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

Description:

Data: chromosomal mutations of 4500 cancer patients ... The collection of DNA copy number amplifications forms a new basis for cancer classification ... – PowerPoint PPT presentation

Number of Views:102

Avg rating:3.0/5.0

Slides: 17

Provided by: jaakko1

Category:

more less

Transcript and Presenter's Notes

Title: Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

1
Compact and Understandable Descriptions of
Mixtures of Bernoulli Distributions

Jaakko Hollmén and Jarkko Tikka
Helsinki Institute of Information Technology
Helsinki University of Technology
Espoo, Finland

2
Background on the problem

Collaboration Knuutila, Myllykangas at the
University of Helsinki
DNA copy number amplifications are mutations in
the DNA structure ? cancer
Bibliomics survey of 838 journal articles during
1992-2002
Data chromosomal mutations of 4500 cancer
patients

3
Example on the data collection
S. Myllykangas, J. Himberg, T. Böhling, B. Nagy,
J. Hollmén, and S. Knuutila. DNA copy number
amplification profiling of human neoplasms .
Oncogene, 25(55)7324-7332, November 2006
4
Chromosomal regions names

Standardized nomenclature for chromosomal regions
(spatial)
1p36.2 chromosome 1, the arm p, region 36,
subregion 2
Ranges 1p36.1-p36.3
Hierarchical, irregular naming scheme used in
literature

5
DNA copy number amplification data as 0-1 data
Cancer patients (i)
Chromosomal areas spatial coordinates (j)
6
Mixture models for 0-1 data

Cancer is a collection of diseases
Finite mixture model of multivariate Bernoulli
distributions

Learn the model with the EM algorithm

7
Model selection how many components in a
mixture?

5-fold cross validation repeated 10 times
Try different solutions, based on average
likelihood for a validation set ? J6

training
validation
8
Mixture model Chromosome 1
Mixture Components j1,...,6
Chromosomal areas (spatial coordinates)

Model is summarized by J Jd parameters (about
200 parameters altogether)

9
Mixture model in clustering
Clustered cancer patients
Chromosomal areas (spatial coordinates)
10
Solution creates a problem

We solved the modeling problem, but created a
communications problem!
How do the cancer experts understand and refer to
our models? Names?

11
Compact and Understandable Descriptions

Understandable (language, nomenclature)
Compact (size of the description)
Describe the parameters of the model
Use the model to cluster the data and describe
the data in the clusters

12
Describe the model parameters

Mode of the component distribution
most probable chromosomal area
Hypothetical mean organism (HMO)
quantize the parameters to represent a
hypothetical case of data

13
Describe the clustered data

Describe the margins of the clusters with maximal
frequent itemsets
Why maximal describe the largest representative
commonality in the data extracting frequent
itemsets not feasible
Express the itemsets as ranges of contiguous
chromosomal areas

14
Descriptions, Chromosome 1

Maximal frequent itemsets extracted globally
1q21-q22,1q22-q23
Shadowing and spurious mutations

15
Amplification models and patterns
1q32-q44, 1q11-q44, 1q21-q25, 1q21-q23,
1p35-p32, 2p15-p14, 2q32, 2p25-2p24,2 p24-p23,
2p25-2p11.1, 3q26.1-q26.3, 3q11.1-q29, 3p26-q29,
3q25-q29, 3p24, 3q27-q29, 4q12, 4p15.3-p12,
5p13-p12, 5p15.3-p11, 5p15.3 5q35, 6q22,
6p25-q27, 6p25-p22, 6p12, 6p25-p11.1, 6q21-q27,
7q3- q36, 7p21, 7p13-p11.2, 7q21 ,7p22-q36,
7p22-p11.1, 8p23-q24.3, 8q24.1-q24.3, 8q23,
8q21.1-q22, 8q21.1-q24.3, 8q11.1-q24.3, 9q11-q34,
9p24 q34, 9q34, 9p24-p21, 10q11.1-q26, 10p15-p12,
11q11-q25, 11p15-q25, 11q23, 11q13, 11q14-q22,
11p12-p11.2, 11q12-q13, 12p13-p11.1, 12q13-q15,
12q11-q21, 12q12-12q23,12q24.1-q24.3, 12p12,
12q14-q15, 13q32-q34, 13p13-q34, 13q13-q14,
13q22-q34, 13q22-q31, 13q11-q34,
14q12-q21, 14q12-q32, 14q32, 15q11.1-q26,
15q24-q25, 16p13.3-q24, 16p13.3-p11.1, 16q22,
16p13.1-p12, 17q11.1-q25, 17p13-11.1, 17q21-q25,
17q12-q21, 17p13-q25, 17q24-q25, 17q22,
18q11.1-q23, 18q21, 18p11.3-18q23, 18p11.3-11.1,
19q13.1, 19p13.3-p13.2, 19p13.3-q13.4,
19q13.1-q13.4, 20q12, 20p12-p11.2, 20q11.1-q13.3,
20p13-q13.3, 20q13.1-q13.3, 20q11.1-q12,
21p13-q22, 21q11.2-q21, 21q21-q22, 21q11.1-q22,
22q11.1-q13, 22q13, 22p13-q13, Xp22.3-q28,
Xp22.1-p11.2, Xq26-q28, Xq11-q28
16
Summary and Conclusions

DNA copy number amplifications (mutations) in
cancer database collected from literature
Mixture modeling of 0-1 data
Models summarized based on parameters and
clustered data with maximal frequent itemsets
The collection of DNA copy number amplifications
forms a new basis for cancer classification

Write a Comment

User Comments (0)