Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions

Description:

Hollm n, Tikka: Compact and Understandable Descriptions of Mixtures of Bernoulli ... S. Myllykangas, J. Himberg, T. B hling, B. Nagy, J. Hollm n, and S. Knuutila. ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 17
Provided by: jaakko1
Category:

less

Transcript and Presenter's Notes

Title: Compact and Understandable Descriptions of Mixtures of Bernoulli Distributions


1
Compact and Understandable Descriptions of
Mixtures of Bernoulli Distributions
  • Jaakko Hollmén and Jarkko Tikka
  • Helsinki Institute of Information Technology
  • Helsinki University of Technology
  • Espoo, Finland

2
Background on the problem
  • Collaboration Knuutila, Myllykangas at the
    University of Helsinki
  • DNA copy number amplifications are mutations in
    the DNA structure ? cancer
  • Bibliomics survey of 838 journal articles during
    1992-2002
  • Data chromosomal mutations of 4500 cancer
    patients

3
Example on the data collection
S. Myllykangas, J. Himberg, T. Böhling, B. Nagy,
J. Hollmén, and S. Knuutila. DNA copy number
amplification profiling of human neoplasms .
Oncogene, 25(55)7324-7332, November 2006
4
Chromosomal regions names
  • Standardized nomenclature for chromosomal regions
    (spatial)
  • 1p36.2 chromosome 1, the arm p, region 36,
    subregion 2
  • Ranges 1p36.1-p36.3
  • Hierarchical, irregular naming scheme used in
    literature

5
DNA copy number amplification data as 0-1 data
Cancer patients (i)
Chromosomal areas spatial coordinates (j)
6
Mixture models for 0-1 data
  • Cancer is a collection of diseases
  • Finite mixture model of multivariate Bernoulli
    distributions
  • Learn the model with the EM algorithm

7
Model selection how many components in a
mixture?
  • 5-fold cross validation repeated 10 times
  • Try different solutions, based on average
    likelihood for a validation set ? J6

training
validation
8
Mixture model Chromosome 1
Mixture Components j1,...,6
Chromosomal areas (spatial coordinates)
  • Model is summarized by J Jd parameters (about
    200 parameters altogether)

9
Mixture model in clustering
Clustered cancer patients
Chromosomal areas (spatial coordinates)
10
Solution creates a problem
  • We solved the modeling problem, but created a
    communications problem!
  • How do the cancer experts understand and refer to
    our models? Names?

11
Compact and Understandable Descriptions
  • Understandable (language, nomenclature)
  • Compact (size of the description)
  • Describe the parameters of the model
  • Use the model to cluster the data and describe
    the data in the clusters

12
Describe the model parameters
  • Mode of the component distribution
  • most probable chromosomal area
  • Hypothetical mean organism (HMO)
  • quantize the parameters to represent a
    hypothetical case of data

13
Describe the clustered data
  • Describe the margins of the clusters with maximal
    frequent itemsets
  • Why maximal describe the largest representative
    commonality in the data extracting frequent
    itemsets not feasible
  • Express the itemsets as ranges of contiguous
    chromosomal areas

14
Descriptions, Chromosome 1
  • Maximal frequent itemsets extracted globally
    1q21-q22,1q22-q23
  • Shadowing and spurious mutations

15
Amplification models and patterns
1q32-q44, 1q11-q44, 1q21-q25, 1q21-q23,
1p35-p32, 2p15-p14, 2q32, 2p25-2p24,2 p24-p23,
2p25-2p11.1, 3q26.1-q26.3, 3q11.1-q29, 3p26-q29,
3q25-q29, 3p24, 3q27-q29, 4q12, 4p15.3-p12,
5p13-p12, 5p15.3-p11, 5p15.3 5q35, 6q22,
6p25-q27, 6p25-p22, 6p12, 6p25-p11.1, 6q21-q27,
7q3- q36, 7p21, 7p13-p11.2, 7q21 ,7p22-q36,
7p22-p11.1, 8p23-q24.3, 8q24.1-q24.3, 8q23,
8q21.1-q22, 8q21.1-q24.3, 8q11.1-q24.3, 9q11-q34,
9p24 q34, 9q34, 9p24-p21, 10q11.1-q26, 10p15-p12,
11q11-q25, 11p15-q25, 11q23, 11q13, 11q14-q22,
11p12-p11.2, 11q12-q13, 12p13-p11.1, 12q13-q15,
12q11-q21, 12q12-12q23,12q24.1-q24.3, 12p12,
12q14-q15, 13q32-q34, 13p13-q34, 13q13-q14,
13q22-q34, 13q22-q31, 13q11-q34,
14q12-q21, 14q12-q32, 14q32, 15q11.1-q26,
15q24-q25, 16p13.3-q24, 16p13.3-p11.1, 16q22,
16p13.1-p12, 17q11.1-q25, 17p13-11.1, 17q21-q25,
17q12-q21, 17p13-q25, 17q24-q25, 17q22,
18q11.1-q23, 18q21, 18p11.3-18q23, 18p11.3-11.1,
19q13.1, 19p13.3-p13.2, 19p13.3-q13.4,
19q13.1-q13.4, 20q12, 20p12-p11.2, 20q11.1-q13.3,
20p13-q13.3, 20q13.1-q13.3, 20q11.1-q12,
21p13-q22, 21q11.2-q21, 21q21-q22, 21q11.1-q22,
22q11.1-q13, 22q13, 22p13-q13, Xp22.3-q28,
Xp22.1-p11.2, Xq26-q28, Xq11-q28
16
Summary and Conclusions
  • DNA copy number amplifications (mutations) in
    cancer database collected from literature
  • Mixture modeling of 0-1 data
  • Models summarized based on parameters and
    clustered data with maximal frequent itemsets
  • The collection of DNA copy number amplifications
    forms a new basis for cancer classification
Write a Comment
User Comments (0)
About PowerShow.com