Yanxin%20Shi1,%20Fan%20Guo1,%20Wei%20Wu2,%20Eric%20P.%20Xing1 - PowerPoint PPT Presentation

About This Presentation
Title:

Yanxin%20Shi1,%20Fan%20Guo1,%20Wei%20Wu2,%20Eric%20P.%20Xing1

Description:

1 School of Computer Science, Carnegie Mellon University. 2 Division of Pulmonary, Allergy, and Critical Care Medicine, University of Pittsburgh ... – PowerPoint PPT presentation

Number of Views:42
Avg rating:3.0/5.0
Slides: 29
Provided by: jiaz
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Yanxin%20Shi1,%20Fan%20Guo1,%20Wei%20Wu2,%20Eric%20P.%20Xing1


1
GIMscan A New Statistical Method for Analyzing
Whole-Genome Array CGH Data
RECOMB 2007 Presentation
  • Yanxin Shi1, Fan Guo1, Wei Wu2,Eric P. Xing1

1 School of Computer Science, Carnegie Mellon
University2 Division of Pulmonary, Allergy, and
Critical Care Medicine, University of Pittsburgh
2
Outline
  • Motivation and Background
  • Computational framework
  • Experiments and Results
  • Summary

3
Copy number aberration and Array CGH
  • DNA copy number (a.k.a. dosage state)
  • Normal 2 DNA copies
  • Aberrations deletion(0 copy), loss (1 copy),
    gain(3 copies), amplification(gt3 copies)
  • Array CGH a high throughput method to measure
    DNA copy number

4
Array CGH data
Ideally,
Deletion (0 copy) LR log(0/2) Loss (1
copy) LR log(1/2) -1 Normal (2 copies) LR
log(2/2) 0 Gain (3 copies) LR log(3/2)
0.58 Amplification (gt4 copies) LR gt log(4/2)
1
5
However
  • Factors influencing the LR values
  • Impurity of the test sample (e.g. mixture of
    normal and cancer cells)
  • Variations of hybridization efficiency
  • Base compositions of different probes
  • Saturation of array
  • Divergent sequence lengths of the clones
  • Many others
  • Measurement noises, etc

6
Segmental pattern and spatial drift
Spatial drift
Segmental pattern
7
Existing Computational Methods
  • Threshold Method
  • Mixture Models (e.g. Hodgson et al., 2001)
  • Assume observations are iid samples from a
    mixture distribution.
  • Regression Models (e.g., Hsu et al., 2005 Myers
    et al., 2004)
  • Smoothing for visual inspection to detect copy
    number states.
  • Segmentation Models (e.g. Hupé et al., 2004)
  • Directly search for breakpoints in sequential
    data
  • Spatial Dynamics Models (e.g. Fridlyand et al.,
    2004)

8
Spatial Dynamic Methods
  • Hidden Markov Models
  • Dosage states form a Markov chain of hidden
    variables
  • Observed LR ratios are generated from
    state-specific Gaussian distributions

dosage states
LR ratios
9
Dosage-Specific Kalman Filters
  • Introduce hidden trajectory to model
    state-specific LR distributions (no longer fixed
    mean)

Linear Dynamics for dosage state m
10
Switching Kalman Filters
Trajectory 1
Trajectory M
Dosage state chain
  • A SKF generates observations from one of the
    trajectories.

11
Posterior Inference
  • Dosage annotation is equivalent to the estimate
    of the posterior .
  • Recovering the hidden trajectory
    .

12
Variational Inference
  • Posterior Inference is intractable.
  • Variational inference decouple the hidden
    chains.
  • Decoupled chains have tractable distributions.

13
Variational Inference
  • Use this tractable distribution to approximate
    the true distribution by minimizing KL
    divergence.
  • Fixed point equations to update the variational
    parameters.

14
Parameter Sharing
  • The CGH dataset contains whole-genome
    measurements for multiple individuals.
  • Chromosome-specific parameters shared across
    individuals
  • Individual-specific parameters shared across
    chromosomes

trajectory parameters
All other parameters e.g. output noise variance
15
Experiment Design
  • Simulation Analysis
  • Data generated from SKFs.
  • Compare with threshold, HMM.
  • aCGH profiles of 125 colorectal tumors (Nakao et
    al. 2004)
  • Case studies of 3 representative chromosomes.
  • Populational analysis over 125 genomes

16
Simulation Analysis (1)
Performance of dosage state prediction (b
noise in hidden dynamics, r noise in
observation, M5)
17
Simulation Analysis (2)
Prediction by HMM
Synthetic Data
Prediction by SKF
18
Experiment Design
  • Simulation Analysis
  • Data generated from SKFs.
  • Compare with threshold, HMM.
  • aCGH profiles of 125 colorectal tumors (Nakao et
    al. 2004)
  • Case studies of 3 representative chromosomes.
  • Populational analysis over 125 genomes

19
Real aCGH Profile
Spatial Patterns Difficult for Conventional
Methods(1) Flat-Arch Pattern
20
Real aCGH Profile
Spatial Patterns Difficult for Conventional
Methods(2) Step Pattern
21
Real aCGH Profile
Spatial Patterns Difficult for Conventional
Methods(3) Spikes Pattern
22
Populational Analysis
Frequency of dosage state alteration of 125
individuals
red bar copy number gain or
amplification blue bar copy number loss or
deletionsolid vertical lines boundary between
chromosomes
23
Populational Analysis
Frequency of dosage state alteration on 2
chromosomes
top, red square copy number
gain top, blue circle copy
number loss bottom, red square copy
number amplification bottom, blue
circle copy number deletion
24
Summary
  • SKF for whole-genome analysis of aCGH data.
  • SKF can capture variations in the hybridization
    efficiency.
  • Parameter sharing scheme for data integration.
  • Possible Extensions
  • Gene expression concordance analysis
  • Incorporate information about sequence length and
    distance between clones

25
Thank you!
26
Populational Analysis
Detailed spectrum of GIM rates over 125
Colorectal cancer patients in 4 hotspots region
with annotation of cancer related gene
27
  • M is selected by AIC.
  • We also have done experiments to compare SKF with
    segmentation methods (result now shown here).

28
Switching Kalman Filters
  • A SKF generates observations from one of the
    trajectories.
  • is the switching process as in an
    HMM.
  • are observed LR ratios.
Write a Comment
User Comments (0)
About PowerShow.com