Modelling of CGH arrays experiments - PowerPoint PPT Presentation

About This Presentation

Title:

Modelling of CGH arrays experiments

Description:

The development of solid tumors is associated with the acquisition ... Aim: study genomic alterations in oncology. 4. 1. Extraction - DNA. 2. Labelling (fluo) ... – PowerPoint PPT presentation

Number of Views:32

Avg rating:3.0/5.0

Slides: 30

Provided by: richar509

Category:

more less

Transcript and Presenter's Notes

Title: Modelling of CGH arrays experiments

1
Modelling of CGH arrays experiments

Philippe Broët
Faculté de Médecine,
Université de Paris-XI

Sylvia Richardson
Imperial College
London

CGH Competitive Genomic Hybridization
2
Outline

Background
Mixture model with spatial allocations
Performance, comparison with CGH-Miner
Analyses of CGH-array cancer data sets
Extensions

3
Aim study genomic alterations in oncology
Loss
Gain
Tumor supressor gene
Oncogene
The development of solid tumors is associated
with the acquisition of complex genetic
alterations that modify normal cell growth and
survival. Many of these changes involve gains
and/or losses of parts of the genome
Amplification of an oncogene or deletion of a
tumor suppressor gene are considered as important
mechanisms for tumorigenesis.
4

CGH Competitive Genomic hybridization
Array containing short sequences of DNA bound to
glass slide
Fluorescein-labeled normal and pathologic
samples co-hybridised to the array

1. Extraction
- DNA
2. Labelling (fluo)
3. Co-hybridization
4. Scanning

Once hybridization has been performed, the
signal intensities of the fluorophores is
quantified

Provides a means to quantitatively measure DNA
copy-number alterations and to map them directly
onto genomic sequence
6
MCF7 cell line investigated in Pollack et al
(2002) 23 chromosomes and 6691 cDNA sequences
Data log transformed Difference bet. MCF7 and
reference
7
Types of alterations observed

(Single) Gain or Deletion of sequences, occurring
for contiguous regions
Low level changes in the ratio log2
but attenuation (dye bias) ? ratio 0.4
Multiple gains (small regions)
High level change, easy to pick up
Focus the modelling on the first common type
of alterations

8
Chromosome 1
Multiple gains ?
Deletion?
Normal?
9
2 -- Mixture model
10
Specificity of CGH array experiment

A priori biological knowledge from conventional
CGH
Limited number of states for a genomic sequence
- presence (modal), - deletion, - gain(s)
corresponding to different intensity ratios on
the array
Mixture model to capture the underlying
discrete states
GS located contiguously on chromosomes are likely
to carry alterations of the same type
Use clone spatial location in the allocation
model

3 component mixture model with spatial allocation
11
Mixture model
For chromosome k Zgk log ratio of measurement
of normal versus tumoral change, genomic
sequence (GS) g, chromosome k Dye bias is
estimated by using a reference array
(normal/normal) and then subtracting the bias
from Zgk Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
w3gkN(µ3 ,?32)
3gain
2presence
1deletion
For unique labelling µ1 lt 0 , µ3 gt 0 µ2 0
(dye bias has been adjusted)
12
Mixture model with spatial allocation

Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
w3gkN(µ3 ,?32)
Spatial structure on the weights (c.f. Fernandez
and Green, 2002)
Introduce 3 centred Markov random fields
umgk, m 1, 2, 3 with nearest neighbours
along the chromosomes

Define mixture proportions to depend on the
chromosomic location via a logistic model
wcgk exp(ucgk) / Sm exp(umgk)
favours allocation of nearby GS to same component

13
Prior structure

wcgk exp(ucgk) / Sm exp(umgk)
with Gaussian Conditional AutoRegressive model
ucgk uc-gk N (?h uc hk /ng , sck2/ng)
for h neighbour of g (ng h, one or two in
this simple case), with constraint ?g uc gk 0
Variance parameters sck2 of the CAR acts as a
smoothing prior ? indexed by the chromosome
switching structure between the states can be
different between chromosomes
Mean and variances (µc ,?c2 ) of the mixture
components are common to all chromosomes ?
borrowing information
Inverse gamma priors for the variances, uniform
priors for the means

14
Posterior quantities of interest

Bayesian inference via MCMC, implemented using
Winbugs
In particular, latent allocations, Lgk , of GS g
on chromosome k to state c, are sampled during
the MCMC run
Compute posterior allocation probabilities
pcgk P(Lgk c data), c 1,2,3
Probabilistic classification of each GS using
threshold
on pcgk
-- Assign g to modified state deletion (c1)
or gain (c3) if corresponding pcgk gt 0.8,
-- Otherwise allocate to modal state.
Subset S of genomic sequences classified as
modified
(this subset depends on the chosen threshold)

15
False Discovery Rate

Using the posterior allocation probabilities, can
compute an estimate of FDR for the list S
Bayes FDR (S) data 1/card(S) Sg ? S p2gk
where p2gk is posterior probability of allocation
to the modal (c2) state
Note Can adjust the threshold to get a desired
FDR and vice versa

16
3 -- Performance
17
Simulation set-up

200 fake GS with
Z N(0 ,.32) , modal
Z N(log 2 ,.32) , deletion, a block of 30 GS
Z N(- log 2 ,.32), gains, blocks of 20 and 10
GS
Reference array with Z N(0 ,.32)
50 replications

18
CGH-Miner

Data mining approach to select gain and losses
(Wang et al 2005)
Hierarchical clustering with a spatial constraint
(ie only spatially adjacent clusters are joined)
Subtree selection according to predefined rules
? focus on selecting large consistent gain/loss
regions and small (big spike) regions
Implemented in CGH-Miner Excel plug in
Estimation of FDR using a reference
(normal/normal) array and the same set of rules
to prune the tree. Declared target 1
Simulation set-up is similar to Wang et al.

19
Classification obtained by CGH miner and CGH mix
Gain
Gain
Mod
Modal
Deletion
Modal
Modal
30
10
20
20
Posterior probabilities of allocation to the 3
components
21
Comparative performance between CGHmix and
CGH-Miner
50 simulations CGHmix CGH-Miner
Realised false positive (mean) 1.9 16.4
Realised false positive (range) 0 -- 20 3 -- 39
Realised false negative (mean) 1.0 9.6
Realised false negative (range) 0 -- 4 0 -- 50
Realised FDR () 2.8 23.7
Estimated FDR () 1.3 1.2
22
4 -- Analyses of CGH-array cancer data sets
23
Breast cancer cell line MCF7

Data from Pollack et al., 6691 GS on 23
chromosomes
µ1 -0.35, ?1 0.37
(µ2 0) ?2 0.27
µ3 0.44, ?3 0.54
Estimated FDR CGHmix 2.6
Estimated FDR CGH-Miner 1.5

24
(No Transcript)
25
Classification of GS obtained by CGHmix
26
known alterations found by both methods
additional known Alterations found by CGHmix
27
Neuroblastoma KCNR cell lineCurie Institute CGH
custom array for chromosome 1

190 genomic clones, mostly on the short arm
3 replicate spots for each
µ1 - 0.49, loss component
µ3 0.04, not plausible ? no gain in this case
Estimate FDR by regrouping c2 and c3 classes
Substantial number of deletions on short arm
No deletion found for the long arm by CGHmix, a
result confirmed by classical cytogenetic
information

28
Long arm
29
Extensions

Account for variability in the case of repeated
measurement
? add a measurement model with GS specific
noise, with exchangeable prior
Refine the spatial model
Incorporate genomic sequence location in the
neighbourhood definition of the CAR model
0-1 contiguity ? spatial weights
In particular, account for overlapping sequences
by using weights that depend on the overlap