Title: Modelling of CGH arrays experiments
1Modelling of CGH arrays experiments
- Philippe Broët
- Faculté de Médecine,
- Université de Paris-XI
- Sylvia Richardson
- Imperial College
- London
CGH Competitive Genomic Hybridization
2Outline
- Background
- Mixture model with spatial allocations
- Performance, comparison with CGH-Miner
- Analyses of CGH-array cancer data sets
- Extensions
3Aim study genomic alterations in oncology
Loss
Gain
Tumor supressor gene
Oncogene
The development of solid tumors is associated
with the acquisition of complex genetic
alterations that modify normal cell growth and
survival. Many of these changes involve gains
and/or losses of parts of the genome
Amplification of an oncogene or deletion of a
tumor suppressor gene are considered as important
mechanisms for tumorigenesis.
4- CGH Competitive Genomic hybridization
- Array containing short sequences of DNA bound to
glass slide - Fluorescein-labeled normal and pathologic
samples co-hybridised to the array
- 1. Extraction
- - DNA
-
- 2. Labelling (fluo)
- 3. Co-hybridization
- 4. Scanning
5- Once hybridization has been performed, the
signal intensities of the fluorophores is
quantified
Provides a means to quantitatively measure DNA
copy-number alterations and to map them directly
onto genomic sequence
6MCF7 cell line investigated in Pollack et al
(2002) 23 chromosomes and 6691 cDNA sequences
Data log transformed Difference bet. MCF7 and
reference
7Types of alterations observed
- (Single) Gain or Deletion of sequences, occurring
for contiguous regions - Low level changes in the ratio log2
- but attenuation (dye bias) ? ratio 0.4
- Multiple gains (small regions)
- High level change, easy to pick up
- Focus the modelling on the first common type
of alterations
8Chromosome 1
Multiple gains ?
Deletion?
Normal?
92 -- Mixture model
10Specificity of CGH array experiment
- A priori biological knowledge from conventional
CGH - Limited number of states for a genomic sequence
- - presence (modal), - deletion, - gain(s)
- corresponding to different intensity ratios on
the array - Mixture model to capture the underlying
discrete states - GS located contiguously on chromosomes are likely
to carry alterations of the same type - Use clone spatial location in the allocation
model
3 component mixture model with spatial allocation
11Mixture model
For chromosome k Zgk log ratio of measurement
of normal versus tumoral change, genomic
sequence (GS) g, chromosome k Dye bias is
estimated by using a reference array
(normal/normal) and then subtracting the bias
from Zgk Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
w3gkN(µ3 ,?32)
3gain
2presence
1deletion
For unique labelling µ1 lt 0 , µ3 gt 0 µ2 0
(dye bias has been adjusted)
12Mixture model with spatial allocation
- Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
w3gkN(µ3 ,?32) - Spatial structure on the weights (c.f. Fernandez
and Green, 2002) - Introduce 3 centred Markov random fields
umgk, m 1, 2, 3 with nearest neighbours
along the chromosomes
- Define mixture proportions to depend on the
chromosomic location via a logistic model - wcgk exp(ucgk) / Sm exp(umgk)
- favours allocation of nearby GS to same component
13Prior structure
- wcgk exp(ucgk) / Sm exp(umgk)
- with Gaussian Conditional AutoRegressive model
- ucgk uc-gk N (?h uc hk /ng , sck2/ng)
- for h neighbour of g (ng h, one or two in
this simple case), with constraint ?g uc gk 0 - Variance parameters sck2 of the CAR acts as a
smoothing prior ? indexed by the chromosome - switching structure between the states can be
different between chromosomes - Mean and variances (µc ,?c2 ) of the mixture
components are common to all chromosomes ?
borrowing information - Inverse gamma priors for the variances, uniform
priors for the means
14Posterior quantities of interest
- Bayesian inference via MCMC, implemented using
Winbugs - In particular, latent allocations, Lgk , of GS g
on chromosome k to state c, are sampled during
the MCMC run - Compute posterior allocation probabilities
- pcgk P(Lgk c data), c 1,2,3
- Probabilistic classification of each GS using
threshold - on pcgk
- -- Assign g to modified state deletion (c1)
or gain (c3) if corresponding pcgk gt 0.8, - -- Otherwise allocate to modal state.
- Subset S of genomic sequences classified as
modified - (this subset depends on the chosen threshold)
15False Discovery Rate
- Using the posterior allocation probabilities, can
compute an estimate of FDR for the list S - Bayes FDR (S) data 1/card(S) Sg ? S p2gk
- where p2gk is posterior probability of allocation
to the modal (c2) state -
- Note Can adjust the threshold to get a desired
FDR and vice versa
163 -- Performance
17Simulation set-up
- 200 fake GS with
- Z N(0 ,.32) , modal
- Z N(log 2 ,.32) , deletion, a block of 30 GS
- Z N(- log 2 ,.32), gains, blocks of 20 and 10
GS - Reference array with Z N(0 ,.32)
- 50 replications
18CGH-Miner
- Data mining approach to select gain and losses
(Wang et al 2005) - Hierarchical clustering with a spatial constraint
- (ie only spatially adjacent clusters are joined)
- Subtree selection according to predefined rules
- ? focus on selecting large consistent gain/loss
regions and small (big spike) regions - Implemented in CGH-Miner Excel plug in
- Estimation of FDR using a reference
(normal/normal) array and the same set of rules
to prune the tree. Declared target 1 - Simulation set-up is similar to Wang et al.
19Classification obtained by CGH miner and CGH mix
Gain
Gain
Mod
Modal
Deletion
Modal
Modal
30
10
20
20Posterior probabilities of allocation to the 3
components
21Comparative performance between CGHmix and
CGH-Miner
50 simulations CGHmix CGH-Miner
Realised false positive (mean) 1.9 16.4
Realised false positive (range) 0 -- 20 3 -- 39
Realised false negative (mean) 1.0 9.6
Realised false negative (range) 0 -- 4 0 -- 50
Realised FDR () 2.8 23.7
Estimated FDR () 1.3 1.2
224 -- Analyses of CGH-array cancer data sets
23Breast cancer cell line MCF7
- Data from Pollack et al., 6691 GS on 23
chromosomes - µ1 -0.35, ?1 0.37
- (µ2 0) ?2 0.27
- µ3 0.44, ?3 0.54
- Estimated FDR CGHmix 2.6
- Estimated FDR CGH-Miner 1.5
24(No Transcript)
25Classification of GS obtained by CGHmix
26known alterations found by both methods
additional known Alterations found by CGHmix
27Neuroblastoma KCNR cell lineCurie Institute CGH
custom array for chromosome 1
- 190 genomic clones, mostly on the short arm
- 3 replicate spots for each
- µ1 - 0.49, loss component
- µ3 0.04, not plausible ? no gain in this case
- Estimate FDR by regrouping c2 and c3 classes
- Substantial number of deletions on short arm
- No deletion found for the long arm by CGHmix, a
result confirmed by classical cytogenetic
information
28Long arm
29Extensions
- Account for variability in the case of repeated
measurement - ? add a measurement model with GS specific
noise, with exchangeable prior - Refine the spatial model
- Incorporate genomic sequence location in the
neighbourhood definition of the CAR model - 0-1 contiguity ? spatial weights
- In particular, account for overlapping sequences
by using weights that depend on the overlap