Modelling of CGH arrays experiments - PowerPoint PPT Presentation

About This Presentation
Title:

Modelling of CGH arrays experiments

Description:

The development of solid tumors is associated with the acquisition ... Aim: study genomic alterations in oncology. 4. 1. Extraction - DNA. 2. Labelling (fluo) ... – PowerPoint PPT presentation

Number of Views:32
Avg rating:3.0/5.0
Slides: 30
Provided by: richar509
Category:

less

Transcript and Presenter's Notes

Title: Modelling of CGH arrays experiments


1
Modelling of CGH arrays experiments
  • Philippe Broët
  • Faculté de Médecine,
  • Université de Paris-XI
  • Sylvia Richardson
  • Imperial College
  • London

CGH Competitive Genomic Hybridization
2
Outline
  • Background
  • Mixture model with spatial allocations
  • Performance, comparison with CGH-Miner
  • Analyses of CGH-array cancer data sets
  • Extensions

3
Aim study genomic alterations in oncology
Loss
Gain
Tumor supressor gene
Oncogene
The development of solid tumors is associated
with the acquisition of complex genetic
alterations that modify normal cell growth and
survival. Many of these changes involve gains
and/or losses of parts of the genome
Amplification of an oncogene or deletion of a
tumor suppressor gene are considered as important
mechanisms for tumorigenesis.
4
  • CGH Competitive Genomic hybridization
  • Array containing short sequences of DNA bound to
    glass slide
  • Fluorescein-labeled normal and pathologic
    samples co-hybridised to the array
  • 1. Extraction
  • - DNA
  • 2. Labelling (fluo)
  • 3. Co-hybridization
  • 4. Scanning

5
  • Once hybridization has been performed, the
    signal intensities of the fluorophores is
    quantified

Provides a means to quantitatively measure DNA
copy-number alterations and to map them directly
onto genomic sequence
6
MCF7 cell line investigated in Pollack et al
(2002) 23 chromosomes and 6691 cDNA sequences
Data log transformed Difference bet. MCF7 and
reference
7
Types of alterations observed
  • (Single) Gain or Deletion of sequences, occurring
    for contiguous regions
  • Low level changes in the ratio log2
  • but attenuation (dye bias) ? ratio 0.4
  • Multiple gains (small regions)
  • High level change, easy to pick up
  • Focus the modelling on the first common type
    of alterations

8
Chromosome 1
Multiple gains ?
Deletion?
Normal?
9
2 -- Mixture model
10
Specificity of CGH array experiment
  • A priori biological knowledge from conventional
    CGH
  • Limited number of states for a genomic sequence
  • - presence (modal), - deletion, - gain(s)
  • corresponding to different intensity ratios on
    the array
  • Mixture model to capture the underlying
    discrete states
  • GS located contiguously on chromosomes are likely
    to carry alterations of the same type
  • Use clone spatial location in the allocation
    model

3 component mixture model with spatial allocation
11
Mixture model
For chromosome k Zgk log ratio of measurement
of normal versus tumoral change, genomic
sequence (GS) g, chromosome k Dye bias is
estimated by using a reference array
(normal/normal) and then subtracting the bias
from Zgk Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
w3gkN(µ3 ,?32)
3gain
2presence
1deletion
For unique labelling µ1 lt 0 , µ3 gt 0 µ2 0
(dye bias has been adjusted)
12
Mixture model with spatial allocation
  • Zgk ? w1gkN(µ1 ,?12) w2gkN(µ2 ,?22)
    w3gkN(µ3 ,?32)
  • Spatial structure on the weights (c.f. Fernandez
    and Green, 2002)
  • Introduce 3 centred Markov random fields
    umgk, m 1, 2, 3 with nearest neighbours
    along the chromosomes
  • Define mixture proportions to depend on the
    chromosomic location via a logistic model
  • wcgk exp(ucgk) / Sm exp(umgk)
  • favours allocation of nearby GS to same component

13
Prior structure
  • wcgk exp(ucgk) / Sm exp(umgk)
  • with Gaussian Conditional AutoRegressive model
  • ucgk uc-gk N (?h uc hk /ng , sck2/ng)
  • for h neighbour of g (ng h, one or two in
    this simple case), with constraint ?g uc gk 0
  • Variance parameters sck2 of the CAR acts as a
    smoothing prior ? indexed by the chromosome
  • switching structure between the states can be
    different between chromosomes
  • Mean and variances (µc ,?c2 ) of the mixture
    components are common to all chromosomes ?
    borrowing information
  • Inverse gamma priors for the variances, uniform
    priors for the means

14
Posterior quantities of interest
  • Bayesian inference via MCMC, implemented using
    Winbugs
  • In particular, latent allocations, Lgk , of GS g
    on chromosome k to state c, are sampled during
    the MCMC run
  • Compute posterior allocation probabilities
  • pcgk P(Lgk c data), c 1,2,3
  • Probabilistic classification of each GS using
    threshold
  • on pcgk
  • -- Assign g to modified state deletion (c1)
    or gain (c3) if corresponding pcgk gt 0.8,
  • -- Otherwise allocate to modal state.
  • Subset S of genomic sequences classified as
    modified
  • (this subset depends on the chosen threshold)

15
False Discovery Rate
  • Using the posterior allocation probabilities, can
    compute an estimate of FDR for the list S
  • Bayes FDR (S) data 1/card(S) Sg ? S p2gk
  • where p2gk is posterior probability of allocation
    to the modal (c2) state
  • Note Can adjust the threshold to get a desired
    FDR and vice versa

16
3 -- Performance
17
Simulation set-up
  • 200 fake GS with
  • Z N(0 ,.32) , modal
  • Z N(log 2 ,.32) , deletion, a block of 30 GS
  • Z N(- log 2 ,.32), gains, blocks of 20 and 10
    GS
  • Reference array with Z N(0 ,.32)
  • 50 replications

18
CGH-Miner
  • Data mining approach to select gain and losses
    (Wang et al 2005)
  • Hierarchical clustering with a spatial constraint
  • (ie only spatially adjacent clusters are joined)
  • Subtree selection according to predefined rules
  • ? focus on selecting large consistent gain/loss
    regions and small (big spike) regions
  • Implemented in CGH-Miner Excel plug in
  • Estimation of FDR using a reference
    (normal/normal) array and the same set of rules
    to prune the tree. Declared target 1
  • Simulation set-up is similar to Wang et al.

19
Classification obtained by CGH miner and CGH mix
Gain
Gain
Mod
Modal
Deletion
Modal
Modal
30
10
20
20
Posterior probabilities of allocation to the 3
components
21
Comparative performance between CGHmix and
CGH-Miner
50 simulations CGHmix CGH-Miner
Realised false positive (mean) 1.9 16.4
Realised false positive (range) 0 -- 20 3 -- 39
Realised false negative (mean) 1.0 9.6
Realised false negative (range) 0 -- 4 0 -- 50
Realised FDR () 2.8 23.7
Estimated FDR () 1.3 1.2
22
4 -- Analyses of CGH-array cancer data sets
23
Breast cancer cell line MCF7
  • Data from Pollack et al., 6691 GS on 23
    chromosomes
  • µ1 -0.35, ?1 0.37
  • (µ2 0) ?2 0.27
  • µ3 0.44, ?3 0.54
  • Estimated FDR CGHmix 2.6
  • Estimated FDR CGH-Miner 1.5

24
(No Transcript)
25
Classification of GS obtained by CGHmix
26
known alterations found by both methods
additional known Alterations found by CGHmix
27
Neuroblastoma KCNR cell lineCurie Institute CGH
custom array for chromosome 1
  • 190 genomic clones, mostly on the short arm
  • 3 replicate spots for each
  • µ1 - 0.49, loss component
  • µ3 0.04, not plausible ? no gain in this case
  • Estimate FDR by regrouping c2 and c3 classes
  • Substantial number of deletions on short arm
  • No deletion found for the long arm by CGHmix, a
    result confirmed by classical cytogenetic
    information

28
Long arm
29
Extensions
  • Account for variability in the case of repeated
    measurement
  • ? add a measurement model with GS specific
    noise, with exchangeable prior
  • Refine the spatial model
  • Incorporate genomic sequence location in the
    neighbourhood definition of the CAR model
  • 0-1 contiguity ? spatial weights
  • In particular, account for overlapping sequences
    by using weights that depend on the overlap
Write a Comment
User Comments (0)
About PowerShow.com