Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations - PowerPoint PPT Presentation

1 / 25
About This Presentation
Title:

Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations

Description:

Ingrid K. Glad, University of Oslo. Heidi Lyng, The Norwegian Radium Hospital. Bricks dag ... We try to keep the model as close as possible to the biology, physics, ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 26
Provided by: arnoldof
Category:

less

Transcript and Presenter's Notes

Title: Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations


1
Bayesian process-based modeling of gene
expression data estimating absolute mRNA
concentrations
  • Arnoldo Frigessi, University of Oslo
  • Mark van de Wiel, Technische Universiteit
    Eindhoven
  • Marit Holden, Norwegian Computing Center
  • Ingrid K. Glad, University of Oslo
  • Heidi Lyng, The Norwegian Radium Hospital

Bricks dag 29-11-2005
2
cDNA microarray
DNA
transcription
mRNA
translation
amino acid
organism phenotype
protein
cell phenotype
  • Microarrays measure gene expression at the
    transcription level
  • Microarray technology 40,000 spots cDNA
    microarray, hybridized with a sample labelled
    with a green fluorescent colour (Cy3) and a
    reference sample, labelled in red (Cy5).

3
Adapted fromChristina Kendziorski
http//www.biostat.wisc.edu/kendzior
Data
log2(rj/gj)

4
  • Efficient production of spotted glass-slide
    arrays has made the microarray technology to a
    widespread technique.
  • Spotted microarrays provided valuable
    information on relative transcript levels in
    tissues, but
  • differences in experimental protocols make
    direct comparison of results between microarray
    studies very difficult.

5
  • Can we get information about absolute transcript
    levels from
  • standard spotted microarray data?
  • Extraction of absolute transcript levels is
    complicated due to experimental variation and
    noise originating in the production and
    hybridisation processes.
  • Main difficulty Probes (to which the transcripts
    bind) have different properties.

6
  • Is information about absolute transcript levels
    useful?
  • Absolute concentrations of mRNA are universal
    and can be included in further analysis with
    similar estimates obtained with different
    techniques in other labs.
  • A first step towards building an annotated data
    base of transcript levels of cells.
  • It is possible to detect significant
    concentration differences between two different
    genes within the same tumor, comparisons that are
    not possible with standard intensity ratios.


7

Sample 1
Sample 3
Sample t
Sample 2
K1g
K2g
K1g
Ktg
Estimates
TransCount counting the number of transcripts of
each gene in each sample.
K1g
K2g
K3g

Ktg
8
Propagating uncertainty
  • Current practice
  • Divide the experiment into separate steps
  • microarray production
  • transcription labelling hybridisation
  • image analysis
  • normalisation
  • imputation
  • estimation of intensities
  • testing / clustering
  • Do inference inside each task and plug-in
    results into the next step.


We do a coherent statistical analysis and
propagate uncertainties.
9
  • We use available covariates describing the
    various steps of the
  • experiment, from target preparation to laser
    scanning of the images.
  • We try to keep the model as close as possible to
    the biology, physics,
  • bio-chemistry of the experiment.
  • MCMC converges (slowly, as usual in complex
    models).


10
  • Some genes must be spotted at least twice on
    some arrays
  • the number of such genes does not depend on
    the total number
  • of genes in the study but on the design.
    Currently 50 genes
  • in duplicate.
  • Our method succeeds in obtaining absolute
    concentrations
  • because it makes explicit use of probe and
    spot related covariates
  • like probe length and quantity, to describe
    probe-dependent
  • hybridisation efficiency. By means of
    duplicate spotting,
  • we have many transcripts with more than one
    probe, and the
  • effect of probe-dependent covariates can be
    estimated.


11
  • We follow the mRNA molecules
  • through the whole experiment.
  • At each step, some molecules
  • survive, according to a Binomial
  • process with a success probability
  • depending on appropriate covariates.
  • At the end, some molecules are
  • scanned, and produce our data,
  • i.e. the raw measured intensities.

12
  • Two off-line experiments are needed to determine
    two
  • covariates which are technology dependent
  • Hybridisation factor c is used to scale the
    estimated values
  • to the true number of transcripts. Estimated
    using two control
  • samples (spikes) with known concentrations.
  • Amplification factor f is a measure of the
    increase in intensity per unit
  • of increase in PMT voltage during laser
    scanning. Estimated once for
  • each dye and scanner.
  • Under ordinary stable experimental settings
    it is sufficient to estimate these
  • factors once.

13
MODEL
1 scaling and selection of target molecules 2
inclusion of covariate information 3 scanning 4
imaging
14
  • Reparametrisation principle
  • Approximate Binomial with Poisson
  • Hgt,a Poisson (c nas qta Ktg pst,a )
  • and find the parameters that are
    identifiable.
  • Reparametrise Ktg to include all other
    remaining parameters
  • Next approximate Binomial with Normal
  • Parameters that were not Poisson identifiable (
    )
  • do not occur in the mean, but only in the
    variance.
  • (a is the ratio of the two dye parameters)

15
  • Validate estimated concentrations in a dye swap
    experiment with
  • control samples at known concentrations.
  • 17 genes spotted each 6 times on 2 arrays.
  • 2 control samples (spikes) each with 17
    different
  • mRNA sequences at specific
  • concentrations.
  • Hyb. Factor 0.001


Low concentrations are overestimated.
16
  • Validate estimated concentrations with results
    from quantitative
  • real-time PCR.
  • 12 cervix tumor samples and a pool of ten cancer
    cell lines
  • 24 arrays, each with one tumor and the pool,
    dye-swapped
  • 10000 genes


17
TransCount
Log-ratios
18
  • Clear linear relation between the PCR data and
    estimated concentrations
  • The best agreement for intermediate and high
    concentrations, reflecting the increased
  • uncertainties of both methods in
    quantification of low abundant transcripts.
  • Good positive correlation between estimated
    concentration and PCR data for some
  • individual genes, despite a limited
    within-gene variability and few data points.
  • Standard log-ratio expressions also showed a
    significant correlation to PCR data,
  • BUT much lower
  • Many genes had approximately the same log-ratio
    although their absolute transcript
  • concentrations differed considerably. This
    shows the additional information of
  • absolute measures.

19
Cervix tumor samples and pooled cell lines
  • Estimated absolute transcript concentrations
  • 12 cervix tumor samples and a pool of ten cancer
    cell lines
  • 24 arrays, each with one tumor and the pool, dye
    swapped
  • 10000 genes

20
(No Transcript)
21
(No Transcript)
22
  • Estimate both parameters in a binomial (which
    allows for estimating gene-dependent effects)
  • Use scaling to focus hybridization process
  • Do not use single observations to estimate its
    variance.
  • Use conditional independence in hierarchical
    models to model
  • complex dependencies in a flexible way.
  • Start MCMC runs with central initial values.


23
  • Four main ideas
  • we use covariates explicitly, incl. some
    describing hybridisation
  • efficiency of each spot
  • we treat unequal number of replicates per gene
  • we use the binomial process, which better
    describes the experimental
  • dynamics and allows estimation of gene and
    dye effects
  • we build a bottom-to-top coherent stochastic
    model, avoiding
  • plug-ins and propagating fully uncertainty.


24
Publications Arnoldo Frigessi, M.A. van de
Wiel, M. Holden, D.H. Svendsrud, I.K Glad and H.
Lyng (2005), Genome-wide estimation of transcript
concentrations from spotted cDNA microarray data,
Nucleic Acids Research - Methods Online, 33,
e143 M.A. van de Wiel, M. Holden, I.K Glad, H.
Lyng and Arnoldo Frigessi (2006), Bayesian
process-based modeling of two-channel microarray
experiments estimating absolute mRNA
concentrations In Bayesian Inference for Gene
Expression and Proteomics (Mueller Do,
eds) Tech report available here
http//www.nr.no/files/samba/smbi/Transcount/repor
t999.pdf
TransCount a prototype and quite-user-friendly
version of the MCMC sampler is available here
http//www.nr.no/pages/samba/area_emr_smbi_transco
unt
25
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com