Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations - PowerPoint PPT Presentation

1 / 25

About This Presentation

Title:

Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations

Description:

Ingrid K. Glad, University of Oslo. Heidi Lyng, The Norwegian Radium Hospital. Bricks dag ... We try to keep the model as close as possible to the biology, physics, ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 26

Provided by: arnoldof

Category:

more less

Transcript and Presenter's Notes

Title: Bayesian processbased modeling of gene expression data: estimating absolute mRNA concentrations

1
Bayesian process-based modeling of gene
expression data estimating absolute mRNA
concentrations

Arnoldo Frigessi, University of Oslo
Mark van de Wiel, Technische Universiteit
Eindhoven
Marit Holden, Norwegian Computing Center
Ingrid K. Glad, University of Oslo
Heidi Lyng, The Norwegian Radium Hospital

Bricks dag 29-11-2005
2
cDNA microarray
DNA
transcription
mRNA
translation
amino acid
organism phenotype
protein
cell phenotype

Microarrays measure gene expression at the
transcription level
Microarray technology 40,000 spots cDNA
microarray, hybridized with a sample labelled
with a green fluorescent colour (Cy3) and a
reference sample, labelled in red (Cy5).

3
Adapted fromChristina Kendziorski
http//www.biostat.wisc.edu/kendzior
Data
log2(rj/gj)

4

Efficient production of spotted glass-slide
arrays has made the microarray technology to a
widespread technique.
Spotted microarrays provided valuable
information on relative transcript levels in
tissues, but
differences in experimental protocols make
direct comparison of results between microarray
studies very difficult.

Can we get information about absolute transcript
levels from
standard spotted microarray data?
Extraction of absolute transcript levels is
complicated due to experimental variation and
noise originating in the production and
hybridisation processes.
Main difficulty Probes (to which the transcripts
bind) have different properties.

Is information about absolute transcript levels
useful?
Absolute concentrations of mRNA are universal
and can be included in further analysis with
similar estimates obtained with different
techniques in other labs.
A first step towards building an annotated data
base of transcript levels of cells.
It is possible to detect significant
concentration differences between two different
genes within the same tumor, comparisons that are
not possible with standard intensity ratios.

7

Sample 1
Sample 3
Sample t
Sample 2
K1g
K2g
K1g
Ktg
Estimates
TransCount counting the number of transcripts of
each gene in each sample.
K1g
K2g
K3g

Ktg
8
Propagating uncertainty

Current practice
Divide the experiment into separate steps
microarray production
transcription labelling hybridisation
image analysis
normalisation
imputation
estimation of intensities
testing / clustering
Do inference inside each task and plug-in
results into the next step.

We do a coherent statistical analysis and
propagate uncertainties.
9

We use available covariates describing the
various steps of the
experiment, from target preparation to laser
scanning of the images.
We try to keep the model as close as possible to
the biology, physics,
bio-chemistry of the experiment.
MCMC converges (slowly, as usual in complex
models).

Some genes must be spotted at least twice on
some arrays
the number of such genes does not depend on
the total number
of genes in the study but on the design.
Currently 50 genes
in duplicate.
Our method succeeds in obtaining absolute
concentrations
because it makes explicit use of probe and
spot related covariates
like probe length and quantity, to describe
probe-dependent
hybridisation efficiency. By means of
duplicate spotting,
we have many transcripts with more than one
probe, and the
effect of probe-dependent covariates can be
estimated.

We follow the mRNA molecules
through the whole experiment.
At each step, some molecules
survive, according to a Binomial
process with a success probability
depending on appropriate covariates.
At the end, some molecules are
scanned, and produce our data,
i.e. the raw measured intensities.

Two off-line experiments are needed to determine
two
covariates which are technology dependent
Hybridisation factor c is used to scale the
estimated values
to the true number of transcripts. Estimated
using two control
samples (spikes) with known concentrations.
Amplification factor f is a measure of the
increase in intensity per unit
of increase in PMT voltage during laser
scanning. Estimated once for
each dye and scanner.
Under ordinary stable experimental settings
it is sufficient to estimate these
factors once.

13
MODEL
1 scaling and selection of target molecules 2
inclusion of covariate information 3 scanning 4
imaging
14

Reparametrisation principle
Approximate Binomial with Poisson
Hgt,a Poisson (c nas qta Ktg pst,a )
and find the parameters that are
identifiable.
Reparametrise Ktg to include all other
remaining parameters
Next approximate Binomial with Normal
Parameters that were not Poisson identifiable (
)
do not occur in the mean, but only in the
variance.
(a is the ratio of the two dye parameters)

Validate estimated concentrations in a dye swap
experiment with
control samples at known concentrations.
17 genes spotted each 6 times on 2 arrays.
2 control samples (spikes) each with 17
different
mRNA sequences at specific
concentrations.
Hyb. Factor 0.001

Low concentrations are overestimated.
16

Validate estimated concentrations with results
from quantitative
real-time PCR.
12 cervix tumor samples and a pool of ten cancer
cell lines
24 arrays, each with one tumor and the pool,
dye-swapped
10000 genes

17
TransCount
Log-ratios
18

Clear linear relation between the PCR data and
estimated concentrations
The best agreement for intermediate and high
concentrations, reflecting the increased
uncertainties of both methods in
quantification of low abundant transcripts.
Good positive correlation between estimated
concentration and PCR data for some
individual genes, despite a limited
within-gene variability and few data points.
Standard log-ratio expressions also showed a
significant correlation to PCR data,
BUT much lower
Many genes had approximately the same log-ratio
although their absolute transcript
concentrations differed considerably. This
shows the additional information of
absolute measures.

19
Cervix tumor samples and pooled cell lines

Estimated absolute transcript concentrations
12 cervix tumor samples and a pool of ten cancer
cell lines
24 arrays, each with one tumor and the pool, dye
swapped
10000 genes

20
(No Transcript)
21
(No Transcript)
22

Estimate both parameters in a binomial (which
allows for estimating gene-dependent effects)
Use scaling to focus hybridization process
Do not use single observations to estimate its
variance.
Use conditional independence in hierarchical
models to model
complex dependencies in a flexible way.
Start MCMC runs with central initial values.

Four main ideas
we use covariates explicitly, incl. some
describing hybridisation
efficiency of each spot
we treat unequal number of replicates per gene
we use the binomial process, which better
describes the experimental
dynamics and allows estimation of gene and
dye effects
we build a bottom-to-top coherent stochastic
model, avoiding
plug-ins and propagating fully uncertainty.

24
Publications Arnoldo Frigessi, M.A. van de
Wiel, M. Holden, D.H. Svendsrud, I.K Glad and H.
Lyng (2005), Genome-wide estimation of transcript
concentrations from spotted cDNA microarray data,
Nucleic Acids Research - Methods Online, 33,
e143 M.A. van de Wiel, M. Holden, I.K Glad, H.
Lyng and Arnoldo Frigessi (2006), Bayesian
process-based modeling of two-channel microarray
experiments estimating absolute mRNA
concentrations In Bayesian Inference for Gene
Expression and Proteomics (Mueller Do,
eds) Tech report available here
http//www.nr.no/files/samba/smbi/Transcount/repor
t999.pdf
TransCount a prototype and quite-user-friendly
version of the MCMC sampler is available here
http//www.nr.no/pages/samba/area_emr_smbi_transco
unt
25
(No Transcript)

Write a Comment

User Comments (0)