Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A - PowerPoint PPT Presentation

1 / 53

About This Presentation

Title:

Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A

Description:

Equilibrium limit, t , gives the 'Langmuir isotherm': Time-dependent solutions. 512 ... Equilibrium Langmuir isotherm. Parameters y0, b, K all probe. dependent ... – PowerPoint PPT presentation

Number of Views:55

Avg rating:3.0/5.0

Slides: 54

Provided by: wwwmath

Category:

more less

Transcript and Presenter's Notes

Title: Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A

1
Adsorption models of oligonucleotide
microarraysConrad Burden, Centre for
Bioinformation Science, ANU
2
Oligonucleotide microarray chips
Affymetrix make these little beasties for
testing for the presence of genes in prepared
cRNA samples
Image courtesy of Affymetrix
3

Single strand DNA oligo probes 25 bases in length
deposited onto glass substrate using
photolithographic process

Image courtesy of Affymetrix
4

The chip surface is divided up into 500,000
features tens of microns across, probes within
each feature are a specific sequence
Each gene represented by between 11 and 16 pairs
of such regions one perfect match (PM) sequence,
and one mismatch (MM) sequence
? Tens of thousands of genes measured by a single
chip

5
Image courtesy of Affymetrix
6
Image courtesy of Affymetrix
7
Image courtesy of Affymetrix
8
Image courtesy of Affymetrix
9

Data from an experiment showing the expression of
thousands of genes on a single GeneChip probe
array.

Image courtesy of Affymetrix
10

Given a set of typically 16 PM and MM intensity
values (number of replicate chips in expt.), how
can we obtain a measure of mRNA expression for a
given gene?
Either as an absolute mRNA concentration in, say,
picomolar
Or a relative change in mRNA concentration
between treatments

Absolute concentration well come to later
Relative expression between treatments existing
Expression measures such as
MAS5
RMA
Li-Wong
attempt to do this.
(MAS5 is provided with Affymetrix chips.
The Bioconductor software provides inbuilt
functions for all three measures.)

12
MAS5 (MicroArray Suite v5)

MM subtraction

if
where
something lt PM otherwise
2. Tukey biweight average of logged Vs within
probeset (summarisation)
SignalLogValue
13

3. Optional scaling factor

4. Final output is
Reported value of ith probeset
14
RMA (Robust Microarray Average)
Irizarry et al. Biostatistics, 4 (2003) 249-264
1. Background Correction
Subtract from PMs a probe specific background
correction using a model based on observed
intensity being the sum of (exponential) signal
(normal) noise.

2. Quantile normalisation

Assuming multiple replicates of each experiment,
this adjusts intensities so that the
distribution of intensities is the same for all
chips within set of replicates.
15

3. Take logs

4. Average across the 16 probes in probeset using
median polish summarisation
i.e., fit to model
is the required measure
16
Affymetrix Latin Square experiment

14 genes spiked at cyclic permutations of the 14
concentrations (0, 0.25, 0.5, 1, ,1024) pM
into background of human pancreas cRNA
Hybridised onto 14 arrays
3 replicates of experiment

17
GENES
CHIPS
18
Gene 37777_at
Background
64 pM
Saturation
1 pM
19
(No Transcript)
20

Existing expression measures
wrongly assume a linear relationship between
target concentration and measured fluorescent dye
intensity
fail to account for saturation effects
fail to account properly for probe specific
differences in binding probe-target affinities
An alternate approach is to use adsorption models
of physical chemistry to infer absolute
concentration estimates.

21
Langmuir Adsorption Model

ADSORPTION PROBE TARGET DUPLEX

DESORPTION
Image courtesy of Affymetrix
22
Langmuir Adsorption Model

Let x be the concentration of mRNA target and
?(t) be the fraction of sites occupied by
probe-target duplexes.
Assume
(Adsorption) Target mRNA attaches to probes at a
rate kfx(1 ?(t)) proportional to concentration
of specific target mRNA and fraction of
unoccupied probes
(Desorption) Target mRNA detaches from probes at
a rate kb?(t) proportional to fraction of
occupied probes

Solution with initial condition ?(0) 0 is

where K kb/kf. Let y(x,t) be the measured
fluorescence intensity, y0 be the background
intensity at zero concentration. Also assume
intensity above background is proportional to
?(t). Then
24
Equilibrium limit, t ? 8, gives the Langmuir
isotherm
25
Time-dependent solutions
26
GENES
CHIPS
27
Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
28
Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
29
Raw data from .cel files
Affy spike-in experiment Gene 1024_at Red
PM Black MM
30
Raw data from .cel files
Affy spike-in experiment Gene 1024_at Red
PM Black MM
31
Statistical Model

Use a Generalized Linear Model to fit
fluorescence intensity values y to Gamma
distribution i.e. assume random variable Y has a
Gamma distribution
Y G(µ,?)
with mean given by Langmuir adsorption
solution
µ yLangmuir(x,t)
and constant shape parameter ?, i.e. constant
coefficient of variation.

32
Justification for Gamma distribution

Add to Langmuir equation a stochastic noise

where z(t) is a Gaussian noise, then under
reasonable assumptions on h(x,?), ? follows an
approximate Gamma distribution.
33
Test of Gamma assumption using Q-Q plot

Y G(µ,?) ? Y/µ G(1,?)
coeff. of variation
std. dev./mean 0.192
(gt 8,000 data points)

We tested many versions of the model

and determined the best supported model
(parsimonious i.e. no unnecessary parameters
but accurate over all data)
Equilibrium Langmuir isotherm
Parameters y0, b, K all probe
dependent
Overall wafer-dependent
scaling effect

36
Inverse problem

Given the measured fluorescence intensities from
16 probes, what is the concentration of mRNA?

First try a simple algorithm
(following D. Hekstra et al. Nucl. Acids Res.
31(2003) 1962)
1) Fit parameters to a linear model

where nA, nC and nG are number of each nucleotide
in probe
2) Given a new set of 16 probe sequences,
estimate their parameters y0, b and K from
the model
38

3) Invert the Langmuir isotherm to get 16
estimates of the gene concentration

4) Median of these 16 values gives a robust
estimate of mRNA concentration for this gene
39
Why the median and not the mean?
40
Why the median and not the mean?
?
41
Why the median and not the mean?
?
So that we can account for data outside the
range y0 lt y lt y0 b
42
Calculated mRNA concentration vs. true values
43
compare with MAS5 () and RMA ()
44

Even this mindlessly simple algorithm is an
improvement on the currently available
Expression measures!

45
The challenge is to find an algorithm that will
predict y0, b and K for any given probe sequence

Recall Langmuir isotherm

Parameters y0, b and K probe dependent
explanation from physical chemistry?
Work in progress

46
Improvements to naïve Langmuir model

Include cross hybridization competition from
mRNA other than the intended target sequence

Rate of uptake of specific target
Rate of uptake of non-specific target
47

Include dynamics of probe target binding

Even with these two improvements, The
hyperbolic form of isotherm is preserved!
48

Langmuir isotherm
is still appropriate (but the three parameters
have less simple meanings).
This model enables a comparison between PM and MM
probes parameters in terms of binding free
energies

49
Langmuir isotherms for PM and MM
Affy spike-in experiment Gene 37777_at Red
PM Black MM
50
(No Transcript)
51

which is where we are up to.
Where is it going?
Final aim is to combine our adsorption model with
existing models (e.g. Position Dependent Nearest
Neighbour model) to find an algorithm for
determining y0, b and K for any probe sequence.
This will provide a practical way of measure
absolute concentration of mRNA in biological
samples

52
References