Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A - PowerPoint PPT Presentation

1 / 53
About This Presentation
Title:

Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A

Description:

Equilibrium limit, t , gives the 'Langmuir isotherm': Time-dependent solutions. 512 ... Equilibrium Langmuir isotherm. Parameters y0, b, K all probe. dependent ... – PowerPoint PPT presentation

Number of Views:55
Avg rating:3.0/5.0
Slides: 54
Provided by: wwwmath
Category:

less

Transcript and Presenter's Notes

Title: Adsorption models of oligonucleotide microarrays Conrad Burden, Centre for Bioinformation Science, A


1
Adsorption models of oligonucleotide
microarraysConrad Burden, Centre for
Bioinformation Science, ANU
2
Oligonucleotide microarray chips
Affymetrix make these little beasties for
testing for the presence of genes in prepared
cRNA samples
Image courtesy of Affymetrix
3
  • Single strand DNA oligo probes 25 bases in length
    deposited onto glass substrate using
    photolithographic process

Image courtesy of Affymetrix
4
  • The chip surface is divided up into 500,000
    features tens of microns across, probes within
    each feature are a specific sequence
  • Each gene represented by between 11 and 16 pairs
    of such regions one perfect match (PM) sequence,
    and one mismatch (MM) sequence
  • ? Tens of thousands of genes measured by a single
    chip

5
Image courtesy of Affymetrix
6
Image courtesy of Affymetrix
7
Image courtesy of Affymetrix
8
Image courtesy of Affymetrix
9
  • Data from an experiment showing the expression of
    thousands of genes on a single GeneChip probe
    array.

Image courtesy of Affymetrix
10
  • Given a set of typically 16 PM and MM intensity
    values (number of replicate chips in expt.), how
    can we obtain a measure of mRNA expression for a
    given gene?
  • Either as an absolute mRNA concentration in, say,
    picomolar
  • Or a relative change in mRNA concentration
    between treatments

11
  • Absolute concentration well come to later
  • Relative expression between treatments existing
    Expression measures such as
  • MAS5
  • RMA
  • Li-Wong
  • attempt to do this.
  • (MAS5 is provided with Affymetrix chips.
  • The Bioconductor software provides inbuilt
    functions for all three measures.)

12
MAS5 (MicroArray Suite v5)
  • MM subtraction

if
where
something lt PM otherwise
2. Tukey biweight average of logged Vs within
probeset (summarisation)
SignalLogValue
13
  • 3. Optional scaling factor

4. Final output is
Reported value of ith probeset
14
RMA (Robust Microarray Average)
Irizarry et al. Biostatistics, 4 (2003) 249-264
1. Background Correction
Subtract from PMs a probe specific background
correction using a model based on observed
intensity being the sum of (exponential) signal
(normal) noise.
  • 2. Quantile normalisation

Assuming multiple replicates of each experiment,
this adjusts intensities so that the
distribution of intensities is the same for all
chips within set of replicates.
15
  • 3. Take logs

4. Average across the 16 probes in probeset using
median polish summarisation
i.e., fit to model
is the required measure
16
Affymetrix Latin Square experiment
  • 14 genes spiked at cyclic permutations of the 14
    concentrations (0, 0.25, 0.5, 1, ,1024) pM
  • into background of human pancreas cRNA
  • Hybridised onto 14 arrays
  • 3 replicates of experiment

17
GENES
CHIPS
18
Gene 37777_at
Background
64 pM
Saturation
1 pM
19
(No Transcript)
20
  • Existing expression measures
  • wrongly assume a linear relationship between
    target concentration and measured fluorescent dye
    intensity
  • fail to account for saturation effects
  • fail to account properly for probe specific
    differences in binding probe-target affinities
  • An alternate approach is to use adsorption models
    of physical chemistry to infer absolute
    concentration estimates.

21
Langmuir Adsorption Model

ADSORPTION PROBE TARGET DUPLEX

DESORPTION
Image courtesy of Affymetrix
22
Langmuir Adsorption Model
  • Let x be the concentration of mRNA target and
    ?(t) be the fraction of sites occupied by
    probe-target duplexes.
  • Assume
  • (Adsorption) Target mRNA attaches to probes at a
    rate kfx(1 ?(t)) proportional to concentration
    of specific target mRNA and fraction of
    unoccupied probes
  • (Desorption) Target mRNA detaches from probes at
    a rate kb?(t) proportional to fraction of
    occupied probes

23
  • Solution with initial condition ?(0) 0 is

where K kb/kf. Let y(x,t) be the measured
fluorescence intensity, y0 be the background
intensity at zero concentration. Also assume
intensity above background is proportional to
?(t). Then
24
Equilibrium limit, t ? 8, gives the Langmuir
isotherm
25
Time-dependent solutions
26
GENES
CHIPS
27
Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
28
Raw data from .cel files
Affy spike-in experiment Gene 37777_at Red
PM Black MM
29
Raw data from .cel files
Affy spike-in experiment Gene 1024_at Red
PM Black MM
30
Raw data from .cel files
Affy spike-in experiment Gene 1024_at Red
PM Black MM
31
Statistical Model
  • Use a Generalized Linear Model to fit
    fluorescence intensity values y to Gamma
    distribution i.e. assume random variable Y has a
    Gamma distribution
  • Y G(µ,?)
  • with mean given by Langmuir adsorption
    solution
  • µ yLangmuir(x,t)
  • and constant shape parameter ?, i.e. constant
    coefficient of variation.

32
Justification for Gamma distribution
  • Add to Langmuir equation a stochastic noise

where z(t) is a Gaussian noise, then under
reasonable assumptions on h(x,?), ? follows an
approximate Gamma distribution.
33
Test of Gamma assumption using Q-Q plot
  • Y G(µ,?) ? Y/µ G(1,?)
  • coeff. of variation
  • std. dev./mean 0.192
  • (gt 8,000 data points)

34
  • We tested many versions of the model

35
  • and determined the best supported model
  • (parsimonious i.e. no unnecessary parameters
  • but accurate over all data)
  • Equilibrium Langmuir isotherm
  • Parameters y0, b, K all probe
  • dependent
  • Overall wafer-dependent
  • scaling effect

36
Inverse problem
  • Given the measured fluorescence intensities from
    16 probes, what is the concentration of mRNA?

37
  • First try a simple algorithm
  • (following D. Hekstra et al. Nucl. Acids Res.
    31(2003) 1962)
  • 1) Fit parameters to a linear model

where nA, nC and nG are number of each nucleotide
in probe
2) Given a new set of 16 probe sequences,
estimate their parameters y0, b and K from
the model
38
  • 3) Invert the Langmuir isotherm to get 16
    estimates of the gene concentration

4) Median of these 16 values gives a robust
estimate of mRNA concentration for this gene
39
Why the median and not the mean?
40
Why the median and not the mean?
?
41
Why the median and not the mean?
?
So that we can account for data outside the
range y0 lt y lt y0 b
42
Calculated mRNA concentration vs. true values
43
compare with MAS5 () and RMA ()
44
  • Even this mindlessly simple algorithm is an
    improvement on the currently available
    Expression measures!

45
The challenge is to find an algorithm that will
predict y0, b and K for any given probe sequence
  • Recall Langmuir isotherm
  • Parameters y0, b and K probe dependent
  • explanation from physical chemistry?
  • Work in progress

46
Improvements to naïve Langmuir model
  • Include cross hybridization competition from
    mRNA other than the intended target sequence

Rate of uptake of specific target
Rate of uptake of non-specific target
47
  • Include dynamics of probe target binding

Even with these two improvements, The
hyperbolic form of isotherm is preserved!
48
  • Langmuir isotherm
  • is still appropriate (but the three parameters
    have less simple meanings).
  • This model enables a comparison between PM and MM
    probes parameters in terms of binding free
    energies

49
Langmuir isotherms for PM and MM
Affy spike-in experiment Gene 37777_at Red
PM Black MM
50
(No Transcript)
51
  • which is where we are up to.
  • Where is it going?
  • Final aim is to combine our adsorption model with
    existing models (e.g. Position Dependent Nearest
    Neighbour model) to find an algorithm for
    determining y0, b and K for any probe sequence.
  • This will provide a practical way of measure
    absolute concentration of mRNA in biological
    samples

52
References
  • Statistical Analysis of Adsorption Models for
    Oligonucleotide Microarrays,
  • Statistical Applications in Genetics and
    Molecular Biology, 2004 (to appear)
  • An Adsorption Model of Hybridization Behaviour
    on Oligonucleotide Microarrays,
  • ePrint arXiv q-bio.BM/1411005

53
Acknowledgements
  • Susan Wilson (CBiS/CMA, ANU)
  • Yvonne Pittelkow (CBiS, ANU)
  • C.B. (CBiS/JCSMR, ANU)
Write a Comment
User Comments (0)
About PowerShow.com