Comparison of Various Normalisation Strategies for Microarray Analysis - PowerPoint PPT Presentation

1 / 1
About This Presentation
Title:

Comparison of Various Normalisation Strategies for Microarray Analysis

Description:

In microarray experiments the expression levels of thousands of genes are being ... [1] Dudoit S, Yang YH, Speed TP, Callow MJ; Statistica Sinica, 12:111-139, 2002 ... – PowerPoint PPT presentation

Number of Views:37
Avg rating:3.0/5.0
Slides: 2
Provided by: Wince
Category:

less

Transcript and Presenter's Notes

Title: Comparison of Various Normalisation Strategies for Microarray Analysis


1
Comparison of Various Normalisation Strategies
for Microarray Analysis C. Steinhoff, U.A. Nuber,
R. Versteeg and M. Vingron Max Planck Institute
for Molecular Genetics Dept. Computational
Molecular Biology Ihnestr. 73, D-14195 Berlin,
Germany
Normalisation
ABSTRACT In microarray experiments the expression
levels of thousands of genes are being measured
simultaneously. Due to the number of variable
experimental steps such as probe acquisition,
preparation, labelling, hybridisation and
scanning procedures the resulting data are highly
variable, very noisy and have no fixed scale.
These systematic variation effects - when not
detected and analysed - will affect further data
analysis and interpretation. In order to compare
independent microarray experiments these effects
should be detected and removed. In the literature
there have been a number of attempts to solve
these problems. Up to now there is neither a
consensus about the use of the different existing
normalisation methods nor an overall comparison
of the different types of analyses. Furthermore
it is questionable to what extent different
normalisation strategies influence different
types of analysis such as the detection of
differential genes, classification etc. as well
as further biological interpretation. We
examined a number of normalisation strategies and
applied them to a repetition series of dye swap
experiments. Ín our experimental setting we first
applied various normalisation methods and then
studied the effect of the typee of strategy on
the detection of differential genes. For that
purpose we performed northern blots of a subset
of those genes which were detected to be
differential by different methods and on the
other hand compared data with corresponding SAGE
experiments. Thus we want to contrast detection
of outliers based on different preceding
normalisation methods by outlier detection using
northern blotting or SAGE. Overall
normalisation methods can be divided into two
groups those which are using subsets of spotted
sequences (for example housekeeping genes, spiked
controls etc.) for normalising the whole dataset
and those which are using all clones/probes for
normalisation. The latter approach assumes that
most genes remain unchanged when considering a
sample-control-setting. We applied different
normalisation methods using all probes on the
chip. These methods can be devided into (a)
scaling methods (b) regression methods (c)
methods analysing and stabilising for variable
variance across the spectrum of intensities (d)
methods analysing for various influencing factors
as ANOVA and (e) distribution based methods. (a)
Scaling methods only adjust for overall and
linear effects across the slides. Dividing the
raw intensities by overall mean, median or shorth
or (b) applying linear regression has been used
in different studies. Dudoit et al. 1 estimated
a smooth normalising curve by local regression.
(c) The variance of log intensities of microarray
experiments tend to increase along decreasing
mean of log intensities, so the variance is
highly dependent on the mean value. To solve this
problem it has been proposed (Kepler et al. 3)
to estimate the normalised expression levels and
the expression level dependent error variance by
local regression. Huber et al. 4 proposed a
variance stabilising transformation to obtain log
transformed data with approximate constant
variance across the dataset. (d) Kerr et al. 2
proposed an ANOVA approach to model log
intensities. They analysed for specific effects
due to probe, dye, array, gene and probe-gene,
array-gene interaction. The parameters are
determined by maximum likelihood estimation. We
used three repetitions of dye swap experiments
and applied the normalisation strategies
mentioned above. Different lists of outliers
which resulted from different normalisations were
compared by defining distance measures on rank
ordered differential genes. 7 genes which were
classified to be differential according to the
different normalisations were subjected to
northern blot analysis. All cases of differential
expression could be confirmed. Results from
northern blotting and preceding detection of
outliers from different methods were correlated.
Apart from ZScore and Variance Stabilisation a
high correlation between expression differences
detected by northern blotting and microarray were
seen. On the other hand there was no apparent
correlation with SAGE data. References 1
Dudoit S, Yang YH, Speed TP, Callow MJ
Statistica Sinica, 12111-139, 2002 2 Kerr MK,
Martin M, Churchill GA Journal of Computational
Biology, 7819-837, 2000 3 Kepler TB, Cosby L,
Morgan KT GenomeBiology, 3(7)
research0037.1-0037.12, 2002 4 Huber W, v
Heydebreck A, Sültmann H, Poustka A, Vingron M
Bioinformatics, ISMB, 2002
User Defined Sets Housekeeping Controls
etc useful for Most Genes Changed- Settings
Entire Dataset useful for Most Genes
Unchanged- Settings
a
b
d
c
e
Regression methods
Analysis of Variance/ ML based methods
Scaling methods
Distribution based
Transformation methods
  • Variance
  • stabilisation (6)
  • Quantile
  • normalisation
  • ANOVA (7)
  • Overall (4)
  • linear/polynomial
  • Local (5)
  • linear/polynomial
  • Mean
  • Median (2)
  • Shorth
  • Zscore (3)

1 maximal differential genes (red) using
different normalisations
Microarray SAGE
Raw Median ZScore Least Med Regr Local
Regr Var Stab ANOVA
plt0.05 500 most diff
gt 2fold 500 most diff
36.2
plt0.05 500 most diff
gt 2fold 500 most diff
35.4
logproduct
plt0.05 500 most diff
gt 2fold 500 most diff
30.0
pairwise distances according to 1 maximal diff.
genes between different normalisation methods
d(i,j) 1 - 6/(N(N2-1)) ?k,l1...N d(i,j)k,l
genes ordered by abs(logratio) d(i,j)k,l
rank(genek)-rank(genel) if exists
N1 else
plt0.05 500 most diff
gt 2fold 500 most diff
36.4
d(i,j) N - shared genes
plt0.05 500 most diff
gt 2fold 500 most diff
36.2
plt0.05 500 most diff
gt 2fold 500 most diff
36.0
plt0.05 500 most diff
gt 2fold 500 most diff
36.4
single linkage clustering
single linkage clustering
1 2
3 4 ID 4695 ID
1193 ID 2145 ID 3657 P C
P C P C
P C PPatient CControl
1 maximal differential genes (red) with minimal
pvalue (horizontal line) using different
normalisations
...
logratio
corr 0.867
corr 0.867
pairwise distances according to 1 maximal diff.
genes with pvalue lt 0.01 between different
normalisation methods
corr 0.859
corr 0.232
corr 0.599
corr 0.864
single linkage clustering
single linkage clustering
fold change (Raw data) ID Microarray
Northern 1193 1.603 2.015 2145
- 2.046 - 1.944 3176 - 1.803 -
1.910 3657 1.812 1.883 3993
- 2.390 - 2.312 4460 3.590
1.845 4695 1.211 1.990
corr 0.867
Write a Comment
User Comments (0)
About PowerShow.com