Title: Microarray Data Analysis
1Microarray Data Analysis - The Need for
Normalization Christine Steinhoff Max Planck
Institut für Molekulare Genetik Berlin, Germany
2Types of Arrays
Red/Green experiments
Affymetrix Chips
Radioactive filters
3Data Analysis - Procedure
?
4Data Analysis - Procedure
5Example Quality Check
Ratio of intensitities of both channels
Yang, YH et al, SPIE BiOS, San Jose 2001
Product intensitity of both channels
6Example Quality Check
Ratio of intensitities of both channels
Yang, YH et al, SPIE BiOS, San Jose 2001
Product intensitity of both channels
7Data Analysis - Procedure
Starting with Image Processing Scanneroutput
informations about spot intensities local
background pins PCR plates localization
standard deviation...
Quality Check Are there any effects due to
pins PCR plates local effects ...
8What is Normalization ?
Systematic Variation in Microarray Experiments
- Saturation (Scanner Labelling) -
Nonlinearity of Cy5, Cy3 Labelling -
Efficiencies of Cy5, Cy3 Labelling - Variation
of Low-Intensities - Pins - PCR Plates - Local
Effects ...
Normalization is the process of describing and
removing such variation
9What is Normalization ?
10Why do we normalize ? Do we need normalization ?
Goal Reliable Measurement of Ratios Patient
vs. Control Patient(red)/Control(green)
Patient(green)/Control(red)
In Self-Self-Hybridization we would
expect green/red 1 for all genes
Mixture of Unequal Labelling Noise not constant
Variance Differential Expression (not in this
example!) ...
11Normalization Methods
Entire Dataset Overall Mean, Median,
Shorth Overall Regression Local
Regression Zscore ANOVA Variance
Stabilization useful for Most Genes
Unchanged- Settings
User Defined Sets Housekeeping (?!) Controls
etc useful for Most Genes Changed- Settings
12Normalization Strategies
Local Regression determine regression lines
locally
13Normalization Strategies
14Comparison of Normalization Strategies
1 maximal differential genes (red, 138 genes)
discarding 5 lowest expressed genes (green, 691
genes) before log product vs. log ratio of
normalized intensities
No Normalization
Linear Regression
Local Regression
Overall Median
Zscore
ANOVA
Variance Stabilization
15Comparison of Normalization Strategies
Goal Detection of differentially expressed
genes Set of 30 maximal differential genes out
of 13824
16Comparison of Normalization Strategies
17Comparison of Normalization Strategies
Goal Detection of Differentially expressed Genes
Var Stab ANOVA Lin Regr Least Med Local
Regr Mean Median Shorth Zscore Raw
18What is the right Normalization-method to use ?
General Assumption Most Genes Unchanged!
- Quantifiable Differential Expression ! - high
variance in low intensities - removing spatial
effects of various types (pins, PCR plates,
nonlinearity of labelling) - overall nonlinear
data - analyzing for a variety of influences -
specific for Dye-Swap-setting
Variance Stabilization Local
Regression ANOVA
19Summary
- Quality Check can detect removable effects which
otherwise could lead to false - positives
- Normalization of the data is essential to
- (1) remove
systematic effects and - (2) make
experiments comparable - Different Normalization Methods can lead to
different interpretations of the data - Depending on the experimental question different
methods are appropriate for
quantifiable expression differences for ex.
Variance Stabilization - for removal of spatial effects and overall
nonlinear data for ex. Local Regression - for detection of various influences in Dye
Swap settings for ex. ANOVA - For Red/Green - experiments always use Dye
Swaps
20Acknowledgement
Ulrike Nuber H.-Hilger Ropers Martin Vingron
Human Molecular Genetics Computational
Molecular Biology