2. Data quality assessment and normalization - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

2. Data quality assessment and normalization

Description:

Plots can be used to check microarray quality and to select the ... aRNA. A. Log2(MT*WT) / 2 (signal strength) 12. MA-plot for GeneChip arrays (1 color) MT ... – PowerPoint PPT presentation

Number of Views:65
Avg rating:3.0/5.0
Slides: 42
Provided by: AlexSa1
Category:

less

Transcript and Presenter's Notes

Title: 2. Data quality assessment and normalization


1
2. Data quality assessmentand normalization
  • Alex Sánchez. Dept. Estadística
  • Universitat de Barcelona

2
Outline
  • Microarray data quality diagnostic plots
  • Pre-processing
  • Image analysis
  • Normalization

3
Microarray studies life cycle
Here we are
4
Looking at microarray data
  • Diagnostic Plots

5
Diagnostic plots
  • Plots can be used to check microarray quality and
    to select the appropiate normalization
  • Many types available
  • Image (Fg, Bg) plots, Histograms, Spatial plots,
    Box plots, Scatterplot, MA plot,
  • Qualitative approach
  • No threshold
  • Needs some experience, seeing bad good plots

6
Red / Green overlay images
  • Start by looking at the slides

Bad high bg
Good low bg
7
Signal/Noise histograms
Images with high background tend to have lower
log2(signal/noise) ratios
8
Spatial plots for slide backgrounds
Log-ratios (M)
9
Spatial plot of high intensity log ratios
If there are no spatial effects ? high intensity
spots should be uniformly distributed
Top (black) and bottom (green) 5 of log ratios
10
Scatterplots always log, always rotate
log2R vs log2G
Mlog2R/G vs Alog2vRG
Instead of plotting log2R vs log2G? M-A is better
11
MA-plot for spotted arrays (2 colors)
Mutant (MT)
MT and WT intensity for each probe
Cy3/5- cDNA or aRNA
M Log2 (MT/WT)
Spot
Wild Type (WT)
A Log2(MTWT) / 2 (signal strength)
12
MA-plot for GeneChip arrays (1 color)
MT intensity for each probe set
aRNA
RMA
M Log2 (MT/WT)
MT
WT intensity for each probe set
aRNA
RMA
WT
A Log2(MTWT) / 2 (signal strength)
13
Pin-group effects
MA-plot
Boxplot
Boxplots of log ratios by pin group
Lowess lines through points from pin groups
14
Highlighting pin group effects
Log-ratios
Print-tip groups
Scatterplot and boxplots show a clear spatial
bias which may be associated with sample
preparation because spatially defined groups
are of different colours
15
Slide effects
16
Normalization
  • Addressing systematic bias

17
Preprocessing normalization
  • The word normalization describes techniques used
    to suitably transform the data before they are
    analysed.
  • Goal is to correct for systematic differences
  • between samples on the same slide, or
  • between slides,
  • which do not represent true biological variation
    between samples.

18
The origin of systematic differences
  • Systematic differences may be due to
  • Dye biases which vary with spot intensity,
  • Location on the array,
  • Plate origin,
  • Printing quality which may vary between
  • Pins
  • Time of printing
  • Scanning parameters,

19
Dye bias
  • Cy3 and Cy5 are relatively unstable, and may
    present different incorporation efficiencies
    during labeling, different quantum efficiencies,
    and are detected by the scanner with different
    efficiencies.
  • Normalization is performed to balance the
    fluorescence intensities of the two dyes, as well
    as to allow the comparison of expression levels
    across experiments (slides).

20
How to know if its necessary?
  • Look at diagnostic plots for dye, slide or
    spatial effects
  • Perform self-self normalization
  • If we hibridize a sample with itself instead of
    sample vs control intensities should be the same
    in both channels
  • All deviations from this equality means there is
    systematic bias that needs correction

21
R vs G plot
DIRECT REPRESENTATION OF INTENSITY VALUES
22
log R vs log G
WELL NORMALIZED DATA SHOULD FOLLOW THE DIAGONAL
YX
23
M vs A
WELL NORMALIZED DATA SHOULD FOLLOW THE HORIZONTAL
Y0
24
Self-self hybridizations
False color overlay
Boxplots within pin-groups
Scatter (MA-)plots
25
Some non self-self hybridizations
From the NCI60 data set
Early Ngai lab, UC Berkeley
Early PMCRI, Melbourne Australia
Early Goodman lab, UC Berkeley
26
Normalization methods issues
  • Methods
  • Global adjustment
  • Median normalization
  • Regression based normalization
  • Intensity dependent normalization
  • Within print-tip group normalization
  • And many other
  • Selection of spots for normalization

27
Global normalization
  • Based on a global adjustment
  • log2 R/G ?log2 R/G - c log2 R/(kg)
  • Choices for k or c log2k are
  • c median or mean of log ratios for a particular
    gene set
  • All genes or control or housekeeping genes.
  • Total intensity normalization, where
  • K ?Ri/ ?Gi.

28
Example (Callow et al 2002)Global median
normalization.
29
Regression normalization
  • Linear Regression (Calibration)
  • log(Cy3)ablog(Cy5)
  • Use estimates a and b to normalize the data
  • Alternative Regression through the origin

30
Regression normalization
Before normalization
After normalization
31
Intensity-dependent normalization
  • Dye bias is not linear, as can easily be seen in
    an MA plot
  • Run a line through the middle of the MA plot,
    shifting the M value of the pair (A,M) by cc(A),
    i.e. log2 R/G ? log2 R/G - c (A) log2
    R/(k(A)G).
  • One estimate of c(A) is made using the LOWESS
    function of Cleveland (1979) LOcally WEighted
    Scatterplot Smoothing.

32
Intensity-dependent normalization
  • Run a line through the middle of the MA plot,
    shifting the M value of the pair (A,M) by cc(A),
    i.e. log2 R/G ? log2 R/G - c (A) log2
    R/(k(A)G).
  • One estimate of c(A) is made using the LOWESS
    function of Cleveland (1979) LOcally WEighted
    Scatterplot Smoothing.

33
Example (Callow et al 2002)loess vs median
normalization.
34
Example (Callow et al 2002)Global median
normalization.
  • Global normalization performs a global correction
    but it cannot account for spatial effects?
  • See next slide boxplots for the same situations
    in only one mouse, showing all sectors

35
Global normalisation does not correct spatial
bias (print-tip-sectors)
36
Within print-tip group normalization
  • To correct for spatial bias produced by
    hybridization artefacts or print-tip or plate
    effects during the construction of arrays.
  • To correct for both print-tip and
    intensity-dependent bias perform LOWESS fits to
    the data within print-tip groups, i.e.
  • Log2 R/G? log2 R/G - ci(A) log2 R/(ki(A)G),
    where ci(A) is the LOWESS fit to the MA-plot for
    the ith grid only.

37
Local print-tip normalisation corrects spatial
bias (print-tip-sectors)
38
Normalization, which spots to use?
  • LOWESS can be run through many different sets of
    points,
  • All genes on the array.
  • Constantly expressed genes (housekeeping).
  • Controls.
  • Spiked controls (genes from distant species).
  • Genomic DNA titration series.
  • Rank invariant set.

39
Strategies for selecting a set of spots for
normalization
  • Use of a global LOWESS approach can be justified
    by supposing that, when stratified by mRNA
    abundance,
  • Only a minority of genes expected to be
    differentially expressed,
  • Any differential expression is as likely to be
    up-regulation as down-regulation.
  • Pin-group LOWESS requires stronger assumptions
    that one of the above applies within each
    pin-group.

40
Summary
  • Microarray experiments have many hot spots
    where errors or systematic biases can apper
  • Visual and numerical quality control should be
    performed
  • Usually intensities will require normalisation
  • At least global or intensity dependent
    normalisation should be performed
  • More sophisticated procedures rely on stronger
    assumptions? Must look for a balance

41
Acknowledgments
  • Special thanks to Yee Hwa Yang (UCSF) for
    allowing me to use some of her materials
  • Sandrine Dudoit Terry Speed, U.C. Berkeley
  • M. Carme Ruíz de Villa, U. Barcelona
  • Sara Marsal, U. Reumatología, HVH Barcelona
Write a Comment
User Comments (0)
About PowerShow.com