Analysis of Microarray Data Using EXPANDER and SHARP - PowerPoint PPT Presentation

1 / 38

About This Presentation

Title:

Analysis of Microarray Data Using EXPANDER and SHARP

Description:

2. Intensity-dependent normalization (Yang, Speed) (Lowess local ... Global normalization cannot remove intensity-dependent biases. 3. Quantile Normalization ... – PowerPoint PPT presentation

Number of Views:62

Avg rating:3.0/5.0

Slides: 39

Provided by: YossiS7

Category:

more less

Transcript and Presenter's Notes

Title: Analysis of Microarray Data Using EXPANDER and SHARP

1
Analysis of Microarray Data Using EXPANDER and
SHARP

Workshop, Jan 06

2
Input data
Normalization/ Filtering
Links to public annotation DBs (Hs, Mm, Rn, Dm,
S.cer)
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
EXPANDER work flow
3
Input data
Normalization/ Filtering
Links to public annotation DBs (Hs, Mm, Rn, Dm,
S.cer)
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
4
EXPANDER Input Data

Input data
Expression matrix (probes-rows
conditions-columns)
One-channel data (e.g., Affymetrix)
Dual-channel data (cDNA microarrays, data are
(log) ratios between the Red and Green channels)
ID conversion file map probe to gene ids

5
1. Normalization
6
Outline

What is normalization
Why is normalization needed
Three quantitative methods for normalization
Software tools

7
Hybridization of the same sample to 2
chips/channels

Ideally scatter plot coincides with the xy
diagonal
Due to Random errors we expect to see a cloud
around the xy diagonal.

Probe intensity - 2
Probe intensity - 1
8
Hybridization of the same sample to 2
chips/channels

In practice Both Random and Systematic
measurement errors (Bias)
Due to Biases scatter plots are not centered
around the x-y diagonal

9
Hybridization of the same sample to 2
chips/channels
10
Normalization the process of removing
systematic errors (biases) from the data
11
Sources of Systematic Errors

Different incorporation efficiency of dyes
Different amounts of mRNA
Experimenter/protocol issues (comparing chips
processed by different labs)
Different scanning parameters
Batch bias

12
Normalization - two problems

How to detect biases? Which genes to use for
estimating biases among chips/channels?
How to remove the biases?

13
Which Genes to use for bias detection?

All genes on the chip
Assumption Most of the genes are equally
expressed in the compared samples, the proportion
of the differential genes is low (lt20).
Limits
Not appropriate when comparing highly
heterogeneous samples (different tissues)
Not appropriate for analysis of dedicated chips
(apoptosis chips, inflammation chips etc)

14
Which Genes to use for bias detection?

Housekeeping genes
Assumption based on prior knowledge a set of
genes can be regarded as equally expressed in the
compared samples
Affy novel chips normalization set of 100
genes
NHGRIs cDNA microarrays 70 "house-keeping"
genes set
Limits
The validity of the assumption is questionable
Housekeeping genes are usually expressed at high
levels, not informative for the low intensities
range

15
Which Genes to use for bias detection?

Spiked-in controls from other organism, over a
range of concentrations
Limits
low number of controls- less robust
Cant detect biases due to differences in RNA
extraction protocols
Invariant set
Trying to identify genes that are expressed at
similar levels in the compared samples without
relying on any prior knowledge
Rank the genes in each chip according to their
expression level
Find genes with small change in ranks

16
Normalization Methods
17
1. Global normalization (Scaling)

A single normalization factor (k) is computed for
balancing chips\channels
Xinorm kXi
Multiplying intensities by this factor equalizes
the mean (median) intensity among compared chips

18
Global Normalization
Before
After
19
Boxplots
Log (Intensity)
Upper quartile
Median intensity
Lower quartile
20
Before Normalization
After Scaling
21
2. Intensity-dependent normalization (Yang, Speed)

(Lowess local linear fit)
Compensate for intensity-dependent biases

22
Detect Intensity-dependent Biases M vs A plots

X axis A average intensity
A 0.5log(Cy3Cy5)
Y axis M log ratio
M log(Cy3/Cy5)

23
We expect the M vs A plot to look like
M log(Cy3/Cy5)
A
24
Intensity-dependent bias
M log(Cy3/Cy5)
Global normalization cannot remove
intensity-dependent biases
A
25
(No Transcript)
26
3. Quantile Normalization
Before Normalization
After Scaling
27
quantile normalization equalizing the entire
distribution
28
Quantile Normalization

Sort intensities in each chip
Compute mean intensity in each rank across the
chips
Replace each intensity by the mean intensity at
its rank

Average chip
Chip 1
Chip 2
Chip 3
29
Normalization - tools

Bioconductor (both AFFY and cDNA)
Packages in R language
dChip (Affymetrix)
Quantile, Invariant set
Expander (Affy)
Lowess
Quantile

30
Acknowledgements

Figures in this presentations were taken in part
from presentations of
Henrik Bengtsson, Terry Speed
Yee Yang, Terry Speed
Guilherme J. M. Rosa
Laurent Gautier, Rafael Irizarry, Leslie Cope,
and Ben Bolstad

31
2. Identification of Differential Genes
32
Identification of differential genes

The most basic experimental design comparison
between 2 conditions treatment vs control
The goal to identify genes that are
differentially expressed in the examined
conditions
Number of replicates is usually low (n2-4)

33
1. Fold Change

Consider genes whose mean expression level was
change by at least 1.75-2 fold as differential
genes
Limits
Usually no estimation of false positive rate is
provided
Biased to genes with low expression level
Ignores the variability of gene levels over
replicates.

34
Fold Change limit ignores variability over
replicates

Seek for score that punishes genes with high
variability over replicates

35
2. T-test

Compute a t-score for each gene

mc, mt mean levels in Control and
Treatment Sc2, St2 variance estimates in
Control and Treatment nc, nt number of
replicates in in Control and Treatment
36
T - test

t-scores can be associated with p-value (under
the assumption that expression levels follow
normal distribution)
Log-transformation
Set cut-off for p-value (a0.01)
Consider all genes with p-value lt a as
differential genes

37
Multiple Testing

P-valg associated with the t-score Tg is the
probability for obtaining by random a t-score
that is at least as extreme as Tg.
Multiplicity problem thousands of genes are
tested simultaneously.
e.g. suppose
10,000 genes on a chip
not a single one is differentially expressed.
a0.01
10000x0.01 100 genes are expected to have a
p-value lt 0.01 just by chance.

38
Multiple testing

Need to adjust for multiple testing when
assessing the statistical significance of
findings
Corrections
Bonferroni (e.g., a0.01, N10,000
cut-off0.000001)
False Discovery Rate (FDR)
In high-throughput studies certain proportion of
false positives is tolerable
Control the expected proportion of false
positives among the genes identified as
differential (q10).

39
Differential Genes - Tools