Analysis of Microarray Data Using EXPANDER and SHARP - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Analysis of Microarray Data Using EXPANDER and SHARP

Description:

2. Intensity-dependent normalization (Yang, Speed) (Lowess local ... Global normalization cannot remove intensity-dependent biases. 3. Quantile Normalization ... – PowerPoint PPT presentation

Number of Views:62
Avg rating:3.0/5.0
Slides: 39
Provided by: YossiS7
Category:

less

Transcript and Presenter's Notes

Title: Analysis of Microarray Data Using EXPANDER and SHARP


1
Analysis of Microarray Data Using EXPANDER and
SHARP
  • Workshop, Jan 06

2
Input data
Normalization/ Filtering
Links to public annotation DBs (Hs, Mm, Rn, Dm,
S.cer)
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
EXPANDER work flow
3
Input data
Normalization/ Filtering
Links to public annotation DBs (Hs, Mm, Rn, Dm,
S.cer)
Visualization utilities
Clustering (CLICK, SOM, K-means, Hierarchical)
Biclustering (SAMBA)
Functional enrichment (TANGO)
Promoter signals (PRIMA)
4
EXPANDER Input Data
  • Input data
  • Expression matrix (probes-rows
    conditions-columns)
  • One-channel data (e.g., Affymetrix)
  • Dual-channel data (cDNA microarrays, data are
    (log) ratios between the Red and Green channels)
  • ID conversion file map probe to gene ids

5
1. Normalization
6
Outline
  • What is normalization
  • Why is normalization needed
  • Three quantitative methods for normalization
  • Software tools

7
Hybridization of the same sample to 2
chips/channels
  • Ideally scatter plot coincides with the xy
    diagonal
  • Due to Random errors we expect to see a cloud
    around the xy diagonal.

Probe intensity - 2
Probe intensity - 1
8
Hybridization of the same sample to 2
chips/channels
  • In practice Both Random and Systematic
    measurement errors (Bias)
  • Due to Biases scatter plots are not centered
    around the x-y diagonal

9
Hybridization of the same sample to 2
chips/channels
10
Normalization the process of removing
systematic errors (biases) from the data
11
Sources of Systematic Errors
  • Different incorporation efficiency of dyes
  • Different amounts of mRNA
  • Experimenter/protocol issues (comparing chips
    processed by different labs)
  • Different scanning parameters
  • Batch bias

12
Normalization - two problems
  • How to detect biases? Which genes to use for
    estimating biases among chips/channels?
  • How to remove the biases?

13
Which Genes to use for bias detection?
  • All genes on the chip
  • Assumption Most of the genes are equally
    expressed in the compared samples, the proportion
    of the differential genes is low (lt20).
  • Limits
  • Not appropriate when comparing highly
    heterogeneous samples (different tissues)
  • Not appropriate for analysis of dedicated chips
    (apoptosis chips, inflammation chips etc)

14
Which Genes to use for bias detection?
  • Housekeeping genes
  • Assumption based on prior knowledge a set of
    genes can be regarded as equally expressed in the
    compared samples
  • Affy novel chips normalization set of 100
    genes
  • NHGRIs cDNA microarrays 70 "house-keeping"
    genes set
  • Limits
  • The validity of the assumption is questionable
  • Housekeeping genes are usually expressed at high
    levels, not informative for the low intensities
    range

15
Which Genes to use for bias detection?
  • Spiked-in controls from other organism, over a
    range of concentrations
  • Limits
  • low number of controls- less robust
  • Cant detect biases due to differences in RNA
    extraction protocols
  • Invariant set
  • Trying to identify genes that are expressed at
    similar levels in the compared samples without
    relying on any prior knowledge
  • Rank the genes in each chip according to their
    expression level
  • Find genes with small change in ranks

16
Normalization Methods
17
1. Global normalization (Scaling)
  • A single normalization factor (k) is computed for
    balancing chips\channels
  • Xinorm kXi
  • Multiplying intensities by this factor equalizes
    the mean (median) intensity among compared chips

18
Global Normalization
Before
After
19
Boxplots
Log (Intensity)
Upper quartile
Median intensity
Lower quartile
20
Before Normalization
After Scaling
21
2. Intensity-dependent normalization (Yang, Speed)
  • (Lowess local linear fit)
  • Compensate for intensity-dependent biases

22
Detect Intensity-dependent Biases M vs A plots
  • X axis A average intensity
  • A 0.5log(Cy3Cy5)
  • Y axis M log ratio
  • M log(Cy3/Cy5)

23
We expect the M vs A plot to look like
M log(Cy3/Cy5)
A
24
Intensity-dependent bias
M log(Cy3/Cy5)
Global normalization cannot remove
intensity-dependent biases
A
25
(No Transcript)
26
3. Quantile Normalization
Before Normalization
After Scaling
27
quantile normalization equalizing the entire
distribution
28
Quantile Normalization
  • Sort intensities in each chip
  • Compute mean intensity in each rank across the
    chips
  • Replace each intensity by the mean intensity at
    its rank

Average chip
Chip 1
Chip 2
Chip 3
29
Normalization - tools
  • Bioconductor (both AFFY and cDNA)
  • Packages in R language
  • dChip (Affymetrix)
  • Quantile, Invariant set
  • Expander (Affy)
  • Lowess
  • Quantile

30
Acknowledgements
  • Figures in this presentations were taken in part
    from presentations of
  • Henrik Bengtsson, Terry Speed
  • Yee Yang, Terry Speed
  • Guilherme J. M. Rosa
  • Laurent Gautier, Rafael Irizarry, Leslie Cope,
    and Ben Bolstad

31
2. Identification of Differential Genes
32
Identification of differential genes
  • The most basic experimental design comparison
    between 2 conditions treatment vs control
  • The goal to identify genes that are
    differentially expressed in the examined
    conditions
  • Number of replicates is usually low (n2-4)

33
1. Fold Change
  • Consider genes whose mean expression level was
    change by at least 1.75-2 fold as differential
    genes
  • Limits
  • Usually no estimation of false positive rate is
    provided
  • Biased to genes with low expression level
  • Ignores the variability of gene levels over
    replicates.

34
Fold Change limit ignores variability over
replicates
  • Seek for score that punishes genes with high
    variability over replicates

35
2. T-test
  • Compute a t-score for each gene

mc, mt mean levels in Control and
Treatment Sc2, St2 variance estimates in
Control and Treatment nc, nt number of
replicates in in Control and Treatment
36
T - test
  • t-scores can be associated with p-value (under
    the assumption that expression levels follow
    normal distribution)
  • Log-transformation
  • Set cut-off for p-value (a0.01)
  • Consider all genes with p-value lt a as
    differential genes

37
Multiple Testing
  • P-valg associated with the t-score Tg is the
    probability for obtaining by random a t-score
    that is at least as extreme as Tg.
  • Multiplicity problem thousands of genes are
    tested simultaneously.
  • e.g. suppose
  • 10,000 genes on a chip
  • not a single one is differentially expressed.
  • a0.01
  • 10000x0.01 100 genes are expected to have a
    p-value lt 0.01 just by chance.

38
Multiple testing
  • Need to adjust for multiple testing when
    assessing the statistical significance of
    findings
  • Corrections
  • Bonferroni (e.g., a0.01, N10,000
    cut-off0.000001)
  • False Discovery Rate (FDR)
  • In high-throughput studies certain proportion of
    false positives is tolerable
  • Control the expected proportion of false
    positives among the genes identified as
    differential (q10).

39
Differential Genes - Tools
  • Cyber-T
  • SAM (Significance Analysis of Microarray)
Write a Comment
User Comments (0)
About PowerShow.com