Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Exp - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Exp

Description:

Statistical Methods for Identifying Differentially Expressed Genes in Replicated ... Quantile-Quantile plots. Graphical Display for Test Statistics (III) ... – PowerPoint PPT presentation

Number of Views:111
Avg rating:3.0/5.0
Slides: 23
Provided by: nan68
Category:

less

Transcript and Presenter's Notes

Title: Statistical Methods for Identifying Differentially Expressed Genes in Replicated cDNA Microarray Exp


1
Statistical Methods for Identifying
Differentially Expressed Genes in Replicated cDNA
Microarray Experiments
  • Presented by Nan Lin
  • 13 October 2002

2
Introduction to cDNA Microarray Experiment
  • Single-slide Design
  • Two mRNA samples (red/green) on the same slide
  • Multiple-slide Design
  • Two or more types of mRNA on different slides
  • Exclude time-course experiment

3
Examples of Multiple-slide Design
  • Apo AI
  • Treatment group 8 mice with apo AI gene knocked
    out
  • Control group 8 C57B1/6 mice
  • Cy5 each of 16 mice
  • Cy3 pooling cDNA from 8 control mice
  • SR-BI
  • Treatment group 8 SR-BI transgenic mice
  • Control group 8 normal FVB mice
  • Microarray Setup
  • 6384 spots, 4X4 grids with 19X21 spots in each

4
Single-slide Methods
  • Two types
  • Based solely on intensity ratio R/G
  • Take into account overall transcript abundance
    measured by RG
  • Historical Review
  • Fold increase/decrease cut-offs (1995-1996)
  • Probabilistic modeling based on distributional
    assumptions (1997-2000)
  • Consider RG (2000-2001) e.g. Gamma-Gamma-Bernoull
    i

5
Summary of Single-slide Methods
  • Producing a model dependent rule drawing two
    curves in the (R,G) plane
  • Power (1-Type II error rate)
  • False positive rate (Type I error rate)
  • Multiple testing
  • Replication is needed because gene expression
    data are too noisy

6
Image Analysis
  • Raw data 16-bit TIFF files
  • Addressing
  • Within a batch, important characteristics are
    similar
  • Segmentation
  • Seeded region growing algorithm
  • Background adjustment
  • Morphological opening (a nonlinear filter)
  • Software package Spot in R environment

7
Single-slide Data Display
  • Plot log2R vs. log2G
  • variation less dependent on absolute magnitude
  • normalization is additive for logged intensities
  • evens out highly skewed distributions
  • a more realistic sense of variation
  • Plot Mlog2 (R/G) vs. Alog2(RG)/2
  • More revealing in terms of identifying spot
    artifacts and for normalization purpose

8
Normalization
  • Identify and remove sources of systematic
    variation other than differential expression
  • Different labeling efficiencies and scanning
    properties for Cy3 and Cy5
  • Different scanning parameters
  • Print-tip, spatial or plate effects
  • Red intensity is often lower than green intensity
  • The imbalance between R and G varies
  • across spots and between arrays
  • Overall spot intensity A
  • Location on the array, plate origin, etc.

9
An Example Self-Self Experiment
10
Normalization (Cont.)
  • Global normalization
  • subtract mean or median from all intensity
    log-ratios
  • More complex normalization
  • Robust locally weighted regression
  • Mspot intensity Alocationplate origin
  • Use print-tip group to represent the spot
    locations
  • log2 (R/G)? log2 (R/G) l(A,j)
  • l(A,j) lowess in R (0.2ltflt0.4)
  • Control sequences

11
Apo AI Normalization
12
Graphical Display for Test Statistics (I)
  • Test statistics
  • Hj no association between treatment and the
    expression level of gene j, j1,,m.
  • Two-sided alternative
  • Two-sample Welch t-statistics
  • Replication is essential to assess the
    variability in treatment and control group
  • The joint distribution is estimated by a
    permutation procedure because the actual
    distribution is not a t-distribution

13
Graphical Display for Test Statistics (II)
  • Quantile-Quantile plots

14
Graphical Display for Test Statistics (III)
  • Plots vs. absolute expression levels

15
Multiple Hypothesis Testing Adjusted p-values
(I)
  • P-value PjPr(TjgttjHj), j1,,m.
  • Family-wise Type I Error Rate (FWER)
  • The probability of at least one Type I error in
    the family
  • Strong Control of the FWER
  • Control the FWER for any combination of true and
    false hypotheses
  • Weak Control of the FWER
  • Control the FWER only under the complete null
    hypothesis that all hypotheses in the family are
    true

16
Multiple Hypothesis Testing Adjusted p-values
(II)
  • Adjusted p-value for Hj
  • Pjinfa Hj is rejected at FWERa
  • Hj is rejected at FWER a if Pjlta
  • P-value adjustment approaches
  • Bonferroni
  • Sidak single-step
  • Holm step-down
  • Westfall and Young step-down minP

17
Multiple Hypothesis Testing Estimation of
adjusted p-values (I)
18
Multiple Hypothesis Testing Estimation of
adjusted p-values (II)
19
Apo AI Adjusted p-values (I)
20
Apo AI Adjusted p-values (II)
21
Apo AI Comparison with Single-slide Methods
22
Discussion
  • M-A plots
  • Normalization
  • Robust local regression, e.g. lowess
  • Q-Q plots Plots vs. absolute expression level
  • False discovery rate (FDR)
  • Replication is necessary
  • Design issues
  • Factorial experiments
  • Joint behavior of genes
  • R package SMA
Write a Comment
User Comments (0)
About PowerShow.com