Statistical Issues in the Design of Microarray Experiments - PowerPoint PPT Presentation

About This Presentation
Title:

Statistical Issues in the Design of Microarray Experiments

Description:

Active collaboration with researchers from biomedical and bioinformatics fields ... IFOM, Milano (biologists, bioinformatics) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 27
Provided by: lara5
Learn more at: http://www.nettab.org
Category:

less

Transcript and Presenter's Notes

Title: Statistical Issues in the Design of Microarray Experiments


1
Statistical Issues in the Design of Microarray
Experiments
  • Lara Lusa
  • U.O. Statistica Medica e Biometria
  • Istituto Nazionale per lo Studio e la Cura dei
    Tumori, Milano
  • NETTAB 2003
  • Bologna, 28th November 2003

2
Outline
  • Biostatistics and microarrays
  • Study objectives
  • Design of microarray experiments
  • A case study a designed experiment

3
Biostatistics and microarrays
  • Microarray research unique challenge for
    interdisciplinary collaboration
  • Can biostatisticians be useful in microarray
    research?
  • Are available software tools a valid substitution
    for collaboration with biostatisticians?

4
What can biostatisticians do?
  • Active collaboration with researchers from
    biomedical and bioinformatics fields
  • to develop and critically evaluate methods for
  • design of microarray experiments
  • analysis of data
  • to perform data-analysis
  • to develop software tools and train biomedical
    researchers to use them

5
Italian inter-university research group
  • Statistical issues in design and analysis of
    microarray data
  • MIUR grant 2003-2005
  • Firenze (Annibale Biggeri)
  • Milano (Giuseppe Gallus)
  • Padova (Monica Chiogna)
  • Torino (Mauro Gasparini)
  • Udine (Corrado Lagazio)

6
Collaborations
  • Milano
  • Istituto Nazionale per lo Studio e la Cura dei
    Tumori, Milano (statisticians, biologists,
    molecular oncologists)
  • IFOM, Milano (biologists, bioinformatics)
  • Biometric Research Branch, NCI, Bethesda
    (statisticians)
  • Edo Tempia Foundation, Biella
  • Bioconductor poject (software development)

7
Study objectives
  • Class comparison (supervised)
  • establish differences in gene expression between
    predetermined classes
  • Class prediction (supervised)
  • prediction of phenotype using gene expression
    data
  • Class discovery (unsupervised)
  • discover groups of samples or genes with similar
    expression

8
Design of microarray experiments
  • Design of arrays
  • Allocation of samples
  • Replication
  • Labeling of samples (cDNA)
  • reference design
  • balanced block design
  • loop design

9
Levels of replication
  • Biological replicates
  • multiple samples from different populations
  • Technical replicates
  • multiple samples from the same subject
  • multiple samples from the same mRNA
  • multiple clones or probes of the same gene on the
    array

10
How many replicates?
  • Biological replicates essential to make inference
    about population
  • Technical replicates useful for quality control
    and for increasing precision
  • How to determine sample size?
  • Problem-dependent
  • simple methods available for class comparison
    problems
  • not yet clear what to use for class discovery

11
Common pitfalls in microarray experiments
  • Too little or no replication
  • Use of replication at the wrong level
  • Experiments with cell lines assuming no
    variability among cell lines of the same type
  • Inappropriate use of pooling
  • ok use of multiple independent pools
  • but is it useful?
  • individual information lost

12
Case study a designed experiment
  • Biological aim assess the effect of Toremifen on
    MCF-7 breast cancer cell line, in terms of gene
    expression

13
Week 1
POOL
POOL
CDNA
Affymetrix
CDNA
Affymetrix
14
Week 2 and 3
POOL
POOL
CDNA
Affymetrix
CDNA
Affymetrix
15
Statistical aims
  • comparison of microarray platforms (cDNA vs
    Affymetrix)
  • hybridization of individual samples vs pools
  • variability of cell lines
  • robustness of commonly used statistical methods

16
Data available (so far)
  • Hybridizations from Affymetrix HGU133 Chips
  • Summary measure of intensities MAS5 (Affymetrix,
    2002)
  • most commonly used, but other possibilities
    available
  • Robust Multichip Analysis (Irizarry et al., 2002)
  • Model-Based Expression Index (Li and Wong, 2001)
    (at least 16 chips!)

17
Brief summary of data
  • HGU133A
  • chipA 22.283 probe sets
  • chipB 22.645 probe sets
  • Present
  • chipA 48.5
  • chipB 38.2
  • pmltmm
  • chipA 27
  • chipB 31

18
Methods for exploring reproducibility among
arrays
  • Pearsons coefficient of correlation (common, but
    wrong!)
  • Coefficient of variation
  • Distribution of differences of intensities
  • Altman and Blands plot (MA plot)

19
(No Transcript)
20
Class comparison
  • Identification of differentially expressed genes
    between treated and not treated cell lines
  • t-tests (adjusting for multiple comparisons)
  • all arrays
  • only pooled arrays
  • only individual arrays
  • ANOVA (linear) model
  • estimation of treatment effect, adjusting for
    pool effect and week effect

21
Some results...
  • Pooled variance t-test on whole data
  • treated versus controls
  • chipA
  • 1948 plt0.001 (356 plt0.001 and abs(FC)gt2)
  • 240 with Bonferroni correction
  • chipB
  • 743 plt0.001 (143 plt0.001 and abs(FC)gt2)
  • 76 with Bonferroni correction

22
Some results...
  • Pooled variance t-test on pooled data
  • treated versus controls
  • chipA
  • 204 plt0.001
  • 189/(204, 1948) common to overall analysis
  • 82 plt0.001 and abs(FC)gt2
  • 82/(82, 356) common to overall analysis
  • chipB
  • 80 plt0.001
  • 69/(80, 743) common to overall analysis
  • 37 plt0.001 and abs(FC)gt2
  • 37/(37, 143) common to overall analysis

23
Some results...
  • Pooled variance t-test on individual data
  • treated versus controls
  • chipA
  • 669 plt0.001
  • 594/(669, 1948) common to overall analysis
  • 226 plt0.001 and abs(FC)gt2
  • 221/(226, 356) common to overall analysis
  • chipB
  • 245 plt0.001
  • 196(245, 743) common to overall analysis
  • 80 plt0.001 and abs(FC)gt2
  • 77/(80, 143) common to overall analysis

24
Some results...
  • Pooled variance ANOVA results
  • treated versus controls
  • chipA
  • 1913 plt0.001
  • 1624/(1913, 1948) common to overall analysis
  • 343 plt0.001 and abs(FC)gt2
  • 339/(343, 356) common to overall analysis
  • chipB
  • 245 plt0.001
  • 196(245, 743) common to overall analysis
  • 80 plt0.001 and abs(FC)gt2
  • 77/(80, 143) common to overall analysis

25
(No Transcript)
26
Conclusions
  • ?????????
  • So far no evidence for the usefulness of pooling
    data from cell lines no evidence of decreased
    variability
  • but need to further investigate the differences
    in the individual versus pooled results
  • Need for a plan of biological (quantitative)
    validations of expression measures
Write a Comment
User Comments (0)
About PowerShow.com