P1253814652hryiZ - PowerPoint PPT Presentation

About This Presentation
Title:

P1253814652hryiZ

Description:

(1) Sequencing of the genome (human, mouse, and others) ... Human Genome Project (also other sequenced genomes: mouse, dog etc) ... – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 57
Provided by: kar143
Category:

less

Transcript and Presenter's Notes

Title: P1253814652hryiZ


1
Statistical Methods for the Screeningand
Classification of Microarray Gene Expression Data
Geoff McLachlan Department of Mathematics
Institute for Molecular Bioscience University of
Queensland
http//www.maths.uq.edu.au/gjm
2
Institute for Molecular Bioscience, University
of Queensland
3
Liat Jones
Richard Bean
Justin Zhu
4
Outline of Workshop
Part 1 Introduction to Microarray Technology
Part 2 Detecting Differentially Expressed Genes
in Known Classes of Tissue Samples
Part 3 Supervised Classification of Tissue
Samples
Part 4 Unsupervised Classification Cluster
Analyis of Tissue Samples and Gene
Profiles Part 5 Linking Microarray Data with
Survival Analysis
5
A microarray is a new technology which allows the
measurement of the expression levels of thousands
of genes simultaneously.
  • (1) Sequencing of the genome (human, mouse, and
    others)
  • (2) Improvement in technology to generate
    high-density
  • arrays on chips (glass slides or nylon
    membrane)

The entire genome of an organism can be probed at
a single point in time.
6
Draft of the Human Genome
Public Sequence Nature, Feb. 2001
Celera Sequence Science, Feb. 2001
7
The Challenge for Statistical Analysis of
Microarray Data
Microarrays present new problems for statistics
because the data are very high dimensional with
very little replication.
The challenge is to extract useful information
and discover knowledge from the data, such as
gene functions, gene interactions, regulatory
pathways, metabolic pathways etc.
8
Vital Statistics by C. Tilstone Nature 424,
610-612, 2003.
DNA microarrays have given geneticists and
molecular biologists access to more data than
ever before. But do these researchers have the
statistical know-how to cope?
Branching out cluster analysis can group samples
that show similar patterns of gene expression.
9
Representation of Data from M Microarray
Experiments
Sample 1 Sample 2 Sample
M
Gene 1 Gene 2 Gene N
Assume we have extracted gene expressions values
from intensities.
Expression Signature
Expression Profile
10
  • It is assumed that the (logged) expression
    levels have been preprocessed with adjustment for
    array effects.

11
  • Majority of time on a data analysis project will
    be spent
  • cleaning the data before doing any analysis
  • Paradoxically, most statistical training assumes
    that the data
  • arrive prelceaned. Students, whether in
    PhD programs
  • or an undergraduate introductory course, are
    not taught
  • routinely to check data for accuracy or even
    to worry about it.
  • Exacerbating the problem further are claims
    by software vendors
  • that their techniques can produce valid
    results no matter what the quality of the
    incoming data.
  • De Veaux and Hand (How to Lie with Bad
    Data, Statist. Sci., 2005)

12
Large-scale gene expression studies are not a
passing fashion, but are instead one aspect of
new work of biological experimentation, one
involving large-scale, high throughput assays.
Speed et al., 2002, Statistical Analysis of Gene
Expression Microarray Data, Chapman and Hall/ CRC
13
Growth of microarray and microarray methodology
literature listed in PubMed from 1995 to 2003.
The category all microarray papers includes
those found by searching PubMed for microarray
OR gene expression profiling. The category
statistical microarray papers includes those
found by searching PubMed for statistical
method OR statistical techniq OR
statistical approach AND microarray OR gene
expression profiling.
14
Mehta et al (Nature Genetics, Sept. 2004)
The field of expression data analysis is
particularly active with novel analysis
strategies and tools being published weekly, and
the value of many of these methods is
questionable. Some results produced by using
these methods are so anomalous that a breed of
forensic statisticians (Ambroise and McLachlan,
2002 Baggerly et al., 2003) who doggedly detect
and correct other HDB (high-dimensional biology)
investigators prominent mistakes, has been
created.
15
Analyzing Microarray Gene Expression Data
16
Analyzing Microarray Gene Expression Data
Analysis of Microarray Gene Expression Data
17
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression Data
The Analysis of Gene Expression Data
18
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression
Data The Analysis of Gene Expression Data
The Statistical Analysis of Gene Expression Data
19
Analyzing Microarray Gene Expression Data (UQ,
Wiley) Analysis of Microarray Gene Expression
Data (Harvard, Kluwer) The Analysis of Gene
Expression Data (Johns Hopkins, Springer) The
Statistical Analysis of Gene Expression Data
(Berkeley, CH)
20
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression
Data The Analysis of Gene Expression Data The
Statistical Analysis of Gene Expression Data
Statistics for Microarrays
21
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression
Data The Analysis of Gene Expression Data The
Statistical Analysis of Gene Expression
Data Statistics for Microarrays
Design and Analysis of DNA Microarrays
22
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression
Data The Analysis of Gene Expression Data The
Statistical Analysis of Gene Expression
Data Statistics for Microarrays Design and
Analysis of DNA Microarrays
Exploration and Analysis of Microarrays
23
Analyzing Microarray Gene Expression
Data Analysis of Microarray Gene Expression
Data The Analysis of Gene Expression Data The
Statistical Analysis of Gene Expression
Data Statistics for Microarrays Design and
Analysis of DNA Microarrays Exploration and
Analysis of Microarrays
Data Analysis Tools for DNA Microarrays
24
In the sequel, references to most of the material
presented can be found in my joint
book, McLachlan, Do, and Ambroise (2004),
Analyzing Microarray Gene Expression Data,
Hoboken, NJ Wiley.
25
(No Transcript)
26
Contents
  1. Microarrays in Gene Expression Studies
  2. Cleaning and Normalization
  3. Some Cluster Analysis Methods
  4. Clustering of Tissue Samples
  5. Screening and Clustering of Genes
  6. Discriminant Analysis
  7. Supervised Classification of Tissue Samples
  8. Linking Microarray Data with Survival Analysis

27
Distribution of References by Year
Year
2004 34
2003 73
2002 80
2001 93
2000 47 (67.8)
Total 481
28
mRNA Levels Indirectly Measure Gene Activity
  • Essentially every cell contains the same genes.
  • Type and amount of mRNA produced by a cell tells
    which genes are
  • being expressed
  • Cells differ in the genes which are active at
    any one time.
  • Gene Expression is transcription of
  • DNA to mRNA
  • mRNA is translated to proteins

29
Technical Background
Two recent advances
  • Human Genome Project (also other sequenced
    genomes mouse, dog etc)
  • DNA microarray technology -- works by exploiting
    the ability of a given mRNA molecule to bind
    specifically to (hybridize) the DNA template from
    which it originated

30
What is a DNA microarray?
  • Small, solid supports onto which the sequences
    from thousands (tens of thousands) of genes are
    attached at fixed locations.
  • They may be glass slides, or silicon chips or
    nylon membranes.
  • The DNA is printed, spotted or synthesized
    directly onto the support
  • The spots can be DNA, cDNA or oligonucleotides.

31
The microarray experiment
Spot DNA (known)
Sample (unknown)
32
Microarrays Indirectly Measure Levels of mRNA
  • mRNA is extracted from the cell
  • mRNA is reverse transcribed to cDNA (mRNA itself
    is unstable)
  • cDNA is labeled with fluorescent dye TARGET
  • The sample is hybridized to known DNA sequences
    on the array
  • (tens of thousands of genes) PROBE
  • If present, complementary target binds to probe
    DNA
  • (complementary base pairing)
  • Target bound to probe DNA fluoresces

33
The microarray experiment
  • mRNA from the cell (sample) is washed over the
    surface HYBRIDIZATION
  • measure the amount of bound mRNA at each spot

Allows the measurement of expression for
thousands of genes from the amount of bound mRNA.
34
A Spotted cDNA Microarray Experiment
  • Compare the gene expression levels for
  • two cell populations on a single microarray.

  • e.g. tumour and normal cells

35
(No Transcript)
36
Microarray Image Red High expression in
target labelled with cyanine 5 dye Green High
expression in target labelled with cyanine 3
dye Yellow Similar expression in both target
samples
37
Assumptions
Gene Expression
(1)
cellular mRNA levels directly reflect gene
expression
mRNA
intensity of bound target is a measure of the
abundance of the mRNA in the sample.
(2)
Fluorescence Intensity
38
Experimental Error
Sample contamination
Poor quality/insufficient mRNA
Reverse transcription bias
Fluorescent labeling bias
Hybridization bias
Cross-linking of DNA (double strands)
Poor probe design (cross-hybridization)
Defective chips (scratches, degradation)
Background from non-specific hybridization
39
Why are microarrays important?
  • They contain a very large number of genes and
    are very small.
  • Compare gene expression within a single sample
    or in two different cell types or tissue samples
  • Examine expressions in a single sample on a
    genome-wide scale (GENOMICS)
  • Infer new gene functions, diagnostic tools
    e.g. in cancer provides a molecular view.

40
The Microarray Technologies
Spotted Microarray
Affymetrix GeneChip
cDNAs, clones, or short and long
oligonucleotides deposited onto glass
slides Each gene (or EST) represented by its
purified PCR product Simultaneous analysis of
two samples (treated vs untreated
cells) provides internal control.
short oligonucleotides synthesized in situ onto
glass wafers Each gene represented multiply -
using 16-20 (preferably non-overlapping) 25-mers.
Each oligonucleotide has single-base mismatch
partner for internal control of hybridization
specifity.
relative gene expressions

absolute gene expressions
Each with its own advantages and disadvantages
41
Pros and Cons of the Technologies
Spotted Microarray
Affymetrix GeneChip
More expensive yet less flexible Good for whole
genome expression analysis where genome of that
organism has been sequenced High quality with
little variability between slides Gives a
measure of absolute expression of genes
Flexible and cheaper Allows study of genes not
yet sequenced (spotted ESTs can be used to
discover new genes and their functions) Variabil
ity in spot quality from slide to slide Provide
information only on relative gene expressions
between cells or tissue samples

42
Aims of a Microarray Experiment
  • observe changes in a gene in response to
    external stimuli
  • (cell samples exposed to hormones, drugs,
    toxins)
  • compare gene expressions between different
    tissue types
  • (tumour vs normal cell samples)
  • To gain understanding of
  • function of unknown genes
  • disease process at the molecular level
  • Ultimately to use as tools in Clinical Medicine
    for diagnosis,
  • prognosis and therapeutic management.

43
Importance of Experimental Design
  • Good DNA microarray experiments should have
    clear objectives.
  • Not performed as aimless data mining in search
    of unanticipated patterns that will provide
    answers to unasked questions
  • (Richard Simon, BioTechniques 34S16-S21, 2003)

44
Replicates
Technical replicates arrays that have been
hybridized to the same biological source (using
the same treatment, protocols, etc.) Biological
replicates arrays that have been hybridized to
different biological sources, but with the same
preparation, treatments, etc.
45
Extracting Data from the Microarray
  • Cleaning
  • Image processing
  • Filtering
  • Missing value estimation
  • Normalization
  • Remove sources of systematic variation.

Sample 1
Sample 2
Sample 3
Sample 4 etc
46
(No Transcript)
47
(No Transcript)
48
Examples of spot imperfections. A. donut shape
B. oval or pear shape C. holey heterogeneous
interior D. high-intensity artifact E. sickle
shape F. scratches.
49
Gene Expressions from Measured Intensities
Spotted Microarray
log 2(Intensity Cy5 / Intensity Cy3)
Affymetrix
(Perfect Match Intensity Mismatch Intensity)
50
Data Transformation
Rocke and Durbin (2001), Munson (2001), Durbin et
al. (2002), and Huber et al. (2002)
51
Representation of Data from M Microarray
Experiments
Sample 1 Sample 2 Sample
M
Gene 1 Gene 2 Gene N
Assume we have extracted gene expressions values
from intensities.
Expression Signature
Gene expressions can be shown as Heat Maps
Expression Profile
52
  • It is assumed that the (logged) expression
    levels have been preprocessed with adjustment for
    array effects.

53
(No Transcript)
54
(No Transcript)
55
(No Transcript)
56
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com