Microarray analysis presentation

About This Presentation

Transcript and Presenter's Notes

Title: Microarray analysis

1
Microarray analysis

Quantitation of Gene Expression
Expression Data to Networks

BIO520 Bioinformatics Jim Lund
2
Microarray data

Image quantitation.
Normalization
Find genes with significant expression
differences
Annotation
Clustering, pattern analysis, network analysis

3
Sources of Non-Biological Variation

Dye bias differences in heat and light
sensitivity, efficiency of dye incorporation
Differences in the amount of labeled cDNA
hybridized to each channel in a microarray
experiment (Channel is used to refer to a
combination of a dye and a slide.)
Variation across replicate slides
Variation across hybridization conditions
Variation in scanning conditions
Variation among technicians doing the lab work.

4
Factors which impact on the signal level

Amount of mRNA
Labeling efficiencies
Quality of the RNA
Laser/dye combination
Detection efficiency of photomultiplier or CCD

5
Hela HepG2
6
Hela HepG2
7
M vs. A Plot
M Log Red - Log Green
A (Log Green Log Red) / 2
8
M v A plots of chip pairs before normalization
9
M v A plots of chip pairs after quantile
normalization
10
Types of normalization

To total signal (linear normalization)
LOESS (LOcally WEighted polynomial regreSSion).
To house keeping genes
To genomic DNA spots (Research Genetics) or mixed
cDNAs
To internal spikes

11
Fold change the crudest method of finding
differentially expressed genes
Hela HepG2
gt2-fold expression change
gt2-fold expression change
12
What do we mean by differentially expressed?

Statistically, our gene is different from the
other genes.

Distribution of average ratios for all genes
Number of genes
Log ratio
13
Finding differentially expressed genes What
affects our certainty that a gene is up or
down-regulated?

Number of sample points
Difference in means
Standard deviations of sample

14
Practical views on statistics

With appropriate biological replicates, it is
possible to select statistically meaningful
genes/patterns.
Sensitivity and selectivity are inversely
related - e.g. increased selection of true
positives WILL result in more false positive and
less false negatives.
False negatives are lost opportunities, false
positives cost s and waste time.
A typical set of experiments treated with
conservative statistics typically results in more
genes/pathways/patterns than one can sensibly
follow - so use conservative statistics to
protect against false positives when designing
follow-on experiments.

15
Statistical Tests

Students t-test
Correct for multiple testing! (Holm-Bonferroni)
False discovery rate.
Significance Analysis of Microarrays (SAM)
http//www-stat.stanford.edu/tibs/SAM/
ANOVA
Principal components analysis
Special methods for periodic patterns in data.

16
Volcano plot log(expr) vs p-value
p-value
Log(fold change)
17
Scatter plot showing genes with significant
p-values
18
Pattern finding

In many cases, the patterns of differential
expression are the target (as opposed to specific
genes)
Clustering or other approaches for pattern
identification - find genes which behave
similarly across all experiments or experiments
which behave similarly across all genes
Classification - identify genes which best
distinguish 2 or more classes.
The statistical reliability of the pattern or
classifier is still an issue and similar
considerations apply - e.g. cluster analysis of
random noise will produce clusters which will be
meaningless.

19
What is clustering?

Group similar objects together.
Genes with similar expression patterns.
Objects in the same cluster (group) are more
similar to each other than objects in different
clusters.

20
Clustering

What is clustering?
Similarity/distance metrics
Hierarchical clustering algorithms
Made popular by Stanford, ie. Eisen et al. 1998
K-means
Made popular by many groups, eg. Tavazoie et al.
1999
Self-organizing map (SOM)
Made popular by Whitehead, ie. Tamayo et al.
1999

21
Typical Tools

Expression NTI
GeneSpring
Affymetrix GeneChip Operating System (GCOS)
Cluster/Treeview
R statistics package microarray analysis
libraries.

22
How to define similarity?
Experiments
genes
X
n
1
p
1
X
genes
genes
Y
Y
n
n
Raw matrix
Similarity matrix

Similarity metric
A measure of pairwise similarity or
dissimilarity
Examples
Correlation coefficient
Euclidean distance

23
Similarity metrics

Euclidean distance
Correlation coefficient

Euclidean clustering magnitude
Direction Correlation clustering direction
24
Sporulation-example
25
Sporulation-example
26
Self-organizing maps (SOM) Kohonen 1995

Basic idea
map high dimensional data onto a 2D grid of nodes
Neighboring nodes are more similar than points
far away

27
Self-organizing maps (SOM)
28
SOM Clusters
29
Inference

NDT80 transcription factor
Can account for control of many, not ALL, genes
with pattern
How do we find the other factor(s)
Infer binding site
DNA binding protein selection?

30
Inferences from Expression

Pathways not known to be involved
Ontology?
Novel genes involved in a known pathway
like and unlike tissues

31
Transcription FactorsRegulatory Networks

ID co-regulated genes
Search for common motifs
Evaluate known motifs/factors
Search for new ones.
Programs MEME, etc.

32
mRNA-protein Correlation

YPD should have relevant data
will yeast be typical?
Electrophoresis 18533
23 proteins on 2D gels
r0.48 for mRNAprotein
Posttranscriptional and post translational
regulation important!

33
Drosophila Fusion Project
Lots of introns

Exon GFP vector
Good site?
Fluorescent sort
Image

Lynne Cooley
34
Developmental Localization
35
Other microarray formats

Single nucleotide polymorphism (SNP) chips
Oligos with each of 4 nt at each SNP.
Chromosomal IP chips (ChIPchip)
Determine transcription factor binding sites
Promoter DNA on the chip.
Alternative splicing chips
Long oligos, covering alternatively spliced
exons, or all exons.
Genome tiling chips

36
ChIPchip--Identification of Transcription Factor
Binding Sites

Cross link transcription factors to DNA with
formaldehyde
Pull out transcription factor of interest via
immunoprecipitation with an antibody or by
tagging the factor of interest with an isolatable
epitope (e.g GST fusion).
Fractionate the DNA associated with the
transcription factor, reverse the cross links,
label and hybridize to an array of protomer DNA.
Brown et.al. (2001) Nature, 409(533-8)

37
Analysis of TF Binding Sites
38
On to Proteomics
DNA?RNA ?Protein

Write a Comment

User Comments (0)

About PowerShow.com

Microarray analysis PowerPoint PPT Presentation