Microarray analysis - PowerPoint PPT Presentation

1 / 38
About This Presentation
Title:

Microarray analysis

Description:

Variation across hybridization conditions. Variation in scanning conditions ... False negatives are lost opportunities, false positives cost $'s and waste time. ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 39
Provided by: chucks96
Category:

less

Transcript and Presenter's Notes

Title: Microarray analysis


1
Microarray analysis
  • Quantitation of Gene Expression
  • Expression Data to Networks

BIO520 Bioinformatics Jim Lund
2
Microarray data
  • Image quantitation.
  • Normalization
  • Find genes with significant expression
    differences
  • Annotation
  • Clustering, pattern analysis, network analysis

3
Sources of Non-Biological Variation
  • Dye bias differences in heat and light
    sensitivity, efficiency of dye incorporation
  • Differences in the amount of labeled cDNA
    hybridized to each channel in a microarray
    experiment (Channel is used to refer to a
    combination of a dye and a slide.)
  • Variation across replicate slides
  • Variation across hybridization conditions
  • Variation in scanning conditions
  • Variation among technicians doing the lab work.

4
Factors which impact on the signal level
  • Amount of mRNA
  • Labeling efficiencies
  • Quality of the RNA
  • Laser/dye combination
  • Detection efficiency of photomultiplier or CCD

5
Hela HepG2
6
Hela HepG2
7
M vs. A Plot
M Log Red - Log Green
A (Log Green Log Red) / 2
8
M v A plots of chip pairs before normalization
9
M v A plots of chip pairs after quantile
normalization
10
Types of normalization
  • To total signal (linear normalization)
  • LOESS (LOcally WEighted polynomial regreSSion).
  • To house keeping genes
  • To genomic DNA spots (Research Genetics) or mixed
    cDNAs
  • To internal spikes

11
Fold change the crudest method of finding
differentially expressed genes
Hela HepG2
gt2-fold expression change
gt2-fold expression change
12
What do we mean by differentially expressed?
  • Statistically, our gene is different from the
    other genes.

Distribution of average ratios for all genes
Number of genes
Log ratio
13
Finding differentially expressed genes What
affects our certainty that a gene is up or
down-regulated?
  • Number of sample points
  • Difference in means
  • Standard deviations of sample

14
Practical views on statistics
  • With appropriate biological replicates, it is
    possible to select statistically meaningful
    genes/patterns.
  • Sensitivity and selectivity are inversely
    related - e.g. increased selection of true
    positives WILL result in more false positive and
    less false negatives.
  • False negatives are lost opportunities, false
    positives cost s and waste time.
  • A typical set of experiments treated with
    conservative statistics typically results in more
    genes/pathways/patterns than one can sensibly
    follow - so use conservative statistics to
    protect against false positives when designing
    follow-on experiments.

15
Statistical Tests
  • Students t-test
  • Correct for multiple testing! (Holm-Bonferroni)
  • False discovery rate.
  • Significance Analysis of Microarrays (SAM)
  • http//www-stat.stanford.edu/tibs/SAM/
  • ANOVA
  • Principal components analysis
  • Special methods for periodic patterns in data.

16
Volcano plot log(expr) vs p-value
p-value
Log(fold change)
17
Scatter plot showing genes with significant
p-values
18
Pattern finding
  • In many cases, the patterns of differential
    expression are the target (as opposed to specific
    genes)
  • Clustering or other approaches for pattern
    identification - find genes which behave
    similarly across all experiments or experiments
    which behave similarly across all genes
  • Classification - identify genes which best
    distinguish 2 or more classes.
  • The statistical reliability of the pattern or
    classifier is still an issue and similar
    considerations apply - e.g. cluster analysis of
    random noise will produce clusters which will be
    meaningless.

19
What is clustering?
  • Group similar objects together.
  • Genes with similar expression patterns.
  • Objects in the same cluster (group) are more
    similar to each other than objects in different
    clusters.

20
Clustering
  • What is clustering?
  • Similarity/distance metrics
  • Hierarchical clustering algorithms
  • Made popular by Stanford, ie. Eisen et al. 1998
  • K-means
  • Made popular by many groups, eg. Tavazoie et al.
    1999
  • Self-organizing map (SOM)
  • Made popular by Whitehead, ie. Tamayo et al.
    1999

21
Typical Tools
  • Expression NTI
  • GeneSpring
  • Affymetrix GeneChip Operating System (GCOS)
  • Cluster/Treeview
  • R statistics package microarray analysis
    libraries.

22
How to define similarity?
Experiments
genes
X
n
1
p
1
X
genes
genes
Y
Y
n
n
Raw matrix
Similarity matrix
  • Similarity metric
  • A measure of pairwise similarity or
    dissimilarity
  • Examples
  • Correlation coefficient
  • Euclidean distance

23
Similarity metrics
  • Euclidean distance
  • Correlation coefficient

Euclidean clustering magnitude
Direction Correlation clustering direction
24
Sporulation-example
25
Sporulation-example
26
Self-organizing maps (SOM) Kohonen 1995
  • Basic idea
  • map high dimensional data onto a 2D grid of nodes
  • Neighboring nodes are more similar than points
    far away

27
Self-organizing maps (SOM)
28
SOM Clusters
29
Inference
  • NDT80 transcription factor
  • Can account for control of many, not ALL, genes
    with pattern
  • How do we find the other factor(s)
  • Infer binding site
  • DNA binding protein selection?

30
Inferences from Expression
  • Pathways not known to be involved
  • Ontology?
  • Novel genes involved in a known pathway
  • like and unlike tissues

31
Transcription FactorsRegulatory Networks
  • ID co-regulated genes
  • Search for common motifs
  • Evaluate known motifs/factors
  • Search for new ones.
  • Programs MEME, etc.

32
mRNA-protein Correlation
  • YPD should have relevant data
  • will yeast be typical?
  • Electrophoresis 18533
  • 23 proteins on 2D gels
  • r0.48 for mRNAprotein
  • Posttranscriptional and post translational
    regulation important!

33
Drosophila Fusion Project
Lots of introns
  • Exon GFP vector
  • Good site?
  • Fluorescent sort
  • Image

Lynne Cooley
34
Developmental Localization
35
Other microarray formats
  • Single nucleotide polymorphism (SNP) chips
  • Oligos with each of 4 nt at each SNP.
  • Chromosomal IP chips (ChIPchip)
  • Determine transcription factor binding sites
  • Promoter DNA on the chip.
  • Alternative splicing chips
  • Long oligos, covering alternatively spliced
    exons, or all exons.
  • Genome tiling chips

36
ChIPchip--Identification of Transcription Factor
Binding Sites
  • Cross link transcription factors to DNA with
    formaldehyde
  • Pull out transcription factor of interest via
    immunoprecipitation with an antibody or by
    tagging the factor of interest with an isolatable
    epitope (e.g GST fusion).
  • Fractionate the DNA associated with the
    transcription factor, reverse the cross links,
    label and hybridize to an array of protomer DNA.
  • Brown et.al. (2001) Nature, 409(533-8)

37
Analysis of TF Binding Sites
38
On to Proteomics
DNA?RNA ?Protein
Write a Comment
User Comments (0)
About PowerShow.com