Microarrays - PowerPoint PPT Presentation

1 / 41
About This Presentation
Title:

Microarrays

Description:

The conentration of labels in the cDNA is measured! ... gar. Gene 6. 0.86. 0.84. 0.42. 0.84. 0.44. groo. Gene 5. 0.41. 0.21. 0.11. 0.26. 0.60. bas. Gene 4 ... – PowerPoint PPT presentation

Number of Views:18
Avg rating:3.0/5.0
Slides: 42
Provided by: bioin6
Category:
Tags: gar | microarrays

less

Transcript and Presenter's Notes

Title: Microarrays


1
Microarrays
Carsten Daub www.mpimp-golm.mpg.de daub_at_mpimp-golm
.mpg.de
2
Microarrays
3
Microarrays
  • Short Introduction to Microarray Technology
  • The data matrix
  • Distance metrics
  • Clustering techniques
  • Relevance of networks

4
Microarrays
DNA
Cell
RNA Polymerase
mRNA
labeled Nucleotides
Reverse Transcriptase




















labeled cDNA
5
Microarrays
  • The conentration of labels in the cDNA is
    measured!
  • Assumptions for interpretation of Microarray
    data
  • mRNA is stable over time until cDNA is
    transcribed
  • The concentration of cDNA equals the concentraton
    of mRNA
  • The labeled nucleotides are inserted
    statistically
  • The labels are often flourophors with a cyanine
    structure

cy3
6
Microarrays
labeled cDNA as probe


















target
  • cDNA Microarrays
  • Immobilisation of whole cDNA
  • cDNA is obtained via becteria
  • Spotted by inkjet technology
  • cDNA libraries are needed
  • oligonucleotide Microarrays
  • Immobilisation of characteristc sequences
  • Sequences are designed
  • Sequences are synthesized on the chip
  • Several commercial products are available

7
Comparing two situations
cDNA Microarrays


















Situation A treated probe
Situation B wild type
A reference is needed to compare data from
different arrays.) Mixture of A and B is spotted
Extraction of intensities with picture analysis
software
8
Comparing two situations
  • 0 lt ratio lt Inf.
  • Inf. lt log2(ratio) lt Inf.
  • where
  • log2(ratio) gt 0 increase
  • log2(ratio) lt 0 decrease

9
Absolute Intensities
10
Data Matrix
11
Data Normalisation
  • Background noise
  • The background is determined around each spot
  • Some spots are left empty to determin background
  • There are several different procedures to
    substract background
  • The background around a point is substracted for
    that point
  • The average background is substracted from each
    point
  • A background profile is determined for the whole
    array can also be used to determin systematical
    errors of array

12
Data Normalisation
  • Correct for
  • Differences in labelling and detection
    efficiencies
  • Differences in quantity of initial mRNA
  • Some widely used techniques
  • Total intensity normalisation
  • Assumption Quantity of initial mRNA is similar
    for both labelled samples
  • Total integrated intensities are equal for both
    samples
  • Re-scale intensities for each gene
  • Normalisation using regression techniques
  • Normalisation using ratio statistics

13
Data Normalisation
  • Some widely used techniques (continued)
  • Normalisation using regression techniques
  • Assumption mRNA derived from closely related
    samples is used
  • Scatterplot of int(Cy5) vs. int(Cy3) has a
    diagonal part with slope 1
  • Re-scale intensities for each gene
  • Local regression techniques also available for
    non-linear scatterplots
  • Normalisation using ratio statistics
  • Confidence limits are calculated via statistics

14
Data Normalisation
15
Data Normalisation
16
Undefined values
  • Gene expression datasets often contain undefined
    values.
  • The background and the signal give similar
    intensity
  • Surface of the chip is not planar (cDNA chips)
  • The probe is not properly fixed on the chip
  • hybridistion step didnt work properly
  • Probe was not washed away properly
  • Undefined values can be
  • Discarted -gt leads to problems in data analysis
  • Replaced with averages of rows/columns
  • Replaced by zero

17
Similarity of Genes
  • Co-regulated genes show a similar behaviour
  • There exists no a priori definition of similarity
  • Different similarity measures are used
  • Different patterns are seen with different
    similarity measures

18
Similarity of Genes
  • Example
  • An expression vector is defined for each gene
  • The expression values are the dimensions (x1,x2,
    ..., xn) in expression space
  • Each gene is represented as one
  • n-dimensional point in expression space
  • Two genes with similar expression behaviour are
    spatially nearby
  • Similarity is inverse proportional to spatial
    distance

19
Similarity of Genes
  • Example (continued)
  • Different clustering algorithms can be applied to
    find clusters of genes

20
Similarity of Experiments
  • Instead of clustering genes it is also possible
    to cluster experiments
  • Each experiment is regarded as experiment
    vector in the experiment space

21
Visualisation of Data Matrix
  • The data matrix can be visualised to get an
    impression of gene expression behaviour.
  • The order of genes in data matrix can be
    reorganised for better recognition of expression
    patterns.

22
Distance Metrics
  • For clustering algorithms the calculation of a
    distance between gene vectors or experiment
    vectors is a necessary step
  • Distances metrics can be classified as
  • Metric distances
  • Semi-metric distances
  • Metric distances
  • dab gt 0
  • dab dba
  • daa 0
  • dab lt dac dca
  • Semi-metric distances obey 1) to 3), fail in 4)

23
Distance Metrics
Minkowski distance If q 1, d is Manhattan
distance (semi-metric distance) If q 2, d is
Euclidean distance (metric distance)
24
Distance Metrics
Pearson correlation coefficient (semi-metric
distance)
-1 lt d(i,j) lt 1
25
Distance Metrics
Rank-ordered Pearson correlation coefficient -gt
Spearman (semi-metric distance)
26
Distance Metrics
Other variations of Pearsons correlation
coefficient Uncentered Pearson
correlation (semi-metric distance)
27
Distance Metrics
Entropy based distances Mutual
Information (semi-metric distance)
  • Mutual Information (MI) is a statistical
    representation of the correlation of two signals
    A and B.
  • MI is a measure of the additional information
    known about one expression pattern when given
    another.
  • MI is not based on linear models and can
    therefore also see non-linear dependencies (see
    picture).

28
Clustering Techniques
  • Several classification criteria of clustering
    algorithms exist with regard to
  • Clustering result hierarchical not
    hierarchical
  • Clustering process divisive - agglomerative
  • Clustering criterion sequential - global

29
Clustering Techniques
Hierarchical clustering
30
Clustering Techniques
Hierarchical clustering phylogenetic tree
31
Clustering Techniques
  • Hierarchical clustering
  • Various hierarchical clustering algorithms exist
  • Single-linkage clustering, nearest-neighbour
  • Complete-linkage, furthest-neighbour
  • Average-linkage, unweighted pair-group method
    average (UPGMA)
  • Weighted-pair group average, UPGMA weighted by
    cluster sizes
  • Within-groups clustering
  • Wards method
  • ...

32
Clustering Techniques
  • k-means clustering
  • The number of clusters k has to be chosen in
    advance)
  • Initial position of cluster centers is random)
  • For each data point the (euclidean) distance to
    each cluster center is calculated
  • Each data point is assigned to its nearest
    cluster
  • Cluster centers are shifted to the center of data
    points assigned to this cluster center
  • Step 3. 5. is iterated until cluster centers
    are not shifted anymore

33
Clustering Techniques
k-means clustering (continued) A reasonable
number of cluster centers k can be estimated
34
Clustering Techniques
k-means clustering (continued) The initial
position of cluster centers can be estimated by
the distribution of the data vectors
35
Clustering Techniques
  • Principal Component Analysis (PCA)
  • Singular value decomposition (SVD)
  • Reduction of effective dimensionality of
    gene-expression space
  • by (linear) combination of initial dimensions
  • Mathematically complex

36
Clustering Techniques
  • No clustering technique is better than another
  • Different clustering techniques have shown to
    lead to reasonable results depending on the
    measured data
  • Organism
  • Tissue
  • Experimental condition, e.g. mutants or time
    course experiments
  • Distance metric used

37
Visualisation
Visualisation in clusters
38
Relevance of Networks
  • There is now abolute measure for quality of
    clusters
  • Biological relevance of clusters can be checked
    with biochemical knowlegde
  • Functional classifications are provided by some
    databases, e.g. MIPS (http//gsf.mips.de)
  • Genes in clusters can be checked if a majority is
    annotated to one or a few functional classes

39
Conclusion
  • Microarrays measure the concentration of mRNA
  • Several assumptions are made in the progress of
    analysis
  • Biological
  • Numerical
  • Experimental conditions are important!
  • Different distances matrics combined with
    different clustering algorithms can leed to
    completely different results

40
Software Sources
  • AFM (Array File Maker) from Samuel Lunenfeld
    Research Institute, Mount Sinai Hospital
    (Toronto) http//www.mshri.on.ca/tyers/software.h
    tml
  • AMADA (Analyzing Microarray Data) from Univ of
    Hong Kong http//web.hku.hk/xxia/software/AMADA.
    htm
  • ANOVA programs for microarray data from
    Churchill's group at Jackson Lab
    http//www.jax.org/research/churchill/software/an
    ova/index.html
  • CLUSFAVOR (CLUSter and Factor Analysis using
    Varimax Orthogonal Rotation), from Baylor
    http//mbcr.bcm.tmc.edu/genepi/
  • CLUSTER, TREEVIEW from Eisen's lab at Lawrence
    Berkeley National Lab http//rana.lbl.gov/EisenSo
    ftware.htm also, http//www.microarrays.org/softw
    are.html
  • D-CHIP specifically for Affymetrix chip data,
    from Wong's lab at Biostatistic Department,
    Harvard University http//www.dchip.org
  • GENE-CLUSTER Whitehead Institute (registration
    is required). A commercial version of this
    program is also developed, at Affymetrix
    http//www.genome.wi.mit.edu/MPR/software.html
  • J-EXPRESS Univ of Bergen, Norway
    http//www.ii.uib.no/bjarted/jexpress/index.html
  • ONTO-EXPRESS Wayne Univ http//vortex.cs.wayne.
    edu/Projects.html
  • PAGE (PAtterns from Gene Expression) from U
    Penn. http//www.cbil.upenn.edu/PaGE/
  • PLAID from Owen's lab at Stanford Univ.
    http//www-stat.stanford.edu/owen/plaid/
  • SAM (significance analysis of microarrays) from
    Tibshirani's group at Stanford Univ.
    http//www-stat.stanford.edu/tibs/SAM/index.html
  • SAM (significance of array measurement) from
    Institute for Systems Biology http//www.systemsb
    iology.org/VERAandSAM/
  • SMA (Statistic for Microarray Analysis, in R or
    S-PLUS subroutines) from Speed's lab at
    Statistics Department of UC Berkeley
    http//www.stat.berkeley.edu/users/terry/zarray/S
    oftware/smacode.html
  • SVDMAN from Los Alamos National Laboratory
    http//public.lanl.gov/mewall/svdman/
  • TREE-ARRANGE and TREEPS from Univ Waterloo
    http//monod.uwaterloo.ca/downloads/treearrange/
  • VERA (variability and error assessment) from
    Institute for Systems Biology http//www.systemsb
    iology.org/VERAandSAM/
  • XCLUSTER from Sherlock's lab of Genetics
    Department of Stanford Univ http//genome-www.sta
    nford.edu/sherlock/cluster.html

41
Literature
  • Sessions at Pacific Symposium on
    Bioinformatics1998, 1999, 2000
  • Gene Regulation Networks - a Finite State Linear
    Model slide presentation by Alvis Brazma, the
    EBI.
  • T. Akutsu, S. Miyano, and S. Kuhara. 1999.
    Identification of Genetic Networks from a Small
    Number of Gene Expression Patterns Under the
    Boolean Network Model, Pacific Symposium on
    Biocomputing 417-28.
  • A. J. Butte, I. S. Kohane. 2000. Mutual
    information relevance networks Functional
    genomic clustering using pairwise entropy
    measurements.Pacific Symposium on Bioinformatics
    2000.
  • B Dutilh. 1999. Gene Networks from Microarray
    Data. Literature thesis, Utrecht University
  • P D'Haeseleer, S. Liang, R. Somogyi. 2000.
    Genetic Network Inference From Co-expression
    clustering to reverse engineering.
    Bioinformatics in press.
  • P D'haeseleer, S. Liang, R. Somogyi. 1999. Gene
    Expression Analysis and Genetic Network Modelin.
    Tutorial session at Pacific Symposium on
    Bioinformatics 1999.
  • P D'haeseleer, S. Fuhrman. 2000. Gene network
    inference using a linear, additive regulation
    mode. submitted to Bioinformatics.
  • S Huang. 1999. Gene expression profiling, genetic
    networks and cellular states an integrating
    concept for tumorigenesis and drug discovery. J
    Mol Med 77, 469-480.
  • H de Jong, M Page. 2000. Qualitative simulation
    of lage and complex genetic regulatory systems.
    Proc of ECAI 2000, IOS Press.
  • P Smolen, DA Baxter, JH Byrne. 2000. Modelling
    transcriptional control in gene networks -
    methods, recent results, and future directions.
    Bulletin of Math. Biol. 62, 247-292.
  • M. Wahde, J. Hertz. 2000. Coarse-grained reverse
    engineering of genetic regulatory networks.
    BioSystems 55, 129-136.
  • D.C. Weaver, C.T. Workman, G.D. Stormo. 1999.
Write a Comment
User Comments (0)
About PowerShow.com