Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis - PowerPoint PPT Presentation

About This Presentation
Title:

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Description:

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some s adapted from: MeV documentation s – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 30
Provided by: Gabriel154
Category:

less

Transcript and Presenter's Notes

Title: Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis


1
Making Sense of Complicated Microarray
DataPart II Gene Clustering and Data Analysis
  • Gabriel Eichler
  • Boston University
  • Some slides adapted from MeV documentation slides

2
Why Cluster?
  • Clustering is a process by which you can explore
    your data in an efficient manner.
  • Visualization of data can help you review the
    data quality.
  • Assumption Guilt by association similar gene
    expression patterns may indicate a biological
    relationship.

3
Expression Vectors
  • Gene Expression Vectors encapsulate the
    expression of a gene over a set of experimental
    conditions or sample types.

-0.8
1.5
1.8
0.5
-1.3
-0.4
1.5
0.8
Numeric Vector
Line Graph
Heatmap
4
Expression Vectors As Points in Expression Space
t 1
t 2
t 3
G1
-0.8
-0.3
-0.7
G2
-0.8
-0.7
-0.4
Similar Expression
G3
-0.4
-0.6
-0.8
G4
0.9
1.2
1.3
G5
1.3
0.9
-0.6
Experiment 3
Experiment 2
Experiment 1
5
Distance and Similarity
-the ability to calculate a distance (or
similarity, its inverse) between two expression
vectors is fundamental to clustering
algorithms -distance between vectors is the basis
upon which decisions are made when grouping
similar patterns of expression -selection of a
distance metric defines the concept of distance
6
Distance a measure of similarity between gene
expression.
p1
  • Some distances (MeV provides 11 metrics)
  • Euclidean ??i 1 (xiA - xiB)2

p0
3. Pearson correlation
7
Clustering Algorithms
8
Clustering Algorithms
  • Be weary - confounding computational artifacts
    are associated with all clustering algorithms.
    -You should always understand the basic concepts
    behind an algorithm before using it.
  • Anything will cluster! Garbage In means Garbage
    Out.

9
Hierarchical Clustering
  • IDEA Iteratively combines genes into groups
    based on similar patterns of observed expression
  • By combining genes with genes OR genes with
    groups algorithm produces a dendrogram of the
    hierarchy of relationships.
  • Display the data as a heatmap and dendrogram
  • Cluster genes, samples or both

(HCL-1)
10
Hierarchical Clustering
11
Hierarchical Clustering
12
Hierarchical Clustering
13
Hierarchical Clustering
14
Hierarchical Clustering
15
Hierarchical Clustering
16
Hierarchical Clustering
17
Hierarchical Clustering
18
Hierarchical Clustering
H
L
19
Hierarchical Clustering
Samples
Genes
The Leaf Ordering Problem
  • Find optimal layout of branches for a given
    dendrogram
  • architecture
  • 2N-1 possible orderings of the branches
  • For a small microarray dataset of 500 genes
  • there are 1.6E150 branch configurations

20
Hierarchical Clustering
The Leaf Ordering Problem
21
Hierarchical Clustering
  • Pros
  • Commonly used algorithm
  • Simple and quick to calculate
  • Cons
  • Real genes probably do not have a hierarchical
    organization

22
Self-Organizing Maps (SOMs)
A
Idea Place genes onto a grid so that genes with
similar patterns of expression are placed on
nearby squares.
B
C
D
c
a
d
b
23
Self-Organizing Maps (SOMs)
A
IDEA Place genes onto a grid so that genes with
similar patterns of expression are placed on
nearby squares.
B
C
D
c
a
d
b
24
Self-organizing Maps (SOMs)
25
Self-organizing Maps (SOMS)
26
The Gene Expression Dynamics Inspector GEDI
S a m p l e s



Group A
Group B
Group C
C1
C2
C3
C4
B1
B2
B3
B4
A1
A2
A3
A4
1.5 1.4 1.7 1.2 .85 .65 .50 .55 2.5 2.8 2.7 2.1
.78 .95 .75 .45 1.1 1.2 1.0 1.3 .56 .62 .78 .89
.45 .23 .15 .05 .82 .71 .62 .49 .11 .16 .11 .95
2.2 4.5 6.7 6.2 2.2 2.5 2.8 2.9 .48 .90 1.5 1.8
2.1 2.0 1.9 1.6 4.2 4.8 5.2 5.5 2.5 2.6 2.0 1.9
1.2 1.1 1.6 2.9 1.1 1.8 1.9 1.4 1.7 1.2 1.1 1.6
Gene 1
G en e s
Gene 2
G en e s
Gene 3
Gene 4
Gene 5
Gene 6
Group A
Group B
Group C
  • GEDIs Features
  • Allows for simultaneous analysis or several time
    courses or datasets
  • Displays the data in an intuitive and comparable
    mathematically driven visualization
  • The same genes maps to the same tiles

H
Group A
Group B
Group C
L
1
2
3
4
27
Software Demonstrations
  • MeV available at http//www.tigr.org/software/tm4/
    mev.html

GEDI available at http//www.chip.org/ge/gedihome
.htm
28
Comparison of GEDI vs. Hierarchical
ClusteringHierarchical clustering of random
data(GIGO)
From CreateGEP_Journal.wpd, random_A
29
Questions
Write a Comment
User Comments (0)
About PowerShow.com