Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis - PowerPoint PPT Presentation

About This Presentation

Title:

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

Description:

Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis Gabriel Eichler Boston University Some s adapted from: MeV documentation s – PowerPoint PPT presentation

Number of Views:214

Avg rating:3.0/5.0

Slides: 30

Provided by: Gabriel154

Learn more at: https://pga.mgh.harvard.edu

Category:

more less

Transcript and Presenter's Notes

Title: Making Sense of Complicated Microarray Data Part II Gene Clustering and Data Analysis

1
Making Sense of Complicated Microarray
DataPart II Gene Clustering and Data Analysis

Gabriel Eichler
Boston University
Some slides adapted from MeV documentation slides

2
Why Cluster?

Clustering is a process by which you can explore
your data in an efficient manner.
Visualization of data can help you review the
data quality.
Assumption Guilt by association similar gene
expression patterns may indicate a biological
relationship.

3
Expression Vectors

Gene Expression Vectors encapsulate the
expression of a gene over a set of experimental
conditions or sample types.

-0.8
1.5
1.8
0.5
-1.3
-0.4
1.5
0.8
Numeric Vector
Line Graph
Heatmap
4
Expression Vectors As Points in Expression Space
t 1
t 2
t 3
G1
-0.8
-0.3
-0.7
G2
-0.8
-0.7
-0.4
Similar Expression
G3
-0.4
-0.6
-0.8
G4
0.9
1.2
1.3
G5
1.3
0.9
-0.6
Experiment 3
Experiment 2
Experiment 1
5
Distance and Similarity
-the ability to calculate a distance (or
similarity, its inverse) between two expression
vectors is fundamental to clustering
algorithms -distance between vectors is the basis
upon which decisions are made when grouping
similar patterns of expression -selection of a
distance metric defines the concept of distance
6
Distance a measure of similarity between gene
expression.
p1

Some distances (MeV provides 11 metrics)
Euclidean ??i 1 (xiA - xiB)2

p0
3. Pearson correlation
7
Clustering Algorithms
8
Clustering Algorithms

Be weary - confounding computational artifacts
are associated with all clustering algorithms.
-You should always understand the basic concepts
behind an algorithm before using it.
Anything will cluster! Garbage In means Garbage
Out.

9
Hierarchical Clustering

IDEA Iteratively combines genes into groups
based on similar patterns of observed expression
By combining genes with genes OR genes with
groups algorithm produces a dendrogram of the
hierarchy of relationships.
Display the data as a heatmap and dendrogram
Cluster genes, samples or both

(HCL-1)
10
Hierarchical Clustering
11
Hierarchical Clustering
12
Hierarchical Clustering
13
Hierarchical Clustering
14
Hierarchical Clustering
15
Hierarchical Clustering
16
Hierarchical Clustering
17
Hierarchical Clustering
18
Hierarchical Clustering
H
L
19
Hierarchical Clustering
Samples
Genes
The Leaf Ordering Problem

Find optimal layout of branches for a given
dendrogram
architecture
2N-1 possible orderings of the branches
For a small microarray dataset of 500 genes
there are 1.6E150 branch configurations

20
Hierarchical Clustering
The Leaf Ordering Problem
21
Hierarchical Clustering

Pros
Commonly used algorithm
Simple and quick to calculate
Cons
Real genes probably do not have a hierarchical
organization

22
Self-Organizing Maps (SOMs)
A
Idea Place genes onto a grid so that genes with
similar patterns of expression are placed on
nearby squares.
B
C
D
c
a
d
b
23
Self-Organizing Maps (SOMs)
A
IDEA Place genes onto a grid so that genes with
similar patterns of expression are placed on
nearby squares.
B
C
D
c
a
d
b
24
Self-organizing Maps (SOMs)
25
Self-organizing Maps (SOMS)
26
The Gene Expression Dynamics Inspector GEDI
S a m p l e s

Group A
Group B
Group C
C1
C2
C3
C4
B1
B2
B3
B4
A1
A2
A3
A4
1.5 1.4 1.7 1.2 .85 .65 .50 .55 2.5 2.8 2.7 2.1
.78 .95 .75 .45 1.1 1.2 1.0 1.3 .56 .62 .78 .89
.45 .23 .15 .05 .82 .71 .62 .49 .11 .16 .11 .95
2.2 4.5 6.7 6.2 2.2 2.5 2.8 2.9 .48 .90 1.5 1.8
2.1 2.0 1.9 1.6 4.2 4.8 5.2 5.5 2.5 2.6 2.0 1.9
1.2 1.1 1.6 2.9 1.1 1.8 1.9 1.4 1.7 1.2 1.1 1.6
Gene 1
G en e s
Gene 2
G en e s
Gene 3
Gene 4
Gene 5
Gene 6
Group A
Group B
Group C

GEDIs Features
Allows for simultaneous analysis or several time
courses or datasets
Displays the data in an intuitive and comparable
mathematically driven visualization
The same genes maps to the same tiles

H
Group A
Group B
Group C
L
1
2
3
4
27
Software Demonstrations

MeV available at http//www.tigr.org/software/tm4/
mev.html

GEDI available at http//www.chip.org/ge/gedihome
.htm
28
Comparison of GEDI vs. Hierarchical
ClusteringHierarchical clustering of random
data(GIGO)
From CreateGEP_Journal.wpd, random_A
29
Questions

Write a Comment

User Comments (0)