Annotation Driven Hierarchical Clustering Analysis - PowerPoint PPT Presentation

1 / 22
About This Presentation
Title:

Annotation Driven Hierarchical Clustering Analysis

Description:

Molecular function. Biological process. Cell component. Each gene has set of properties ... finds the right balance between the raw data, the clustering results. and ... – PowerPoint PPT presentation

Number of Views:29
Avg rating:3.0/5.0
Slides: 23
Provided by: pow482
Category:

less

Transcript and Presenter's Notes

Title: Annotation Driven Hierarchical Clustering Analysis


1
Annotation Driven Hierarchical Clustering Analysis
  • Darja Krushevskaja

2
Motivation and Background
  • Data
  • High throughput technologies produce/store huge
    datasets... now even faster
  • Data availability (internet, cooperation with
    other research groups, etc)

3
For Instance ... in biology
6000 genes
1400x1050
  • Huge multidimentional datasets
  • Dataset structure
  • Structure Visualisation
  • Knowledge discovery
  • Already known facts.

4
Gene expression data
experiments
9

10

10
-10


0






Scanning with laser
10
9
...
-10
genes
Each experiment 40 000 measurements
5
Things to do with it
  • It is just a set of numerical vectors. You can
  • Sort
  • Find 10 best matches for some item
  • Search for patterns
  • Divide into groups
  • Etc

6
Clustering
The goal is to put similar objects to the same
cluster and dissimilar ones to different
clusters.
7
Hierarchical clustering
  • provides data structure
  • explaines the relation between all objects in the
    data set.

8
Annotations
  • Already known properties of genes
  • Molecular function
  • Biological process
  • Cell component
  • ...
  • Each gene has set of properties

9
Back to motivation
6000 genes
1400x1050
10
Hangling the dataset
11
Annotation Driven Hierarchical Clustering Analysis
  • The analysis that
  • finds the right balance between the raw data, the
    clustering results
  • and
  • the most interesting features in the data set.

12
Input
x1
x2
x3
x4
x5
x6
x7
x8
x9
3.5 0.9
1.5 3.0
1.7 5.5
2.1 8.0
3.0 8.4
5.1 5.2
8.4 6.5
8.3 2.2
8.3 1.4








  • Elements X(x1, x2, ..., xn)
  • Experiment results (vectors)
  • Known facts (binary vectors)

13
Step I Clustering
x2
x3
x1
x4
x5
x8
x9
x6
x7
1.5 3.0
1.7 5.5
3.5 0.9
2.1 8.0
3.0 8.4
8.3 2.2
8.3 1.4
5.1 5.2
8.4 6.5









14
Step II Tree Annotation
x2
x3
x1
x4
x5
x8
x9
x6
x7
1.5 3.0
1.7 5.5
3.5 0.9
2.1 8.0
3.0 8.4
8.3 2.2
8.3 1.4
5.1 5.2
8.4 6.5









15
Step III Tree collapsing









x2
x3
x1
x4
x5
x8
x9
x6
x7
1.5 3.0
1.7 5.5
3.5 0.9
2.1 8.0
3.0 8.4
8.3 2.2
8.3 1.4
5.1 5.2
8.4 6.5
16
Step IV Visualisation
x2
x3
x1
x4
x5
x8
x9
x6
x7
1.5 3.0
1.7 5.5
3.5 0.9
2.1 8.0
3.0 8.4
8.3 2.2
8.3 1.4
5.1 5.2
8.4 6.5









17
Method Implementation
  • Analysis and visualisation of one data set at a
    time
  • Analysis and visualisation of one cluster at a
    time
  • Comparision of datasets.

18
Single Dataset Analysis
  • Heatmap
  • Dendrogram
  • Intresting subtrees
  • Interactive
  • Interesting clusters characteristics
  • Gene(s) search

19
Single cluster analysis
  • Single cluster information
  • Size
  • Annotation
  • Links to other applications (gProfiler, URLMap, )

20
Comparision of Datasets
  • Comparative table for interesting annotations

21
Further work
  • Consider alternative collapsing techniques
  • Knowledge discovery

22
Thank You!
Write a Comment
User Comments (0)
About PowerShow.com