Clustering of Leaflabelled Trees on Free Leafset - PowerPoint PPT Presentation

1 / 24
About This Presentation
Title:

Clustering of Leaflabelled Trees on Free Leafset

Description:

Phylogenetic Tree. ancestor. dog. rat. cow. cat. mouse. dog. cat. mouse. rat. cow. horse. ancestor? ... The objective of k-mean for trees is identical with the k-best ... – PowerPoint PPT presentation

Number of Views:46
Avg rating:3.0/5.0
Slides: 25
Provided by: Qba
Category:

less

Transcript and Presenter's Notes

Title: Clustering of Leaflabelled Trees on Free Leafset


1
Clustering of Leaf-labelled Trees on Free Leafset
  • Jakub Koperwas
  • Krzysztof Walczak
  • Institute of Computer Science
  • Warsaw University of Technology

2
Agenda
  • Background
  • Introduction to phylogenetic trees
  • Consensus and distance methods
  • Clustering of leaf-labelled trees
  • Clustering motivation
  • Clustering Quality Measure
  • Clustering Approaches
  • Free leafset extension
  • Experiments and results
  • Future works

3
Phylogenetic Tree
ancestor

cat
CACCTGT
dog
CAACTGT
mouse
CACCTAT
rat
CACTTGT
CACCTCT
horse
cow
CACCTCT
4
Tree Representation
Splits
a
b
abcdef
bacdef
cabdef
dabcef
c
eabcdf
fabcde
abcdef
abcdef
abcdef
d
e
f
Clusters
abcde
ab
cde
cd
e
a
b
a
b
c
d
e
c
d
5
Robinson Foulds Distance
Splits for tree T1 abcdef, bacdef, cabdef,
dabcef, eabcdf, fabcde, abcdef, abcdef,
abcdef
Splits for tree T2 abcdef, bacdef, cabdef,
dabcef, eabcdf, fabcde, abcdef, abcdef,
abcefd
Uncommon splits abcdef, abcefd
6
Strict Consensus Tree
Splits for tree T1 abcdef, bacdef, cabdef,
dabcef, eabcdf, fabcde, abcdef,
abcdef, abcdef
Splits for tree T2 abcdef, bacdef, cabdef,
dabcef, eabcdf, fabcde, abcdef,
abcdef, abcefd
The common splits abcdef, bacdef, cabdef,
dabcef, eabcdf, fabcde, abcdef,
abcdef
7
Clustering motivation
  • Phylogenetic trees reconstruction methods may
    produce many candidate trees
  • Hard to apply consensus methods to achieve one
    tree from profile of hundreds of trees
  • Clustering helps to designate small number of
    candidate trees form a large number of trees

8
Representative Tree
  • Representative tree tree that shares common
    knowledge of all trees in cluster.
  • Strict Consensus Tree
  • Majoruty-rule Consensus Tree
  • Other

9
Information in Tree
a
a
b
c
a
b
b
e
e
c
e
d
d
d
c
a
a
a
b
b
b
e
e
e
c
c
c
d
d
d
10
Information Loss
  • Cluster Information Loss the amount of
    information that will be lost while replacing the
    cluster of trees with one representative tree
  • Clustering Information Loss the amount of
    information that will be lost while replacing the
    input profile of trees with k representative
    trees

11
Information Loss - Example
12
K-best problem
  • K-best problem is the problem of finding
    partition of dataset on k clusters (where k is an
    given value), in such way that this partition
    maximizes Information Gain towards given type of
    representative tree.
  • Proposition
  • K-mean algorithm for majority-rule consensus tree
    as representative tree
  • Agg-inf algorithm for strict consensus tree as
    representative tree

13
K-mean Approach(Majority-rule CT)
  • Majority-rule consensus tree is a center tree,
    therefore can be used as centroid
  • The objective of k-mean for trees is identical
    with the k-best objective if the majority-rule
    consensus tree is chosen as representative tree
  • Conclusion K-mean is a good candidate when a
    majority-rule consensus tree is used as
    representative tree

14
Agglomerative approach (Strict CT)
  • Typical Merging Strategies
  • Single linkage
  • Complete linkage
  • Average Linkage
  • Our Merging Strategy minimize information loss
    after merging
  • For Strict Consensus Tree as Representative Tree

15
Free leafset extension
a
a
c
c
b
b
f
d
e
d
T2 abcdf, bacdf, cabdf, dabcf acbdf, abcdf
T1 abcde, bacde, cabde, dabce acbde, abcde
No two splits can ever be equal Consensus
methods always empty set Distance always sum
of splits in all trees
16
Z-restriction

abcdefgy
abcdefgx
abcdefg
17
Z-restricted consensus tree
Zabcd
18
Z-restricted distance
dRF(T1,T2)15
dRFabcde2
19
Z-restriction in Clustering
  • 1. Z-Restricted Majority rule consensus tree is a
    middle tree
  • K-mean can be used

2.
  • Agg-inf improvement can be used

20
Pros and cons of z-restriction
  • Pros
  • Simple and efficient
  • Nice Mathematical Features
  • Cons
  • Arbitrary Information discarding
  • Hard to choose the z parameter

21
Results Strict consensus (same leafset)
22
Results (free leafset)
23
Future Work
  • Extension of presented methods with frequent
    subsplits approach
  • Developing an algorithm for general
    representative tree
  • Incorporate more biologically-significant
    features into clustering objective function
  • Experiments on datasets from other disciplines
    like linguistics

24
Thank You
Write a Comment
User Comments (0)
About PowerShow.com