Using PQ Trees For Comparative Genomics - PowerPoint PPT Presentation

1 / 20
About This Presentation
Title:

Using PQ Trees For Comparative Genomics

Description:

... Genomics. Gad M. Landau. Laxmi Parida. Oren Weimann. Gene Clusters ... Laxmi Parida. Oren Weimann. No trees were harmed during the making of this presentation ... – PowerPoint PPT presentation

Number of Views:81
Avg rating:3.0/5.0
Slides: 21
Provided by: Michael1824
Category:

less

Transcript and Presenter's Notes

Title: Using PQ Trees For Comparative Genomics


1
Using PQ Trees For Comparative Genomics
  • Gad M. Landau
  • Laxmi Parida
  • Oren Weimann

2
Gene Clusters
  • Genes that appear together consistently across
    genomes are believed to be functionally related,
    however the ordering doesnt have to be the same.

3
What is a ?Pattern?
  • Given a string Ss1s2s3.sn and an integer K, a
    pattern Pp1,p2,p3,,pm is a ?pattern if P
    occurs (possibly permuted) in at least K places
    in S.
  • Example
  • S a b c d b a c d a b a c b P
    a,b,c K4
  • P is a 4-?Pattern with location-list
    1,5,10,11
  • For the moment we will assume that every
    character appears once in the pattern.

4
Maximal ?Patterns
  • A ?pattern p is non-maximal with respect to
    ?pattern p if Example
  • Maximal notation - a representation of a maximal
    ?pattern p that illustrates all the non-maximal
    ?patterns with respect to p.
  • Our goal Given a string S find all ?patterns p
    and their maximal notation.
  • Our solution a linear time algorithm based on
    PQ trees.

S a b c d e b a d c e
S a b c d e b a d c e a,b is
non-maximal with respect to a,b,c,d,e
The maximal notation of a,b,c,d,e is
((a,b)-(c,d)-e)
5
PQ trees Booth, Lueker Definitions
  • PQ trees Booth, Lueker, 1976
  • Character labeled leaves.
  • P-nodes
  • Represent truly permuted components
  • Arbitrary permutations of children
  • Q-nodes
  • Represent bi-connected components
  • Only reversion

A
C
6
PQ trees Definitions
  • Equivalent PQ trees (denoted ).

7
PQ trees Definitions
  • FRONTIER
  • C(T) the set of frontiers of all trees
    equivalent to T

FRONTIER(T)A B C D E F G H I J K
FRONTIER(T)A B C G H I J K E F D"
Theorem If C(T1)C(T2) then T1 T2.
8
Our Use of the PQ tree
  • Suppose the ?Pattern a,b,c,d appears in 4
    locations as
  • ? abcd , acbd , dbca , dcba .
  • Our goal
  • C(T) abcd ,
    acbd , dbca , dcba .
  • Write the P-nodes as , and the Q-nodes as -
    and get (a-(b,c)-d) which is exactly the maximal
    notation of the ?Pattern a,b,c,d

a
d
b
c
9
The minimal Consensus PQ tree
  • It is not always possible to find a tree T where
    ?C(T)
  • Consider a ?Pattern a,b,c,d that appears as ?
    abcd , bdac .
  • abcd ,
    bdac C(T)
  • Given permutations ??1, ?2,,?k, the consensus
    PQ tree T of ? is such that ? C(T), and the
    consensus is minimal when there exists no other
    T such that ? C(T) and C(T) C(T).
  • The problem of obtaining a maximal notation for a
    ?Pattern is the same as obtaining a minimal
    consensus PQ tree of all the k occurrences.
  • Theorem The minimal consensus PQ tree T is
    unique.

10
The original use of the PQ Tree
  • The consecutive 1s problem
  • The
    restriction sets
  • F a,b,c , b,c , b,c,d , b
  • The solution Booth, Lueker, 1976
  • Reduce(F )
  • The result will be C(T), in our case C(T)abcd ,
    acbd , dbca , dcba
  • and the tree was constructed in O( ) time (for
    an n x n matrix)
  • (Reduce(F) by Booth, Lueker, 1976)

a
d
b
c
11
Obtaining the Minimal Consensus PQ tree
  • Some definitions Heber, Stoye, 2001
  • Common interval an interval that appears as a
    consecutive sequence in all the appearances.
    4,8 in the example.
  • We denote all Common intervals
  • 1,2,2,3,1,3,1,9,1,8,4,5,4,6,4,7
    ,4,8,5,6
  • A list p of common intervals is a chain if every
    two successive intervals in p have a non-trivial
    overlap. For example P(1,2,2,3)
  • A common interval is called reducible if there
    is a chain that generates it, otherwise it is
    called irreducible. 1,3 is a reducible interval
    since it can be generated by the irreducible
    intervals 1,2 ,2.3
  • We denote all irreducible intervals of ?
  • 1,2,1,8,2,3,4,5,4,8,4,8,5,6

12

Obtaining the Minimal Consensus PQ tree
  • Theorem Reduce( ) Reduce( ) minimal
    consensus tree.
  • The Algorithm
  • Compute .
  • 1,2,1,8,2,3,4,5,4,8,4,8,5,6
  • Compute Reduce( ) to get the minimal
    consensus tree of ?.
  • The ?Pattern notation is ((1-2-3)-(((4-5-6),7),8
    )-9)
  • Time Complexity For a a ?pattern of size n that
    appears in k places it takes a total
    of O(kn ) to compute maximal notation.

13
Improving the Time Complexity to O(kn)
  • In Heber Stoyes algorithm for obtaining ,
    a data structure S was maintained to hold the
    chains of the irreducible intervals

  • 1,2,1,8,2,3,

  • 4,5,4,8,4,8,5,6
  • REPLACE(S)
  • Replace every chain by a Q node.
  • Replace every element that is not a leaf or a Q
    node and is pointed by a vertical link with a P
    node.

14
Maximal ?Patterns and Sub-Trees
  • A sub-tree of the PQ tree T is obtained by
    picking a P-node in T with all its descendants,
    or by picking a Q-node in T with any number of
    consecutive descendants.
  • Suppose the ?Pattern a,b,c,d appears in 4
    locations as
  • ? abcd , acbd , dbca , dcba .
  • Theorem 4 If p1 and p2 are ?patterns, and p1 is
    non-maximal with respect to p2, then the PQ Tree
    T1 that represents p1 is a sub-tree of the PQ
    tree T2 that represents p2.

a
d
b
c
15
So what did we achieve?
  • A first algorithm (and optimal in time) that
    generates the maximal notation of a pattern.
    Allowing
  • A visualization of the inner structure of a
    pattern.
  • Filtering of meaningful from apparently
    meaningless (non-maximal) clusters.
  • Experimental results that prove this tool can aid
    in predicting gene functions.
  • Clustering for the various genome models.

16
Using Our Tool for Various Genome Models
  • Genome model I (orthologs only)
  • A sequence is a permutation of the set
    1,2,n. Only one maximal ?pattern 1,2.,n. In
    O(kn) time we get a PQ tree that describes all
    patterns of all sizes and their non-maximal
    relations.

17
Using Our Tool for Various Genome Models
  • Genome model II A gene may appear once in a
    sequence or not appear at all in that sequence.
  • We can extend the algorithm to work on
    sequences that are not permutations of the same
    set in
  • Example consider the 2 sequences
  • 1 2 3 4 5 6 7 and 1 8 2 4 3 7 6
  • 8 1 2 3 4 5 5 6 7 8 and 5 1 8 8 2 4 3 7 6
    5

add characters as needed
Build PQ Tree on the new sequences
The sub-trees that have no red leaves Are all
the maximal patterns
8
5
6
5
7
8
1
2
3
4
18
Using Our Tool for Various Genome Models
  • Genome model III (paralogs and orthologs)
  • A gene may appear any number of times in a
    sequence (including zero).
  • The minimal consensus PQ tree is not necessarily
    unique. Solution
  • Example consider 2 appearances of the ?pattern
    a,a,b as
  • ? aab , baa
  • 1. ? a1a2b , ba2a1
    C(T) a1a2b , ba2a1
  • 2. ? a1a2b , ba1a2
    C(T) a1a2b , ba2a1 , a2a1b ,
    ba1a2

19
Our Current work
  • We are extending the notion of permutation
    patterns to permutation patterns of trees
    (connected components with the same vertex label
    set) and graphs. We are developing algorithms to
    represent the maximal notation of a ?pattern in
    trees and graphs.

20
Using PQ Trees For Comparative Genomics Cast Tr
ees Lines Arrows Intervals Strings
Patterns Frontiers Based on a true story
by Gad M. Landau Laxmi Parida Oren Weimann No
trees were harmed during the making of this
presentation
Write a Comment
User Comments (0)
About PowerShow.com