Tree Pattern Matching in Phylogenetic Trees - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Tree Pattern Matching in Phylogenetic Trees

Description:

Create algorithms that allow for automatic searching for orthologs or paralogs ... into clusters of orthologs depends on evolutionary distance between species ... – PowerPoint PPT presentation

Number of Views:58
Avg rating:3.0/5.0
Slides: 17
Provided by: jea989
Category:

less

Transcript and Presenter's Notes

Title: Tree Pattern Matching in Phylogenetic Trees


1
Tree Pattern Matching in Phylogenetic Trees
  • Automatic Search for Orthologs or Paralogs in
    Homologous Gene Sequence Databases
  • By Jean-François Dufayard, Laurent Duret, Simon
    Penel, Manolo Gouy, François Rechenmann, and Guy
    Perrière

Presented by Jean Yeh
2
Background Information
  • The authors have created three databases that
    gather genes into homologous families
  • HOVERGEN vertebrates
  • HOBACGEN prokaryotes
  • HOGENOM completely sequenced organisms
  • Among homologous genes, need to be able to
    differentiate orthologs from paralogs

3
Homologous Sequences
  • Homologs Two genes related by descent from a
    common ancestral DNA sequence
  • Orthologs Two genes in different species
    evolved from a single ancestral gene by
    speciation
  • Paralogs Two genes related by duplication within
    a genome

4
Orthologs and Paralogs
http//www.ncbi.nlm.nih.gov/Education/BLASTinfo/or
thologs3.gif
5
Gene Function
  • Gene function tends to change after gene
    duplication
  • Orthologs are more reliable predictors of gene
    function than paralogs
  • Evolutionary distance also plays a role
  • Closely related paralogs probably more similar
    than distantly related orthologs

6
Goal
  • Create algorithms that allow for automatic
    searching for orthologs or paralogs in their
    databases
  • One algorithm for tree reconciliation
  • One algorithm for tree pattern matching
  • Implement under architecture used to query the
    databases

7
Tree Reconciliation
  • Infers speciation and duplication events
  • Compares gene tree G with species tree S to give
    a reconciled tree R
  • Algorithm
  • R S
  • Step through G and R simultaneously
  • If nodes are incongruent, insert duplication node
    in R and annotate gene losses

8
Tree Reconciliation
9
Tree Pattern Matching
  • A tree pattern is a peculiar tree structure with
    taxonomic and evolutionary parameters contained
    in nodes and leaves
  • Can be considered a subtree
  • Want to match to a target tree
  • E.g. pattern (X, Y, Z) matches ((X, Y), Z), (X,
    (Y, Z)), and ((X, Z), Y)

10
Tree Pattern Matching
  • Uses a recurrence algorithm that takes into
    account different taxonomic levels as well as the
    specific branch constraints
  • Cuts down on run time by checking the number of
    leaves in the pattern and the target tree
  • Allows users to search for orthologs/paralogs

11
FamFetch Interface
  • User interface to access the databases
  • Incorporates both algorithms
  • Pattern editor has two frames tool and pattern
  • Pattern frame interactive editor to construct,
    load, save, and match patterns with a tree
    database
  • Tool frame tools used in pattern frame

12
FamFetch
13
Tree Rooting
  • For tree reconciliation, the trees must be rooted
  • Authors use their reconciliation algorithm to
    find the most parsimonious solution the one
    that requires the least number of gene
    duplications
  • Reconciliation algorithm relatively fast

14
Tree Pattern Search
  • By forming their algorithm as a tree pattern
    search, the authors managed to increase possible
    queries for the users
  • Can search for gene duplication or gene
    speciation events, not just orthologs and
    paralogs
  • Also relatively fast algorithm, though lose the
    human flexibility of pattern matching

15
Automatic Search for Orthologs
  • Previously done with pairwise BLAST searches and
    reciprocal hits
  • Need all genes and if genes are wrong, results
    may be wrong
  • Classifying genes into clusters of orthologs
    depends on evolutionary distance between species

16
Possible Improvement
  • Have program estimate reliability of
    reconciliation
  • While it allows for easier comparative sequence
    analysis, it was designed solely for databases
    the authors had already created
  • Might be improved if it could be generalized for
    more databases
Write a Comment
User Comments (0)
About PowerShow.com