Bioinformatics Algorithms and Data Structures - PowerPoint PPT Presentation

About This Presentation
Title:

Bioinformatics Algorithms and Data Structures

Description:

UNIVERSITY OF SOUTH CAROLINA. College of Engineering & Information Technology ... By contraposition: D' ultrametric D additive. Q: does D' ultrametric D additive? ... – PowerPoint PPT presentation

Number of Views:49
Avg rating:3.0/5.0
Slides: 55
Provided by: john244
Learn more at: https://www.cse.sc.edu
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics Algorithms and Data Structures


1
Bioinformatics Algorithms and Data Structures
  • Chapter 17.4-6 Strings and Evolutionary Trees
  • Lecturer Dr. Rose
  • Slides by Dr. Rose
  • April 17, 2003

2
Ultrametric Problem Centrality
  • Four related tree problems
  • Ultrametric
  • Additive
  • Binary perfect phylogeny
  • Tree compatibility
  • All can be solved as ultrametric tree problems.
  • Recall tree compatibility reduces to perfect
    phylogeny.
  • Now we reduce additive tree (binary) perfect
    phylogeny problems to the ultrametric tree
    problem.

3
Ultrametric Problem Additive Trees
  • Goal reduce additive tree problem to ultrametric
    problem
  • Complexity O(n2) reduction
  • Approach create a matrix D? that is ultrametric
    ? D is additive.
  • We will start by describing a reduction that
    involves a tree T for D and T? for D?.
  • We will then describe a direct reduction of D to
    D?.

4
Ultrametric Problem Additive Trees
  • Assume that D is additive.
  • Assume that we know of an additive tree T for D
  • Assume that each of the n taxa in D labels a leaf
    of T.
  • Idea label the nodes of T to create an
    ultrametric tree T?.
  • Q How can we do this?

5
Ultrametric Problem Additive Trees
  • A we will do the following
  • Select one node as the root
  • Stretch the leaf edges so that they are
    equidistant from the root.
  • Let v be the row of D containing the largest
    entry.
  • Let mv denote the value of this entry.
  • Select node v as the root of T.
  • This creates a directed tree.

6
Ultrametric Problem Additive Trees
  • Example
  • A is the row of D containing the largest entry.
  • Select node A as the root of T.

7
Ultrametric Problem Additive Trees
  • Stretch leaf edges
  • for each leaf i, add mA D(A, i) to the leaf
    edge.
  • Leaf edges are now equidistant from A.

8
Ultrametric Problem Additive Trees
  • The resulting tree T? is
  • a rooted edge-weighted tree
  • distance mv from root to every leaf
  • each internal node is equidistant to leaves in
    its subtree.

9
Ultrametric Problem Additive Trees
  • Since each internal node is equidistant to the
    leaves in its subtree
  • Label each internal node by this unique distance.
  • These labels can be used to define an ultrametric
    matrix D?.
  • D?(i, j) is the label at the least common
    ancestor of leaves i and j in T?.
  • Q How can we go directly from matrix D to matrix
    D? without involving T and T??

10
Ultrametric Problem Additive Trees
  • Consider leaves i j in T
  • Let node w be their least common ancestor
  • Let x be the distance from the root v to w.
  • Let y be the distance from node w to leaf i.

11
Ultrametric Problem Additive Trees
  • Q What is the distance from w to i in T??
  • A y mv - D(v, i) in T?.
  • Q Where does mv - D(v, i) come from?
  • A Recall we add mv - D(v, i) to stretch the leaf
    edges.

12
Ultrametric Problem Additive Trees
  • Gusfield presents the following lemma
  • Without knowing T or T explicitly, we can deduce
    that D(i, j) mv (D(i, j) - D(v, i) - D(v,
    j))/2
  • Q Is this equation correct?
  • D(i, j) mv ((y z) - (x y) - (x z))/2 ?
  • D(i, j) mv -2x/2 ?
  • Should it instead be
  • D(i, j) 2mv D(i, j) - D(v, i) - D(v, j)?
  • i.e., D(i, j) 2mv - 2x?
  • Probably, but it is not necessary for the
    reduction (slide 9)

13
Ultrametric Problem Additive Trees
  • This brings us to the following Theorem
  • If D is an additive matrix, then D is
    ultrametric, where D(i, j) mv (D(i, j) -
    D(v, i) - D(v, j))/2
  • Proof. Weve shown that
  • D(i, j) y mv - D(v, i)
  • y D(v, i) x
  • x (D(v, i) D(v, j) - D(i, j))/2
  • Putting it altogether establishes the equation in
    the theorem.
  • D satisfies the ultrametric requirement.

14
Ultrametric Problem Additive Trees
  • Q What is the value of y?
  • A y D(v, i) - x.
  • Q What is the value of x in terms of values in
    D?
  • A x (D(v, i) D(v, j) - D(i, j))/2

15
Ultrametric Problem Additive Trees
  • So D additive ? D ultrametric
  • By contraposition ?D ultrametric ? ?D additive
  • Q does D ultrametric ? D additive?
  • A Theorem D ultrametric ? D additive
  • Proof. (constructive)
  • Let T be the ultrametric tree for D
  • Assign weights to edges of T
  • Note the sum of edges from a leaf to an
    ancestor must match the ancestors label.
  • For each edge (p, q), assign the weight p-q

16
Ultrametric Problem Additive Trees
  • Assign weights to edges of T continued
  • Note the path distance between leaves (i, j) is
    twice the value labeling the least common
    ancestor
  • Hence, 2D(i, j) 2mv D(i, j) - D(v, i) - D(v,
    j)
  • Now shrink the edge into each leaf i by mv - D(v,
    i)
  • The path from leaf i to leaf j is now D(i, j)
  • The result is an additive tree for matrix D from
    Ds ultrametric tree.
  • Putting all of this together results in a method
    for contructing and additive tree for an additive
    matrix.

17
Ultrametric Problem Additive Trees
  • Additive Tree Algorithm
  • Create matrix D from D.
  • Create ultrametric tree T from D
  • Create T from T
  • Label edge (p, q) with the value p-q
  • For each leaf i, shrink the leaf edge by mv -
    D(v, i)
  • Note no step takes more than O(n2) time.
  • Thm. An additive tree for an additive matrix can
    be constructed in O(n2) time.

18
Ultrametric Problem Additive Trees
  • Example Given D, first find D
  • Recall D(i, j) mv (D(i, j) - D(v, i) - D(v,
    j))/2

19
Ultrametric Problem Additive Trees
  • Example From D find T
  • Recall label edge inner edges (p, q) by p-q

20
Ultrametric Problem Additive Trees
  • Example From T find T
  • Recall shrink leaf edge i by mv - D(v, i)

21
Ultrametric Problem Additive Trees
  • Example Finally compare the derived T with the
    original tree as a sanity check.

22
Ultrametric Problem Perfect Phylogeny
  • We now recast perfect phylogeny in terms of an
    ultrametric tree problem.
  • Defn. DM the n by n matrix of shared characters
  • More formally
  • Given the n by m character matrix M, define the n
    by n matrix DM for each pair of objects, set
    DM(p, q) to be the number of characters that p
    and q both possess.

23
Ultrametric Problem Perfect Phylogeny
  • Lemma If M has a perfect phylogeny, then DM is a
    min-ultrametric matrix.
  • Proof convert Ms perfect phylogeny T to a
    min-ultrametric tree for DM
  • Let T be the perfect phylogeny for M.
  • Label Ts root be zero.
  • Traverse T from top to bottom, for each node v
  • Let pv be the number labeling node vs parent.
  • Let ev be the of characters labeling the edge
    into v.
  • Label node v with the sum pv ev

24
Ultrametric Problem Perfect Phylogeny
  • The label of node v is the number of characters
    common to all leaves in the subtree rooted at v.
  • if v is the immediate parent of leaves p and q,
    then the label of v is DM(p, q)
  • The numbers labeling nodes on any path from the
    root are strictly increasing.
  • The result is an ultrametric tree for matrix DM.

25
Ultrametric Problem Perfect Phylogeny
  • Algorithm perfect phylogeny via ultrametrics
  • Create matrix DM from M.
  • Attempt to create a min-ultrametric tree T from
    DM. If not possible, then M has no perfect
    phylogeny.
  • If T was successfully created in step 2
  • Attempt to label its edges with the m characters
    of M.
  • If not possible, then M has no perfect phylogeny.
  • O/w the modified T is the perfect phylogeny T.
  • Note T may be min-ultrametric but M may not
    have a perfect phylogeny, hence the check in step
    3

26
Ultrametric Problem Perfect Phylogeny
  • Final notes on the centrality ultrametric
    problem.
  • We can see that the following problems
  • perfect phylogeny
  • tree compatibility
  • can be cast as ultrametric problems.
  • This is not an efficient way to address these
    problems.

27
Maximum Parsimony
  • Maximum parsimony
  • Perfect phylogeny is a special instance
  • Can be viewed as a Steiner tree problem on a
    hypercube
  • Presentation Approach
  • Introduce Steiner trees
  • Hypercube graphs
  • Maximum parsimony as a Steiner tree problem

28
Maximum Parsimony
  • Definitions
  • Let N be a set of nodes
  • Let E be a set undirected edges with non-negative
    weight
  • Let G (N, E) be an undirected graph
  • Let X ? N be a subset of nodes.
  • A Steiner tree ST for X is any connected subtree
    of G that contains all nodes of X and possibly
    nodes in N-X.
  • Weighted Steiner Tree Problem Given G and X,
    find the Steiner tree of minimum total weight.

29
Maximum Parsimony
  • More Definitions
  • A hypercube of dimension d is an undirected graph
    with 2d nodes, labeled 0..2d-1. Adjacent nodes
    differ in only one label bit position.
  • The weighted Steiner tree problem on hypercubes
    G must be a hypercube.

30
Maximum Parsimony
  • More Definitions
  • Maximum Parsimony Occams razor applied to
    phylogenetic reconstruction. A preference for
    trees requiring fewer evolutionary events to
    explain data.
  • Gusfields definition
  • The Maximum Parsimony problem is the unweighted
    Steiner tree problem on a d-dimensional hypercube.

31
Maximum Parsimony
  • More about the hypercube formulation of MP
  • The X input taxa are described as d-length binary
    vectors.
  • Recall adjacent nodes differ in only one label
    bit position.
  • Correspondingly, taxa that differ by a single
    mutation will be adjacent.
  • ?? Steiner tree of X nodes and l edges iff ? a
    corresponding phylogenetic tree that entails l
    character-state mutations.

32
Steiner interpretation of Perfect Phylogeny
  • Define a nontrivial binary character to be a
    character contained by some taxa but not all.
  • Consider an MP dataset of d nontrivial binary
    characters
  • Q what is the minimal number of mutations in the
    MP tree?
  • A at least d.

33
Steiner interpretation of Perfect Phylogeny
  • Q What is the relation to binary perfect
    phylogeny?
  • A the binary perfect phylogeny problem is
    equivalent to asking if there is an MP solution
    with a cost of exactly d.
  • Q What about generalized perfect phylogeny?
  • A Its similar. The lower bound must reflect
  • the number of character states in the input taxa.
  • a character having r states in the input taxa is
    allowed only r-1 transitions.

34
Steiner interpretation of Perfect Phylogeny
  • Complexity
  • No known efficient solution for Steiner tree
    problem on unweighted graphs.
  • Polynomial time solution for generalized perfect
    phylogeny problem when r is fixed.
  • ? this particular Steiner tree problem can be
    answer in polynomial time.

35
Steiner interpretation of Perfect Phylogeny
  • MP approximations
  • The weighted Steiner tree problem on hypercubes
    is NP-hard.
  • There is an approximate method with an error
    bound of a factor of 11/6.
  • Also MST can be used to find a Steiner tree with
    weight less than twice the optimal Steiner tree.

36
Phylogenetic Alignment
  • Recall
  • phylogenetic alignment was discussed in section
    14.8
  • The focus was on deriving a multiple alignment
    enlightened by evolutionary history.
  • The tree focused emphasis on specific alignment
    groupings
  • Internal node sequences were a secondary artifact

37
Phylogenetic Alignment
  • Phylogenetic alignment as a parsimony problem
  • In contrast
  • we are now interested in the internal sequences
  • These sequences are waypoints in the evoutionary
    trajectory leading to the extant taxa
  • phylogenetic alignment is thus a parsimony problem

38
Phylogenetic Alignment
  • Hypothesis optimal phylogenetic alignment
    describes evolutionary history.
  • Assumptions
  • Edit distance realistically models evolutionary
    distance
  • Globally optimal phylogenetic alignment captures
    essence of the evolutionary process
  • We will look at minimum mutation, a variant of
    phylogenetic alignment

39
Fitch-Hartigan minimum mutation problem
  • Defn. minimum mutation problem variant of
    phylogenetic alignment problem.
  • Input comprised of
  • Tree
  • Strings labeling the leaves
  • A multiple alignment of those strings

40
Fitch-Hartigan minimum mutation problem
  • Q If you are given the tree and the multiple
    alignment, what is left to compute?
  • A the mutations that accounts for the input
    data.
  • These mutations should be
  • minimum sequence of site mutations that is
  • compatible with the given tree and
  • the given multiple alignment.

41
Fitch-Hartigan minimum mutation problem
  • Q How is the input data used to determine the
    minimum sequence of mutations?
  • The multiple alignment associates each amino acid
    with a specific position.
  • The evolutionary history of the sequences is then
    treated as a combined but independent
    evolutionary history of each position.
  • The tree guides the order of mutations for each
    position.

42
Fitch-Hartigan minimum mutation problem
  • Assumptions
  • Each column of the alignment can be solved
    separately
  • The strings labeling inner nodes adhere to the
    same alignment
  • The problem reduces to a computation at a single
    position.

43
Fitch-Hartigan minimum mutation problem
  • Minimum mutation for a single position
  • Input
  • rooted tree with n nodes
  • Each leaf is labeled by a single character
  • Output
  • Each interior node is labeled by a single
    character
  • The labeling minimizes the number of edges
    between nodes with different labels.

44
Fitch-Hartigan minimum mutation problem
  • Algorithmic approach Dynamic Programming
  • Let Tv denote the subtree rooted at node v
  • Let C(v) be the cost of the optimal solution for
    Tv
  • Let C(v, x) be the cost when v must be labeled by
    x
  • Let vi denote the ith child of node v
  • Base case for each leaf specify C(v) C(v, x)
    ?x ? S.
  • C(v) 0 C(v, x) 0 if leaf v is labeled by x.
  • C(v, x) ? if leaf v is not labeled by x.

45
Fitch-Hartigan minimum mutation problem
  • When v is an internal node
  • The recurrence relations start from the base
    cases.
  • Bottom up from leaves
  • Backtracking is used to after all C(v,x) computed
    to extract the solution.

46
Fitch-Hartigan minimum mutation problem
  • Backtracking process
  • The root is labeled by the character x s.t. C(r)
    C(r,x)
  • The traversal is then top-down
  • If v is labeled x, then vi is labeled
  • character x if C(vi) 1 gt C(vi,x)
  • o/w character y such that C(vi) C(vi,y)

47
Fitch-Hartigan minimum mutation problem
  • Lets evaluate an example
  • C(v) 0 C(v, x) 0 if leaf v is labeled by x,
    o/w C(v, x) ? if leaf v is not labeled by x.

48
Fitch-Hartigan minimum mutation problem
  • Time complexity
  • Bottom-up portion
  • Let s S
  • Each node is evaluate wrt each x ? S.
  • For n nodes this gives O(ns)
  • The backtracking portion is O(n)
  • Overall O(ns)

49
Maximum Parsimony
  • Most widely used tree building algorithm
  • Differs from distance-based algorithms
  • Does not actually build trees from distances
  • Parsimony is used to compute the cost of a tree
  • A search strategy is used to search through all
    topologies
  • Goal find the tree topology with the overall
    minimum cost

50
Traditional Parsimony
  • Algorithm Traditional parsimony Fitch 1971
  • Goal count the number of substitutions at a
    site.
  • Method recursion, keeping track of
  • C, the current cost
  • Rk, the residues at k, the current node

51
Traditional Parsimony
  • Algorithm Traditional parsimony Fitch 1971
  • C 0, k root / initialize the cost and
  • TP(k)
  • If k is a leaf then return xk
  • Rleft TP( k.left)
  • Rright TP(k.right)
  • if Rleft ? Rright ? ? return Rleft ? Rright
  • else
  • C C 1
  • return Rleft ? Rright

52
Traditional Parsimony
  • Lets evaluate an example
  • if Rleft ? Rright ? ? return Rleft ? Rright
  • else C C 1, return Rleft ? Rright

53
Traditional Parsimony
  • There is a traceback procedure for finding
    ancestral assignments.
  • Q How do you think the traceback works?
  • A Start from the root
  • Pick a residue
  • Pick the same residue for each child set if
    possible
  • If a child set does not contain the parents
    residue, randomly select a residue from its set.

54
Traditional Parsimony
  • Lets perform the traceback on our example
Write a Comment
User Comments (0)
About PowerShow.com