An introduction to maximum parsimony and compatibility - PowerPoint PPT Presentation

About This Presentation
Title:

An introduction to maximum parsimony and compatibility

Description:

Nearby sites will tend to have 'greater' genealogical correlation than distant sites ... If recombination has occurred, genealogical correlation will be partially ... – PowerPoint PPT presentation

Number of Views:101
Avg rating:3.0/5.0
Slides: 41
Provided by: ahmedab4
Category:

less

Transcript and Presenter's Notes

Title: An introduction to maximum parsimony and compatibility


1
An introduction to maximum parsimony and
compatibility
  • Trevor Bruen
  • PhD Candidate
  • McGill Centre for Bioinformatics

2
Overview
  • The point of this talk is to give a sense how
    discrete mathematics enters into phylogenetic and
    genetic inference.
  • I will illustrate these ideas by describing two
    approaches in detail namely maximum compatibility
    and maximum parsimony.
  • I will also show how ideas from these two
    criteria can be used to develop applications such
    as bounds and tests for recombination.
  • My goal is to give the basis for further study in
    this type of area and to give greater insight
    into these methods.

3
Outline
  • Introduction to compatibility and parsimony
  • Overview of basic notation/concepts
  • Compatibility
  • Compatibility as a graph theory problem
  • Compatibility for pairs of characters
  • Interpretation of compatibility
  • Parsimony
  • Parsimony score with connections to graph theory
  • Connections between parsimony and compatibility
  • Homoplasy
  • Parsimony for pairs of characters
  • Connections between SPRs/TBRs and parsimony
  • Applications to recombination
  • Parsimony as a consensus method

4
Introduction
  • Maximum parsimony and maximum compatibility that
    are used in phylogenetics, linguistics and
    population genetics
  • Phylogenetics goal is to infer an evolutionary
    tree
  • Linguistics often the same
  • Population genetics uses compatibility for
    recombination
  • For general phylogenetic inference with molecular
    data, likelihood (probability based) methods are
    generally preferred.
  • BUT compatibility and parsimony are
    computationally tractable.
  • ALSO the mathematics behind parsimony and
    compatibility is very well developed. We can
    show that parsimonylikelihood in certain
    circumstances (Tuffley and Steel 1997). This
    gives us insight in where to go in terms of
    research.

5
Formalism
  • A character is a mapping from a set of taxa to a
    set of states.
  • In this case, XS1,S2,S3,S4
  • Also, CA,C
  • Informally, a character is a column in a
    multiple sequence alignment

6
Binary Character / Splits
  • If character has two states then it induces a
    split of the taxa set.
  • Example Let X be the taxa set S1,S2,S3,S4.
    Let C be the state set A,C.
  • Then S1,S2 S3,S4 is the split induced by
    the first character.
  • In general a character induces a set of
    equivalence classes

7
Tree and Labeling
  • Informally we would like to be able to
    mathematically describe a tree and a labeling
    structure.
  • In graph theory a tree T(V,E) consists of a
    graph with no cycles.
  • Informally, we would also like to be able to add
    taxa (members of X) to our tree (actually the
    leaves).
  • Define a labeling function (such that leaves of
    V(T) are labeled by members of X)

8
X-Trees
  • An X-tree consists of pair (T, phi) where phi is
    a labeling function that labels the leaves of T.
  • Recall

9
Extensions
  • Informally, we have an X-tree consisting of the
    pair (T,phi). We also have a character chi. We
    need to relate the character to the tree.
  • Define an extension of character as a function
    (which is consistent at the leaves with chi)
  • Informally, an extension provides a description
    of how the internal vertices are labeled.

10
Quick Summary
  • Summary so far
  • X-tree are trees along with functions labeling
    the leaves with members of X
  • A character is a function from X into a state set
    C
  • An extension is a labeling of the vertices of T
    with states of C

11
Compatibility - Definition
  • A character is compatible with a tree if and only
    if there exists an extension of the character to
    the tree so that the subgraphs induced by each of
    the states are connected.
  • Example
  • First tree character is compatible with tree
  • Second tree character is incompatible since both
    As are disconnected

12
Compatibility
  • Problem definition Given a sequence of
    characters
  • determine whether there exists a tree on
    which all character are compatible.
  • Related problem Given a sequence of characters
  • determine largest set of characters that are
    compatible with some tree

13
Intersection Graph
  • Suppose we have sequence of
  • characters
  • where
  • Then each character induces a partition of X -
    I.e.
  • Create a graph where the vertex set consists of
  • There is an edge between two vertices iff only
    the intersection of the two subsets are non-empty

14
Intersection Graph
  • To figure out whether the sequence of characters
  • are compatible, we will be able to determine
    this directly from the intersection graph.
  • First we need to define two concepts a chordal
    graph and a restricted chordal completion of the
    intersection graph.

15
Chordal Graphs
  • A graph G(V,E) is chordal graph if every cycle
    with at least four vertices contains a chord (an
    edge connecting two non-consecutive vertices).
  • A chordalization of graph is a graph G(V,E)
    where such that G is
    chordal

16
Restricted Chordal Completions
  • Imagine the vertices of our graph G(V,E) are
    colored. Then a restricted chordalization of G
    is a graph G(V,E), where G is chordal but all
    edges of G connect vertices of different colors.

17
Restricted chordal completions
  • A restricted chordal completion of the
    intersection graph is a chordalization where
    there is no edge between vertices that share the
    same character.
  • In this case, the colors correspond to
    characters

18
Main Theorem for Compatibility
  • Let be a
    collection of characters. Then is
    compatible if and only if there is a restricted
    chordal completion of the intersection graph.

19
Pairs of Characters
  • A simple corollary of main theorem arises when we
    restrict our attention to two characters.
  • Corollary Two characters
  • are compatible if and only if the
    intersection graph, G for both characters is
    acyclic
  • Proof (backwards direction) If graph is acyclic
    then it is chordal so the characters are
    compatible.
  • (forward direction) OTOH Suppose G contains
    a cycle. Then any chordal completion of G must
    contain a three cycle. But no restricted
    completion of G can contain a three cycle! So G
    is acyclic.

20
Interpretation
  • Recall a set of characters are compatible with a
    X-tree if and only if there exists an extension
    of the character to the tree so that the
    subgraphs induced by each of the states are
    connected.
  • Informally speaking this is a very strict
    condition. This corresponds to an all or
    nothing condition - either a character is
    compatible with a tree or it isnt. Relaxing
    this condition is the subject of the next
    section.

21
Parsimony
  • Informally given an leaf labeled tree and a
    character, how can we define the fit of the
    character to the tree?
  • Consider a character, along with an
    extension to a leaf labeled tree. Then
    the length of the extension is the number edges
    where
  • Define the parsimony score of a character on a
    tree as the length of a minimal extension of the
    character to the tree. Denote this value by

22
Parsimony
  • Then the maximum parsimony score for a set of
    characters
  • on a tree is defined as
  • The tree that minimizes this score is referred to
    as the maximum parsimony tree.

23
Parsimony and graph theory
  • A minimal cut-set for a leaf-labeled tree T(V,E)
    and a character is a minimal set of edges
    whose removal ensure that if
    that x and y are in different components.
  • Claim There is a bijection between the set of
    minimal cut sets and minimal extensions. So the
    cardinality of the minimal cut set is equal to
    the parsimony score.

24
Parsimony and Graph Theory
  • Recall Mengers Theorem (1927) Let G(V,E) be a
    graph with V1 and V2 as two disjoint subsets of
    V. Then the minimum number of edges whose
    removal from G leaves vertices of V1 and V2 in
    different components is equal to the maximum
    number of edge disjoint paths between V1 and V2.
  • Corollary For a binary character, the maximal
    number of edge disjoint paths corresponds to the
    parsimony score.

25
Compatibility and parsimony
  • Recall let
    be a collection of characters. Then
    is compatible if and only if there is a
    restricted chordal completion of the intersection
    graph.
  • Question How can characterize parsimony with
    respect to an intersection graph?

26
Compatibility Graph
  • Recall Each character induces a partition of X -
    I.e.
  • A block for a character
  • is a subset taxa on which is constant.
  • Thus we may identify the blocks of
  • with the vertices of the intersection
    graph.

27
Character Refinement
  • A character refines another character
    if
  • implies
  • Thus characters that refine other characters
    correspond to refinements of the partition

28
Compatibility and Parsimony
  • Recall Let
    be a collection of characters. Then
    is compatible if and only if there is a
    restricted chordal completion of the intersection
    graph.
  • Main

29
Special Case Two characters
  • Recall Two characters are compatible if and only
    if the intersection graph, G for both characters
    is acyclic
  • Using the previous theorem we can show that the
    parsimony score for two
  • characters corresponds to
  • where k is the number of components in the graph.
  • Note This score corresponds to the maximum
    parsimony score over all trees.

30
Homoplasy
  • Recall The parsimony score of a character on a
    tree, corresponds to minimum number of
    changes of a character on a tree.
  • Informally What is an intuitive way to think
    about the parsimony score?
  • Define the homoplasy of character on a tree as

31
Homoplasy
  • Note that with equality
    if and only if is convex on T
  • Informally Homoplasy corresponds to the number
    of extra mutations of the character on the
    tree. These extra mutations correspond to
    recurrent mutations
  • Informally Thus a character is not compatible
    on a tree iff it cannot be placed on a tree
    without extra mutations.

32
Homoplasy For Two Characters
  • Recall The parsimony score for a pair of
    characters can be found directly from the
    bipartite intersection graph.
  • Recall This score corresponds to an optimum
    over all trees.
  • Thus for two characters, we can define a pairwise
    homoplasy score as
  • Recall Up to now homoplasy refers to extra
    mutations on a tree.

33
A second look at homoplasy
  • Example Two characters with a pairwise
    homoplasy score equal to one.
  • Informally We have seen that the homoplasy
    corresponds to the number of extra mutations on
    a tree.
  • But in certain situations, this is biologically
    implausible. The state 1 may correspond to a
    mutation that has only arisen once. In this
    case, the fact that the pairs of characters are
    incompatible can be explained by a recombination
    event.
  • This will be defined more precisely later.

34
A quick aside - tree distances.
  • Differences between leaf labeled trees can be
    defined using various metrics - e.g. Subtree
    Prune and Regrafts
  • A subtree prune and regraft corresponds to a
    specific re-arrangement of a tree.
  • For two leaf-labeled trees, dSPR(T1, T2) is
    minimum SPRs between T1 and T2

35
Homoplasy for two characters
  • Theorem If and are two
    characters then corresponds
    to the minimum number of SPRs from any
    leaf-labled tree on which is compatible
    to any leaf labeled tree on which is
    compatible!
  • Informally Thus we have a whole new
    interpretation of homoplasy.

36
Application - Testing for Recombination
  • If recombination has occurred sites will have
    different histories
  • Nearby sites will tend to have greater
    genealogical correlation than distant sites
  • Idea If recombination has occurred,
    genealogical correlation will be partially
    reflected by a tendency for pairs of closely
    linked sites to have than less homoplasy than
    distant sites

37
Test for Recombination
  • Idea We would like to distinguish between two
    possibilities - recurrent mutation and
    recombination.
  • Idea Use previous observations to develop test
    for recombination.
  • H0 Single history describe all sites.
  • H0 Nearby sites share no more compatibility
    than arbitrary pairs of sites
  • Use statistic to capture information and
    solve analytically for p-values

38
Application Parsimony and supertrees
  • Supertree MRP - parsimony with characters that
    represent trees.
  • What does homoplasy mean in this context?

Courtesy of TREE 12315-322
39
Parsimony as a consensus tree
  • Recall If and are two
    characters then corresponds
    to the minimum number of SPRs from any
    leaf-labeled tree on which is compatible
    to any leaf labeled tree on which is
    compatible.
  • Informally This can be generalized to show that
    the maximum parsimony tree for a set of charaters
  • minimizes the SPR distance to each of the
    set of tree on which each character is compatible

40
Acknowledgements
  • Thanks for listening!
  • Background and further reading
  • Phylogenetics, Semple and Steel (book 2003)
  • Some results I presented are not on this book -
    they are from work I have worked on. Please talk
    to me if you are interested.
  • I have many other references- please see me if
    interested.
Write a Comment
User Comments (0)
About PowerShow.com