Title: An introduction to maximum parsimony and compatibility
1An introduction to maximum parsimony and
compatibility
- Trevor Bruen
- PhD Candidate
- McGill Centre for Bioinformatics
2Overview
- The point of this talk is to give a sense how
discrete mathematics enters into phylogenetic and
genetic inference. - I will illustrate these ideas by describing two
approaches in detail namely maximum compatibility
and maximum parsimony. - I will also show how ideas from these two
criteria can be used to develop applications such
as bounds and tests for recombination. - My goal is to give the basis for further study in
this type of area and to give greater insight
into these methods.
3Outline
- Introduction to compatibility and parsimony
- Overview of basic notation/concepts
- Compatibility
- Compatibility as a graph theory problem
- Compatibility for pairs of characters
- Interpretation of compatibility
- Parsimony
- Parsimony score with connections to graph theory
- Connections between parsimony and compatibility
- Homoplasy
- Parsimony for pairs of characters
- Connections between SPRs/TBRs and parsimony
- Applications to recombination
- Parsimony as a consensus method
4Introduction
- Maximum parsimony and maximum compatibility that
are used in phylogenetics, linguistics and
population genetics - Phylogenetics goal is to infer an evolutionary
tree - Linguistics often the same
- Population genetics uses compatibility for
recombination - For general phylogenetic inference with molecular
data, likelihood (probability based) methods are
generally preferred. - BUT compatibility and parsimony are
computationally tractable. - ALSO the mathematics behind parsimony and
compatibility is very well developed. We can
show that parsimonylikelihood in certain
circumstances (Tuffley and Steel 1997). This
gives us insight in where to go in terms of
research.
5Formalism
- A character is a mapping from a set of taxa to a
set of states. - In this case, XS1,S2,S3,S4
- Also, CA,C
- Informally, a character is a column in a
multiple sequence alignment
6Binary Character / Splits
- If character has two states then it induces a
split of the taxa set. - Example Let X be the taxa set S1,S2,S3,S4.
Let C be the state set A,C. - Then S1,S2 S3,S4 is the split induced by
the first character. - In general a character induces a set of
equivalence classes
7Tree and Labeling
- Informally we would like to be able to
mathematically describe a tree and a labeling
structure. - In graph theory a tree T(V,E) consists of a
graph with no cycles. - Informally, we would also like to be able to add
taxa (members of X) to our tree (actually the
leaves). - Define a labeling function (such that leaves of
V(T) are labeled by members of X)
8X-Trees
- An X-tree consists of pair (T, phi) where phi is
a labeling function that labels the leaves of T.
- Recall
9Extensions
- Informally, we have an X-tree consisting of the
pair (T,phi). We also have a character chi. We
need to relate the character to the tree. - Define an extension of character as a function
(which is consistent at the leaves with chi) - Informally, an extension provides a description
of how the internal vertices are labeled.
10Quick Summary
- Summary so far
- X-tree are trees along with functions labeling
the leaves with members of X - A character is a function from X into a state set
C - An extension is a labeling of the vertices of T
with states of C
11Compatibility - Definition
- A character is compatible with a tree if and only
if there exists an extension of the character to
the tree so that the subgraphs induced by each of
the states are connected. - Example
- First tree character is compatible with tree
- Second tree character is incompatible since both
As are disconnected
12Compatibility
- Problem definition Given a sequence of
characters - determine whether there exists a tree on
which all character are compatible. - Related problem Given a sequence of characters
- determine largest set of characters that are
compatible with some tree
13Intersection Graph
- Suppose we have sequence of
- characters
- where
- Then each character induces a partition of X -
I.e. -
- Create a graph where the vertex set consists of
- There is an edge between two vertices iff only
the intersection of the two subsets are non-empty
14Intersection Graph
- To figure out whether the sequence of characters
- are compatible, we will be able to determine
this directly from the intersection graph. - First we need to define two concepts a chordal
graph and a restricted chordal completion of the
intersection graph. -
15Chordal Graphs
- A graph G(V,E) is chordal graph if every cycle
with at least four vertices contains a chord (an
edge connecting two non-consecutive vertices). - A chordalization of graph is a graph G(V,E)
where such that G is
chordal
16Restricted Chordal Completions
- Imagine the vertices of our graph G(V,E) are
colored. Then a restricted chordalization of G
is a graph G(V,E), where G is chordal but all
edges of G connect vertices of different colors.
17Restricted chordal completions
- A restricted chordal completion of the
intersection graph is a chordalization where
there is no edge between vertices that share the
same character. - In this case, the colors correspond to
characters
18Main Theorem for Compatibility
- Let be a
collection of characters. Then is
compatible if and only if there is a restricted
chordal completion of the intersection graph.
19Pairs of Characters
- A simple corollary of main theorem arises when we
restrict our attention to two characters. - Corollary Two characters
- are compatible if and only if the
intersection graph, G for both characters is
acyclic - Proof (backwards direction) If graph is acyclic
then it is chordal so the characters are
compatible. - (forward direction) OTOH Suppose G contains
a cycle. Then any chordal completion of G must
contain a three cycle. But no restricted
completion of G can contain a three cycle! So G
is acyclic.
20Interpretation
- Recall a set of characters are compatible with a
X-tree if and only if there exists an extension
of the character to the tree so that the
subgraphs induced by each of the states are
connected. - Informally speaking this is a very strict
condition. This corresponds to an all or
nothing condition - either a character is
compatible with a tree or it isnt. Relaxing
this condition is the subject of the next
section.
21Parsimony
- Informally given an leaf labeled tree and a
character, how can we define the fit of the
character to the tree? - Consider a character, along with an
extension to a leaf labeled tree. Then
the length of the extension is the number edges
where - Define the parsimony score of a character on a
tree as the length of a minimal extension of the
character to the tree. Denote this value by
22Parsimony
- Then the maximum parsimony score for a set of
characters - on a tree is defined as
- The tree that minimizes this score is referred to
as the maximum parsimony tree.
23Parsimony and graph theory
- A minimal cut-set for a leaf-labeled tree T(V,E)
and a character is a minimal set of edges
whose removal ensure that if
that x and y are in different components. - Claim There is a bijection between the set of
minimal cut sets and minimal extensions. So the
cardinality of the minimal cut set is equal to
the parsimony score.
24Parsimony and Graph Theory
- Recall Mengers Theorem (1927) Let G(V,E) be a
graph with V1 and V2 as two disjoint subsets of
V. Then the minimum number of edges whose
removal from G leaves vertices of V1 and V2 in
different components is equal to the maximum
number of edge disjoint paths between V1 and V2. - Corollary For a binary character, the maximal
number of edge disjoint paths corresponds to the
parsimony score.
25Compatibility and parsimony
- Recall let
be a collection of characters. Then
is compatible if and only if there is a
restricted chordal completion of the intersection
graph. - Question How can characterize parsimony with
respect to an intersection graph?
26Compatibility Graph
- Recall Each character induces a partition of X -
I.e. -
- A block for a character
- is a subset taxa on which is constant.
- Thus we may identify the blocks of
- with the vertices of the intersection
graph.
27Character Refinement
- A character refines another character
if - implies
- Thus characters that refine other characters
correspond to refinements of the partition
28Compatibility and Parsimony
- Recall Let
be a collection of characters. Then
is compatible if and only if there is a
restricted chordal completion of the intersection
graph. - Main
29Special Case Two characters
- Recall Two characters are compatible if and only
if the intersection graph, G for both characters
is acyclic - Using the previous theorem we can show that the
parsimony score for two - characters corresponds to
- where k is the number of components in the graph.
- Note This score corresponds to the maximum
parsimony score over all trees.
30Homoplasy
- Recall The parsimony score of a character on a
tree, corresponds to minimum number of
changes of a character on a tree. - Informally What is an intuitive way to think
about the parsimony score? - Define the homoplasy of character on a tree as
31Homoplasy
- Note that with equality
if and only if is convex on T - Informally Homoplasy corresponds to the number
of extra mutations of the character on the
tree. These extra mutations correspond to
recurrent mutations - Informally Thus a character is not compatible
on a tree iff it cannot be placed on a tree
without extra mutations.
32Homoplasy For Two Characters
- Recall The parsimony score for a pair of
characters can be found directly from the
bipartite intersection graph. - Recall This score corresponds to an optimum
over all trees. - Thus for two characters, we can define a pairwise
homoplasy score as - Recall Up to now homoplasy refers to extra
mutations on a tree.
33A second look at homoplasy
- Example Two characters with a pairwise
homoplasy score equal to one. - Informally We have seen that the homoplasy
corresponds to the number of extra mutations on
a tree. - But in certain situations, this is biologically
implausible. The state 1 may correspond to a
mutation that has only arisen once. In this
case, the fact that the pairs of characters are
incompatible can be explained by a recombination
event. - This will be defined more precisely later.
34A quick aside - tree distances.
- Differences between leaf labeled trees can be
defined using various metrics - e.g. Subtree
Prune and Regrafts - A subtree prune and regraft corresponds to a
specific re-arrangement of a tree. - For two leaf-labeled trees, dSPR(T1, T2) is
minimum SPRs between T1 and T2
35Homoplasy for two characters
- Theorem If and are two
characters then corresponds
to the minimum number of SPRs from any
leaf-labled tree on which is compatible
to any leaf labeled tree on which is
compatible! - Informally Thus we have a whole new
interpretation of homoplasy.
36Application - Testing for Recombination
- If recombination has occurred sites will have
different histories - Nearby sites will tend to have greater
genealogical correlation than distant sites - Idea If recombination has occurred,
genealogical correlation will be partially
reflected by a tendency for pairs of closely
linked sites to have than less homoplasy than
distant sites
37Test for Recombination
- Idea We would like to distinguish between two
possibilities - recurrent mutation and
recombination. - Idea Use previous observations to develop test
for recombination. - H0 Single history describe all sites.
- H0 Nearby sites share no more compatibility
than arbitrary pairs of sites - Use statistic to capture information and
solve analytically for p-values
38Application Parsimony and supertrees
- Supertree MRP - parsimony with characters that
represent trees. - What does homoplasy mean in this context?
Courtesy of TREE 12315-322
39Parsimony as a consensus tree
- Recall If and are two
characters then corresponds
to the minimum number of SPRs from any
leaf-labeled tree on which is compatible
to any leaf labeled tree on which is
compatible. - Informally This can be generalized to show that
the maximum parsimony tree for a set of charaters
- minimizes the SPR distance to each of the
set of tree on which each character is compatible
40Acknowledgements
- Thanks for listening!
- Background and further reading
- Phylogenetics, Semple and Steel (book 2003)
- Some results I presented are not on this book -
they are from work I have worked on. Please talk
to me if you are interested. - I have many other references- please see me if
interested.