PHYLOGENY RECONSTRUCTION - PowerPoint PPT Presentation

1 / 30
About This Presentation
Title:

PHYLOGENY RECONSTRUCTION

Description:

Phylogeny pattern of historical relationships among species. ... Phylogeny Example for Mammal. a. b. f. e. c. d. a. b. f. c. d. e. Rooted and Unrooted tree ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 31
Provided by: gan98
Category:

less

Transcript and Presenter's Notes

Title: PHYLOGENY RECONSTRUCTION


1
PHYLOGENY RECONSTRUCTION FROM QUARTETS
Jia-Huai You Department of Computing
Science University of Alberta
2
Outline
  • Introduction
  • Research Methods
  • Computational Results and Analysis

3
Common Evolutionary Tree Terminology
Phylogeny pattern of historical relationships
among species . Tree mathematical structure used
to depict the evolutionary history of a group of
species
Leaf Nodes
Branches or Edges
A
Represent the species (genes, populations,
etc.) used to infer the phylogeny
internal
B
C
D
ROOT of the Tree (common ancestor of all species)
E
Internal Nodes (represent hypothetical ancestors
of the species)
4
Phylogeny Example for Mammal
5
Rooted and Unrooted tree
6
General Process of Phylogeny Construction
Input A set of (DNA or protein) sequences for
the species
Output An evolutionary tree(phylogeny) whose
leaf nodes are the input species
Methods Maximum Parsimony (MP), Maximum
Likelyhood (ML),etc
Not suitable for large trees (over 20 species).
Current software all use heuristics to speed up
the computational time
7
Quartet Based Phylogeny Construction
  • There is only one unrooted tree for one, two or
    three species.
  • There are three possible unrooted trees for four
    species (A, B, C, D)
  • Quartets are smallest informative unrooted trees
  • MP or ML can be solved exactly on quartets

ABCD
ACBD
ADBC
8
Process of Quartet Based Phylogeny Construction
9
Definitions
A quartet abcd is consistent with a phylogeny T,
or a phylogeny T satisfies a quartet abcd , if
and only if a,b,c,d are all leaves of T and the
path from a to b does not share any nodes with
the path from c to d.
10
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Phylogeny T
Quartet Set Q
Phylogeny T
quartet aecd is consistent with T, or T
satisfies aecd
11
Definitions
Given a set of quartets Q on a set S of species,
Q is compatible, if and only if there is a
phylogeny on S which satisfies all the quartets
in Q.
A set Q of quartet topologies is complete if Q
contains a quartet topology for each four labels
over label set S.
12
aecd abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Quartet Set Q
Phylogeny T
  • The quartet set Q is compatible
  • The quartet set Q is complete

13
Problem Descriptions
In practice, the given quartet set Q usually
contains errors and thus is incompatible.
Quartet Compatibility Problem(QCP) Input A
set Q of quartets on S Question Is Q
compatible? Equivalently, is there a phylogeny T
on S such that all quartets in Q are satisfied?
Maximum Quartet Consistency Problem (MQC) Input
A set Q of quartets on S. Goal Find a phylogeny
T on S such that the number of consistent
quartets in Q is maximized.
Minimum Quartet Inconsistency Problem
(MQI) Input A set Q of quartets on S. Goal
Find a phylogeny T on S such that the number of
inconsistent quartets in Q is minimized.
14
aced abcd abce abcf abde abdf abef
afcd acef adef becd bfcd bcef
bdef cdef
Input Quartet Set Q
Quartet Compatibility Problem(QCP)?
No
MQC or MQI ?
Only aced is not satisfied
15
Known Results
Quartet Compatibility Problem(QCP) can be solved
in polynomial time if the given quartet set Q is
complete. But it is NP-Complete if Q is
incomplete.
Maximum Quartet Consistency Problem (MQC) and
Minimum Quartet Inconsistency Problem (MQI) are
NP-Complete even if Q is complete.
Exact algorithms "Guarantee" to find the
optimal or "best" tree. Heuristic algorithms
Approximate or quick-and-dirty methods that
attempt to find the optimal tree, but cannot
guarantee to do so.
16
Known Results
Lots of Heuristics. Best known approximation
algorithm is Quartet Cleaning, with approximation
ratio of for MQI, where n is number of
species
There are only two exact algorithms in
literature. Dynamic programming has the
complexity of , where m is the
number of input quartets and n is number of input
species. It is a general algorithm. Fixed
Parameter Algorithm has the complexity of
, where k is the largest number of
quartet errors and n is the number of input
species. Good if k is very small compared to the
total number of quartets. Worse than Dynamic
Programming if k is relatively large.
Dynamic programming can solve MQC problem with 20
species in 6 days in a 300MHz computer. Fixed
Parameter Algorithm can solve MQI problem with 50
species when k 100 in 40 minutes in a 750MHz
computer.
17
Research Objectives
  • Exact algorithm for MQC
  • Quartet set Q is complete
  • Faster
  • Can solve problem with more species

18
Ultrametric Tree and Matrix
Ultrametric Tree We label each internal node
with a number. If along any root to leaf path,
the labels of the internal nodes on the path is
strictly decreasing, then the tree with its
labels is called ultrametric tree.
Ultrametric Matrix Each entry value is the label
of least common ancestor of the two leaf nodes.
It is
  • Symmetric, M(i, i) 0 and
  • For every triplet (i, j, k) there are two equal
    values among
  • M(i, j), M(j, k), and M(i, k) and they are
    greater than the third value.

e.g. i1, j3, k4, M(1, 3)M(3, 4)gt M(1, 4)
19
Theorem 1 A quartet abcd is consistent with a
phylogeny T if and only if any ultrametric
labeling scheme M of T satisfies min M(a, c),
M(b, d) gt minM(a, b), M(c, d).
20
Theorem 1 A quartet abcd is consistent with a
phylogeny T if and only if any ultrametric
labeling scheme M of T satisfies min M(a, c),
M(b, d) gt minM(a, b), M(c, d).
s1 s5 s2 s3 is consistent with the tree and
its corresponding matrix min M(1, 2), M(5,
3)4 gt minM(1, 5), M(2, 3)1. Condition
satisfied!
21
Theorem 2 Given a set Q of quartets on a set of
species S and an ultrametric phylogeny T on S, T
satisfies the maximum number of quartets in Q if
and only if the corresponding ultrametric matrix
M on S satisfies the maximum number of quartets
in Q.
We transfer the original MQC problem into an
ultrametric matrix searching problem
22
(No Transcript)
23
Formulation in Answer Set Programming
Domain
1m(1, 2, 1),m(1, 2, 2),m(1, 2, 3),m(1, 2,
4),m(1, 2, 5)1 matrix entry (1,2) takes exactly
one value in the domain 1,5
Ultrametric Constraints
for three matrix values, m(i,j), m(j,k) and
m(i,k), two of them are equal and greater than
the third one
Quartet Constraints
if minm(i,k),m(j,l)gtminm(i,j),m(k,l) then
quartet i,jk,l is satisfied
Objective
maximize q(i,j,k,l)
24
Optimizations
25
Experiment Results
n number of species p percentage of quartet
errors
26
Phylogenetic Analysis of SARS
  • Severe Acute Respiratory Syndrome (SARS) is
    recognized as a coronavirus
  • The coronaviruses are currently divided into
    three groups
  • The representative viruses from each group are
    shown as

27
Phylogeny Construction Procedure
  • Get the whole whole genome data and protein data
    for each virus from NCBI website.
  • Compute a distance matrix M for these viruses.
  • Use the quartet-based algorithm to generate a
    phylogeny from M.
  • Compute the average distance between SARS-Cov
    and Group 1(D1), Group 2(D2), and Group 3(D3)
    viruses, respectively.

28
Phylogeny on Protein Data with Outgroup
The following is a phylogeny, where SARS-Cov lies
in Group 3.
OUT-GROUP
 
 
MHV
TGEV
D1464.4 D2460.3 D3459
253
HCov-229E
GROUP 2
216
180
196
2.8
GROUP 1
56
124  
16.2
BCov
11.4
50
9.5
196
55
0.35
HCov-OC43
205
Hcov-NL63
229
PEDV
230
IBV
SARS-Cov
GROUP 3
29
Phylogeny on Genome Data with Outgroup
Based on genome data, we can see that SARS lies
in an individual group, but a bit more close to
group 2 and 3.
D1457.4 D2456.1 D3455.2
30
Summary
  • Our phylogeny construction method can
    successfully identify three groups of
    coronaviruses.
  • SARS-Cov locates more closely to group 2 and 3
    than group 1. The average distances of SARS-Cov
    to the group 2 and 3 viruses are approximately
    same.
  • Our quartet-based method can consistently
    generate same phylogeny with various outgroups.
    This phylogeny suggests that SARS-Cov lies more
    likely in the group 3.
Write a Comment
User Comments (0)
About PowerShow.com