Minimal Recombinations Histories and Global Pedigrees - PowerPoint PPT Presentation

About This Presentation
Title:

Minimal Recombinations Histories and Global Pedigrees

Description:

Minimal Recombinations Histories and Global Pedigrees Finding Minimal Recombination Histories 1 2 3 4 1 2 3 1 4 2 3 4 Global Pedigrees Finding Common Ancestors – PowerPoint PPT presentation

Number of Views:130
Avg rating:3.0/5.0
Slides: 51
Provided by: Jot50
Category:

less

Transcript and Presenter's Notes

Title: Minimal Recombinations Histories and Global Pedigrees


1
Minimal Recombinations Histories and Global
Pedigrees
Finding Minimal Recombination Histories
1
2
3
4
1
2
3
1
4
2
3
4
Global Pedigrees
Acknowledgements Yun Song - Rune
Lyngsø - Mike Steel - Carsten Wiuf
2
Basic Evolutionary Events
Recombination
Gene Conversion
Coalescent/Duplication
Mutation
3
Time slices
All positions have found a common ancestors on
one sequence
All positions have found a common ancestors
Time
1 2
1 2
1 2
1 2
1 2
N
1
Population
4
Recombination-Coalescence Illustration
Copied from Hudson 1991
Intensities Coales. Recomb.
0 ?
1 (1b)?
b
3 (2b)?
6 2?
3 2?
1 2?
5
Encoding, Phylogenies and Incompatibility
0
1,2,3,4
1 C 2 C 3 C 4 C 5 A 6 A 7 A
0 0 0 0 1 1 1
1 mutation per site
0
1
1
5,6,7
Four combinations
Incompatibility
0 0 0 1 1 0 1
0 0 0 0 1 1 1
00
10
01
11
6
The 1983 Kreitman Data the infinite site
assumption (M. Kreitman 1983 Nature)
  • 11 sequences of alcohol dehydrogenase gene in
    Drosophila melanogaster.
  • Can be reduced to 9 sequences (3 of 11 are
    identical).
  • 3200 bp long, 43 segregating sites, 28 of which
    are informative

Recoded Kreitman data i. (0,1) ancestor
state known ii. Multiple copies represented
by 1 sequence iii. Non-informative sites
could be removed
7
Hudson Kaplans RM
0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 1 0
0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 1 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1
1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 0 1 1
1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
0 1
If you equate RM with expected number of
recombinations, this could be used as an
estimator. Unfortunately, RM is a gross
underestimate of the real number of
recombinations.
8
Recombination Parsimony Hein, 1990,93 Song
Hein, 2002
9
Metrics on Trees based on subtree transfers.
Trees including branch lengths
Unrooted tree topologies
Rooted tree topologies
Tree topologies with age ordered internal nodes
Pretending the easy problem (unrooted) is the
real problem (age ordered), causes violation of
the triangle inequality
10
Tree Combinatorics and Neighborhoods
Observe that the size of the unit-neighbourhood
of a tree does not grow nearly as fast as the
number of trees
Due to Yun Song
Song (2003)
Allen Steel (2001)
11
1
12
2
13
3
14
4
15
5
6
16
7
17
(No Transcript)
18
Branch and Bound Algorithm
0 3 1 91 2 1314 3
8618 4 30436 5 62794 6 78970 7
63049 8 32451 9 10467 10 1727
Lower bound
?
Upper Bound
Exact length
k
k-recombinatination neighborhood
1. The number of ancestral sequences in the
ACs.
2. Number of ancestral sequences in the ACs
for neighbor pairs
3. AC compatible with the minimal ARG.
4. AC compatible with close-to-minimal ARG.
19
The Minimal Recombination History for the
Kreitman Data
Methods of rec events obtained
Hudson Kaplan (1985) 5
Myers Griffiths (2003) 6
Song Hein (2004). Set theory based approach. 7
Song Hein (2003). Current program using rooted trees. Lyngsø, Song Hein (2006). Massive Acceleration using Branch and Bound Algorithm. Lyngsø, Song Hein (2006). Minimal number of Gene Conversions (in prep.) 7 7 5-2
20
Spatial Coalescent-Recombination Algorithm (Wiuf
Hein 1999 TPB)
Spatial Process
Temporal Process
i. The process is non-Markovian
ii. The trees cannot be reduced to Topologies


21
Gene Conversions Treeness
Gene Conversion
Recombination
Coalescent
Star tree
22
The Bad News Actual, potentially detectable and
detected recombinations
Minimal ARG
True ARG
0
4 Mb
23
The Good News Quality of the estimated local tree
((1,2),(1,2,3))
True ARG
1
2
3
4
5
Reconstructed ARG
1
2
3
4
5
((1,3),(1,2,3))
n7 r10 Q75
24
Simultaneous Inference of Haplotypes
Recombination Events Combinatorial Optimization
Version
Data Genotypes/SNPs
Gusfield, 2002
A
C
A,G
C,G
G
G
1
?
?
??
2
?,?
?
?
?
?
3
?,?
?,?
?
?
Song et al., 2006
Rahman/Lyngsø (unpubl.) Heuristic Sequence of
Phylogenies
25
The Griffiths-Ethier-Tavare Recursions
No recombination Infinite Site Assumption
Ancestral State
Known
History Graph Recursions Exists
No cycles
Possible Histories without Recombination for
simple data example
0
1
1
1
4
2
3
5
4
5
5
5
6
3
7
2
- recombination 27 ACs recombination
3108 ACs
8
1
26
Ancestral configurations to 2 sequences with 2
segregating sites
27
Counting Recursion
Summary statistic lumping configurations
k1(k21)1 padded with -

1
k1
k
28
Enumeration of Ancestral States(via counting
restricted non-negative integer matrices with
given row and column sums)
Due to Yun Song

29
Examples of Likelihood Calculations
010 010 101 101 110
R3
R1
R2
30
Time slices
All positions have found a common ancestors on
one sequence
All positions have found a common ancestors
Time
1 2
1 2
1 2
1 2
1 2
N
1
Population
31
Number of genetic ancestors to the Human Genome
Sr number of Segments E(Sr) 1 r
time
C
C
C
R
R
R
sequence
Simulations
Statements about number of ancestors are much
harder to make.
32
Applications to Human Genome (Wiuf and Hein,97)
Parameters used 4Ne 20.000 Chromos. 1 263 Mb.
263 cM Chromosome 1 Segments 52.000
Ancestors 6.800 All chromosomes Ancestors
86.000 Physical Population. 1.3-5.0 Mill.
A randomly picked ancestor (ancestral material
comes in batteries!)
33
Multiple and Simultaneous Coalescents
1. Simultaneous Events 2. Multifurcations. 3.
Underestimation of Coalescent Rates
34
Recombination Induced Multiple Coalescent Events
P(X2 gt 1) (2N-1)/2N 1-(1/2N)
1
High recombination rate will create many
ancestors violating the coalescent assumption
that sample size ltlt 2N 2N10.000, sample size
(10, 200, 3000, 8000)
35
Recombination Induced Multiple Coalescent Events
Number of our genetic ancestors Recombination
Carriers Gene Conversion Carriers Gene
Conversion Length 300, GR,100R
  • Recombination
  • Recombination Gene Conversion

Recombination Carriers Gene Conversion
Carriers Mixed
36
Recombination Induced Multiple Coalescent Events
Coalescent Rate Discrete versus Continuous
Coalescent Rate Discrete versus Continuous
Consequences for Recombination-Coalescent
Process Globally Wrong, Locally Correct.
37
Questions based on Large Data Sets
Much much more sequence data 1.Comparative
Genomics of a Huge Scale 2.Population
Genomics One issue reconstructing
population pedigrees. Extreme
data Identifiability of pedigrees
3.Association Mapping on the Tree of Life
4. Somatic Gene Genealogies and the Models of
Embryology
38
Global Pedigrees
99 Chang and Derrida. Time to a universal common
ancestor 04 Rhode tries to answer this for
realistic population model
  • Combining the Coalescent and Pedigree Process
  • Super-pedigree problem
  • Bound on how much data is needed to infer a
    pedigree
  • Does embedded phylogenies determined the pedigree
  1. Wiuf Hein (1999) 'A contribution to the
    discussion of J. Chang's paper "Recent Common
    Ancestor of All Present Human Individuals" ' (
    Adv. Appl. Prob. vol. 31.4)
  2. Hein (2004) "Pedigrees for all Humanity" Nature
    431.512-13.
  3. Steel and Hein (2005) Reconstructing Pedigrees
    A combinatorial perspective. J.Theor. Biol.

39
Combining Ancestral Individuals and the
Coalescent Wiuf Hein, 2000.
Let T be the time, when somebody was everybodys
ancestor. Changs result lim
T/log2(N) 1 prob. 1 Unify the two
processes I. Sample more individuals
II. Let each have 2 parents with probabilty p.
Result A discontinuity at 1.
For plt1 change log2?logp Comment
Genetic Ancestors is a vanishing set within
Genealogical Ancestors.
40
Pedigree Ancestors and Human History Rhode,
Olson Chang, 2004
More realistic Model of Human History Geography
and Growth
E(T) 2300 years ago E(U) 4500 years ago
41
Probability of Data given a Pedigree.
Elston-Stewart (1971) -Temporal Peeling
Algorithm
Condition on parental states Recombination and
mutation are Markovian
42
Counting Pedigrees Tong Chen Rune Lyngsø
2
3
1
0
1
2
1
4
Ak(i,j) - the number of pedigrees k generations
back with i females, k males.
2 4
3 279
4 2.8107
5 2.81020
6 7.41052
7 2.810131
8 2.910317
9 3.510749
10 3.9101737
43
Pedigree Counting
  • Counting gender un-labelled pedigrees
  • Much harder.
  • Counting gender labellings on un-labelled
    pedigree.

gender un-labelable
44

Inverting Random Functions a bound on
segregating sites needed to reconstruct a global
pedigrees Steel Szekely, 1998 Steel and Hein,
2005
The population can be partitioned into triples a
couple that gets a pair of children an outsider
that has a child with one of them. This creates
a a mapping from a generation to the previous,
fundamentally labeling all ancestors.
The number of global pedigrees for k generations
with 3n individuals
Number of segregating sites - s - needed to
predict correct global pedigree with at least 0.5
probability of a population of size n for d
generations
Ex. 3106, 300 generations (7000 years) this
lower bound would give a minimum of 2000 sites.
(probably a gross underestimate).
45
Reconstructing global pedigrees Steel and Hein,
2005
Knowing the gender-labeled pedigrees for all
pairs, defines the global pedigree (last k
generations)
Links and lassos determine the global pedigree
(last k generations)
gender labelling of ancestors are crucial
46
Benevolent Mutation and Recombination Process
Genomes with r and m/r --gt infinity r -
recombination rate, m - mutation rate
  • All embedded phylogenies are observable
  • Do they determine the pedigree?

Counter example
Embedded phylogenies
47
Pedigree Reconstruction Principles
Distance Based Reconstructions
Gender specific rates
Continuous Birth Time with Perfect Clock
t3
t2
t1
Subtree Transfer Identification of Ancestors
Recursive Definition of Ancestral Genomes
48
The Coalescent with Recombination
Retrospective in stead of Prospective formulation
of Genetical Processes (Ewens, 1979) 40s
retrospective arguments used by both Fisher and
Wright. 75 Watterson full formulation of
probability of genealogical relationship of a set
of alleles. 82 Three Famous articles by
Kingman. 83 Hudson Includes Recombination in
Genealogical Process.
  • Number of Ancestors to a DNA Sequence.
  • Reformulation of Genealogical Process.
  • Inclusion of Gene Conversion in Genealogical
    Process.
  1. Wiuf Hein (1997) On the Number of Ancestors to
    a DNA Sequence
  2. Wiuf Hein (1999) The Ancestry of a Sample of
    Sequences Subject to Recombination
  3. Wiuf Hein (1999) The Coalescent with
    Recombination as a point process moving along
    sequences.
  4. Wiuf Hein (2000) The Coalescent with Gene
    Conversion

49
Finding Minimal Recombination Histories
64 Bodmer Edwards Parsimony defined as
reconstruction principle 85 Hudson Kaplan uses
minimal recombination histories as observed
recombinations
  • Attempts to find minimal histories of sequences
  • Definition of recombination as Subtree Prune
    Regraft operations
  1. J.J.Hein Reconstructing the history of
    sequences subject to Gene Conversion and
    Recombination. Mathematical Biosciences. (1990)
    98.185-200.
  2. J.J.Hein A Heuristic Method to Reconstruct the
    History of Sequences Subject to Recombination.
    J.Mol.Evol. 20.402-411. 1993
  3. Hein,J.J., T.Jiang, L.Wang K.Zhang (1996) "On
    the complexity of comparing evolutionary trees"
    Discrete Applied Mathematics 71.153-169
  4. Song, Y.S. (2003) On the combinatorics of rooted
    binary phylogenetic trees. Annals of
    Combinatorics, 7365379
  5. Song, Y.S. Hein, J. (2005) Constructing
    Minimal Ancestral Recombination Graphs. J. Comp.
    Biol., 12147169
  6. Song, Y.S. Hein, J. (2004) On the minimum
    number of recombination events in the
    evolutionary history of DNA sequences. J. Math.
    Biol., 48160186.
  7. Song, Y.S. Hein, J. (2003) Parsimonious
    reconstruction of sequence evolution and
    haplotype blocks finding the minimum number of
    recombination events, Lecture Notes in
    Bioinformatics, Proceedings of WABI'03,
    2812287302.
  8. Lyngsø, Song and Hein (2005) Minimal
    Recombination Histories by Branch and Bound WABI

50
Likelihood of Data Set
72 Ewens likelihood of allele number
observations 87 Griffiths recursions for infinite
site data 90 Felsenstein uses Metropolis Hastings
94 Griffiths-Tavare uses MCMC on
coalescent-mutation process 96 Griffiths-Marjoram
uses MCMC on coalescent-mutation-recombination
process 99 Donnelly-Matthews-Fearnhead uses IS
to accellerate earlier methods 00 Hudson
introduces pseuodolikelihood method
  • How hard is the coalescent-mutation-recombination
    process?
  1. Song, Y.S., Lyngsø, R.B. Hein, J. (2005)
    Counting Ancestral States in Population
    Genetics. In Press
Write a Comment
User Comments (0)
About PowerShow.com