Title: Parametric Alignment
1Parametric Alignment
- Colin Dewey and Lior Pachter
2Biological Sequence Alignment
- What are we really doing?
- Establishing homology relationships
- Given two biological strings ?1,?2, define
homology relation Rhom on ?1 ? ?2
-CGATT GCC-TA
Homologous positions Nucleotides evolved from a
common ancestral nucleotide
3Alignment Parameters
- Global Alignment (Needleman-Wunsch)
- Special case of alignment pair HMM
- Parameters (combinations of emission and
transition probabilities) - Match (m)
- Mismatch (x)
- Gap Start (g)
- Gap Extend (e)
4Example Global Alignment
?1 GCCTA, ?2 CGATT log(match) 1,
log(mismatch) -1
C
G
A
T
T
G
C
C
T
A
of possible alignments D(5,5) 1,683
5Example Global Alignment
?1 GCCTA, ?2 CGATT log(match) 1,
log(mismatch) -1
C
G
A
T
T
-GCCTA CG-ATT -GCCTA CGA-TT -GCCTA CGATT- GC-CT
A -CGATT GCC-TA -CGATT
G
C
C
T
A
6Parametric Alignment
- An alignment corresponds to a monomial
- Represented by point
7Input/Output
Newton Polytope
Parametric Aligner
?1,?2
Normal Fan
Alignments
-GCCTA CG-ATT -GCCTA CGA-TT
8Example Polytope
VERTICES 1 1 4 1 2 4 1 0 6 1 2 6 1 0
10 FACETS -4 0 1 0 1 0 -6 2 1 10 -2 -1 2 -1 0
9Normal Fan
log(mismatch)
4
3
log(match)
2
1
0
10Constrained Normal Fan
log(mismatch) lt 0
log(mismatch)
log(mismatch) lt log(match)
log(match)
1
0
11Output
Number of polytope vertices 5 Inequalities for
cone 0 -1m 0x gt 0 1m -1x gt 0 match
-0.5 mismatch -1 gap -1 Number of optimal
alignments for cone 1 Inequalities for cone
1 1m 0x gt 0 0m -1x gt 0 match
0.5 mismatch -0.5 gap -0.5 Number of optimal
alignments for cone 5 Total number of optimal
alignments 6
12Alignments
Cone 0 GCCTA CGATT
Cone 1 -GCCTA CG-ATT -GCCTA CGA-TT -GCCTA CGATT
- GC-CTA -CGATT GCC-TA -CGATT
13Random vs. Related Sequences
14Polytope vertices for different models
15Theorem
- Theorem There exist two homologous biological
sequences ?1,?2 such that no choice of parameters
for the 2-parameter Needleman-Wunsch algorithm
gives the biologically correct alignment
16Proof of Theorem
dr_3 AGTGATTT----TTGCATAACAGGTCTACTT---- hs_34
AGGCATCAGAAGTTGAGAGACAACTCTCCATGCAG
TACATGAC----------ATTTTCGAGAAAAAAA
TCCACGCCCTCAGAGAAGACTTTCGGGAGAAAAA
VERTICES 1 38 33 1 37 33 1 38 45 1 0 70 1 0 121
17Implementation
templatelttypename SemiRinggt void alignGlobalLastR
ow(const string seq1, const
string seq2, const typename
SemiRingElement match, const
typename SemiRingElement mismatch,
const typename SemiRingElement gap,
vectorlttypename SemiRingElementgt
row)
18Implementation
const Element one SemiRingmultiplicativeIdenti
ty // Initialize row row.resize(seq2.size()
1) row0 one // Calculate first row for
(size_t j 1 j lt seq2.size() j) rowj
gap rowj - 1 // Calculate remaining
rows Element up, diag for (size_t i 1 i lt
seq1.size() i) diag row0 row0
gap for (size_t j 1 j lt seq2.size()
j) up rowj if (seq1i -
1 seq2j - 1) rowj match
diag gap (up rowj - 1) else
rowj mismatch diag gap (up
rowj - 1) diag up
19Next Steps
- Bayesian Alignment
- Prior probabilities on parameters
- Find best alignment given priors
- More easily