Title: A Practical Approach to Significance Assessment in Alignments with Gaps
1A Practical Approach to Significance Assessment
in Alignments with Gaps
- Nicholas Chia and Ralf Bundschuh
- Department of Physics
- Ohio State University
2Gapped Alignment
scoring matrix
number of gaps
alignment score
gap cost
G
A
T
C
G
G
T
A
C
-
3Significance Assessment
Studying the distribution of alignment scores
among random sequences yields information about
the rarity, a.k.a., biological import, of a given
alignment score
Evidence suggests that gapped alignment scores
are distributed according to the Gumbel or
extreme value distribution
but statistically characterizing the slow
exponential tail of the Gumbel (?) requires a
large number of simulations.
score
Gumbel parameters
maximum alignment score
Can we find a better method to understand gapped
distributions?
alignment
Can we evaluate ? faster?
4Small Change in Geometry
T
C
G
T
A
Needleman-Wunsch global alignment score
G
C
T
If we can solve for ?, we know ?!
G
C
T
C
G
but how do we solve for ??
A
i.e., how do we account for the length dependence
of the score?
Change geometry!
- fixing the width also fixes the state space
allowing us to model alignment as a Markov
process
- replaces a 2D length dependence with a 1D width
dependence where we can use a Markov matrix to
solve infinite t behavior
t1
5The Markov Model
Using score differences between lattice sites as
our elements, we can write a Markov matrix
describing the transition from t to t1
In this way we can describe the fixed width
dynamics for infinite t
t
t1
largest eigenvalue of the modified Markov matrix
In order to obtain information about the relevant
quantity
we modify our Markov matrix to include the
necessary ?
all alignment parameters
dependence.
6Technique for Calculating ?w(??)
The System Parameters - ?
- match-mismatch scoring matrix
- technical condition to reduce state space
- periodic boundary conditions
- Bernoulli randomness (uncorrelated bonds)
approximation
Calculating ?w(??)
- construct the modified Markov matrices
symbolically - eigenvalue simply too difficult to solve
symbolically large matrices (105) - ARPACK solves for the largest eigenvalue ?w with
precision and speed since the matrices are sparse
(N log N) - Lehoucq et al., SIAM 1997
7Understanding the W-dependence
Definition
... not quite that simple
- cannot solve for extremely large W
- non-trivial width dependence
Kardar-Parisi-Zhang Systems
So, what can we do?
Derrida Lebowitz, PRL 1998
From Derrida and Lebowitz comes the scaling
function G, which gives the form of the width
dependence as follows
ASEP
KPZ
Sequence Alignment
Gapped Alignment
KPZ systems all share properties on a course
grained level
By understanding the W-dependence, we can solve
for ? and ?!
8Calculating ?
parameter dependent scaling factors
Equation for ?
Equation for ?W
W-dependence
Can solve for ? if we know the scaling factors a?
and b?!
Fit difference in order to obtain scaling factors
Solution for ?
9Convergence of ?
0.65
?
0.6
0.55
0
2
4
6
8
10
12
W
10Results
11Conclusion
- Succeeded in calculating ? with precision on
fast timescales - Demonstrated a non-sampling method for
calculating ? - Successful synthesis of computational biology,
high performance numerics, and statistical
physics - In principle, can be generalized to more complex
scoring systems (affine gap costs, correlated
bonds, PAM or BLOSUM matrices)