A Practical Approach to Significance Assessment in Alignments with Gaps - PowerPoint PPT Presentation

1 / 11
About This Presentation
Title:

A Practical Approach to Significance Assessment in Alignments with Gaps

Description:

A Practical Approach to Significance Assessment in Alignments ... to more complex scoring systems (affine gap costs, correlated bonds, PAM or BLOSUM matrices) ... – PowerPoint PPT presentation

Number of Views:30
Avg rating:3.0/5.0
Slides: 12
Provided by: nichol2
Category:

less

Transcript and Presenter's Notes

Title: A Practical Approach to Significance Assessment in Alignments with Gaps


1
A Practical Approach to Significance Assessment
in Alignments with Gaps
  • Nicholas Chia and Ralf Bundschuh
  • Department of Physics
  • Ohio State University

2
Gapped Alignment
scoring matrix
number of gaps
alignment score
gap cost
G
A
T
C
G
G
T
A
C
-
3
Significance Assessment
Studying the distribution of alignment scores
among random sequences yields information about
the rarity, a.k.a., biological import, of a given
alignment score
Evidence suggests that gapped alignment scores
are distributed according to the Gumbel or
extreme value distribution
but statistically characterizing the slow
exponential tail of the Gumbel (?) requires a
large number of simulations.
score
Gumbel parameters
maximum alignment score
Can we find a better method to understand gapped
distributions?
alignment
Can we evaluate ? faster?
4
Small Change in Geometry
T
C
G
T
A
Needleman-Wunsch global alignment score
G
C
T
If we can solve for ?, we know ?!
G
C
T
C
G
but how do we solve for ??
A
i.e., how do we account for the length dependence
of the score?
Change geometry!
  • fixing the width also fixes the state space
    allowing us to model alignment as a Markov
    process
  • replaces a 2D length dependence with a 1D width
    dependence where we can use a Markov matrix to
    solve infinite t behavior

t1
5
The Markov Model
Using score differences between lattice sites as
our elements, we can write a Markov matrix
describing the transition from t to t1
In this way we can describe the fixed width
dynamics for infinite t
t
t1
largest eigenvalue of the modified Markov matrix
In order to obtain information about the relevant
quantity
we modify our Markov matrix to include the
necessary ?
all alignment parameters
dependence.
6
Technique for Calculating ?w(??)
The System Parameters - ?
  • match-mismatch scoring matrix
  • linear gap costs
  • technical condition to reduce state space
  • periodic boundary conditions
  • Bernoulli randomness (uncorrelated bonds)
    approximation
  • Alphabet size 4

Calculating ?w(??)
  • construct the modified Markov matrices
    symbolically
  • eigenvalue simply too difficult to solve
    symbolically large matrices (105)
  • ARPACK solves for the largest eigenvalue ?w with
    precision and speed since the matrices are sparse
    (N log N) - Lehoucq et al., SIAM 1997

7
Understanding the W-dependence
Definition
... not quite that simple
  • cannot solve for extremely large W
  • non-trivial width dependence

Kardar-Parisi-Zhang Systems
So, what can we do?
Derrida Lebowitz, PRL 1998
From Derrida and Lebowitz comes the scaling
function G, which gives the form of the width
dependence as follows
ASEP
KPZ
Sequence Alignment
Gapped Alignment
KPZ systems all share properties on a course
grained level
By understanding the W-dependence, we can solve
for ? and ?!
8
Calculating ?
parameter dependent scaling factors
Equation for ?
Equation for ?W
W-dependence
Can solve for ? if we know the scaling factors a?
and b?!
Fit difference in order to obtain scaling factors
Solution for ?
9
Convergence of ?
0.65
?
0.6
0.55
0
2
4
6
8
10
12
W
10
Results
11
Conclusion
  • Succeeded in calculating ? with precision on
    fast timescales
  • Demonstrated a non-sampling method for
    calculating ?
  • Successful synthesis of computational biology,
    high performance numerics, and statistical
    physics
  • In principle, can be generalized to more complex
    scoring systems (affine gap costs, correlated
    bonds, PAM or BLOSUM matrices)
Write a Comment
User Comments (0)
About PowerShow.com