An efficient multiple alignment method for RNA secondary structures including pseudoknots PowerPoint PPT Presentation

presentation player overlay
1 / 24
About This Presentation
Transcript and Presenter's Notes

Title: An efficient multiple alignment method for RNA secondary structures including pseudoknots


1
An efficient multiple alignment method for RNA
secondary structures including pseudoknots
2nd International Workshop on Natural Computing,
Dec. 10-12, 2007 Noyori Conference Hall, Nagoya
University, Japan
  • Shinnosuke Seki 1 Satoshi Kobayashi 2

1 Department of Computer Science, University of
Western Ontario, London, Ontario, Canada, N6A
5B7, sseki_at_csd.uwo.ca 2 Department of Computer
Science, The University of Electro-Communications,
1-5-1 Chofugaoka, Chofu, Tokyo, Japan, 182-8585,
satoshi_at_cs.uec.ac.jp
2
Problem setting
  • INPUT
  • RNA secondary structures (2 or more)
  • Sequential info.
  • Structural info. (which can be obtained through
    database or a prediction algorithm based on the
    sequential info.)
  • OUTPUT
  • The alignment of the input RNA secondary
    structures as a grammatical model

3
Secondary structure alignment
  • DNA and RNA sequences fold into themselves so
    that they form a 2D (secondary) or 3D (tertiary)
    structures.
  • These highly-dimensional structures play an
    important role in determining biological
    functions.
  • Similar structures may have similar functions.
  • The structure alignment aims at finding a
    similarity between structures as well as between
    sequences.

4
Cloverleaf structure (tRNA)
  • Secondary structure
  • 1 multiple loop with 3 hairpin loops
  • Tertiary structure
  • L-shaped 3D-structure

5
Pseudoknotted structure (tmRNA)
  • E coli. transfer-messenger RNA
  • Hairpin loops
  • Bulge loops
  • Internal loops
  • Multiple loops
  • pseudoknots

6
NP-hardness of pseudoknotted structure alignment
  • The alignment based on the edit distance between
    pseudoknotted structures has proven NP-hard.
  • We focus on a subset of pseudoknotted structures
    which can be modeled by a grammar called SLTAGs.
  • Most of pseudoknots in reality can be modeled by
    SLTAGs.

7
Chomsky-Schützenberger hierarchy
  • Context-free grammars are strong enough to model
    pseudoknot-free secondary structures.
  • Modeling pseudoknotted structures requires
    stronger grammars like context-sensitive grammars.

8
Simple Linear Tree Adjoining Grammars (SLTAGs)
  • A mild context-sensitive grammar (between CF
    CS)
  • Growing a tree by replacing -node by a tree
    called the adjoining tree (bolded in left fig.)
  • Terminal symbols derived at the same time are
    considered to form a base-pair.
  • Descriptive power for pseudoknots (left fig.)

S
A
S
S
S
C
G
S
A
S
S
S
S
U
?
?
U
?
5 A C U G 3
9
Simple Linear TAG (SLTAG)
  • SLTAG
  • A TAG with the property that any tree derived
    from it has exact one -node.
  • Hence, a derivation by SLTAGs can be regarded as
    a sequence of symbols for adjoining trees.
  • Like D A1 A2 A3 A1 A4
  • Known descriptive to model sufficient amount of
    pseudoknots which exist in reality.

10
Challenges in modeling by SLTAGs
  • Ambiguity
  • Based on a grammar, there may exist multiple
    derivations of a word.
  • When modeling something by a grammar, its
    ambiguity must be taken into account!
  • How to overcome the ambiguity?
  • Alignment of derivations by SLTAGs Seki
    Kobayashi, 2005
  • Multiple pseudoknots modeling
  • SLTAGs can model an RNA secondary structure with
    1 pseudoknot, not multiple pseudoknots.

11
Abstract RNA Structure (ARNAS) model
  • A tree structure to model an RNA secondary
    structure to represent a relationship among its
    components.
  • Vertices of ARNAS models are
  • String (single base chain)
  • Tandem (also-called stem, cascade of base-pairs)
  • Pseudoknot

12
Example 1 ARNAS model for tRNA cloverleaf
ARNAS model
Secondary structure
root
tandem
SC
SC
SC
SC
SC
SC
tandem
tandem
tandem
D-arm
T-arm
SC
SC
SC
A-arm
SC single-base chain
13
ARNAS components
  • String (can be modeled by regular grammar)
  • A single base chain of maximal length
  • Sequential information only
  • Tandem (can be modeled by context-free grammar)
  • A cascade of base-pairs
  • Information of sequence, of nested base-pairing,
    and of its child components.
  • Pseudoknot (requires context-sensitive grammar)
  • A pseudoknot in a biological sense
  • A pseudoknot structure which can be modeled by
    SLTAGs
  • Information of sequence, of crossing
    base-pairing, and of its child components.

14
Example 2ARNAS model for tmRNA
Secondary structure
ARNAS model
root
SC
tandem
AAAAAAUAGUGAC
GCUUUAGCAG CUGC UAGAGC
pseudoknot
CUUAAUAAC
U
CGAGG GCGGUU CCUCG AGCCGC
G
GG
UAAAA
15
Alignment of ARNAS components
  • ARNAS components can be modeled by SLTAGs.
  • The SLTAG parser Uemura et al., 1999 provides
    the set of all derivations of each component to
    be aligned.
  • Based on the dynamic programming, the alignment
    algorithm for SLTAG models Seki Kobayashi,
    2005 calculates alignments for all combinations
    of 2 derivations, and finds the optimal alignment
    among them.
  • The components to be aligned may have sub ARNAS
    models as their children. The alignments of these
    sub ARNAS models have been calculated previously,
    and accommodated in the alignment of these
    components.

16
Time-complexity of component alignment algorithm
(Table 1)
  • The algorithm can employ context-free or regular
    grammars as its base-grammar depending on
    components to be aligned.
  • Its time-complexity varies as follows where s1
    and s2 are of bases in components to be aligned.

17
ARNAS Alignment algorithm
  • Based on the tree alignment algorithm Jiang et
    al., 1995 whose time complexity is
    , where n1 and n2 are of nodes of trees to be
    aligned.
  • Scores to edit nodes of ARNAS models are
    alignment scores of corresponding ARNAS
    components.
  • Bottom-up approach
  • Given two ARNAS models, the algorithm
  • calculates alignments between leaf components
    (strings),
  • calculates alignments between their parent
    components based on their alignments,
  • repeat this process until it reaches the
    alignment of root components, which is the
    alignment between the ARNAS models.

18
The time-complexity of ARNAS alignment algorithm
  • Given RNA secondary structures of length n1 and
    n2,
  • Theoretical time complexity is .
  • In reality, it is not so intractable because of
  • The scarcity of pseudoknots
  • Almost all component alignments can be done in
    time.
  • Short-bp property
  • A pseudoknot is much shorter than the secondary
    structure itself.

19
Multiple alignment algorithm
  • Progressive alignment approach
  • Given multiple ARNAS models, find the two ARNAS
    models with the highest similarity.
  • The alignment result is also an ARNAS model so
    that we can repeat this process until all ARNAS
    models given are aligned.

ARNAS((1, (2, 3)), 4)
ARNAS(1, (2, 3))
ARNAS(2, 3)
ARNAS1
ARNAS2
ARNAS3
ARNAS4
20
Experimental results (1)
  • How many pseudoknotted secondary structures can
    be converted into ARNAS models?
  • INPUT 675 RNA pseudoknotted structures in
    comparative RNA (CRW) Website http//www.rna.icmb
    .utexas.edu.
  • 561 of 675 (83.1) can be converted into ARNAS
    models.
  • All but one RNAs of length up to about 2400 can
    be converted.
  • This means that RNA structures hardly contain a
    pseudoknot which cannot be modeled by SLTAGs.

21
Experimental results (2)
  • Short-bp property
  • INPUT The 561 RNA secondary structures whose
    pseudoknots can be modeled by SLTAGs.
  • Compare the length of RNA secondary structure
    with the length of longest pseudoknots in it.
  • The least-square method provides the following
    theoretical curve, where x is the length of
    secondary structure, and P(x) is the length of
    longest pseudoknots.

22
Short-bp Property
23
Experimental results (3)
  • An experimental time complexity
  • SETTING
  • Intel(R) Xeon processors 2.8GHz2 with 2GB memory
  • Cf. on this environment, our original algorithm
    without ARNAS modification takes about 600 sec.
    to align pseudoknots of length around 80
    nucleotides.
  • INPUT 150 of 561 RNAs with structural info.
  • RESULT A theoretical curve between x (the length
    of RNAs) and T(x) (the alignment time sec.) is
    as follows
  • It can align RNAs of 2400 nucleotides in about 15
    secs.

24
Running Time
25
Future work
  • Experiments on the accuracy of ARNAS alignment
    algorithm
  • Comparison with other algorithms for
    pseudoknotted RNA alignment
Write a Comment
User Comments (0)
About PowerShow.com