An efficient multiple alignment method for RNA secondary structures including pseudoknots presentation

About This Presentation

Transcript and Presenter's Notes

Title: An efficient multiple alignment method for RNA secondary structures including pseudoknots

1
An efficient multiple alignment method for RNA
secondary structures including pseudoknots
2nd International Workshop on Natural Computing,
Dec. 10-12, 2007 Noyori Conference Hall, Nagoya
University, Japan

Shinnosuke Seki 1 Satoshi Kobayashi 2

1 Department of Computer Science, University of
Western Ontario, London, Ontario, Canada, N6A
5B7, sseki_at_csd.uwo.ca 2 Department of Computer
Science, The University of Electro-Communications,
1-5-1 Chofugaoka, Chofu, Tokyo, Japan, 182-8585,
satoshi_at_cs.uec.ac.jp
2
Problem setting

INPUT
RNA secondary structures (2 or more)
Sequential info.
Structural info. (which can be obtained through
database or a prediction algorithm based on the
sequential info.)
OUTPUT
The alignment of the input RNA secondary
structures as a grammatical model

3
Secondary structure alignment

DNA and RNA sequences fold into themselves so
that they form a 2D (secondary) or 3D (tertiary)
structures.
These highly-dimensional structures play an
important role in determining biological
functions.
Similar structures may have similar functions.
The structure alignment aims at finding a
similarity between structures as well as between
sequences.

4
Cloverleaf structure (tRNA)

Secondary structure
1 multiple loop with 3 hairpin loops
Tertiary structure
L-shaped 3D-structure

5
Pseudoknotted structure (tmRNA)

E coli. transfer-messenger RNA
Hairpin loops
Bulge loops
Internal loops
Multiple loops
pseudoknots

6
NP-hardness of pseudoknotted structure alignment

The alignment based on the edit distance between
pseudoknotted structures has proven NP-hard.
We focus on a subset of pseudoknotted structures
which can be modeled by a grammar called SLTAGs.
Most of pseudoknots in reality can be modeled by
SLTAGs.

7
Chomsky-Schützenberger hierarchy

Context-free grammars are strong enough to model
pseudoknot-free secondary structures.
Modeling pseudoknotted structures requires
stronger grammars like context-sensitive grammars.

8
Simple Linear Tree Adjoining Grammars (SLTAGs)

A mild context-sensitive grammar (between CF
CS)
Growing a tree by replacing -node by a tree
called the adjoining tree (bolded in left fig.)
Terminal symbols derived at the same time are
considered to form a base-pair.
Descriptive power for pseudoknots (left fig.)

S
A
S
S
S
C
G
S
A
S
S
S
S
U
?
?
U
?
5 A C U G 3
9
Simple Linear TAG (SLTAG)

SLTAG
A TAG with the property that any tree derived
from it has exact one -node.
Hence, a derivation by SLTAGs can be regarded as
a sequence of symbols for adjoining trees.
Like D A1 A2 A3 A1 A4
Known descriptive to model sufficient amount of
pseudoknots which exist in reality.

10
Challenges in modeling by SLTAGs

Ambiguity
Based on a grammar, there may exist multiple
derivations of a word.
When modeling something by a grammar, its
ambiguity must be taken into account!
How to overcome the ambiguity?
Alignment of derivations by SLTAGs Seki
Kobayashi, 2005
Multiple pseudoknots modeling
SLTAGs can model an RNA secondary structure with
1 pseudoknot, not multiple pseudoknots.

11
Abstract RNA Structure (ARNAS) model

A tree structure to model an RNA secondary
structure to represent a relationship among its
components.
Vertices of ARNAS models are
String (single base chain)
Tandem (also-called stem, cascade of base-pairs)
Pseudoknot

12
Example 1 ARNAS model for tRNA cloverleaf
ARNAS model
Secondary structure
root
tandem
SC
SC
SC
SC
SC
SC
tandem
tandem
tandem
D-arm
T-arm
SC
SC
SC
A-arm
SC single-base chain
13
ARNAS components

String (can be modeled by regular grammar)
A single base chain of maximal length
Sequential information only
Tandem (can be modeled by context-free grammar)
A cascade of base-pairs
Information of sequence, of nested base-pairing,
and of its child components.
Pseudoknot (requires context-sensitive grammar)
A pseudoknot in a biological sense
A pseudoknot structure which can be modeled by
SLTAGs
Information of sequence, of crossing
base-pairing, and of its child components.

14
Example 2ARNAS model for tmRNA
Secondary structure
ARNAS model
root
SC
tandem
AAAAAAUAGUGAC
GCUUUAGCAG CUGC UAGAGC
pseudoknot
CUUAAUAAC
U
CGAGG GCGGUU CCUCG AGCCGC
G
GG
UAAAA
15
Alignment of ARNAS components

ARNAS components can be modeled by SLTAGs.
The SLTAG parser Uemura et al., 1999 provides
the set of all derivations of each component to
be aligned.
Based on the dynamic programming, the alignment
algorithm for SLTAG models Seki Kobayashi,
2005 calculates alignments for all combinations
of 2 derivations, and finds the optimal alignment
among them.
The components to be aligned may have sub ARNAS
models as their children. The alignments of these
sub ARNAS models have been calculated previously,
and accommodated in the alignment of these
components.

16
Time-complexity of component alignment algorithm
(Table 1)

The algorithm can employ context-free or regular
grammars as its base-grammar depending on
components to be aligned.
Its time-complexity varies as follows where s1
and s2 are of bases in components to be aligned.

17
ARNAS Alignment algorithm

Based on the tree alignment algorithm Jiang et
al., 1995 whose time complexity is
, where n1 and n2 are of nodes of trees to be
aligned.
Scores to edit nodes of ARNAS models are
alignment scores of corresponding ARNAS
components.
Bottom-up approach
Given two ARNAS models, the algorithm
calculates alignments between leaf components
(strings),
calculates alignments between their parent
components based on their alignments,
repeat this process until it reaches the
alignment of root components, which is the
alignment between the ARNAS models.

18
The time-complexity of ARNAS alignment algorithm

Given RNA secondary structures of length n1 and
n2,
Theoretical time complexity is .
In reality, it is not so intractable because of
The scarcity of pseudoknots
Almost all component alignments can be done in
time.
Short-bp property
A pseudoknot is much shorter than the secondary
structure itself.

19
Multiple alignment algorithm

Progressive alignment approach
Given multiple ARNAS models, find the two ARNAS
models with the highest similarity.
The alignment result is also an ARNAS model so
that we can repeat this process until all ARNAS
models given are aligned.

ARNAS((1, (2, 3)), 4)
ARNAS(1, (2, 3))
ARNAS(2, 3)
ARNAS1
ARNAS2
ARNAS3
ARNAS4
20
Experimental results (1)

How many pseudoknotted secondary structures can
be converted into ARNAS models?
INPUT 675 RNA pseudoknotted structures in
comparative RNA (CRW) Website http//www.rna.icmb
.utexas.edu.
561 of 675 (83.1) can be converted into ARNAS
models.
All but one RNAs of length up to about 2400 can
be converted.
This means that RNA structures hardly contain a
pseudoknot which cannot be modeled by SLTAGs.

21
Experimental results (2)

Short-bp property
INPUT The 561 RNA secondary structures whose
pseudoknots can be modeled by SLTAGs.
Compare the length of RNA secondary structure
with the length of longest pseudoknots in it.
The least-square method provides the following
theoretical curve, where x is the length of
secondary structure, and P(x) is the length of
longest pseudoknots.

22
Short-bp Property
23
Experimental results (3)

An experimental time complexity
SETTING
Intel(R) Xeon processors 2.8GHz2 with 2GB memory
Cf. on this environment, our original algorithm
without ARNAS modification takes about 600 sec.
to align pseudoknots of length around 80
nucleotides.
INPUT 150 of 561 RNAs with structural info.
RESULT A theoretical curve between x (the length
of RNAs) and T(x) (the alignment time sec.) is
as follows
It can align RNAs of 2400 nucleotides in about 15
secs.

An efficient multiple alignment method for RNA secondary structures including pseudoknots PowerPoint PPT Presentation