On The RNA Structure Prediction Problems: Structural Inference Technique and Other Recent Algorithms - PowerPoint PPT Presentation

1 / 33

About This Presentation

Title:

On The RNA Structure Prediction Problems: Structural Inference Technique and Other Recent Algorithms

Description:

free energy. Best result without Pseudoknot: O(n ) time [1] ... We further improve it to O(nm2 log n) time and O(m2 log n) space. ... – PowerPoint PPT presentation

Number of Views:174

Avg rating:5.0/5.0

Slides: 34

Provided by: compN

Category:

more less

Transcript and Presenter's Notes

Title: On The RNA Structure Prediction Problems: Structural Inference Technique and Other Recent Algorithms

1
On The RNA Structure Prediction Problems
Structural Inference Technique and Other Recent
Algorithms

Hugo Willy
HT030031E

2
Content of Presentation

Introduction
RNA and Its Functions
RNA Structures
Brief Review on Computational Methods on RNA
Structure Prediction
Ab Initio Predictive Methods
Comparison Methods
Inference Methods
Current Work
Introduction
Preliminaries and Problem Definition
Previous Approach
Sparsification
Recursive Dynamic Programming
Hirschberg-like Recursive Traceback Technique
Conclusion and Future Direction
References

3
RNA

Stands for Ribonucleic Acid
A biological polymer consisting monomers called
nucleotides
Each nucleotide consists of a (ribose) sugar, a
phosphate group and a base.
There are mainly 4 types of base in RNA
sequences.

4
RNA
Watson-Crick Base Pairing
Wobble Base Pairing
5
RNA Functions

DNA transcription and translation
Transcription Messenger RNA (mRNA),
Translation Transfer RNA(tRNA), Ribosomal RNA
(rRNA)
Catalyst and regulator in nucleic acid processing
and gene expression
Messenger RNA splicing Small Nuclear RNA
(snRNA)
rRNA processing in the nucleus Small Nucleolar
RNA (snoRNA)
Regulators Micro RNA (miRNA) which has two
types,
1)Small Interfering RNA (siRNA) and
2) Small Temporally Regulated RNA (stRNA)

6
RNA Primary Structure

The view of RNA from its nucleotides base
Commonly represented by a string S over the
alphabet SA,C,G,U
Can be found using similar techniques for DNA
sequencing, such as Gel Electrophoresis, etc.

7
RNA Secondary Structures
Helices
Bulge Loop
Hairpin Loop
Internal Loop
Multi Loop
8
RNA Tertiary Structures
Pseudoknot
Base Triple
9
RNA Structure

To a preserved function there corresponds a
preserved molecular conformation.
Secondary and tertiary structures are being
solved much slower than new RNA sequences being
discovered. Existing experimental methods are
relatively expensive and slow.
Denatured RNAs deterministically fold back to
their original folding in vitro. Thus, RNA
structure depends solely on its nucleotide
content. Computational method should exist!

10
Existing Computational Methods for RNA Structure
Prediction

Ab-Initio Predictive Methods
Try to compute the RNA structure solely based on
its nucleotide contents by minimizing the free
energy of the predicted structure.
Comparative Methods using sequence homology
By examining a set of homologous sequence along
with their covarying position, we can predict
interactions between non adjacent positions in
the sequence, such as base pairs, triples, etc.
Structural Inference Methods
Given a sequence with a known structure, we infer
the structure of another sequence known to be
similar to the first one by maximizing some
similarity function

11
Ab Initio Predictive Methods

Minimizing the sum of Free Energy of the
predicted structure
Uses experimentally determined local structures
free energy.
Best result without Pseudoknot O(n³) time 1
Best result with Restricted Pseudoknot
Simple Pseudoknot 2 O(n4) time
Recursive Pseudoknot 2 O(n5) time
General Pseudoknot 2 NP-Hard

12
Ab Initio Predictive Methods

Equilibrium Partition Function
The weighted sum of probabilities over all
possible structure, where the weight is computed
from the free enrgy of the structure.
Best result without Pseudoknot O(n³) time 1
Best result with Restricted Pseudoknot O(n5)
time 3 (Restricted to class of pseudoknots that
are physically most likely to occur)

13
Comparative Methods

Simultaneous Sequence Structure Alignment
First by D. Sankoff 4, with O(n6) time
complexity
Best result without pseudoknot O(M3n3) 5 where
M is the maximum distance between the 2 sequences
Stochastic Context Free Grammars (SCFG)
Use context free grammar to produce the base
pairing with some distribution (hence the term
stochastic)
Can only handle non-pseudoknotted structure
Most recent result in 6
A new model called Parallel Communicating Grammar
System (PCGS) is designed in 7, to handle
pseudoknots

14
Comparative Methods

Maximum Weighted Matching
Based on Gabows Maximal Weighted Matching
algorithm. Tries to find the base pairs in the
structure given the likelihood score of all
possible pairs.
Computing the likelihood score might require
multiple sequence alignment (slow)
The most recent publications are 8 and 9. 8
only considers bi-secondary RNA structures. While
9 tries to find helices of some minimum length
in the sequences and try to align them.

15
Comparative Methods

Iterative Loop Matching 10
Applies the Loop Matching algorithm by Nussinov
et.al. The algorithm finds a non-pseudoknotted
structure in each iteration and run the same
algorithm on the remaining unpaired bases.
Genetic algorithm 11
Find a set of possible structures. The algorithm
will
pass these structures through several stage of
evolution.
Bayesian Network and other approaches

16
Structural Inferring Methods

Given two RNA sequences S1 and S2, where the
secondary structure of S1 is known. The method of
this class will infer the secondary structure for
S2 by aligning S1 and S2. Let the length of S1
be n and length of S2 be m
Previously, Bafna et.al uses dynamic programming
to solve the problem in O(n2m2nm3) time and
O(n2m2) space 12. K. Zhang improves the result
to O(nm3) time and O(nm2) space 13.
We further improve it to O(nm2 log n) time and
O(m2 log n) space. The algorithm will be
described later.
The survey of works related to this class of
algorithm can be found here

17
Current Work

We submitted a paper to WABI 2004 under the title
A Faster and More Space-EfficientAlgorithm for
Inferring Arc-Annotations of RNA Sequences
through Alignment.
Our contributions
Improvement in running time by sparsification and
recursive dynamic programming
Improvement in space requirement using score-only
dynamic programming with a Hirschberg-like trace
back algorithm and compression.

18
Preliminaries

Consider two RNA sequences S1 and S2 with length
equal to n and m respectively. Only S1s
secondary structure is provided
To represent a base pair between the base S1i
and S1j, we use the pair (i,j), denoted as an
arc, where 1?iltj? n. The structure of S1 can
then be defined by a set P1 of arcs. The pair
(S1,P1) is called an arc-annotated sequence.
For RNA, it is obvious that S1i and S1j must
be complementary to each other.

19
Preliminaries

Considering secondary structures, the arc
annotation that corresponds to such structures is
the Nested Arc Annotation
Any two arcs (i,j) , (k,l) in a nested arc
annotation P1 satisfy iltkltj ? iltlltj
For any arc u in P1 let u_l be its left endpoint
and u_r be its right endpoint. The size of an arc
is equal to u_r-u_l1.

20
Alignment Score Function

Unpaired base alignment score function
?(S1i,S2j) ß if S1i and S2j are
complementary
0 otherwise

Arc Alignment Score Function

a1, a2, and ß are positive integers.

21
Problem Formulation

The Weighted Largest Common Substructure (WLCS)
of 2 arc-annotated sequence (S1,P1) and (S2,P2)
is the maximum weighted alignment between S1 and
S2 where unpaired bases are aligned to unpaired
bases and arcs are aligned to arcs.
The problem we address is, given (S1,P1) and S2,
infer the arc annotation P2 of S2 such that their
WLCS is maximized.

22
Previous Algorithm
EXTEND(DP(i,i))
MERGE(DP(i,ul-1),DP(ul,ur))
ARC-MATCH (DP(i,i))
23
Previous Algorithm (2)

All EXTEND operations take O(nm2) time and space
All ARC-MATCH operations also take O(nm2) time
and
space
The bottleneck of the computation is the
procedure
MERGE, each requiring O(m3) time yielding a
total of
O(nm3) time

24
Sparsification Technique

Based on the observation that the entries in the
rows of table DP is monotonically increasing, we
do not need to check all possible j in the
MERGE equation. Instead, we check the positions
of j where the corresponding DP entries are
distinct.
This way, we can reduce the cost of each MERGE
operation to
O(minul-ur,ul-im2)

25
Recursive Dynamic Programming

An arc u is the parent of arc v iff ulltvlltvrltur
and there is no arc w s.t ulltwlltvlltvrltwrltur
Conversely, v is the (one of the) child of u
Let core-arc(u) be the child of arc u with the
biggest arc.
Let side-arc(u) be the set of children of u
excluding its core-arc.
Let core-path(u) be the transitive closure of
core-arc(u).

26
Recursive Dynamic Programming
Left Part
Right Part
Computed
27
Running Time Analysis

Since we compute the DP table only for side arcs,
and since the size of any side arc will not
exceed ½ size of its parent, the recursion will
reach at most log n levels.
In each level, the total time spent by MERGE is
at most O(nm2)
Thus the total running time is still bounded by
the MERGE operation which is O(nm2 log n)

28
Space Improvement

In order to use standard trace back, we need to
store all the tables corresponding to an arc in
P1.
This requires O(nm2) storage. For sequence of
length 3-5K, which is commonly used in lab
experiments, the storage requirement can reach
tens of gigabytes.
Solution Use the score only version of the
dynamic programming and recursion to traceback.

29
Hirschberg-like Trace Back Algorithm
1. Find two points p1 and p2 in S1 such that
p2-p1 is at least 1/3n 2. Find the positions in
S2 to which p1 and p2 is aligned 3. Divide the
problem into two subproblems, each having a
fractional size of the original 4. Summing up the
decreasing geometric series, we still have the
same running time as before
30
Conclusion

RNA structure prediction is in general has many
yet to be done.
We considered a problem of RNA structure
inference where we infer the structure of an RNA
sequence given a similar sequence with known
structure.
Our technique is quite general that it can also
directly solve the LAPCS problem mentioned
elsewhere 14
We wish to handle pseudoknot in the future by
applying the algorithm iteratively, following the
idea of Iterative Loop Matching

31
References

1 R. B. Lyngsø, M. Zuker, and C.N.S. Pedersen.
Internal loops in RNA secondary structure
prediction. In ICMB, pages 260267, 1999.
2 T. Akutsu. Dynamic programming algorithms for
RNA secondary structure with pseudoknots. In
Disc. Appl. Math, volume 104, pages 4562, 2000.
3 R. M. Dirks and N. A. Pierce. A partition
function algorithm for nucleic acid secondary
structure including pseudoknots. In J. Comput.
Chem., volume 24, pages 16641677, 2003.
4 D. Sanko. Simultaneous solution of the RNA
folding alignment and protosequence problem. In
SIAM J. Appl. Math, volume 45, pages 810825,
1985.
5 D. Mathews and D. Turner. Dynalign an
algorithm for finding the secondary structure
common to two RNA sequences. In J. Mol. Biol,
volume 317, pages 191203, 2002.
6 B. Knudsen and J. Hein. RNA secondary
structure prediction using stochastic context
free grammars and evolutionary history. In
Bioinformatics (6), volume 15, pages 446454,
1999.
7 L.M. Cai, R. L. Malmberg, and Y. Z. Wu.
Stochastic modeling of RNA pseudoknotted
structures a grammatical approach. In
Bioinformatics (suppl. 3), volume 15, pages
166173, 2003.

32
References (2)

8 C. Witwer, I. L. Hofacker, and P. F. Stadler.
Prediction of consensus RNA secondary structures
including pseudoknots. In to appear in European
Conference on Computational Biology, 2004.
9 Yongmei-Ji, Xing-Xu, and G. D. Stormo. A
graph theoretical approach to predict common RNA
secondary structure motifs including pseudoknots
in unaligned sequences. In to appear in
Bioinformatics, 2004.
10 Jianhua Ruan, G. D. Stormo, and Weixiong
Zhang. An iterated loop matching approach to the
prediction of RNA secondary structures with
pseudoknots. In Bioinformatics (1), volume 20,
pages 5866, 2004.
11 J.H. Chen, S. Y. Le, and J. V. Maizel.
Prediction of common secondary structures of
RNAa genetic algorithm approach. In Nuc. Acids
Res.(4), volume 28, pages 991999, 2000.
12 V. Bafna, S. Muthukrishnan, and R. Ravi.
Computing similarity between RNA strings. CPM,
volume 937, pages 116, 1995. Springer-Verlag.
13 K. Zhang. Computing similarity between RNA
secondary structures. In IEEE International Joint
Symposia on Intelligence and Systems, pages
126132. 1998.

33
References (3)

14 T. Jiang, G. H. Lin, B. Ma, and K. Zhang.
The longest common subsequence problem for
arc-annotated sequences. In Proceedings of
the11th Annual Symposium on Combinatorial Pattern
Matching, volume 1848, pages 154165.
Springer-Verlag, 2000.

Write a Comment

User Comments (0)