RNA secondary structure prediction - PowerPoint PPT Presentation

1 / 20

About This Presentation

Title:

RNA secondary structure prediction

Description:

Pseudo-knots: Cause problems to ordinary RNA folding. algorithms. ... with the maximal number of base pairs under the pseudo-knot exclusion constraint. ... – PowerPoint PPT presentation

Number of Views:576

Avg rating:3.0/5.0

Slides: 21

Provided by: isrecI

Category:

more less

Transcript and Presenter's Notes

Title: RNA secondary structure prediction

1
RNA secondary structure prediction

Introduction
Examples of RNA molecules
Secondary structure elements
Pseudo-knots
RNA folding
Nussinov algorithm
Energy minimization
Covariance analysis

RNA secondary structure motifs
Examples, biological function
RNA secondary structure patterns

2
Basics about RNA (for computer scientists)

RNA initially synthesized as co-linear copy of
DNA
U replaces T (however, U represented as T in
nucleotide database entries)
RNA may undergo splicing and other
post-transcriptional modifications
Two major RNA classes in cellular organisms
messenger RNA (mRNA) templates for protein
synthesis
structural and catalytic RNAs
The genome of many viruses (e.g. HIV) consists of
RNA
RNA is usually single-stranded (exception a few
viral genomes)
RNA folds back onto itself to form short
base-paired regions
As in DNA, base-paired regions form anti-parallel
helices
Same base-pairing rules as for DNA but U-G pairs
also permitted

3
Examples of structural and/or catalytic RNAs
ribosomal RNA (rRNA) transfer RNA (tRNA) small
nuclear RNA (snRNA. e.g. U1) small nucleolar RNAs
(snoRNA) small cytoplasmic RNA (scRNA, e.g
7SL-RNA) microRNAs (miRNA)
4
RNA secondary structure elements Terminology
5
Purpose of RNA folding algorithm

Prediction of the native secondary structure of
an RNA molecule
Formally, the secondary structure of an RNA
consists of all pairs of bases that interact with
each other, usually through standard Watson-Crick
base-pairs.
Recognition of RNA functional motifs
RNA molecules may contain regulatory motifs that
interact with RNA-binding proteins
Such motif may have a conserved secondary
structure in addition to conserved primary
structure elements.

6
Pseudo-knots
Cause problems to ordinary RNA folding algorithms.
Pseudoknots imply an arrangement of pairs of
interacting base pairs of the type a b a
b Such structure require intersecting lines in
the following type of representation
U U C C G A A G C U C A A C G G G A A A A U G A G
C U
7
RNA secondary structure notation

RNA secondary structures can be specified by a
sequences of the three letters -,gt,lt.
Base pairs can be reconstructed as follows
process sequence from left to right
if base marked - leave unpaired
if base marked gt wait
if base marked lt connect to closest unpaired
base marked gt on left side

AAGACUUCGGAUCUGGCGACACCC --gtgtgt----lt-ltlt-gtgt-gt---ltltlt
Note works only if no pseudoknots occur.
8
Nussinov algorithm Principle
Objective To find the secondary structure with
the maximal number of base pairs under the
pseudo-knot exclusion constraint. Principle Recur
sive procedure (dynamic programming
algorithm). Scoring function sum of base-pair
scores, no penalties for loops Optimal score
computed from the optimal scores of
subsequences. Filling-stage. Scores for
subsequences are recursively computed from and
recorded in a quadratic table. Trace-back Reconst
ruction of filling steps indicates optimal
structure Time-complexity O(N3) Limitations No
pseudo-knots, No constraints on loop
lengths No penalties for bulge loops No
scoring terms for base-pair stacking
inter-actions (see later)
9
Nussinov algorithm extension operations
10
Nussinov algorithm fill-stage
Scoring system d(i,j) 1 for all RNA
Watson-Crick base-pairs including G-U else d(i,j)
0.
Blue addition of unpaired base 3 or 7
Green addition of paired bases 1,7
Pink joining of substructures 1..4 and 5..8
11
Nussinov algorithm trace-back
current record stack 1,9 1,9
1,8 1,8 1,4 5,8 1,4 1,4
2,3 5,8 2,3 2,3 3,2 5,8 3,2
5,8 5,8 5,8 6,7 6,7 6,7 7,6 7,6

12
RNA folding by energy minimization
Note a bulge loop does not alter stacking energy!
13
Principle of the Zuker algorithm (RNAFOLD)

Energy minimization using a richer scoring
system
Stacking energies scores for overlapping
dinucleotide pairs
Bulge loop scores dependent on length
Hairpin loop scores dependent on length and
closing pair
Internal loop scores dependent on length and
closing pair
Same principle as Nussinov algorithm but
Two minimal energy values are stored for each
subsequence
W(i,j) best structure on i,j
V(i,j) best structure on i,j closed by paired
i,j.
Computational complexity essentially O(N3)
(if constraints on maximal loop sizes are applied)

14
Energy-parameters used by RNAFOLD
Note Some energy terms (e.g. for the terminal
mismatch of a hairpin) are Missing.
15
Prediction of RNA structure by covariance models
Motivation Energy minimization-based approaches
often predict large numbers of alternative RNA
secondary structures with very similar free
energy. A Multiple alignment of related RNAs
potentially reveals base pair interactions
Interacting positions in multiple alignment
positions expected to show co-variation
compatible with standard RNA base-pairing
rules Limitation requires within column
variation. No information is obtained for
completely conserved position.
16
Prediction of RNA structure by covariance models
Covariance measure used Mutual information
17
Covariance analysis tRNA-Phe
18
RNA motifs, signatures, domains, and families

Terminology
Motif short RNA regions with partly conserved
primary and secondary structure, usually with a
defined function.
Signature short RNA regions with partly
conserved primary and secondary structure useful
for identifying members of an RNA family.
Domain A larger RNA region with conserved
secondary structure, usually considered an
independent folding unit
Family A family of homologous and/or
structurally related RNA molecules, e.g. tRNAs.
RNA sequence-structural motifs play a role in
various biological processes
Translational control, e.g. iron-response element
(IRE)
RNA degradation
RNA localization (zip-code motifs)

19
RNABOB and example of an RNA pattern recognition
program
Characteristics Supports qualitative patterns
(true/false no scores or probabilities) Based
on simple but powerful pattern syntax Fast search
engine Supports non-Watson-Crick type base
interactions Supports pseudo-knots ! Allows for
errors (mismatches) in the pattern.
20
RNABOB pattern syntax
S1 h1 s2 h2 s3 h2' h1' h1 00 NNNNNNNNNN h2 00
NNNN S1 0 NN s2 0 R s3 0 ANYA
Example

The first line indicates the ordering of pattern
elements
s1, s2, s3 consist of contiguous unpaired
sequences
h1, h1 represent complementary sequence segments
forming a double helix.
Lines 2 to 6 contain the descriptions of each
element
NNNNNNNNNN means that any base is permitted in
this structure, the only constraint is that they
have to respect base-pairing rules2020
Numbers indicate how many mismatches are allowed
per element.
IUPAC codes are used to specify ambiguous
positions Y CT