Title: Improving Free Energy Functions for RNA Folding
1Improving Free Energy Functions for RNA Folding
- RNA Secondary Structure Prediction
2Why RNA is Important
- Machinery of protein construction
- Catalytic role in cells
- May be possible to destroy specific sequences of
RNA (to interrupt protein production) - RNase P (Cech/Altman c.1981)
3RNA Structural Levels
Secondary http//anx12.bio.uci.edu/hudel/bs99a/l
ecture21/lecture2_2.html Tertiary
http//www.leeds.ac.uk/bmb/courses/teachers/trnbal
ls.html
4Abstracting the problem
A
G
C
G
C
A
U
C
Zuker (1981) Nucleic Acids Research 9(1) 133-149
5Why it is hard
- Large search space (hard to enumerate)
Hofacker et al. (1994) Monat. Chem. 125 167-188
6Why it is hard
- Secondary structure does not exist.
- Unlike proteins
- Putative structures (prone to revision)
- Quality of Energy Functions
- Discussed later
7Current Algorithms
- Single-Strand
- Minimum Free Energy (Zuker et. al. 1981)
- Partition Functions (McCaskill 1990)
- Comparative Sequence Analysis
- Max. Weighted Matching (Nussinov et. al. 1978)
- Stochastic CFG (Sakikibara et. al. 1994)
- Phylogenetic Trees (Gulko et. al. 1995)
- Statistical Significance (Noller Woese, early
80s)
See proposal for references
8MFE / Tinoco Hypothesis
The free energy of a secondary structure equals
the sum of the free energies of the loops and
stacked pairs
Tinoco et al. (1971) Nature 230 362-367.
9Proposed System
AAUCG...CUUCUUCCA
2
GA (E)
3
1
MFE (E)
AAUCG...CUUCUUCCA
10Step I - Calc MFE Structure
- Given a sequence ? apply the MFE algorithm
- Generates secondary structure S?
11Step II - Structural Similarity
- Given a database of experimentally verified RNA
structures - Let Q? be the database structure most similar to
S? - Based on RNase P Database (Brown 1999)
12Step III - Construct E
- Create a new energy function
13Discussion on E
- E has global information
- Global information precludes the use of dynamic
programming (MFE, Partition) - Leaves (stochastic) combinatorial optimization
- Gradient Descent (no ?E/?S)
- Genetic Algorithms / Simulated Annealing
14Step IV - Genetic Algorithm
- RNA Structural Prediction by GA
- Input sequence ?
- Output structure that maximizes E for ?
- Steady State Genetic Algorithm
- Pseudoknots forbidden (conflicts)
- Fitness -E
- Effect of Similarity(Q?, S?) diminishes with each
generation (pseudo-SA).
15Genetic Algorithm - Repn.
- Stem-loop representation (Chen et. Al. 2000)
- Window method (EMBOSS Palindrome)
16Genetic Algorithm - Operators
- Mutation
- Add stem from stem pool to a child
- Crossover
17Preliminary Results
- E does not lead to drastic speed up
- Genetic algorithm is very slow
- If initial population generated randomly from
stem pool. - Use suboptimal folding for initial population.
18Preliminary Results Explained
- The real structure is usually very similar the
Tinoco optimal structure. - View E as a way of choosing among the suboptimal
structures.
19Future Work
- More testing on the entire RNase P Database (gt
400 structures) - Tune E
- Accuracy comparison to MFE and Partition Function
Algorithms - Parallelize genetic algorithm
20