Title: Non-coding RNA
1Non-coding RNA
- William Liu
- CS374 Algorithms in Biology
- November 23, 2004
2Non-Coding RNA
- Background Basics
- Biology Overview
- Why ncRNA - Central Dogma?
- Problem Space
- HMM/sCFG Solution
- Paper
- Pair HMMs on Tree Structures
- Alignment of Trees, Structural Alignment
- Experimental Evaluation
- Conclusion
3Central Dogma of Molec. Bio.
4Biology Overview
- RNA merely plays an accessory role
- Complexity is defined by proteins encoded in the
genome
5Biology Overview
- Non-coding RNA (ncRNA) is a RNA molecule that
functions w/o being translated into a protein - Most prominent examples Transfer RNA (tRNA),
Ribosomal RNA (rRNA)
6Why Non-coding RNA
- Protein-coding genes cant account for all
complexity - ncRNA is important!
- Gene regulators
Genome Biol. 2002 Beyond The Proteome
Non-coding Regulatory RNAs
7Non-coding RNA Problems
- Finding ncRNA genes in the genome locate these
genes - Finding Homologs of ncRNA figure out what they do
8Finding ncRNA Genes
- Protein Approaches
- Statistically biased (codon triplets)
- Open Reading Frames
- ncRNA Approaches
- High CG content (hyperthermophiles)
- Promoter/Terminator identification (E. Coli)
Comparative Genome Analysis
Comparative Genome Analysis
9Genetic Code
10Similarity Searching
- Proteins
- BLAST, Sequence Alignment (DP)
- Genes that code for proteins are conserved across
genomes (e.g. low rate of mutation) - ncRNA
- Secondary structure usually conserved
- Alignment scoring based on structure is imperative
11ncRNA Sequence vs Structure
12Alignment Approaches
- sCFGs Modeling secondary structure, scoring
sequences - HMM for scoring of sequence and secondary
structure alignment
13Pair HMMs on Tree Structures
- Outline
- Alignment on Trees
- Structural Alignment
- Secondary Structure Representation
- Hidden Markov Model
- Recurrence Relations
- Experimental Evaluation
- Future Work
14Alignment on Trees
15Structural Alignment
- Problem Given an RNA sequence with known
Secondary Structure and an RNA sequence (unknown
structure), obtain the optimal alignment of the
two
16Structural Representation
?(?, ?) Branch Structure ?(X, ?, Y)
Base-pairs ?(X, ?) or ?(?, Y) Unpaired
bases X,Y ?A,U,G,C
17Hidden Markov Model
- M Match state, I Insertion state, D Deletion
state - ?XY State transition probability from X to Y
- ?X Initial probability
- Emission probability for pair x,y
-
- X,Y ? M,I,D
18Notation
- Let wa1a2an be an unfolded RNA sequence of
length n - Let wi denote ith symbol in w
- Let wi,j denote a substring aiai1aj of w
19Notation
- Let T be a skeletal tree representing a folded
RNA sequence (known structure) - Let v(j) denote the label of node j in tree T
- Let Tj denote the subtree rooted at node j in
tree T - Let jn denote the nth child of node j in tree T
20Recurrence Relation (Match)
21Recurrence Relation (Delete)
22Recurrence Relation (Insert)
23Structural Alignment
- Intuition Given the ncRNA sequence, b with
unknown structure, generate a predicted folded
structure for b, align the resulting tree with
the ncRNA with known secondary structure a. - Complexity O(K M N3 )
- K states in pair HMM,
- M size of skeletal tree,
- N length of unfolded sequence
24Experimental Evaluation
- Dynamic Programming to calculate recurrence
relations, prototype system to execute algorithm - Experiments on 2 families of RNA Transfer RNAs
and Hammerhead Ribozyme
25Parameters
Gorodkin et al. (1997)
26Results tRNA
27Results Hammerhead Ribozyme
28Future Work
- Since based on dynamic programming (of pairwise
alignment), many DP techniques can apply - Refine emission probabilities, relate score
matrix (reliable alignment for RNA families)
29Conclusions
- ncRNA space is quite open - no really great
techniques yet - How many ncRNA genes are there?
- Absence of evidence ? evidence of absence
- Eddys call to arms
it is time for RNA computational biologists to
step up
30Thanks!
31References
- Sakakibara, K., Pair Hidden Markov Models on
Tree Structures, Bioinformatics, 19232-240,
2003 - Eddy, S., Computational Genomics of Noncoding
RNA Genes, Cell, Vol 109137-140, 2002 - Szymanski, M., Barciszewski, J., Beyond The
Proteome Non-coding Regulatory RNAs