Bioinformatics: Applications - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Bioinformatics: Applications

Description:

Aside of being DNA's 'messenger', RNA performs functions itself ... 5' dangling -0.9 stacking -1.8 stacking -2.1 stacking. G= -4.6 KCAL/MOL 5.9 4nt loop ... – PowerPoint PPT presentation

Number of Views:66

Avg rating:3.0/5.0

Slides: 48

Provided by: jonath76

Category:

more less

Transcript and Presenter's Notes

Title: Bioinformatics: Applications

1
Bioinformatics Applications

ZOO 4903
Fall 2006, MW 1030-1145
Sutton Hall, Room 312
RNA secondary structures

2
Lecture overview

What weve talked about so far
Gene prediction and alternative splicing
Microarrays as a means of measuring genome-wide
transcription
Overview
Aside of being DNAs messenger, RNA performs
functions itself
RNA secondary structure is related to mRNA
stability RNA functions
RNA folding can be predicted the effects of
mutations modeled

3
Applications for RNA folding

Explain why non-coding regions are conserved
Viral RNA packing inside capsid
Prediction of functional RNAs
Identify similarity, not by sequence but by
structure

4
RNA Basics
3 Hydrogen Bonds more stable
wobble pairing less stable
2 Hydrogen Bonds

RNA bases A,C,G,U
Canonical Base Pairs
A-U
G-C
G-U
wobble pairing
Bases can only pair with one other base.

5
RNA types

transfer RNA (tRNA)
messenger RNA (mRNA)
ribosomal RNA (rRNA)
small interfering RNA (siRNA)
micro RNA (miRNA)
small nucleolar RNA (snoRNA)

6
RNA Secondary Structure

The RNA molecule folds on itself.
The base pairing is as follows
G C A U G U
hydrogen bond

LOOP
U U C G U A A
U G C 5 3
5
3
G A U C U U G A U C
STEM
7
RNA Secondary Structure
Pseudoknot
Stem
Interior Loop
Single-Stranded
Bulge Loop
Junction (Multiloop)
Hairpin loop
Image Wuchty
8
RNA Structure Representations
2D model
E Mountains
Circle with lines
Ordered tree
Balanced nested parenthesis
9
RNA secondary structure representation
No pseudoknots
Pseudoknots
10
RNA secondary Structure representation
tRNA
11
Some biological functions of non-coding RNA

RNA splicing (snRNAs)
Guide RNAs (RNA editing)
Catalysis
Telomere maintenance
Control of translation (miRNAs)

The function of the RNA molecule depends on its
folded structure
12
Control of iron levels by mRNA secondary structure
Iron Responsive Element (IRE)
G U A G C N N
N N N N N N N C
N N N N N N N
N N N
conserved
Recognized by IRP1, IRP2
5
3
13
F Ferritin iron storage TR Transferrin
receptor iron uptake
IRP1/2
IRE
3
5
F mRNA
IRP1/2
3
5
TR mRNA
14
Examples of known interactions of RNA secondary
structural elements
These patterns are excluded from the prediction
schemes as their computation is too intensive.
Pseudo-knot
Kissing hairpins
Hairpin-bulge contact
15
Structure-based similarity
Sequence Similarity ID 34 gurken
AAGTAATTTTCGTGCTCTCAACAATTGTCGCCGTCACAGATTGTTGTTCG
AGCCGAATCTTACT 64 Ifactor ---TGCACACCTCCCTCGTC
ACTCTTGATTTT-TCAAGAGCCTTCGATCGAGTAGGTGTGCA-- 58

Structural Similarity
I Factor 58nt stem loop
gurken 64nt stem loop
16
RNA secondary structure prediction

Dynamic programming free energy minimization

17
Predicting RNA Secondary Structure

According to base pairing rules only, (A-U, G-C
and wobble pairs G-U) sequences can potentially
form many different structures
An energy value is associated with each possible
structure
Predict the structure with the minimal free
energy (MFE)

18
Simplifying Assumptions for Structure Prediction

RNA folds into one minimum free-energy structure
There are no knots (base pairs never cross)
The energy of a particular base pair in a double
stranded regions sequence independent
Neighbors do not influence the energy

19
Sequence alignment as a method to determine
structure

Bases pair in order to form backbones and
determine the secondary structure
Aligning bases based on their ability to pair
with each other gives an algorithmic approach to
determining the optimal structure

20
Base Pair Maximization Dynamic Programming
Algorithm
S(i,j) is the folding of the subsequence of the
RNA strand from index i to index j which results
in the highest number of base pairs
Simple Example Maximizing Base Pairing
Base pair at i and j
Unmatched at i
Umatched at j
Bifurcation
21
Base Pair Maximization Dynamic Programming
Algorithm
S(i, j 1)
S(i 1, j)

Alignment Method
Align RNA strand to itself
Score increases for feasible base pairs
Each score independent of overall structure
Bifurcation adds extra dimension

Initialize first two diagonal arrays to 0
Fill in squares sweeping diagonally
Bases cannot pair, similar to unmatched alignment
Bases can pair, similar to matched alignment
Dynamic Programming possible paths
S(i 1, j 1) 1
22
Base Pair Maximization - Drawbacks

Base pair maximization will not necessarily lead
to the most stable structure
May create structure with many interior loops or
hairpins which are energetically unfavorable
Comparable to aligning sequences with scattered
matches not biologically reasonable

23
Trouble with Pseudoknots

Pseudoknots cause a breakdown in the Dynamic
Programming Algorithm.
In order to form a pseudoknot, checks must be
made to ensure base is not already paired this
breaks down the recurrence relations

24
Energy Minimization

Thermodynamic Stability
Estimated using experimental techniques
Theory Most stable is the most likely
No pseudoknots due to algorithm limitations
Uses Dynamic Programming alignment technique
Attempts to maximize the score taking into
account thermodynamics
MFOLD and ViennaRNA

25
Thermodynamics

Gibbs Free Energy, G
Describes the energetics of molecules in aqueous
solution. The change in free energy, ?G, for a
chemical process, such as nucleic acid folding,
can be used to determine the direction of the
process
?G0 equilibrium
?Ggt0 unfavorable process
?Glt0 favorable process
Thus the natural tendency for biomolecules in
solution is to minimize free energy of the entire
system (biomolecules solvent).

?G ?H - T?S
?H is enthalpy, ?S is entropy, and T is the
temperature in Kelvin.
Molecular interactions, such as hydrogen bonds,
van der Waals and electrostatic interactions
contribute to the ?H term. ?S describes the
change of order of the system.
Thus, both molecular interactions as well as the
order of the system determine the direction of a
chemical process.
For any nucleic acid solution, it is extremely
difficult to calculate the free energy from first
principle

26
Free energy computation
U U A A G C G
C A G C U A A U
C G A U A 3 A 5
5.9 4nt loop
-1.1 mismatch of hairpin
-2.9 stacking
3.3 1nt bulge
-2.9 stacking
-1.8 stacking
-0.9 stacking
-1.8 stacking
5 dangling
-2.1 stacking
-0.3
G -4.6 KCAL/MOL
-0.3
27
Adding Complexity to Energy Calculations

Stacking energy - We assign negative energies to
these between base pair regions.
Energy is influenced by the previous base pair
(not by the base pairs further down).
These energies are estimated experimentally from
small synthetic RNAs.
Positive energy - added for destabilizing regions
such as bulges, loops, etc.
More than one structure can be predicted

28
Energy Minimization Drawbacks

Compute only one optimal structure
Usual drawbacks of purely mathematical approaches
Similar difficulties in other algorithms
Protein structure
Exon finding

29
Prediction Tools based on Energy Calculation

Fold, Mfold
Zucker Stiegler (1981) Nuc. Acids Res. 9
133-48
Zucker (1989) Science 24448-52
RNAfold
Vienna RNA secondary structure server
Hofacker (2003) Nuc. Acids Res. 31 3429-31

30
Mfold Multiple Folding

Original (1980) computed one single minimum
energy folding of RNA
Multiple Folding algorithm - Given RNA
Predict min. free energy G
Given a set of possible folds F1Fn, calculate
their free energies H1Hn
Eliminate all folds i with Hi gt Gg
g G(P/100)
P is user defined
Compute remaining folds plot each with all base
pairs.

http//www.bioinfo.rpi.edu/applications/mfold/
31
Submitting RNA to MFOLD
Paste your sequence
Use default parameters Scroll wayyyy down and
hit Fold RNA
32
Tools Features

Sub-optimal structures
-Provide solutions within a specific energy
range.
Constraints
- Regions known experimentally to be
single/double stranded can be defined.
Statistical significance
- Currently lacking in energy based methods
Recently was suggested to estimate a significant
stable and conserved fold in aligned sequences
(Washietl ad Hofacker 2004)
Support by compensatory mutations.

33
Searching databases for secondary structures

Genomic or mRNA

34
Compensatory substitutions
Expect areas of base pairing in tRNA to be
covarying between various species
Base pairing creates same stable tRNA structure
in organisms
Mutation in one base makes pairing less favorable
and breaks down structure
Covariation ensures ability to base pair is
maintained and RNA structure is conserved
35
Evolutionary conservation of RNA molecules can be
revealed by identification of compensatory
mutations
U C U G C G N N G C
G C C U U C G G G C G A C U U C G
G U C G G C U U C G G C C
36
Insight from Multiple Alignment

Information from multiple alignment about the
probability of positions i,j to be base-paired.
Conservation no additional information
Consistent mutations (GC? GU) support stem
Inconsistent mutations does not support stem.
Compensatory mutations support stem.

37
RNA families

Rfam General non-coding RNA database
379 families annotating 280,000 regions

http//www.sanger.ac.uk/Software/Rfam/
Includes many families of non-coding RNAs and
functional motifs, as well as their alignment and
secondary structures
38
An example of an RNA family miR-1 MicroRNAs
39
Summary

MFOLD and other RNA secondary structure
prediction tools rarely give the right answer
first (or at all)
Too many possible structures in the low energy
neighbourhood
Can be used as a first-pass tool
Eyeball key conserved motifs
Collect sequences to build a consensus
Often need to adjust parameters
Use prior knowledge to force base pairing
Motif-searching tools can be used to identify
conserved secondary structure motifs in a
sequence database
Retrieves more results than sequence-based
searches

40
Next class Exam 2What you should know

Test mostly multiple choice and short answer
100 points

41
Next class Exam 2What you should know

Gene finding in prokaryotes
How are genes located in prokaryotes?
How are basic gene-finding systems built? (rule,
content, extrinsic evidence, pattern-based,
similarity)
Gene finding in eukaryotes
How does gene structure differ between eukaryotes
prokaryotes?
How do HMMs work (in general)?

42
Next class Exam 2What you should know

Alternative splicing
How are genes alternatively spliced?
What are the evolutionary advantages of having an
alternative splicing system?
How would a microarray detect alternative splice
variants?

43
Next class Exam 2What you should know

Microarray analysis
Technology for measuring transcription
Image processing whats done, why, and
advantages/disadvantages
Normalization what is it, what kinds of data
are normalized, what kinds of methods are used
for normalization?

44
Next class Exam 2What you should know

Microarray analysis
Clustering goals methods
Analysis of gene lists in terms of their
biological meaning/significance
Genetic networks, what are they and how are they
being approximated computationally?

45
Next class Exam 2What you should know

RNA secondary structure
How does one identify secondary structure?
General strategies for trying to calculate
secondary structures

46
For next time

Exam 2 good luck!
Homework 5 due

47
Base Pair Maximization Dynamic Programming
Algorithm

Alignment Method
Align RNA strand to itself
Score increases for feasible base pairs
Each score independent of overall structure
Bifurcation adds extra dimension

Reminder For all k S(i,k) S(k 1, j)
k 0 Bifurcation max in this case S(i,k)
S(k 1, j)
Reminder For all k S(i,k) S(k 1, j)
Initialize first two diagonal arrays to 0
Fill in squares sweeping diagonally
Bases cannot pair, similar
Bases can pair, similar to matched alignment
Dynamic Programming possible paths
Bifurcation add values for all k

Write a Comment

User Comments (0)