Bioinformatics: Applications - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Bioinformatics: Applications

Description:

Aside of being DNA's 'messenger', RNA performs functions itself ... 5' dangling -0.9 stacking -1.8 stacking -2.1 stacking. G= -4.6 KCAL/MOL 5.9 4nt loop ... – PowerPoint PPT presentation

Number of Views:66
Avg rating:3.0/5.0
Slides: 48
Provided by: jonath76
Category:

less

Transcript and Presenter's Notes

Title: Bioinformatics: Applications


1
Bioinformatics Applications
  • ZOO 4903
  • Fall 2006, MW 1030-1145
  • Sutton Hall, Room 312
  • RNA secondary structures

2
Lecture overview
  • What weve talked about so far
  • Gene prediction and alternative splicing
  • Microarrays as a means of measuring genome-wide
    transcription
  • Overview
  • Aside of being DNAs messenger, RNA performs
    functions itself
  • RNA secondary structure is related to mRNA
    stability RNA functions
  • RNA folding can be predicted the effects of
    mutations modeled

3
Applications for RNA folding
  • Explain why non-coding regions are conserved
  • Viral RNA packing inside capsid
  • Prediction of functional RNAs
  • Identify similarity, not by sequence but by
    structure

4
RNA Basics
3 Hydrogen Bonds more stable
wobble pairing less stable
2 Hydrogen Bonds
  • RNA bases A,C,G,U
  • Canonical Base Pairs
  • A-U
  • G-C
  • G-U
  • wobble pairing
  • Bases can only pair with one other base.

5
RNA types
  • transfer RNA (tRNA)
  • messenger RNA (mRNA)
  • ribosomal RNA (rRNA)
  • small interfering RNA (siRNA)
  • micro RNA (miRNA)
  • small nucleolar RNA (snoRNA)

6
RNA Secondary Structure
  • The RNA molecule folds on itself.
  • The base pairing is as follows
  • G C A U G U
  • hydrogen bond

LOOP
U U C G U A A
U G C 5 3
5
3
G A U C U U G A U C
STEM
7
RNA Secondary Structure
Pseudoknot
Stem
Interior Loop
Single-Stranded
Bulge Loop
Junction (Multiloop)
Hairpin loop
Image Wuchty
8
RNA Structure Representations
2D model
E Mountains
Circle with lines
Ordered tree
Balanced nested parenthesis
9
RNA secondary structure representation
No pseudoknots
Pseudoknots
10
RNA secondary Structure representation
tRNA
11
Some biological functions of non-coding RNA
  • RNA splicing (snRNAs)
  • Guide RNAs (RNA editing)
  • Catalysis
  • Telomere maintenance
  • Control of translation (miRNAs)

The function of the RNA molecule depends on its
folded structure
12
Control of iron levels by mRNA secondary structure
Iron Responsive Element (IRE)
G U A G C N N
N N N N N N N C
N N N N N N N
N N N
conserved
Recognized by IRP1, IRP2
5
3
13
F Ferritin iron storage TR Transferrin
receptor iron uptake
IRP1/2
IRE
3
5
F mRNA
IRP1/2
3
5
TR mRNA
14
Examples of known interactions of RNA secondary
structural elements
These patterns are excluded from the prediction
schemes as their computation is too intensive.
Pseudo-knot
Kissing hairpins
Hairpin-bulge contact
15
Structure-based similarity
Sequence Similarity ID 34 gurken
AAGTAATTTTCGTGCTCTCAACAATTGTCGCCGTCACAGATTGTTGTTCG
AGCCGAATCTTACT 64 Ifactor ---TGCACACCTCCCTCGTC
ACTCTTGATTTT-TCAAGAGCCTTCGATCGAGTAGGTGTGCA-- 58

Structural Similarity
I Factor 58nt stem loop
gurken 64nt stem loop
16
RNA secondary structure prediction
  • Dynamic programming free energy minimization

17
Predicting RNA Secondary Structure
  • According to base pairing rules only, (A-U, G-C
    and wobble pairs G-U) sequences can potentially
    form many different structures
  • An energy value is associated with each possible
    structure
  • Predict the structure with the minimal free
    energy (MFE)

18
Simplifying Assumptions for Structure Prediction
  • RNA folds into one minimum free-energy structure
  • There are no knots (base pairs never cross)
  • The energy of a particular base pair in a double
    stranded regions sequence independent
  • Neighbors do not influence the energy

19
Sequence alignment as a method to determine
structure
  • Bases pair in order to form backbones and
    determine the secondary structure
  • Aligning bases based on their ability to pair
    with each other gives an algorithmic approach to
    determining the optimal structure

20
Base Pair Maximization Dynamic Programming
Algorithm
S(i,j) is the folding of the subsequence of the
RNA strand from index i to index j which results
in the highest number of base pairs
Simple Example Maximizing Base Pairing
Base pair at i and j
Unmatched at i
Umatched at j
Bifurcation
21
Base Pair Maximization Dynamic Programming
Algorithm
S(i, j 1)
S(i 1, j)
  • Alignment Method
  • Align RNA strand to itself
  • Score increases for feasible base pairs
  • Each score independent of overall structure
  • Bifurcation adds extra dimension

Initialize first two diagonal arrays to 0
Fill in squares sweeping diagonally
Bases cannot pair, similar to unmatched alignment
Bases can pair, similar to matched alignment
Dynamic Programming possible paths
S(i 1, j 1) 1
22
Base Pair Maximization - Drawbacks
  • Base pair maximization will not necessarily lead
    to the most stable structure
  • May create structure with many interior loops or
    hairpins which are energetically unfavorable
  • Comparable to aligning sequences with scattered
    matches not biologically reasonable

23
Trouble with Pseudoknots
  • Pseudoknots cause a breakdown in the Dynamic
    Programming Algorithm.
  • In order to form a pseudoknot, checks must be
    made to ensure base is not already paired this
    breaks down the recurrence relations

24
Energy Minimization
  • Thermodynamic Stability
  • Estimated using experimental techniques
  • Theory Most stable is the most likely
  • No pseudoknots due to algorithm limitations
  • Uses Dynamic Programming alignment technique
  • Attempts to maximize the score taking into
    account thermodynamics
  • MFOLD and ViennaRNA

25
Thermodynamics
  • Gibbs Free Energy, G
  • Describes the energetics of molecules in aqueous
    solution. The change in free energy, ?G, for a
    chemical process, such as nucleic acid folding,
    can be used to determine the direction of the
    process
  • ?G0 equilibrium
  • ?Ggt0 unfavorable process
  • ?Glt0 favorable process
  • Thus the natural tendency for biomolecules in
    solution is to minimize free energy of the entire
    system (biomolecules solvent).
  • ?G ?H - T?S
  • ?H is enthalpy, ?S is entropy, and T is the
    temperature in Kelvin.
  • Molecular interactions, such as hydrogen bonds,
    van der Waals and electrostatic interactions
    contribute to the ?H term. ?S describes the
    change of order of the system.
  • Thus, both molecular interactions as well as the
    order of the system determine the direction of a
    chemical process.
  • For any nucleic acid solution, it is extremely
    difficult to calculate the free energy from first
    principle

26
Free energy computation
U U A A G C G
C A G C U A A U
C G A U A 3 A 5
5.9 4nt loop
-1.1 mismatch of hairpin
-2.9 stacking
3.3 1nt bulge
-2.9 stacking
-1.8 stacking
-0.9 stacking
-1.8 stacking
5 dangling
-2.1 stacking
-0.3
G -4.6 KCAL/MOL
-0.3
27
Adding Complexity to Energy Calculations
  • Stacking energy - We assign negative energies to
    these between base pair regions.
  • Energy is influenced by the previous base pair
    (not by the base pairs further down).
  • These energies are estimated experimentally from
    small synthetic RNAs.
  • Positive energy - added for destabilizing regions
    such as bulges, loops, etc.
  • More than one structure can be predicted

28
Energy Minimization Drawbacks
  • Compute only one optimal structure
  • Usual drawbacks of purely mathematical approaches
  • Similar difficulties in other algorithms
  • Protein structure
  • Exon finding

29
Prediction Tools based on Energy Calculation
  • Fold, Mfold
  • Zucker Stiegler (1981) Nuc. Acids Res. 9
    133-48
  • Zucker (1989) Science 24448-52
  • RNAfold
  • Vienna RNA secondary structure server
  • Hofacker (2003) Nuc. Acids Res. 31 3429-31

30
Mfold Multiple Folding
  • Original (1980) computed one single minimum
    energy folding of RNA
  • Multiple Folding algorithm - Given RNA
  • Predict min. free energy G
  • Given a set of possible folds F1Fn, calculate
    their free energies H1Hn
  • Eliminate all folds i with Hi gt Gg
  • g G(P/100)
  • P is user defined
  • Compute remaining folds plot each with all base
    pairs.

http//www.bioinfo.rpi.edu/applications/mfold/
31
Submitting RNA to MFOLD
Paste your sequence
Use default parameters Scroll wayyyy down and
hit Fold RNA
32
Tools Features
  • Sub-optimal structures
  • -Provide solutions within a specific energy
    range.
  • Constraints
  • - Regions known experimentally to be
    single/double stranded can be defined.
  • Statistical significance
  • - Currently lacking in energy based methods
  • Recently was suggested to estimate a significant
    stable and conserved fold in aligned sequences
    (Washietl ad Hofacker 2004)
  • Support by compensatory mutations.

33
Searching databases for secondary structures
  • Genomic or mRNA

34
Compensatory substitutions
Expect areas of base pairing in tRNA to be
covarying between various species
Base pairing creates same stable tRNA structure
in organisms
Mutation in one base makes pairing less favorable
and breaks down structure
Covariation ensures ability to base pair is
maintained and RNA structure is conserved
35
Evolutionary conservation of RNA molecules can be
revealed by identification of compensatory
mutations
U C U G C G N N G C
G C C U U C G G G C G A C U U C G
G U C G G C U U C G G C C
36
Insight from Multiple Alignment
  • Information from multiple alignment about the
  • probability of positions i,j to be base-paired.
  • Conservation no additional information
  • Consistent mutations (GC? GU) support stem
  • Inconsistent mutations does not support stem.
  • Compensatory mutations support stem.

37
RNA families
  • Rfam General non-coding RNA database
  • 379 families annotating 280,000 regions

http//www.sanger.ac.uk/Software/Rfam/
Includes many families of non-coding RNAs and
functional motifs, as well as their alignment and
secondary structures
38
An example of an RNA family miR-1 MicroRNAs
39
Summary
  • MFOLD and other RNA secondary structure
    prediction tools rarely give the right answer
    first (or at all)
  • Too many possible structures in the low energy
    neighbourhood
  • Can be used as a first-pass tool
  • Eyeball key conserved motifs
  • Collect sequences to build a consensus
  • Often need to adjust parameters
  • Use prior knowledge to force base pairing
  • Motif-searching tools can be used to identify
    conserved secondary structure motifs in a
    sequence database
  • Retrieves more results than sequence-based
    searches

40
Next class Exam 2What you should know
  • Test mostly multiple choice and short answer
  • 100 points

41
Next class Exam 2What you should know
  • Gene finding in prokaryotes
  • How are genes located in prokaryotes?
  • How are basic gene-finding systems built? (rule,
    content, extrinsic evidence, pattern-based,
    similarity)
  • Gene finding in eukaryotes
  • How does gene structure differ between eukaryotes
    prokaryotes?
  • How do HMMs work (in general)?

42
Next class Exam 2What you should know
  • Alternative splicing
  • How are genes alternatively spliced?
  • What are the evolutionary advantages of having an
    alternative splicing system?
  • How would a microarray detect alternative splice
    variants?

43
Next class Exam 2What you should know
  • Microarray analysis
  • Technology for measuring transcription
  • Image processing whats done, why, and
    advantages/disadvantages
  • Normalization what is it, what kinds of data
    are normalized, what kinds of methods are used
    for normalization?

44
Next class Exam 2What you should know
  • Microarray analysis
  • Clustering goals methods
  • Analysis of gene lists in terms of their
    biological meaning/significance
  • Genetic networks, what are they and how are they
    being approximated computationally?

45
Next class Exam 2What you should know
  • RNA secondary structure
  • How does one identify secondary structure?
  • General strategies for trying to calculate
    secondary structures

46
For next time
  • Exam 2 good luck!
  • Homework 5 due

47
Base Pair Maximization Dynamic Programming
Algorithm
  • Alignment Method
  • Align RNA strand to itself
  • Score increases for feasible base pairs
  • Each score independent of overall structure
  • Bifurcation adds extra dimension

Reminder For all k S(i,k) S(k 1, j)
k 0 Bifurcation max in this case S(i,k)
S(k 1, j)
Reminder For all k S(i,k) S(k 1, j)
Initialize first two diagonal arrays to 0
Fill in squares sweeping diagonally
Bases cannot pair, similar
Bases can pair, similar to matched alignment
Dynamic Programming possible paths
Bifurcation add values for all k
Write a Comment
User Comments (0)
About PowerShow.com