Structure Prediction - PowerPoint PPT Presentation

About This Presentation
Title:

Structure Prediction

Description:

RNA Structure Prediction: Knot-free. Knot-free assumption. Knot: base pairs (I, j) and (k, l) where I j k l. Knot-free causes planar graph, and makes DP algorithm ... – PowerPoint PPT presentation

Number of Views:13
Avg rating:3.0/5.0
Slides: 30
Provided by: csF8
Learn more at: https://cs.fit.edu
Category:

less

Transcript and Presenter's Notes

Title: Structure Prediction


1
Structure Prediction
  • dmitra

2
Methods
  • Ab initio
  • Heuristics
  • Machine learning
  • Homology modeling
  • Threading

3
RNA Structure Prediction Ab-initio
  • Sequence over A, C, G, U
  • Complementary pairs attract, form base-pairs or
    minimizes energy
  • We are not interested in overall energy of the
    sequence, just the process of minimization
  • Just the linear sequence, zero base pairs,
    energy0
  • Physics is embedded within free-energy
    parameter/function
  • Minimization of energy is objective

4
RNA Structure Prediction Knot-free
  • Knot-free assumption
  • Knot base pairs (I, j) and (k, l) where Iltjltkltl
  • Knot-free causes planar graph, and makes DP
    algorithm feasible
  • Base pairs are disjoint or embed in each other

5
RNA Structure Prediction Principle of optimality
  • Assumption 1 Base-pairing do not affect each
    others energy
  • Now one can add energy minimization by all base
    pairs in a string and check which configuration
    produces lowest energy
  • Combinatorics is exponential
  • Need further assumption

6
RNA Structure Prediction DP Algorithm
  • Assume energy for each component can be
    calculated independently
  • a(r,k) free energy for base pair (r,k), where r,
    k from ACGU
  • a is zero for self-pairing (impossible)

7
RNA Structure Prediction DP Algorithm
  • E(Sij) min
  • E(SI1,j-1 ) a(ri,rj), when i,j pairs,
  • MinE(SI,k-1) E(Sk1,j ), when j pairs with
    k, Iltkltj
  • Compute (n x n) matrix for I and j, bottom up,
    for I-j0, I-j1, I-j2,
  • Complexity O(n3)

8
RNA Structure Prediction relax assumptions
  • Consider some special energy functions, other
    than just the base pairing ones a(r,k)
  • This means different types of base pairings
  • Some more practical topology

9
RNA Structure Prediction Loops
  • Say, base pair at (I,j) and Iltultvltwltj
  • v is accessible from base pair (I,j) if there is
    no base pair at (u,v)
  • Loop is the bases accessible from base pair (I,j)
  • Note, still no knot
  • Some loops p249

10
RNA Structure Prediction Energy over loops
  • Say, (I,j) base pair closes a loop
  • Si1,j-1 may not have the minimum energy
    configuration
  • Because energy of Si1,j-1 plus free energy of
    a(ri,rj) may be less than min-energy
    configuration of string (I1 to j-1) without base
    pairing at (I,j)
  • This interactive-ness was ignored at the previous
    assumption level
  • Dynamic Programming can still be done, if we
    explicitly specify energy parameters

11
RNA Structure Prediction Energy over loops
  • E(Sij) min
  • E(SI1,j ), I is not paired
  • E(SI1,j-1 ), j is not paired
  • minE(S,i,k-1) E(Sk1,j ), when i or j pairs
    with k, iltkltj,
  • E(LI,j ), when (I,j) base pairs and all special
    structures may appear within embeds first
    formula of previous assumption

12
RNA Structure Prediction More assumptions
  • Disregard free energies that do not belong to any
    loops
  • Added energy of only components is the final
    energy of the string no interaction between
    components
  • Only 4 types of loops as in p249 for E(LI,j ),
    (can add more, if you know their energy
    parameterization)

13
RNA Structure Prediction free energies for 4
loops
  • Hairpin loop of size k Zi(k)
  • Additional stabilizing energy for two adjacent
    base pairs(in addition to a(r,k)) eta, constant
  • Destabilizing energy for bulge of size k
    beta(k)
  • Destabilizing energy for interior loop of size k
    gamma(k)

14
RNA Structure Prediction E(LI,j )
  • Hairpin a(ri,rj) zi(j-I1)
  • Stacked-pair a(ri,rj)etaE(Si1,j-1)
  • Bulge on i mina(ri,rj)beta(k) E(Sik1,j-1),
    kgt1
  • Bulge on j mina(ri,rj)beta(k) E(Si1,j-k-1),
    kgt1
  • Interior loop mina(ri,rj)gamma(k1k2)
    E(Sik11,j-k2-1), k1,k2gt1

15
RNA Structure Prediction complexity
  • O(n2) table entries
  • On each entry
  • First 2 formulae O(1) leading to O(n2)
  • Third formula O(n) O(n3)
  • 4.1 (E(L) hairpin) O(1) O(n2)
  • 4.2 O(1) O(n2)
  • 4.3 O(n), run on k O(n3)
  • 4.4 O(n), run on k O(n3)
  • 4.5 O(n2), run on k1, k2 O(n4)
  • Final complexity from 4.4 O(n4)

16
Protein Threading
  • Interactions in proteins are between 20x20
    residues, as opposed to 4x4 NAa at most in RNAs
  • Residue interactions are quite non-local, causing
    much more structural complexity
  • Proteins have frequent loops (helices are loops)
  • So, prediction by Ab initio is extremely difficult

17
Protein Threading
  • Number of protein folds are few (1,000 for
    20,000 proteins)
  • Threading map the target sequence over a
    template fold
  • Threading is an alignment problem, Torda, Fig1
  • Find the fold to which target aligns optimally
    (minimum energy function)
  • Needs basic scoring functions as in sequence
    alignment

18
Protein Threading number of folds
  • More the number of folds in database more time
    to find correct template
  • Scoring function for threading is quite
    imperfect need more available templates
    (contradictory requirements)

19
Protein Threading Scoring functions
  • Full force field is not necessarily ideal
  • it involves dynamics between molecules, stretch,
    torsion, etc.
  • Unimportant for a static alignment

20
Protein Threading Scoring functions
  • Scoring function could be between residues from
    the same sequence for coming close to each other
    on the alignment
  • Torda, Fig 5
  • Example scoring function (free energy)
  • For pair of residues A and B to be at distance r
    (Torda, p7)
  • G(AB) kT ln(rho-rAB / rho-0-rAB),
  • rho-rAB is probability of AB to be at distance
    r,
  • rho-0 is probability of random occurrence of
    that (k,T usual)

21
Protein Threading Scoring functions
  • Probabilities are collected from PDB proteins
    with known structure
  • Different threading scheme uses different scoring
    functions, but mostly they are derived from PDB

22
Protein Threading Scoring functions
  • Example (Setubal-Meidanis, p257)
  • G1(I, ti) for placing i-th residue in sequence to
    the ti position in the fold
  • G2(I, j, ti, tj) simultaneous placements of i, j,
    for Iltj
  • Constrained to be within a range, say bilttiltei

23
Protein Threading
  • Optimization is not only on placement, but also
    on multiple folds in database
  • Accuracy is very sensitive to alignment errors

24
Protein Threading Dynamic programming
  • Advantage/disadvantage of DP is that it is
    deterministic
  • Problem adjacency is hard to define in 3D

25
Protein Threading Dynamic programming
  • DP try out different combination of adjacent
    residues on different parts of a template (Torda,
    Fig 5c adjacent comes from template sequence)
  • Start with smaller number of elements and build
    up to the full sequence
  • Alternative approach start with placing each
    residue to one of its possible positions and
    see where next residue should go continue
    residue by residue

26
Protein Threading Probabilistic algorithm
  • Monte Carlo simulation randomly throw residues
    at positions on fold and check aggregate scoring
    function
  • Simulated annealing gradually move residues to
    optimize, stochastically making random shifts to
    avoid local optimum
  • Time consuming, the result is non-deterministic

27
Protein Threading Branch and bound
  • In the worst case try all possible alignments,
    but prune the search space for non-useful
    branches using some bounding function

28
Protein Threading Search on folds
  • Divide and conquer over the space of folds
  • Assumption folds can be ordered for their
    goodness for the target protein
  • Example Setubal-Meidanis, p258

29
Protein Threading Future
  • Slow
  • Subsumed by Ab intio of IBM Blue Gene type
    projects
  • De Novo technique using linear programming (Xu
    and Li, 2003)
  • Threading techniques are not only useful for
    structure prediction but for fold recognition
    problem also no alignment, just find the
    template (fold suggests function)
Write a Comment
User Comments (0)
About PowerShow.com