Structure Prediction - PowerPoint PPT Presentation

About This Presentation

Title:

Structure Prediction

Description:

Structure Prediction dmitra – PowerPoint PPT presentation

Number of Views:160

Avg rating:3.0/5.0

Slides: 30

Provided by: fit63

Learn more at: https://cs.fit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Structure Prediction

1
Structure Prediction

dmitra

2
Methods

Ab initio
Heuristics
Machine learning
Homology modeling
Threading

3
RNA Structure Prediction Ab-initio

Sequence over A, C, G, U
Complementary pairs attract, form base-pairs or
minimizes energy
We are not interested in overall energy of the
sequence, just the process of minimization
Just the linear sequence, zero base pairs,
energy0
Physics is embedded within free-energy
parameter/function
Minimization of energy is objective

4
RNA Structure Prediction Knot-free

Knot-free assumption
Knot base pairs (I, j) and (k, l) where Iltjltkltl
Knot-free causes planar graph, and makes DP
algorithm feasible
Base pairs are disjoint or embed in each other

5
RNA Structure Prediction Principle of optimality

Assumption 1 Base-pairing do not affect each
others energy
Now one can add energy minimization by all base
pairs in a string and check which configuration
produces lowest energy
Combinatorics is exponential
Need further assumption

6
RNA Structure Prediction DP Algorithm

Assume energy for each component can be
calculated independently
a(r,k) free energy for base pair (r,k), where r,
k from ACGU
a is zero for self-pairing (impossible)

7
RNA Structure Prediction DP Algorithm

E(Sij) min
E(SI1,j-1 ) a(ri,rj), when i,j pairs,
MinE(SI,k-1) E(Sk1,j ), when j pairs with
k, Iltkltj
Compute (n x n) matrix for I and j, bottom up,
for I-j0, I-j1, I-j2,
Complexity O(n3)

8
RNA Structure Prediction relax assumptions

Consider some special energy functions, other
than just the base pairing ones a(r,k)
This means different types of base pairings
Some more practical topology

9
RNA Structure Prediction Loops

Say, base pair at (I,j) and Iltultvltwltj
v is accessible from base pair (I,j) if there is
no base pair at (u,v)
Loop is the bases accessible from base pair (I,j)
Note, still no knot
Some loops p249

10
RNA Structure Prediction Energy over loops

Say, (I,j) base pair closes a loop
Si1,j-1 may not have the minimum energy
configuration
Because energy of Si1,j-1 plus free energy of
a(ri,rj) may be less than min-energy
configuration of string (I1 to j-1) without base
pairing at (I,j)
This interactive-ness was ignored at the previous
assumption level
Dynamic Programming can still be done, if we
explicitly specify energy parameters

11
RNA Structure Prediction Energy over loops

E(Sij) min
E(SI1,j ), I is not paired
E(SI1,j-1 ), j is not paired
minE(S,i,k-1) E(Sk1,j ), when i or j pairs
with k, iltkltj,
E(LI,j ), when (I,j) base pairs and all special
structures may appear within embeds first
formula of previous assumption

12
RNA Structure Prediction More assumptions

Disregard free energies that do not belong to any
loops
Added energy of only components is the final
energy of the string no interaction between
components
Only 4 types of loops as in p249 for E(LI,j ),
(can add more, if you know their energy
parameterization)

13
RNA Structure Prediction free energies for 4
loops

Hairpin loop of size k Zi(k)
Additional stabilizing energy for two adjacent
base pairs(in addition to a(r,k)) eta, constant
Destabilizing energy for bulge of size k
beta(k)
Destabilizing energy for interior loop of size k
gamma(k)

14
RNA Structure Prediction E(LI,j )

Hairpin a(ri,rj) zi(j-I1)
Stacked-pair a(ri,rj)etaE(Si1,j-1)
Bulge on i mina(ri,rj)beta(k) E(Sik1,j-1),
kgt1
Bulge on j mina(ri,rj)beta(k) E(Si1,j-k-1),
kgt1
Interior loop mina(ri,rj)gamma(k1k2)
E(Sik11,j-k2-1), k1,k2gt1

15
RNA Structure Prediction complexity

O(n2) table entries
On each entry
First 2 formulae O(1) leading to O(n2)
Third formula O(n) O(n3)
4.1 (E(L) hairpin) O(1) O(n2)
4.2 O(1) O(n2)
4.3 O(n), run on k O(n3)
4.4 O(n), run on k O(n3)
4.5 O(n2), run on k1, k2 O(n4)
Final complexity from 4.4 O(n4)

16
Protein Threading

Interactions in proteins are between 20x20
residues, as opposed to 4x4 NAa at most in RNAs
Residue interactions are quite non-local, causing
much more structural complexity
Proteins have frequent loops (helices are loops)
So, prediction by Ab initio is extremely difficult

17
Protein Threading

Number of protein folds are few (1,000 for
20,000 proteins)
Threading map the target sequence over a
template fold
Threading is an alignment problem, Torda, Fig1
Find the fold to which target aligns optimally
(minimum energy function)
Needs basic scoring functions as in sequence
alignment

18
Protein Threading number of folds

More the number of folds in database more time
to find correct template
Scoring function for threading is quite
imperfect need more available templates
(contradictory requirements)

19
Protein Threading Scoring functions

Full force field is not necessarily ideal
it involves dynamics between molecules, stretch,
torsion, etc.
Unimportant for a static alignment

20
Protein Threading Scoring functions

Scoring function could be between residues from
the same sequence for coming close to each other
on the alignment
Torda, Fig 5
Example scoring function (free energy)
For pair of residues A and B to be at distance r
(Torda, p7)
G(AB) kT ln(rho-rAB / rho-0-rAB),
rho-rAB is probability of AB to be at distance
r,
rho-0 is probability of random occurrence of
that (k,T usual)

21
Protein Threading Scoring functions

Probabilities are collected from PDB proteins
with known structure
Different threading scheme uses different scoring
functions, but mostly they are derived from PDB

22
Protein Threading Scoring functions

Example (Setubal-Meidanis, p257)
G1(I, ti) for placing i-th residue in sequence to
the ti position in the fold
G2(I, j, ti, tj) simultaneous placements of i, j,
for Iltj
Constrained to be within a range, say bilttiltei

23
Protein Threading

Optimization is not only on placement, but also
on multiple folds in database
Accuracy is very sensitive to alignment errors

24
Protein Threading Dynamic programming

Advantage/disadvantage of DP is that it is
deterministic
Problem adjacency is hard to define in 3D

25
Protein Threading Dynamic programming

DP try out different combination of adjacent
residues on different parts of a template (Torda,
Fig 5c adjacent comes from template sequence)
Start with smaller number of elements and build
up to the full sequence
Alternative approach start with placing each
residue to one of its possible positions and
see where next residue should go continue
residue by residue

26
Protein Threading Probabilistic algorithm

Monte Carlo simulation randomly throw residues
at positions on fold and check aggregate scoring
function
Simulated annealing gradually move residues to
optimize, stochastically making random shifts to
avoid local optimum
Time consuming, the result is non-deterministic

27
Protein Threading Branch and bound

In the worst case try all possible alignments,
but prune the search space for non-useful
branches using some bounding function

28
Protein Threading Search on folds

Divide and conquer over the space of folds
Assumption folds can be ordered for their
goodness for the target protein
Example Setubal-Meidanis, p258

29
Protein Threading Future

Slow
Subsumed by Ab intio of IBM Blue Gene type
projects
De Novo technique using linear programming (Xu
and Li, 2003)
Threading techniques are not only useful for
structure prediction but for fold recognition
problem also no alignment, just find the
template (fold suggests function)

Write a Comment

User Comments (0)