Robotics Algorithms for the Study of Protein Structure and Motion PowerPoint PPT Presentation

presentation player overlay
About This Presentation
Transcript and Presenter's Notes

Title: Robotics Algorithms for the Study of Protein Structure and Motion


1
Robotics Algorithms for the Study of Protein
Structure and Motion
  • Jean-Claude Latombe
  • Computer Science DepartmentStanford University

2
Protein
  • Long sequence of amino-acids (dozens to
    thousands), from a dictionary of 20 distinct
    amino-acids

3
Central Dogma of Molecular Biology
Physiological conditions aqueous solution,
37C, pH 7, atmospheric pressure
4
Why Proteins?
  • They are the workhorses of living organisms
  • They perform many vital functions, e.g.
  • catalysis of reactions
  • storage of energy
  • transmission of signals
  • building blocks of muscles
  • They raise challenging computational issues
  • Large molecules (100s to several 1000s of atoms)
  • Made of building blocks drawn from a small
    dictionary
  • Unusual kinematic structure
  • They are associated with many critical problems
  • Folded structure determination
  • Global and local structural similarities
  • Prediction of folding and binding motions

5
f-y Kinematic Linkage Model
6
Molecule and Robot
7
Two problems
  • Structure determination from electron density
    maps
  • Inverse kinematics techniques
  • Itay Lotan, Henry van den Bedem, Ashley Deacon
    (Joint Center for Structural Genomics)
  • Energy maintenance during Monte Carlo simulation
  • Collision detection techniques
  • Itay Lotan, Fabian Schwarzer, and Danny
    Halperin (Tel Aviv University)

8
Structure Determination/Prediction
  • Experimental tools
  • Computational tools
  • Homology, threading
  • Molecular dynamics

X-ray crystallography
9
Protein Data Bank
Only about 10 of structures have been
determined for known protein sequences ?
Protein Structure Initiative (PSI)
1990 ? 250 new structures 1999 ? 2500 new
structures 2000 ? gt20,000 structures total 2004 ?
30,000 structures total
10
X-Ray Crystallography
11
Automated Model Building
  • Software systems RESOLVE, TEXTAL, ARP/wARP, MAID
  • 1.0Å lt d lt 2.3Å 90 completeness
  • 2.3Å d lt 3.0Å 67 completeness (varies
    widely)1

1.0Å
3.0Å
JCSG 43 of data sets ? 2.3Å
  • ? Manually completing a model
  • Labor intensive, time consuming
  • Existing tools are highly interactive

? Model completion is high-throughput bottleneck
1Badger (2003) Acta Cryst. D59
12
The Completion Problem
  • Input
  • Electron-density map
  • Partial structure
  • Two anchor residues
  • Amino-acid sequence of missing fragment
    (typically 4 15 residues long)
  • Output
  • Few candidate conformation(s) of fragment that
  • Respect the closure constraint (IK)
  • Maximize match with electron-density map

13
IK Problem
  • Input
  • Closed kinematic chain with n gt 6 degrees of
    freedom
  • Relative positions/orientations X of end frames
  • Target function T(Q) ? R
  • Output
  • Joint angles Q that
  • Achieve closure
  • Optimize T

14
Related Work
  • Biology/Crystallography
  • Exact IK solvers
  • Wedemeyer Scheraga 99
  • Coutsias et al. 04
  • Optimization IK solvers
  • Fine et al. 86
  • Canutescu Dunbrack Jr. 03
  • Ab-initio loop closure
  • Fiser et al. 00
  • Kolodny et al. 03
  • Database search loop closure
  • Jones Thirup 86
  • Van Vlijman Karplus 97
  • Semi-automatic tools
  • Jones Kjeldgaard 97
  • Oldfield 01
  • Robotics/Computer Science
  • Exact IK solvers
  • Manocha Canny 94
  • Manocha et al. 95
  • Optimization IK solvers
  • Wang Chen 91
  • Redundant manipulators
  • Khatib 87
  • Burdick 89
  • Motion planning for closed loops
  • Han Amato 00
  • Yakey et al. 01
  • Cortes et al. 02, 04

15
Two-Stage IK Method
  • Candidate generations? Closed fragments
  • Candidate refinement? Optimize fit with EDM

16
Stage 1 Candidate Generation
  • Generate random conformation of fragment (only
    one end attached to anchor)
  • Close fragment (i.e., bring other end to second
    anchor) using Cyclic Coordinate Descent (CCD)
    (Wang Chen 91, Canutescu Dunbrack 03)

17
Closure Distance
  • Closure Distance

A.A. Canutescu and R.L. Dunbrack Jr.Cyclic
coordinate descent A robotics algorithm for
protein loop closure. Prot. Sci. 12963972,
2003.
Compute bias toward EDM
avoid steric clashes
18
Stage 2 Candidate Refinement
  • Target function T (Q) measuring quality of the
    fit with the EDM
  • Minimize T while retaining closure
  • Closed conformations lie on a self-motion
    manifold of lower dimension

Null space
1-D manifold
19
Closure and Null Space
  • dX J dQ, where J is the 6?n Jacobian matrix (n
    gt 6)
  • Null space dQ J dQ 0 has dim n 6
  • N orthonormal basis of null space
  • Pseudo-inverse J such that JJ I
  • dQ JdX NNTy

20
Computation of J and N
SVD of J
S6?6
dX
U6?6
VT6?n
dQ

J V S UT where Sdiag1/si
21
Refinement Procedure
  • Repeat until minimum is reached
  • Compute J, J and N at current Q
  • Compute ?T at current Q(analytical expression of
    ?T linear-time recursive computation Abe et
    al., Comput. Chem., 1984)
  • Move along dQ JdX NNT ?T until minimum is
    reached or closure is broken

Monte Carlo simulated annealing protocol to
deal with local minima
22
Monte Carlo Optimization
  • Repeat
  • Perform a random move of the fragment
  • either by picking a random direction in null
    space
  • or by using an exact IK solver over 6 dofs
    Coutsias et al, 2004 (? big jumps)
  • Minimize T(Q)
  • Accept move with Metropolis-criterion probability
    exp(-DT/Temp)

23
Tests 1 Artificial Gaps
  • TM1621 (234 residues) and TM0423 (376 residues),
    SCOP classification a/b
  • Complete structures (gold standard) resolved with
    EDM at 1.6Å resolution
  • Compute EDM at 2, 2.5, and 2.8Å resolution
  • Remove fragments and rebuild

24
TM1621 103 Fragments from TM1621 at 2.5Å
Short Fragments 100 lt 1.0Å aaRMSD
Long Fragments 12 96 lt 1.0Å aaRMSD 15 88 lt
1.0Å aaRMSD
Produced by H. van den Bedem
25
Comparison Across Resolutions
Resolution 2.0Å
Resolution 2.8Å
Resolution 2.5Å
26
Example TM0423
PDB 1KQ3, 376 res. 2.0Å resolution 12 residue
gap Best 0.3Å aaRMSD
27
Tests 2 True Gaps
  • Structure computed by RESOLVE
  • Gaps completed independently (gold standard)
  • Example TM1742 (271 residues)
  • 2.4Å resolution 5 gaps left by RESOLVE

Produced by H. van den Bedem
28
TM0813
PDB 1J5X, 342 res. 2.8Å resolution 12 residue gap
GLU-83
GLY-96
29
TM0813
PDB 1J5X, 342 res. 2.8Å resolution 12 residue
gap Best 0.6Å aaRMSD
GLU-83
GLY-96
30
TM1621
  • Green manually completed conformation
  • Cyan conformation computed by stage 1
  • Magenta conformation computed by stage 2
  • The aaRMSD improved by 2.4Å to 0.31Å

31
Alr1529 D72-D78
resolution 2.0Å initial model
ARP/wARP contour 1.0s PDB 1VJG aaRMSD 0.33Å
32
TM0542
  • Top-scoring fragment in cyan
  • Manually completed fragment in green
  • Residues A259 and A260 are flipped

33
Current/Future Work
  • Software actively being used at the JCSG
  • What about multi-modal loops?

34
  • TM0755 data at 1.8Å
  • 8-residue fragment crystallized in 2
    conformations
  • Overlapping density Difficult to interpret
    manually

Algorithm successfully identified and built both
conformations
35
Current/Future Work
  • Software actively being used at the JCSG
  • What about multi-modal loops?
  • Fuzziness in EDM can then be exploited
  • Use EDM to infer probability measure over the
    conformation space of the loop

36
Amylosucrase
J. Cortés, T. Siméon, M. Renaud-Siméon, and V.
Tran. J. Comp. Chemistry, 25956-967, 2004
37
Energy maintenance during Monte Carlo simulation
  • joint work with Itay Lotan, Fabian Schwarzer, and
    Dan Halperin11 Computer Science Department, Tel
    Aviv University

38
Monte Carlo Simulation (MCS)
  • Random walk through conformation space
  • At each attempted step
  • Perturb current conformation at random
  • Accept step with probability
  • The conformations generated by an arbitrarily
    long MCS are Boltzman distributed, i.e.,
  • conformations in V

39
Monte Carlo Simulation (MCS)
  • Used to
  • sample meaningful distributions of conformations
  • generate energetically plausible motion pathways
  • A simulation run may consist of millions of steps
    ? energy must be evaluated frequently
  • Problem How to maintain energy efficiently?

40
Energy Function
  • E S bonded terms S
    non-bonded terms
    S solvation terms
  • Bonded terms - O(n)
  • Non-bonded terms - E.g., e.g. Van der Waals and
    electrostatic- Depend on distances between pairs
    of atoms - O(n2) ? Expensive to compute
  • Solvation terms- May require computing molecular
    surface

41
Non-Bonded Terms
  • Energy terms go to 0 when distance increases ?
    Cutoff distance (6 - 12Å)
  • vdW forces prevent atoms from bunching up ?
    Only O(n) interacting pairs
    HalperinOvermars 98

Problem How to find interacting pairswithout
enumerating all atom pairs?
42
Grid Method
  • Subdivide 3-space into cubic cells
  • Compute cell that contains each atom center
  • Represent grid as hashtable

43
Grid Method
  • T(n) time to build grid
  • O(1) time to find interactive pairs for each atom
  • T(n) to find all interactive pairs of atoms
    HalperinOvermars, 98
  • Asymptotically optimal in worst-case

44
Can we do better on average?
  • Few DOFs are changed at each MC step

simulationof 100,000 attempted steps
45
Can we do better on average?
  • Few DOFs are changed at each MC step
  • Proteins are long chain kinematics
  • Long sub-chains stay rigid at each step
  • Many partial energy sums remain constant

Problem How to retrieve the unchanged partial
sums?
46
Hierarchical Collision Checking
  • Widely used technique in robotics/graphics to
    approximate distances between objects
  • Pre-computation of bounding-volume hierarchy
  • How to update this hierarchy if the objects deform

47
Two New Data Structures
  • ChainTree ? Fast detection of interacting atom
    pairs
  • EnergyTree ? Retrieval of unchanged partial
    energy sums

48
ChainTree(Twofold Hierarchy BVs Transforms)
links
49
ChainTree(Twofold Hierarchy BVs Transforms)
joints
50
Updating the ChainTree
  • Update path to root
  • Recompute transforms that shortcut the DOF
    change
  • Recompute BVs that contain the DOF change
  • O(k log(n/k)) work for k changes

51
Finding Interacting Pairs
??
52
Finding Interacting Pairs
53
Finding Interacting Pairs
  • Do not search inside rigid sub-chains (unmarked
    nodes)

54
Finding Interacting Pairs
  • Do not search inside rigid sub-chains (unmarked
    nodes)
  • Do not test two nodes with no marked node between
    them
  • ? New interacting pairs

55
EnergyTree
E(N,N)
E(K.L)
E(M,M)
E(L,L)
E(J,L)
56
EnergyTree
E(N,N)
E(K.L)
E(M,M)
E(L,L)
E(J,L)
57
Complexity
  • n total number of DOFs
  • k number of DOF changes at each MCS step
  • k ltlt n
  • Complexity of
  • updating ChainTree O(k log(n/k))
  • finding interacting pairs O(n4/3) but performs
    much better in practice!!!

58
Experimental Setup
  • Energy function
  • Van der Waals
  • Electrostatic
  • Attraction between native contacts
  • Cutoff at 12Å
  • 300,000 steps MCS with Grid and ChainTree
  • Steps are the same with both methods
  • Early rejection for large vdW terms

59
Results 1-DOF change
12.5
7.8
speedup
5.8
3.5
amino acids
60
Results 5-DOF change
5.9
speedup
4.5
3.4
2.2
61
Two-Pass ChainTree (ChainTree)
1st pass small cutoff distance to detect steric
clashes 2nd pass normal cutoff distance
Tests around native state
gt5
62
Interaction with Solvent
  • Explicit solvent models 100s or 1000s of
    discrete solvent molecules
  • Implicit solvent models solvent as continuous
    medium, interface is solvent-accessible surface

E. Eyal, D. Halperin. Dynamic Maintenance of
Molecular Surfaces under Conformational Changes.
http//www.give.nl/movie/publications/telaviv/EH0
4.pdf
63
Summary
  • Inverse kinematics techniques ? Improve structure
    determination from fuzzy electron density maps
  • Collision detection techniques ? Speedup energy
    maintenance during Monte Carlo simulation

64
About Computational Biology
  • Computational Biology is more than using
    computers to biological problems or mimicking
    nature (e.g., performing MD simulation)
  • One of its goals is to achieve algorithmic
    efficiency by exploiting properties of molecules,
    e.g.
  • Proteins are long kinematic chains
  • Atoms cannot bunch up together
  • Forces have relatively short ranges
Write a Comment
User Comments (0)
About PowerShow.com