Title: The Probabilistic Roadmap Approach to Study Molecular Motion
1The Probabilistic Roadmap Approach to Study
Molecular Motion
- Jean-Claude Latombe
- Kwan Im Thong Hood Cho Temple Visiting Professor,
NUS - Kumagai Professor, Computer Science, Stanford
2Molecular motion is an essential process of life
CspA
3Understanding molecular motion could help cure
many diseases
Mad cow disease is caused by misfolding
Drug molecules act bybinding to proteins
4As few experimental tools are available,
computational tools are critical
- Computer simulation
- Monte Carlo simulation
- Molecular Dynamics
5But MD and MC simulation have two major drawbacks
- Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways
6But MD and MC simulation have two major drawbacks
- Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways
7But MD and MC simulation have two major drawbacks
- Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways? Interest in ensemble
properties
8Example of Ensemble Property Probability of
Folding pfold
Measure kinetic distance to folded state
9Other Examples of Ensemble Properties
- Order of formation of secondary structure
elements - Average time for a ligand to escape a binding
site - Folding rate of a protein
- Key intermediates along folding pathways
- Etc ...
10But MD and MC simulation have two major drawbacks
- Each simulation run yields a single pathway,
while molecules tend to move along many different
pathways? Interest in ensemble properties - Each simulation run tends to waste much time in
local minima
11Roadmap-Based Representation
- Network of conformations connected by local
motion pathways - Compact representation of huge number of motion
pathways - Coarse resolution relative to MC and MD
simulation - Efficient algorithms for analyzing multiple
pathways
12Roadmaps for Robot Motion Planning
13Initial Work Application ofRoadmaps to Ligand
Binding A.P. Singh, J.C. Latombe, and D.L.
Brutlag. A Motion Planning Approach to Flexible
Ligand Binding. Proc. 7th Int. Conf. on
Intelligent Syst. for Molecular Biology (ISMB),
pp. 252-261, 1999
- The ligand is modeled as a flexible molecule,
but the protein is assumed rigid - A conformation of the ligand is defined by the
position and orientation of a group of 3 atoms
relative to the proteinand by the torsional
angles of the ligand
14Roadmap Construction (Node Generation)
- Conformations of the ligand are sampled at random
around the protein - The energy E at each sampled conformation is
computed - E Einteraction Einternal Einteraction
electrostatic van der Waals potential Einterna
l Snon-bonded pairs of atoms electrostatic
van der Waals - A sampled conformation is retained as a node with
probability 0 if E gt Emax - Emax-E
- Emax-Emin
- 1 if E lt Emin
- ? Denser distribution of nodes in low-energy
regions of conformational space
15Roadmap Construction (Edge Generation)
- Each node is connected to each of its closest
neighbors by a straight edge - Each edge is discretized at some resolution e (
1Å) - If any E(qi) gt Emax , then the edge is rejected
E
16Roadmap Construction (Edge Generation)
- Each node is connected to each of its closest
neighbors by a straight edge - Each edge is discretized at some resolution e (
1Å) - If all E(qi) ? Emax , then the edge is retained
and is assigned two weights w(q?q) and w(q?q) - where
- (probability that the ligand moves from qi to
qi1 when it is constrained to move along the
edge)
17Querying the Roadmap
- For a given goal node qg (e.g., binding
conformation), the Dijkstras single-source
algorithm computes the lowest-weight paths from
qg to each node (in either direction) in O(N
logN) time, where N number of nodes - Various quantities can then be easily computed
in O(N) time, e.g., average weights of all
paths entering qg and of all paths leaving qg
( binding and dissociation rates Kon and Koff)
Protein Lactate dehydrogenase Ligand Oxamate (7
degrees of freedom)
18Experiments on 3 Complexes
- PDB ID 1ldm
- Receptor Lactate Dehydrogenase (2386 atoms, 309
residues) - Ligand Oxamate (6 atoms, 7 dofs)
- PDB ID 4ts1
- Receptor Mutant of tyrosyl-transfer-RNA
synthetase (2423 atoms, 319 residues) - Ligand L- leucyl-hydroxylamine (13 atoms, 9
dofs) - PDB ID 1stp
- Receptor Streptavidin (901 atoms, 121 residues)
- Ligand Biotin (16 atoms, 11 dofs)
19Computation of Potential Binding Conformations
- Sample many (several 1000s) ligands
conformations at random around protein - Repeat several times
- Select lowest-energy conformations that are
close to protein surface - Resample around them
- Retain k (10) lowest-energy conformations
whose centers of mass are at least 5Å apart
lactate dehydrogenase
20Results for 1ldm
- Some potential binding sites have slightly lower
energy than the active site ? Energy is not a
discriminating factor for recognizing active site - Average path weights (energetic difficulty) to
enter and leave binding site are significantly
greater for the active site ? Indicates that the
active site is surrounded by an energy barrier
that traps the ligand
21Application of Roadmaps to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion
Planning to Map Protein Folding Landscapes and
Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2)239-255, 2003
- Known native state
- Degrees of freedom f-? angles
- Energy van der Waals, hydrogen bonds,
hydrophobic effect - New idea Sampling strategy
22Sampling Strategy(Node Generation)
- High dimensionality ? non-uniform sampling
- Conformations are sampled using Gaussian
distribution around native state - Conformations are sorted into bins by number of
native contacts (pairs of C? atoms that are
closeapart in native structure) - Sampling ends when all bins have minimum number
of conformations ? good coverage of
conformational space
23Application Order of Formation of Secondary
Structure Elements
- The lowest-weight path is extracted from each
denatured conformation to the folded one - The order of formation of SSEs is computed along
each path - The formation order that appears the most often
over all paths is considered the SSE formation
order of the protein
24Order of Formation of Secondary Structures along
a Path
- The contact matrix showing the time step when
each native contact appears is built
25Protein CI2 (1a 4 b)
2660
5
Protein CI2 (1a 4 b)
27Order of Formation of Secondary Structures along
a Path
- The contact matrix showing the time step when
each native contact appears is built - The time step at which a structure appears is
approximated as the average of the appearance
time steps of its contacts
28a forms at time step 122 (II) b3 and b4 come
together at 187 (V) b2 and b3 come together at
210 (IV) b1 and b4 come together at 214 (III)
Protein CI2 (1a 4 b)
29Application Order of Formation of Secondary
Structure Elements
- The lowest-weight path is extracted from each
denatured conformation to the folded one - The order of formation of SSEs is computed along
each path - The formation order that appears the most often
over all paths is considered the SSE formation
order of the protein
30Comparison with Experimental Data
31Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag,
C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation An Efficient
Representation and Algorithm for Analyzing
Molecular Motion. J. Comp. Biol.,
10(3-4)257-281, 2003
- New Idea Capture the stochastic nature of
molecular motion by assigning probabilities to
edges
32Edge Probabilities
Follow Metropolis criteria
Self-transition probability
vj
33Stochastic Roadmap Simulation
V
Pij
34Roadmap as Markov Chain
j
Pij
i
- Transition probability Pij depends only on i and
j
35Probability of Folding pfold
Unfolded state
Folded state
36First-Step Analysis
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
37First-Step Analysis
- One linear equation per node
- Solution gives pfold for all nodes
- No explicit simulation run
- All pathways are taken into account
- Sparse linear system
l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
38Number of Self-Avoiding Walks on a 2D Grid
1, 2, 12, 184, 8512, 1262816, 575780564,
789360053252, 3266598486981642, (10x10)
41044208702632496804, (11x11) 1568758030464750013
214100, (12x12) 182413291514248049241470885236
gt 1028
http//mathworld.wolfram.com/Self-AvoidingWalk.htm
l
39In contrast
- Computing pfold with MC simulation requires
- For every conformation q of interest
- Perform many MC simulation runs from q
- Count number of times F is attained first
40Computational Tests
- 1ROP (repressor of primer)
- 2 a helices
- 6 DOF
- 1HDD (Engrailed homeodomain)
- 3 a helices
- 12 DOF
H-P energy model with steric clash exclusion Sun
et al., 95
41pfold for ß hairpin
Immunoglobin binding protein (Protein G) Last 16
amino acids Ca based representation Go model
energy function 42 DOFs Zhou and Karplus,
99
42Correlation with MC Approach
1ROP
43Computation Times (ß hairpin)
Monte Carlo (30 simulations)
Over 107 energy computations
10 hours of computer time
1 conformation
Roadmap
50,000 energy computations
23 seconds of computer time
2000 conformations
6 orders of magnitude speedup!
44Using Path Sampling to Construct Roadmaps N.
Singhal, C.D. Snow, and V.S. Pande. Using Path
Sampling to Build Better Markovian State Models
Predicting the Folding Rate and Mechanism of a
Tryptophan Zipper Beta Hairpin, J. Chemical
Physics, 121(1)415-425, 2004
- New idea
- Paths computed with Molecular Dynamics
simulation techniques are used to create the
nodes of the roadmap? More pertinent/better
distributed nodes - ? Edges are labeled with the time needed to
traverse them
45Sampling Nodes from Computed Paths (Path Shooting)
F
U
46Sampling Nodes from Computed Paths (Path Shooting)
F
U
47Node Merging
- If two nodes are closer apart than some e, they
are merged into one ? roadmap - Rules are applied to update edge probabilities
and times
48Application Computation of MFPT
- Mean First Passage Time the average time when a
protein first reaches its folded state - First-Step Analysis yields
- MPFT(i) Sj Pij x (tij MPFT(j))
- MPFT(i) 0 if i ? F
- Assuming first-order kinetics, the probability
that a protein folds at time t is - where r is the folding rate
- MFPT 1/r
49Computational Test
- 12-residue tryptophan zipper beta hairpin (TZ2)
- Folding_at_Home used to generate trajectories (fully
atomistic simulation) ranging from 10 to 450 ns - 1750 trajectories (14 reaching folded state)
- ? 22,400-node roadmap
- MFPT 2-9 ms, which is similar to experimental
measurements (from fluorescence and IR)
50Conformational Analysis of Protein Loops J.
Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran.
Geometric Algorithms for the Conformational
Analysis of Long Protein Loops. J. Comp.
Chemistry, 25956-967, 2004
- New idea
- Explore the clash-free subset of the
conformational space of a loop, by building a
tree-shaped roadmap - Kinematic model f-y angles on the backbone ci
torsional angles in side-chains
51- Amylosucrase (AS)
- - Only enzyme in its family that acts on
sucrose substrate - The 17-residue loop (named loop 7) between
Gly433 and Gly449 is - believed to play a pivotal role
52Roadmap Construction
- A tree-shaped roadmap is created from a start
conformation qstart - At each step of the roadmap construction, a
conformation qrand of the loop is picked at
random, and a new roadmap node is created by
iteratively pulling toward it the existing node
that is closest to qrand
53Roadmap Construction
C
Cfree
Cclosed
qstart
Stops when one cant get closer to qrand or a
clash is detected
54Computational Results
- Surprisingly, loop 7 cant move much
- Main bottleneck is residue Asp231
Positions of theCa atom of middleresidue
(Ser441)
55Computational Results
- If residue Asp231 is removed, then loop 7s
mobility increases dramatically. The Ca atom of
Ser441 can be displaced by more than 9Å from its
crystallographic position
56Conclusion
- Probabilistic roadmaps are a recent, but
promising tool for exploring conformational
spaces and computing ensemble properties of
molecular pathways - Current/future research
- Better sampling strategies able to handle more
complex molecular models (protein-protein
binding) - More work to include time information in
roadmaps - More thorough experimental validation to compare
computed and measured quantitative properties