The Probabilistic Roadmap Approach to Study Molecular Motion - PowerPoint PPT Presentation

About This Presentation
Title:

The Probabilistic Roadmap Approach to Study Molecular Motion

Description:

The formation order that appears the most often over all paths is considered the ... If two nodes are closer apart than some e, they are merged into one roadmap ... – PowerPoint PPT presentation

Number of Views:20
Avg rating:3.0/5.0
Slides: 57
Provided by: lato
Category:

less

Transcript and Presenter's Notes

Title: The Probabilistic Roadmap Approach to Study Molecular Motion


1
The Probabilistic Roadmap Approach to Study
Molecular Motion
  • Jean-Claude Latombe
  • Kwan Im Thong Hood Cho Temple Visiting Professor,
    NUS
  • Kumagai Professor, Computer Science, Stanford

2
Molecular motion is an essential process of life
CspA
3
Understanding molecular motion could help cure
many diseases
Mad cow disease is caused by misfolding
Drug molecules act bybinding to proteins
4
As few experimental tools are available,
computational tools are critical
  • Computer simulation
  • Monte Carlo simulation
  • Molecular Dynamics

5
But MD and MC simulation have two major drawbacks
  1. Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways

6
But MD and MC simulation have two major drawbacks
  1. Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways

7
But MD and MC simulation have two major drawbacks
  1. Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways? Interest in ensemble
    properties

8
Example of Ensemble Property Probability of
Folding pfold
Measure kinetic distance to folded state
9
Other Examples of Ensemble Properties
  • Order of formation of secondary structure
    elements
  • Average time for a ligand to escape a binding
    site
  • Folding rate of a protein
  • Key intermediates along folding pathways
  • Etc ...

10
But MD and MC simulation have two major drawbacks
  1. Each simulation run yields a single pathway,
    while molecules tend to move along many different
    pathways? Interest in ensemble properties
  2. Each simulation run tends to waste much time in
    local minima

11
Roadmap-Based Representation
  • Network of conformations connected by local
    motion pathways
  • Compact representation of huge number of motion
    pathways
  • Coarse resolution relative to MC and MD
    simulation
  • Efficient algorithms for analyzing multiple
    pathways

12
Roadmaps for Robot Motion Planning
13
Initial Work Application ofRoadmaps to Ligand
Binding A.P. Singh, J.C. Latombe, and D.L.
Brutlag. A Motion Planning Approach to Flexible
Ligand Binding. Proc. 7th Int. Conf. on
Intelligent Syst. for Molecular Biology (ISMB),
pp. 252-261, 1999
  • The ligand is modeled as a flexible molecule,
    but the protein is assumed rigid
  • A conformation of the ligand is defined by the
    position and orientation of a group of 3 atoms
    relative to the proteinand by the torsional
    angles of the ligand

14
Roadmap Construction (Node Generation)
  • Conformations of the ligand are sampled at random
    around the protein
  • The energy E at each sampled conformation is
    computed
  • E Einteraction Einternal Einteraction
    electrostatic van der Waals potential Einterna
    l Snon-bonded pairs of atoms electrostatic
    van der Waals
  • A sampled conformation is retained as a node with
    probability 0 if E gt Emax
  • Emax-E
  • Emax-Emin
  • 1 if E lt Emin
  • ? Denser distribution of nodes in low-energy
    regions of conformational space

15
Roadmap Construction (Edge Generation)
  • Each node is connected to each of its closest
    neighbors by a straight edge
  • Each edge is discretized at some resolution e (
    1Å)
  • If any E(qi) gt Emax , then the edge is rejected

E
16
Roadmap Construction (Edge Generation)
  • Each node is connected to each of its closest
    neighbors by a straight edge
  • Each edge is discretized at some resolution e (
    1Å)
  • If all E(qi) ? Emax , then the edge is retained
    and is assigned two weights w(q?q) and w(q?q)
  • where
  • (probability that the ligand moves from qi to
    qi1 when it is constrained to move along the
    edge)

17
Querying the Roadmap
  • For a given goal node qg (e.g., binding
    conformation), the Dijkstras single-source
    algorithm computes the lowest-weight paths from
    qg to each node (in either direction) in O(N
    logN) time, where N number of nodes
  • Various quantities can then be easily computed
    in O(N) time, e.g., average weights of all
    paths entering qg and of all paths leaving qg
    ( binding and dissociation rates Kon and Koff)

Protein Lactate dehydrogenase Ligand Oxamate (7
degrees of freedom)
18
Experiments on 3 Complexes
  • PDB ID 1ldm
  • Receptor Lactate Dehydrogenase (2386 atoms, 309
    residues)
  • Ligand Oxamate (6 atoms, 7 dofs)
  • PDB ID 4ts1
  • Receptor Mutant of tyrosyl-transfer-RNA
    synthetase (2423 atoms, 319 residues)
  • Ligand L- leucyl-hydroxylamine (13 atoms, 9
    dofs)
  • PDB ID 1stp
  • Receptor Streptavidin (901 atoms, 121 residues)
  • Ligand Biotin (16 atoms, 11 dofs)

19
Computation of Potential Binding Conformations
  • Sample many (several 1000s) ligands
    conformations at random around protein
  • Repeat several times
  • Select lowest-energy conformations that are
    close to protein surface
  • Resample around them
  • Retain k (10) lowest-energy conformations
    whose centers of mass are at least 5Å apart

lactate dehydrogenase
20
Results for 1ldm
  • Some potential binding sites have slightly lower
    energy than the active site ? Energy is not a
    discriminating factor for recognizing active site
  • Average path weights (energetic difficulty) to
    enter and leave binding site are significantly
    greater for the active site ? Indicates that the
    active site is surrounded by an energy barrier
    that traps the ligand

21
Application of Roadmaps to Protein Folding
N.M. Amato, K.A. Dill, and G. Song. Using Motion
Planning to Map Protein Folding Landscapes and
Analyze Folding Kinetics of Known Native
Structures. J. Comp. Biology, 10(2)239-255, 2003
  • Known native state
  • Degrees of freedom f-? angles
  • Energy van der Waals, hydrogen bonds,
    hydrophobic effect
  • New idea Sampling strategy

22
Sampling Strategy(Node Generation)
  • High dimensionality ? non-uniform sampling
  • Conformations are sampled using Gaussian
    distribution around native state
  • Conformations are sorted into bins by number of
    native contacts (pairs of C? atoms that are
    closeapart in native structure)
  • Sampling ends when all bins have minimum number
    of conformations ? good coverage of
    conformational space

23
Application Order of Formation of Secondary
Structure Elements
  • The lowest-weight path is extracted from each
    denatured conformation to the folded one
  • The order of formation of SSEs is computed along
    each path
  • The formation order that appears the most often
    over all paths is considered the SSE formation
    order of the protein

24
Order of Formation of Secondary Structures along
a Path
  1. The contact matrix showing the time step when
    each native contact appears is built

25
Protein CI2 (1a 4 b)
26
60
5
Protein CI2 (1a 4 b)
27
Order of Formation of Secondary Structures along
a Path
  1. The contact matrix showing the time step when
    each native contact appears is built
  2. The time step at which a structure appears is
    approximated as the average of the appearance
    time steps of its contacts

28
a forms at time step 122 (II) b3 and b4 come
together at 187 (V) b2 and b3 come together at
210 (IV) b1 and b4 come together at 214 (III)
Protein CI2 (1a 4 b)
29
Application Order of Formation of Secondary
Structure Elements
  • The lowest-weight path is extracted from each
    denatured conformation to the folded one
  • The order of formation of SSEs is computed along
    each path
  • The formation order that appears the most often
    over all paths is considered the SSE formation
    order of the protein

30
Comparison with Experimental Data
31
Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag,
C. Guestrin, D. Hsu, J.C. Latombe and C. Varma.
Stochastic Roadmap Simulation An Efficient
Representation and Algorithm for Analyzing
Molecular Motion. J. Comp. Biol.,
10(3-4)257-281, 2003
  • New Idea Capture the stochastic nature of
    molecular motion by assigning probabilities to
    edges

32
Edge Probabilities
Follow Metropolis criteria
Self-transition probability
vj
33
Stochastic Roadmap Simulation
V
Pij
34
Roadmap as Markov Chain
j
Pij
i
  • Transition probability Pij depends only on i and
    j

35
Probability of Folding pfold
Unfolded state
Folded state
36
First-Step Analysis
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
37
First-Step Analysis
  • One linear equation per node
  • Solution gives pfold for all nodes
  • No explicit simulation run
  • All pathways are taken into account
  • Sparse linear system

l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
38
Number of Self-Avoiding Walks on a 2D Grid
1, 2, 12, 184, 8512, 1262816, 575780564,
789360053252, 3266598486981642, (10x10)
41044208702632496804, (11x11) 1568758030464750013
214100, (12x12) 182413291514248049241470885236
gt 1028
http//mathworld.wolfram.com/Self-AvoidingWalk.htm
l
39
In contrast
  • Computing pfold with MC simulation requires
  • For every conformation q of interest
  • Perform many MC simulation runs from q
  • Count number of times F is attained first

40
Computational Tests
  • 1ROP (repressor of primer)
  • 2 a helices
  • 6 DOF
  • 1HDD (Engrailed homeodomain)
  • 3 a helices
  • 12 DOF

H-P energy model with steric clash exclusion Sun
et al., 95
41
pfold for ß hairpin
Immunoglobin binding protein (Protein G) Last 16
amino acids Ca based representation Go model
energy function 42 DOFs Zhou and Karplus,
99
42
Correlation with MC Approach
1ROP
43
Computation Times (ß hairpin)
Monte Carlo (30 simulations)
Over 107 energy computations
10 hours of computer time
1 conformation
Roadmap
50,000 energy computations
23 seconds of computer time
2000 conformations
6 orders of magnitude speedup!
44
Using Path Sampling to Construct Roadmaps N.
Singhal, C.D. Snow, and V.S. Pande. Using Path
Sampling to Build Better Markovian State Models
Predicting the Folding Rate and Mechanism of a
Tryptophan Zipper Beta Hairpin, J. Chemical
Physics, 121(1)415-425, 2004
  • New idea
  • Paths computed with Molecular Dynamics
    simulation techniques are used to create the
    nodes of the roadmap? More pertinent/better
    distributed nodes
  • ? Edges are labeled with the time needed to
    traverse them

45
Sampling Nodes from Computed Paths (Path Shooting)
F
U
46
Sampling Nodes from Computed Paths (Path Shooting)
F
U
47
Node Merging
  • If two nodes are closer apart than some e, they
    are merged into one ? roadmap
  • Rules are applied to update edge probabilities
    and times

48
Application Computation of MFPT
  • Mean First Passage Time the average time when a
    protein first reaches its folded state
  • First-Step Analysis yields
  • MPFT(i) Sj Pij x (tij MPFT(j))
  • MPFT(i) 0 if i ? F
  • Assuming first-order kinetics, the probability
    that a protein folds at time t is
  • where r is the folding rate
  • MFPT 1/r

49
Computational Test
  • 12-residue tryptophan zipper beta hairpin (TZ2)
  • Folding_at_Home used to generate trajectories (fully
    atomistic simulation) ranging from 10 to 450 ns
  • 1750 trajectories (14 reaching folded state)
  • ? 22,400-node roadmap
  • MFPT 2-9 ms, which is similar to experimental
    measurements (from fluorescence and IR)

50
Conformational Analysis of Protein Loops J.
Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran.
Geometric Algorithms for the Conformational
Analysis of Long Protein Loops. J. Comp.
Chemistry, 25956-967, 2004
  • New idea
  • Explore the clash-free subset of the
    conformational space of a loop, by building a
    tree-shaped roadmap
  • Kinematic model f-y angles on the backbone ci
    torsional angles in side-chains

51
  • Amylosucrase (AS)
  • - Only enzyme in its family that acts on
    sucrose substrate
  • The 17-residue loop (named loop 7) between
    Gly433 and Gly449 is
  • believed to play a pivotal role

52
Roadmap Construction
  • A tree-shaped roadmap is created from a start
    conformation qstart
  • At each step of the roadmap construction, a
    conformation qrand of the loop is picked at
    random, and a new roadmap node is created by
    iteratively pulling toward it the existing node
    that is closest to qrand

53
Roadmap Construction
C
Cfree
Cclosed
qstart
Stops when one cant get closer to qrand or a
clash is detected
54
Computational Results
  • Surprisingly, loop 7 cant move much
  • Main bottleneck is residue Asp231

Positions of theCa atom of middleresidue
(Ser441)
55
Computational Results
  • If residue Asp231 is removed, then loop 7s
    mobility increases dramatically. The Ca atom of
    Ser441 can be displaced by more than 9Å from its
    crystallographic position

56
Conclusion
  • Probabilistic roadmaps are a recent, but
    promising tool for exploring conformational
    spaces and computing ensemble properties of
    molecular pathways
  • Current/future research
  • Better sampling strategies able to handle more
    complex molecular models (protein-protein
    binding)
  • More work to include time information in
    roadmaps
  • More thorough experimental validation to compare
    computed and measured quantitative properties
Write a Comment
User Comments (0)
About PowerShow.com