Title: Bio-CS Exploration of Molecular Conformational Spaces
1Bio-CSExploration of Molecular Conformational
Spaces
- Jean-Claude LatombeComputer Science
DepartmentRobotics Laboratory Bio-X Clark
Center
2Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
3Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
Accuray
4Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
5Motion ? Structure
6Motion ? Structure ? Function
Develop efficient algorithms and data
structuresto explore protein conformational
spaces Sampling Similarities
Pathways
7Vision for the Future
- In-silico experiments
- Drugs on demand
- ? Interactive Biology
8Analogy with Robotics
9But Biology ? Robotics
- Energy field, instead of joint control
- Continuous energy field, instead of binary free
and in-collision spaces - Multiple pathways, instead of single
collision-free path - Potentially many more degrees of freedom
- Relation to real world is more complex
10Overview
- Part I Probabilistic Roadmaps A Tool for
Computing Ensemble Properties of Molecular
MotionsM.S. Apaydin, D.L. Brutlag, C. Guestrin,
D. Hsu, J.C. Latombe, and C. Varma. Stochastic
Roadmap Simulation An Efficient Representation
and Algorithm for Analyzing Molecular Motion. J.
Computational Biology, 10(3-4)257-281, 2003. - Part IIChainTree A Data Structure for
Efficient Monte Carlo Simulation of ProteinsI.
Lotan, F. Schwarzer, J.C. Latombe. Efficient
Energy Computation for Monte Carlo Simulation of
Proteins. 3rd Workshop on Algorithms in
Bioinformatics (WABI), Budapest, Hungary, Sept.,
2003.
11Part I Probabilistic Roadmaps A Tool for
Computing Ensemble Properties of Molecular Motions
- Serkan Apaydin, Doug Brutlag1, Carlos Guestrin,
David Hsu2, Jean-Claude Latombe, Chris Varma - Computer Science Department
- Stanford University
- 1 Department of Biochemistry, Stanford University
- 2 Computer Science Department, Nat. Univ. of
Singapore
12Initial WorkSingh, Latombe, Brutlag, 99
- Study of ligand-protein binding
- Probabilistic roadmaps with edges weighted by
energetic plausibility
13Initial WorkSingh, Latombe, Brutlag, 99
- Study of ligand-protein binding
- Probabilistic roadmaps with edges weighted by
energetic plausibility - Search of most plausible path
14Initial WorkSingh, Latombe, Brutlag, 99
- Study of energy profiles along most plausible
paths -
- Extensions to protein foldingSong and Amato,
01 Apaydin et al., 01 - But Molecules fold/bind along a myriad of
pathways. Any single pathway is of limited
interest.
15New Idea Capture the stochastic nature of
molecular motion by assigning probabilities to
edges
16Edge probabilities
Follow Metropolis criteria
Self-transition probability
vj
17Stochastic Roadmap Simulation
S
Pij
Stochastic simulation on roadmap and Monte Carlo
simulation converge to same Boltzmann distribution
18Problems with Monte Carlo Simulation
- Much time is wasted escaping local minima
- Each run generates a single pathway
19Proposed Solution
Treat a roadmap as a Markov chain and use
First-Step Analysis tool
20Example 1 Probability of Folding pfold
HIV integrase
Du et al. 98
We stress that we do not suggest using pfold as
a transition coordinate for practical purposes as
it is very computationally intensive. Du,
Pande, Grosberg, Tanaka, and Shakhnovich On the
Transition Coordinate for Protein Folding
Journal of Chemical Physics (1998).
Unfolded state
Folded state
21First-Step Analysis
- One linear equation per node
- Solution gives pfold for all nodes
- No explicit simulation run
- All pathways are taken into account
- Sparse linear system
l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
22In Contrast
- Computing pfold with MC simulation requires
- For every conformation c of interest
- Perform many MC simulation runs from c
- Count number of times F is attained first
23Computational Tests
- 1ROP (repressor of primer)
- 2 a helices
- 6 DOF
- 1HDD (Engrailed homeodomain)
- 3 a helices
- 12 DOF
H-P energy model with steric clash exclusion Sun
et al., 95
24Correlation with MC Approach
1ROP
25Computation Times (1ROP)
Monte Carlo
Over 106 energy computations
Over 11 days of computer time
49 conformations
Roadmap
15,000 energy computations
1.5 hours of computer time
5000 conformations
4 orders of magnitude speedup!
26Example 2 Ligand-Protein Interaction
Computation of escape time from funnels of
attraction around potential binding
sites funnel ball of 10Ã… rmsd Camacho, Vajda,
01
27Similar Computation Through Simulation Sept,
Elcock and McCammon 99
10K to 30K independent simulations
28Computing Escape Time with Roadmap
l
k
Pil
Pik
m
Pij
j
Pim
i
Pii
Funnel of Attraction
ti 1 Pii ti Pij tj Pik tk Pil tl Pim
tm (escape time is measured as number of
stepsof stochastic simulation)
0
29Distinguishing Catalytic Site
- Given several potential binding sites,which one
is the catalytic site?
Energy electrostatic van der Waals solvation
free energy terms
30Complexes Studied
ligand protein random nodes DOFs
oxamate 1ldm 8000 7
Streptavidin 1stp 8000 11
Hydroxylamine 4ts1 8000 9
COT 1cjw 8000 21
THK 1aid 8000 14
IPM 1ao5 8000 10
PTI 3tpi 8000 13
31Distinction Based on Energy
Protein Bound state Best potential binding site
1stp -15.1 -14.6
4ts1 -19.4 -14.6
3tpi -25.2 -16.0
1ldm -11.8 -13.6
1cjw -11.7 -18.0
1aid -11.2 -22.2
1ao5 -7.5 -13.1
Able to distinguish catalytic site
Not able
(kcal/mol)
32Distinction Based on Escape Time
Protein Bound state Best potential binding site
1stp 3.4E9 1.1E7
4ts1 3.8E10 1.8E6
3tpi 1.3E11 5.9E5
1ldm 8.1E5 3.4E6
1cjw 5.4E8 4.2E6
1aid 9.7E5 1.6E8
1ao5 6.6E7 5.7E6
Able to distinguish catalytic site
Not able
( steps)
33Conclusion
- Probabilistic roadmaps are a promising tool for
computing ensemble properties of molecular
pathways - Current work
- Non-uniform sampling strategies to handle more
complex molecules - More realistic energetic models
- Extension to molecular dynamic simulation
- Connection to in-vitro experiments(interaction
of two proteins)
34Part II ChainTree A Data Structure for
Efficient Monte Carlo Simulation of Proteins
- Itay Lotan, Fabian Schwarzer, Dan Halperin1,
Jean-Claude LatombeComputer Science Department - Stanford University
- 1 Computer Science Department, Tel Aviv University
35Monte Carlo Simulation (MCS)
- Used to study thermodynamic and kinetic
properties of proteins - Random walk through conformation space
- At each attempted step
- Perturb current conformation at random
- Accept step with probability
- Problem How to maintain energy efficiently?
36Energy Function
- E S bonded terms S non-bonded terms
- Bonded terms, e.g. bond length Easy to compute
- Non-bonded terms, e.g. Van der Waals, depend on
distances between pairs of atomsExpensive to
compute, O(n2)
37Energy Function
- Non-bonded terms? Use cutoff distance (6 -
12Ã…)? Only O(n) interacting pairs Halperin
Overmars 98
Problem How to find interacting pairswithout
enumerating all atom pairs?
38Grid Method
- Subdivide space into cubic cells
- Compute cell that contains each atom center
- Store results in hash table
- T(n) time to update grid
- O(1) time to find interactions for each atom
- T(n) to find all interactions
Asymptotically optimal in worst-case!
39Can We Do Better on Average?
- Proteins are long kinematic chains
40Proteins Kinematic Structure
torsional dof
- Angles j, y for backbone and c for
side-chains - Conformational space
41Can We Do Better on Average?
- Proteins are long chain kinematics
- Few DOFs are perturbed at each MC step
How to retrieve unchanged partial sums?
- Long sub-chains stay rigid at each step
- Many partial energy sums remain constant
42Two New Data Structures
- ChainTree ? Fast detection of interacting atom
pairs -
- EnergyTree ? Reuse of unchanged partial energy
sums
43ChainTree
- Combination of two hierarchies
- Transform hierarchy
- Bounding volume hierarchy
44ChainTree
- Combination of two hierarchies
- Transform hierarchy approximate kinematics of
protein backbone at successive resolutions
45ChainTree
- Combination of two hierarchies
- Bounding volume hierarchy approximate geometry
of protein at successive resolutions
(Larsen et al., 00)
46ChainTree
47Updating the ChainTree
- Update path to root
- Recompute transforms that shortcut change
- Recompute bounding volumes that contain change
48Finding Interacting Pairs
vs.
- Do not search inside rigid sub-chains (unmarked
nodes) - Do not test two nodes with no marked node between
them
49Finding Interacting Pairs
vs.
- Do not search inside rigid sub-chains (unmarked
nodes) - Do not test two nodes with no marked node between
them
50Computational Complexity
- n total number of DOFs in protein backbone
- k number of simultaneous DOF changes at each
step of MCS - Updating complexity
- Worst-case complexity of finding all interacting
pairsbut performs much better in practice!!!
51EnergyTree
52EnergyTree
53Experimental Setup
- Energy function
- Van der Waals
- Electrostatic
- Attraction between native contacts
- Cutoff at 12Ã…
- 300,000 steps MCS
- Early rejection for large vdW terms
54Results 1-DOF change
55Results 5-DOF change
56Two-Pass ChainTree
57Conclusion
- Chain/EnergyTree reduces average time per step in
MCS of proteins (vs. grid) - Exploit chain kinematics of protein
- Larger speed-up for bigger proteins and for
smaller number of simultaneous DOF changes
58What is Computational Biology?
- Using computers in Biology?
- Designing efficient algorithms for analyzing
biological data and simulating biological
processes? - Using Biology to design new algorithms and
computing hardware? - Cultural clash Biology ? classification Compu
ter Science ? abstraction - In any case, Computational Biology will be a
critical domain for the next 20 years, probably
the next big thing after the Internet