Bio-CS Exploration of Molecular Conformational Spaces - PowerPoint PPT Presentation

About This Presentation
Title:

Bio-CS Exploration of Molecular Conformational Spaces

Description:

A Tool for Computing Ensemble Properties of Molecular Motions ... Capture the stochastic nature of molecular motion by assigning probabilities to edges ... – PowerPoint PPT presentation

Number of Views:26
Avg rating:3.0/5.0
Slides: 59
Provided by: lato
Category:

less

Transcript and Presenter's Notes

Title: Bio-CS Exploration of Molecular Conformational Spaces


1
Bio-CSExploration of Molecular Conformational
Spaces
  • Jean-Claude LatombeComputer Science
    DepartmentRobotics Laboratory Bio-X Clark
    Center

2
Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
3
Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
Accuray
4
Range of Bio-CS Research
Body system
Robotic surgery
Tissue/Organs
Soft-tissue simulation andsurgical training
Cells
Simulation ofcell interaction
Molecules
Molecular structures,similaritiesand motions
Gene
5
Motion ? Structure
6
Motion ? Structure ? Function
Develop efficient algorithms and data
structuresto explore protein conformational
spaces Sampling Similarities
Pathways
7
Vision for the Future
  • In-silico experiments
  • Drugs on demand
  • ? Interactive Biology

8
Analogy with Robotics
9
But Biology ? Robotics
  • Energy field, instead of joint control
  • Continuous energy field, instead of binary free
    and in-collision spaces
  • Multiple pathways, instead of single
    collision-free path
  • Potentially many more degrees of freedom
  • Relation to real world is more complex

10
Overview
  • Part I Probabilistic Roadmaps A Tool for
    Computing Ensemble Properties of Molecular
    MotionsM.S. Apaydin, D.L. Brutlag, C. Guestrin,
    D. Hsu, J.C. Latombe, and C. Varma. Stochastic
    Roadmap Simulation An Efficient Representation
    and Algorithm for Analyzing Molecular Motion. J.
    Computational Biology, 10(3-4)257-281, 2003.
  • Part IIChainTree A Data Structure for
    Efficient Monte Carlo Simulation of ProteinsI.
    Lotan, F. Schwarzer, J.C. Latombe. Efficient
    Energy Computation for Monte Carlo Simulation of
    Proteins. 3rd Workshop on Algorithms in
    Bioinformatics (WABI), Budapest, Hungary, Sept.,
    2003.

11
Part I Probabilistic Roadmaps A Tool for
Computing Ensemble Properties of Molecular Motions
  • Serkan Apaydin, Doug Brutlag1, Carlos Guestrin,
    David Hsu2, Jean-Claude Latombe, Chris Varma
  • Computer Science Department
  • Stanford University
  • 1 Department of Biochemistry, Stanford University
  • 2 Computer Science Department, Nat. Univ. of
    Singapore

12
Initial WorkSingh, Latombe, Brutlag, 99
  • Study of ligand-protein binding
  • Probabilistic roadmaps with edges weighted by
    energetic plausibility

13
Initial WorkSingh, Latombe, Brutlag, 99
  • Study of ligand-protein binding
  • Probabilistic roadmaps with edges weighted by
    energetic plausibility
  • Search of most plausible path

14
Initial WorkSingh, Latombe, Brutlag, 99
  • Study of energy profiles along most plausible
    paths
  • Extensions to protein foldingSong and Amato,
    01 Apaydin et al., 01
  • But Molecules fold/bind along a myriad of
    pathways. Any single pathway is of limited
    interest.

15
New Idea Capture the stochastic nature of
molecular motion by assigning probabilities to
edges
16
Edge probabilities
Follow Metropolis criteria
Self-transition probability
vj
17
Stochastic Roadmap Simulation
S
Pij
Stochastic simulation on roadmap and Monte Carlo
simulation converge to same Boltzmann distribution
18
Problems with Monte Carlo Simulation
  • Much time is wasted escaping local minima
  • Each run generates a single pathway

19
Proposed Solution
Treat a roadmap as a Markov chain and use
First-Step Analysis tool
20
Example 1 Probability of Folding pfold
HIV integrase
Du et al. 98
We stress that we do not suggest using pfold as
a transition coordinate for practical purposes as
it is very computationally intensive. Du,
Pande, Grosberg, Tanaka, and Shakhnovich On the
Transition Coordinate for Protein Folding
Journal of Chemical Physics (1998).
Unfolded state
Folded state
21
First-Step Analysis
  • One linear equation per node
  • Solution gives pfold for all nodes
  • No explicit simulation run
  • All pathways are taken into account
  • Sparse linear system

l
k
j
Pik
Pil
Pij
m
Pim
i
Pii
Let fi pfold(i) After one step fi Pii fi
Pij fj Pik fk Pil fl Pim fm
22
In Contrast
  • Computing pfold with MC simulation requires
  • For every conformation c of interest
  • Perform many MC simulation runs from c
  • Count number of times F is attained first

23
Computational Tests
  • 1ROP (repressor of primer)
  • 2 a helices
  • 6 DOF
  • 1HDD (Engrailed homeodomain)
  • 3 a helices
  • 12 DOF

H-P energy model with steric clash exclusion Sun
et al., 95
24
Correlation with MC Approach
1ROP
25
Computation Times (1ROP)
Monte Carlo
Over 106 energy computations
Over 11 days of computer time
49 conformations
Roadmap
15,000 energy computations
1.5 hours of computer time
5000 conformations
4 orders of magnitude speedup!
26
Example 2 Ligand-Protein Interaction
Computation of escape time from funnels of
attraction around potential binding
sites funnel ball of 10Ã… rmsd Camacho, Vajda,
01
27
Similar Computation Through Simulation Sept,
Elcock and McCammon 99
10K to 30K independent simulations
28
Computing Escape Time with Roadmap
l
k
Pil
Pik
m
Pij
j
Pim
i
Pii
Funnel of Attraction
ti 1 Pii ti Pij tj Pik tk Pil tl Pim
tm (escape time is measured as number of
stepsof stochastic simulation)
0
29
Distinguishing Catalytic Site
  • Given several potential binding sites,which one
    is the catalytic site?

Energy electrostatic van der Waals solvation
free energy terms
30
Complexes Studied
ligand protein random nodes DOFs
oxamate 1ldm 8000 7
Streptavidin 1stp 8000 11
Hydroxylamine 4ts1 8000 9
COT 1cjw 8000 21
THK 1aid 8000 14
IPM 1ao5 8000 10
PTI 3tpi 8000 13
31
Distinction Based on Energy
Protein Bound state Best potential binding site
1stp -15.1 -14.6
4ts1 -19.4 -14.6
3tpi -25.2 -16.0
1ldm -11.8 -13.6
1cjw -11.7 -18.0
1aid -11.2 -22.2
1ao5 -7.5 -13.1
Able to distinguish catalytic site
Not able
(kcal/mol)
32
Distinction Based on Escape Time
Protein Bound state Best potential binding site
1stp 3.4E9 1.1E7
4ts1 3.8E10 1.8E6
3tpi 1.3E11 5.9E5
1ldm 8.1E5 3.4E6
1cjw 5.4E8 4.2E6
1aid 9.7E5 1.6E8
1ao5 6.6E7 5.7E6
Able to distinguish catalytic site
Not able
( steps)
33
Conclusion
  • Probabilistic roadmaps are a promising tool for
    computing ensemble properties of molecular
    pathways
  • Current work
  • Non-uniform sampling strategies to handle more
    complex molecules
  • More realistic energetic models
  • Extension to molecular dynamic simulation
  • Connection to in-vitro experiments(interaction
    of two proteins)

34
Part II ChainTree A Data Structure for
Efficient Monte Carlo Simulation of Proteins
  • Itay Lotan, Fabian Schwarzer, Dan Halperin1,
    Jean-Claude LatombeComputer Science Department
  • Stanford University
  • 1 Computer Science Department, Tel Aviv University

35
Monte Carlo Simulation (MCS)
  • Used to study thermodynamic and kinetic
    properties of proteins
  • Random walk through conformation space
  • At each attempted step
  • Perturb current conformation at random
  • Accept step with probability
  • Problem How to maintain energy efficiently?

36
Energy Function
  • E S bonded terms S non-bonded terms
  • Bonded terms, e.g. bond length Easy to compute
  • Non-bonded terms, e.g. Van der Waals, depend on
    distances between pairs of atomsExpensive to
    compute, O(n2)

37
Energy Function
  • Non-bonded terms? Use cutoff distance (6 -
    12Ã…)? Only O(n) interacting pairs Halperin
    Overmars 98

Problem How to find interacting pairswithout
enumerating all atom pairs?
38
Grid Method
  • Subdivide space into cubic cells
  • Compute cell that contains each atom center
  • Store results in hash table
  • T(n) time to update grid
  • O(1) time to find interactions for each atom
  • T(n) to find all interactions

Asymptotically optimal in worst-case!
39
Can We Do Better on Average?
  • Proteins are long kinematic chains

40
Proteins Kinematic Structure
torsional dof
  • Angles j, y for backbone and c for
    side-chains
  • Conformational space

41
Can We Do Better on Average?
  • Proteins are long chain kinematics
  • Few DOFs are perturbed at each MC step

How to retrieve unchanged partial sums?
  • Long sub-chains stay rigid at each step
  • Many partial energy sums remain constant

42
Two New Data Structures
  • ChainTree ? Fast detection of interacting atom
    pairs
  • EnergyTree ? Reuse of unchanged partial energy
    sums

43
ChainTree
  • Combination of two hierarchies
  • Transform hierarchy
  • Bounding volume hierarchy

44
ChainTree
  • Combination of two hierarchies
  • Transform hierarchy approximate kinematics of
    protein backbone at successive resolutions

45
ChainTree
  • Combination of two hierarchies
  • Bounding volume hierarchy approximate geometry
    of protein at successive resolutions

(Larsen et al., 00)
46
ChainTree
47
Updating the ChainTree
  • Update path to root
  • Recompute transforms that shortcut change
  • Recompute bounding volumes that contain change

48
Finding Interacting Pairs
vs.
  • Do not search inside rigid sub-chains (unmarked
    nodes)
  • Do not test two nodes with no marked node between
    them

49
Finding Interacting Pairs
vs.
  • Do not search inside rigid sub-chains (unmarked
    nodes)
  • Do not test two nodes with no marked node between
    them

50
Computational Complexity
  • n total number of DOFs in protein backbone
  • k number of simultaneous DOF changes at each
    step of MCS
  • Updating complexity
  • Worst-case complexity of finding all interacting
    pairsbut performs much better in practice!!!

51
EnergyTree
52
EnergyTree
53
Experimental Setup
  • Energy function
  • Van der Waals
  • Electrostatic
  • Attraction between native contacts
  • Cutoff at 12Ã…
  • 300,000 steps MCS
  • Early rejection for large vdW terms

54
Results 1-DOF change
55
Results 5-DOF change
56
Two-Pass ChainTree
57
Conclusion
  • Chain/EnergyTree reduces average time per step in
    MCS of proteins (vs. grid)
  • Exploit chain kinematics of protein
  • Larger speed-up for bigger proteins and for
    smaller number of simultaneous DOF changes

58
What is Computational Biology?
  • Using computers in Biology?
  • Designing efficient algorithms for analyzing
    biological data and simulating biological
    processes?
  • Using Biology to design new algorithms and
    computing hardware?
  • Cultural clash Biology ? classification Compu
    ter Science ? abstraction
  • In any case, Computational Biology will be a
    critical domain for the next 20 years, probably
    the next big thing after the Internet
Write a Comment
User Comments (0)
About PowerShow.com