Algorithms Exploiting the Chain Structure of Proteins - PowerPoint PPT Presentation

About This Presentation

Title:

Algorithms Exploiting the Chain Structure of Proteins

Description:

Fast energy computation during Monte Carlo simulation ... TEG. TGI. TAE. TEI. TAI. Hierarchy of bounding volumes. BB. BA. BH. BG. BF. BE. BD. BC. BCD. BEF ... – PowerPoint PPT presentation

Number of Views:106

Avg rating:3.0/5.0

Slides: 78

Provided by: ItayL7

Learn more at: http://www-cs-students.stanford.edu

Category:

more less

Transcript and Presenter's Notes

Title: Algorithms Exploiting the Chain Structure of Proteins

1
Algorithms Exploiting the Chain Structure of
Proteins

Itay Lotan
Computer Science

2
Proteins 101

Involved in all functions of our body
metabolism, motion, defense, etc.

?Michael Levitt
3
Protein representation

Torsion angle model
Ca model

4
Structure determination
X-ray crystallography
?Bernhard Rupp
5
Outline

Fast energy computation during Monte Carlo
simulation
Model completion for protein X-ray
crystallography
Large scale computation of similarity

Exploit specific properties of proteins to
perform the computation efficiently
6
Outline

Fast energy computation during Monte Carlo
simulation
Model completion for protein X-ray
crystallography
Large scale computation of similarity

Lotan, Schwarzer, Halperin and Latombe. J.
Comput. Bio. 2004 (to appear)
CS Department, Tel-Aviv University
7
Monte Carlo simulation (MCS)
Popular method for sampling the conformation
space of proteins

Estimate thermodynamic quantities
Search for low-energy conformations and the
folded structure

8
MCS How it works

Propose random change in conformation

Compute energy E of new conformation
Accept with probability

Requires gtgt106 steps to sample adequately
9
Energy function

Bonded terms
Bond lengths
Bond angles
Dihedral angles
Non-bonded terms
Van der Waals
Electrostatic
Heuristic Go models, HP models, etc.

10
Pair-wise interactions

Cutoff distance (6 - 12Å)
Linear number of interactions contribute to
energy (Halperin Overmars 98)

Challenge Find all interacting pairs without
enumerating all pairs
11
Related work

Biology
Neighbor lists
Verlet 67
Brooks et al. 83
Grid
Quentrec Brot 73
Hockney et al. 74
Van Gunsteren et al. 84
Neighbor lists grid
Yip Elber 89
Petrella 02

Computer Science
Bounding volume hierarchies for collision
detection
Gotschalk et al. 96
Larsen et al. 00
Guibas et al. 02
Space partition methods for collision detection
Faverjon 84
Halperin Overmars 98
Collisions detection for chains
Halperin et al. 97
Guibas et al. 02

12
Grid method
d Cutoff distance

Linear complexity
Optimal in worst case

13
Contributions

Efficient maintenance and self-collision
detection for kinematic chains
Efficient computation of pair-wise interactions
in MCS of proteins
Scheme for caching and reusing partial energy
sums during MCS
MCS software

Much faster than existing algorithm (grid method)
Download at http//robotics.stanford.edu/itayl/
mcs
14
Properties of kinematic chains

Small changes ? large effects

15
Properties of kinematic chains

Small changes ? large effects

16
Properties of kinematic chains

Small changes ? large effects
Local changes ? global effects

17
Properties of kinematic chains

Small changes ? large effects
Local changes ? global effects
Few DoF changes ? long rigid sub-chains

18
Properties of kinematic chains

Small changes ? large effects
Local changes ? global effects
Few DoF changes ? long rigid sub-chains

19
ChainTree A tale of two hierarchies

Transform hierarchy approximates kinematics of
protein backbone at successive resolutions
Bounding volume hierarchy approximates geometry
of protein at successive resolutions

20
Hierarchy of transforms
21
Hierarchy of transforms
22
Hierarchy of bounding volumes
23
The ChainTree
TAI BAH
TAE BAD
TEI BEH
TAC BAB
TCE BCD
TEG BEF
TGI BGH
TAB BA
TBC BB
TCD BC
TDE BD
TEF BE
TFG BF
TGH BG
THI BH
24
Updating the ChainTree
TAI BAH
TAE BAD
TEI BEH
TAC BAB
TCE BCD
TEG BEF
TGI BGH
TAB BA
TBC BB
TCD BC
TDE BD
TEF BE
TFG BF
TGH BG
THI BH
25
Computing the energy
Recursively search ChainTree for interactions

Pruning rules
Prune search when distance between bounding
volumes is more than cutoff distance
Do not search inside rigid sub-chains

26
Computing the energy

P
27
Computing the energy

P

N
28
Computing the energy

P

N
O
29
Computing the energy

P

N-O
N
O
30
Computing the energy

P

N-O
N
O

J-K
J
K

A-C
C

A-D
C-D

B-C
D

B-D
31
Computing the energy

P

N
N-O
O

J-K
K
K-L
J-M
J-L
K-M
J
L
L-M
M

A-G
C
A-E
C-E
C-G
A
E
E-G
H

A-H
A-D
C-D
A-F
C-F
C-H
A-B
E-F
E-H
H-G

B-G
D
B-E
B
D-E
F
F-G
G

B-H
B-D
B-F
D-F
F-H
32
Computing the energy
E(O)

P

N
N-O
O

J-K
K
K-L
J-M
J-L
K-M
J
L
L-M
M

A-G
C
A-E
C-E
C-G
A
E
E-G
H

A-H
A-D
C-D
A-F
C-F
C-H
A-B
E-F
E-H
H-G

B-G
D
B-E
B
D-E
F
F-G
G

B-H
B-D
B-F
D-F
F-H
33
Computing the energy

Only changed interactions are found
Reuse unaffected partial sums
Better performance for
Longer proteins
Fewer simultaneous changes

34
Computational complexity

Updating
Searching

worst case bound
Much faster in practice
35
Test
1-DoF change
5-DoF change
68 res.
144 res.
374 res.
755 res.
68 res.
144 res.
374 res.
755 res.
36
Simulation of a-Synuclein

140 res. protein implicated in Parkinsons
disease
Multi-canonical Replica-exchange MC regime
Over 1000 CPU days of simulation
Study conformations at room temp.
Joint work with Vijay Pande

37
Outline

Fast energy computation during Monte Carlo
simulation
Model completion for protein X-ray
crystallography
Large scale computation of similarity

Lotan, van den Bedem, Deacon and Latombe, WAFR
2004 van den Bedem, Lotan, Latombe and Deacon,
submitted to Acta. Cryst. D
Joint Center for Structural Genomics (JCSG) at
SSRL
38
Protein Structure Initiative
152K sequenced genes (30K/year)
25K determined structures (3.6K/year)

Reduce cost and time to determine protein
structure

Develop software to automatically interpret the
electron density map (EDM)

39
EDM

3-D image of atomic structure
High value (electron density) at atom centers
Density falls off exponentially away from center

40
Automated model building

90 built at high resolution (2Å)
66 built at medium to low resolution (2.5
2.8Å)
Gaps left at noisy areas in EDM (blurred density)

Gaps need to be resolved manually
41
The Fragment completion problem

Input
EDM
Partially resolved structure
2 Anchor residues
Length of missing fragment
Output
A small number of candidate structures for
missing fragment

A robotics inverse kinematics (IK) problem
42
Related work

Biology/Crystallography
Exact IK solvers
Wedemeyer Scheraga 99
Coutsias et al. 04
Optimization IK solvers
Fine et al. 86
Canutescu Dunbrack Jr. 03
Ab-initio loop closure
Fiser et al. 00
Kolodny et al. 03
Database search loop closure
Jones Thirup 86
Van Vlijman Karplus 97
Semi-automatic tools
Jones Kjeldgaard 97
Oldfield 01

Computer Science
Exact IK solvers
Manocha Canny 94
Manocha et al. 95
Optimization IK solvers
Wang Chen 91
Redundant manipulators
Khatib 87
Burdick 89
Motion planning for closed loops
Han Amato 00
Yakey et al. 01
Cortes et al. 02, 04

43
Contributions

Sampling of gap-closing fragments biased by the
EDM
Refinement of fit to density without breaking
closure
Fully automatic fragment completion software for
X-ray Crystallography

Novel application of a combination of inverse
kinematics techniques
44
Two-stage IK method

Candidate generations Optimize density fit while
closing the gap
Refinement Optimize closed fragments without
breaking closure

45
Stage 1 candidate generation

Generate random conformation
Close using Cyclic Coordinate Descent (CCD) (Wang
Chen 91, Canutescu Dunbrack Jr. 03)

46
Stage 1 candidate generation

Generate random conformation
Close using Cyclic Coordinate Descent (CCD) (Wang
Chen 91, Canutescu Dunbrack 03)

47
Stage 1 candidate generation

Generate random conformation
Close using Cyclic Coordinate Descent (CCD) (Wang
Chen 91, Canutescu Dunbrack 03)

48
Stage 1 candidate generation

Generate random conformation
Close using Cyclic Coordinate Descent (CCD) (Wang
Chen 91, Canutescu Dunbrack 03)

49
Stage 1 candidate generation

Generate random conformation
Close using Cyclic Coordinate Descent (CCD) (Wang
Chen 91, Canutescu Dunbrack 03)

CCD moves biased toward high-density
50
Stage 2 refinement

Target function T (goodness of fit to EDM)
Minimize T while retaining closure
Closed conformations lie on Self-motion manifold
of lower dimension

1-D manifold
51
Stage 2 null-space minimization

Jacobian linear relation between joint
velocities and end-effector linear and angular
velocity .

Compute minimizing move using
N orthonormal basis of null space
52
Stage 2 minimization with closure

Choose sub-fragment with n gt 6 DOFs
Compute using SVD
Project onto
Move until minimum is reached or closure is broken

Escape from local minima using Monte Carlo with
simulated annealing
53
Test artificial gaps

Completed structure (gold standard)
Good density (1.6Å res.)
Remove fragment and rebuild

Length High (2.0Å) Medium (2.5Å) Low (2.8Å)
4 100 (0.14Å) 100 (0.19Å) 100 (0.32Å)
8 100 (0.18Å) 100 (0.23Å) 100 (0.36Å)
12 91 (0.51Å) 96 (0.41Å) 91 (0.52Å)
15 91 (0.53Å) 88 (0.63Å) 83 (0.76Å)
Produced by H. van den Bedem
54
Test true gaps

Completed structure (gold standard)
O.K. density (2.4Å res.)
6 gaps left by model builder (RESOLVE)

Length Top scorer Lowest error
4 0.44Å 0.40Å
4 0.22Å 0.22Å
5 0.78Å 0.78Å
5 0.36Å 0.36Å
7 0.72Å 0.66Å
10 0.43Å 0.43Å
Produced by H. van den Bedem
55
Example TM0423
PDB 1KQ3, 376 res. 2.0Å resolution 12 residue
gap Best 0.3Å aaRMSD
56
Example TM0813
PDB 1J5X, 342 res. 2.8Å resolution 12 residue
gap Best 0.6Å aaRMSD
GLU-83
GLY-96
57
Example TM0813
PDB 1J5X, 342 res. 2.8Å resolution 12 residue
gap Best 0.6Å aaRMSD
GLU-83
GLY-96
58
Example TM0813
PDB 1J5X, 342 res. 2.8Å resolution 12 residue
gap Best 0.6Å aaRMSD
GLU-83
GLY-96
59
Outline

Fast energy computation during Monte Carlo
simulation
Model completion for protein X-ray
crystallography
Large scale computation of similarity

Lotan and Schwarzer, J. Comput. Biol. 11(23)
299317, 2004
60
Large scale similarity

Analysis of simulation trajectories
Molecular dynamics simulation
Monte Carlo simulation
Clustering of decoy sets (e.g., Shortle et al.
98)
Stochastic Roadmap Simulation (Apaydin et
al. 03)

Fast similarity measures are needed for analyzing
large sets of conformations
61
Contributions

Uniform simplification of protein structure for
similarity computation
Speed-up existing similarity measures
Method offers trade-off between speed and
precision
Efficient computation of nearest neighbors

62
m-Averaged approximation

Cut chain into pieces of length m
Replace each sequence of m Ca atoms by its
centroid

3n coordinates
3n/m coordinates
63
Chains and distances

Proximity along the chain entails spatial
proximity

Far away links along the chain are spatially
distant (on average)

ci
cj
64
Similarity measures
65
Evaluation test sets
8 structurally diverse proteins (54 -76 residues)

Decoy sets conformations from the Park-Levitt
set (Park et al, 97), N 10,000
Random sets conformations generated by the
program FOLDTRAJ (Feldman Hogue, 00), N 5000

66
Evaluation results decoy sets
m cRMS dRMS
3 0.99 0.96-0.98
4 0.98-0.99 0.94-0.97
6 0.92-0.99 0.78-0.93
9 0.81-0.98 0.65-0.96
12 0.54-0.92 0.52-0.69

9x for cRMS (m 9)
36x for dRMS (m 6)

Higher correlation for random sets!
67
k Nearest-neighbors problem
Given a set S of conformations of a protein and a
query conformation c, find the k conformations in
S most similar to c

Brute force complexity
for all

N size of S L time to compute
similarity
68
Efficient nearest neighbor search

kd-tree time per query
Limitations
Requires Minkowski metric
Less efficient when dgt20

cRMS is not a Minkowski metric dRMS has
dimensionality of
Reduce dRMS dimensionality using SVD
69
Reduction using SVD

Stack m-averaged distance matrices as vectors
Compute the SVD of entire set
Project onto principle components

dRMS is reduced to ?20 dimensions
Complexity of SVD
70
Testing the method

Use decoy sets (N 10,000) and random sets (N
5,000)
m-averaging with (m 4)
Project onto 16 PCs for decoys, 12 PCs for random
sets
Find k 10, 25, 100 NNs for 250 conformations in
each set

71
Results

Decoy sets
77 correct
Furthest NN off by 10 - 15 (0.7Å 1.5Å)
4k approximate NNs contain all true k NNs
Random sets slightly better results

Use reduction as fast filter
72
Running Time

N 100,000, m4, PC 16
Find k 100 for each conformation

Brute-force
84 hours Brute-force m-averaging
4.8 hours Brute-force m-averaging SVD 41
minutes kd-tree m-averaging SVD 19
minutes
kd-tree has more impact for larger sets
73
Contributions

Energy computation in MCS
Efficient maintenance and self-collision
detection for kinematic chains
Efficient computation of pair-wise interactions
in MCS of proteins
Caching scheme for partial energy sums during MCS
MCS software
Model completion in X-ray crystallography
sampling of gap-closing fragments biased towards
the EDM
Refinement of fit to density without breaking
closure
Fully automatic fragment completion software
Similarity computation for large conformation
sets
Uniform simplification of protein structure for
similarity computation
Speed-up existing similarity measures
Method offers trade-off between speed and
precision
Efficient computation of nearest neighbors

74
Take-home message

Taking into account physical properties of
proteins can lead to efficient algorithms for a
wide variety of applications in structural biology

75
Outlook
computer scientist
biophysicist/biochemist
Models that simplify the physics and chemistry of
proteins
Algorithms that exploit properties of protein
models
Develop simplified protein models that lend
themselves to efficient computations
76
Acknowledgements

Jean-Claude Latombe
Vijay Pande
Michael Levitt
Leo Guibas
Axel Brunger, Balaji Prabhakar, Serafim Batzoglou
Fabian Schwarzer, Henry van den Bedem, Dan
Halperin
Carlo Tomasi
Daniel Russakoff, Rachel Kolodny
Latombe group
Serkan Apaydin, Tim Bretl, Joel Brown, Phil Fong,
Mitul Saha, Pekka Isto, Kris Hauser
Pande group
Bojan Zagrovic, Stefan Larson, Lillian Chong,
Young Min Rhee, Sidney Elmer, Chris Snow, Guha
Jayachandran, Eric Sorin, Sung-Joo Lee, Jim
Cladwell, Michael Shirts, Nina Singhal, Relly
Brandman, Vishal Vaidyanathan, Nick Kelley, Mark
Engelhardt
Levitt Group
Patrice Koehl, Tanya Raschke, Erik Lindahl