Algorithms Exploiting the Chain Structure of Proteins - PowerPoint PPT Presentation

About This Presentation
Title:

Algorithms Exploiting the Chain Structure of Proteins

Description:

Fast energy computation during Monte Carlo simulation ... TEG. TGI. TAE. TEI. TAI. Hierarchy of bounding volumes. BB. BA. BH. BG. BF. BE. BD. BC. BCD. BEF ... – PowerPoint PPT presentation

Number of Views:106
Avg rating:3.0/5.0
Slides: 78
Provided by: ItayL7
Category:

less

Transcript and Presenter's Notes

Title: Algorithms Exploiting the Chain Structure of Proteins


1
Algorithms Exploiting the Chain Structure of
Proteins
  • Itay Lotan
  • Computer Science

2
Proteins 101
  • Involved in all functions of our body
    metabolism, motion, defense, etc.

?Michael Levitt
3
Protein representation
  • Torsion angle model
  • Ca model

4
Structure determination
X-ray crystallography
?Bernhard Rupp
5
Outline
  1. Fast energy computation during Monte Carlo
    simulation
  2. Model completion for protein X-ray
    crystallography
  3. Large scale computation of similarity

Exploit specific properties of proteins to
perform the computation efficiently
6
Outline
  1. Fast energy computation during Monte Carlo
    simulation
  2. Model completion for protein X-ray
    crystallography
  3. Large scale computation of similarity

Lotan, Schwarzer, Halperin and Latombe. J.
Comput. Bio. 2004 (to appear)
CS Department, Tel-Aviv University
7
Monte Carlo simulation (MCS)
Popular method for sampling the conformation
space of proteins
  • Estimate thermodynamic quantities
  • Search for low-energy conformations and the
    folded structure

8
MCS How it works
  1. Propose random change in conformation
  1. Compute energy E of new conformation
  2. Accept with probability

Requires gtgt106 steps to sample adequately
9
Energy function
  • Bonded terms
  • Bond lengths
  • Bond angles
  • Dihedral angles
  • Non-bonded terms
  • Van der Waals
  • Electrostatic
  • Heuristic Go models, HP models, etc.

10
Pair-wise interactions
  • Cutoff distance (6 - 12Ã…)
  • Linear number of interactions contribute to
    energy (Halperin Overmars 98)

Challenge Find all interacting pairs without
enumerating all pairs
11
Related work
  • Biology
  • Neighbor lists
  • Verlet 67
  • Brooks et al. 83
  • Grid
  • Quentrec Brot 73
  • Hockney et al. 74
  • Van Gunsteren et al. 84
  • Neighbor lists grid
  • Yip Elber 89
  • Petrella 02
  • Computer Science
  • Bounding volume hierarchies for collision
    detection
  • Gotschalk et al. 96
  • Larsen et al. 00
  • Guibas et al. 02
  • Space partition methods for collision detection
  • Faverjon 84
  • Halperin Overmars 98
  • Collisions detection for chains
  • Halperin et al. 97
  • Guibas et al. 02

12
Grid method
d Cutoff distance
  • Linear complexity
  • Optimal in worst case

13
Contributions
  • Efficient maintenance and self-collision
    detection for kinematic chains
  • Efficient computation of pair-wise interactions
    in MCS of proteins
  • Scheme for caching and reusing partial energy
    sums during MCS
  • MCS software

Much faster than existing algorithm (grid method)
Download at http//robotics.stanford.edu/itayl/
mcs
14
Properties of kinematic chains
  • Small changes ? large effects

15
Properties of kinematic chains
  • Small changes ? large effects

16
Properties of kinematic chains
  • Small changes ? large effects
  • Local changes ? global effects

17
Properties of kinematic chains
  • Small changes ? large effects
  • Local changes ? global effects
  • Few DoF changes ? long rigid sub-chains

18
Properties of kinematic chains
  • Small changes ? large effects
  • Local changes ? global effects
  • Few DoF changes ? long rigid sub-chains

19
ChainTree A tale of two hierarchies
  • Transform hierarchy approximates kinematics of
    protein backbone at successive resolutions
  • Bounding volume hierarchy approximates geometry
    of protein at successive resolutions

20
Hierarchy of transforms
21
Hierarchy of transforms
22
Hierarchy of bounding volumes
23
The ChainTree
TAI BAH
TAE BAD
TEI BEH
TAC BAB
TCE BCD
TEG BEF
TGI BGH
TAB BA
TBC BB
TCD BC
TDE BD
TEF BE
TFG BF
TGH BG
THI BH
24
Updating the ChainTree
TAI BAH
TAE BAD
TEI BEH
TAC BAB
TCE BCD
TEG BEF
TGI BGH
TAB BA
TBC BB
TCD BC
TDE BD
TEF BE
TFG BF
TGH BG
THI BH
25
Computing the energy
Recursively search ChainTree for interactions
  • Pruning rules
  • Prune search when distance between bounding
    volumes is more than cutoff distance
  • Do not search inside rigid sub-chains

26
Computing the energy


P
27
Computing the energy


P


N
28
Computing the energy


P




N
O
29
Computing the energy


P






N-O
N
O
30
Computing the energy


P






N-O
N
O






J-K
J
K




A-C
C




A-D
C-D




B-C
D


B-D
31
Computing the energy


P






N
N-O
O




















J-K
K
K-L
J-M
J-L
K-M
J
L
L-M
M


















A-G
C
A-E
C-E
C-G
A
E
E-G
H




















A-H
A-D
C-D
A-F
C-F
C-H
A-B
E-F
E-H
H-G
















B-G
D
B-E
B
D-E
F
F-G
G










B-H
B-D
B-F
D-F
F-H
32
Computing the energy
E(O)


P






N
N-O
O




















J-K
K
K-L
J-M
J-L
K-M
J
L
L-M
M


















A-G
C
A-E
C-E
C-G
A
E
E-G
H




















A-H
A-D
C-D
A-F
C-F
C-H
A-B
E-F
E-H
H-G
















B-G
D
B-E
B
D-E
F
F-G
G










B-H
B-D
B-F
D-F
F-H
33
Computing the energy
  • Only changed interactions are found
  • Reuse unaffected partial sums
  • Better performance for
  • Longer proteins
  • Fewer simultaneous changes

34
Computational complexity
  • Updating
  • Searching

worst case bound
Much faster in practice
35
Test
1-DoF change
5-DoF change
68 res.
144 res.
374 res.
755 res.
68 res.
144 res.
374 res.
755 res.
36
Simulation of a-Synuclein
  • 140 res. protein implicated in Parkinsons
    disease
  • Multi-canonical Replica-exchange MC regime
  • Over 1000 CPU days of simulation
  • Study conformations at room temp.
  • Joint work with Vijay Pande

37
Outline
  1. Fast energy computation during Monte Carlo
    simulation
  2. Model completion for protein X-ray
    crystallography
  3. Large scale computation of similarity

Lotan, van den Bedem, Deacon and Latombe, WAFR
2004 van den Bedem, Lotan, Latombe and Deacon,
submitted to Acta. Cryst. D
Joint Center for Structural Genomics (JCSG) at
SSRL
38
Protein Structure Initiative
152K sequenced genes (30K/year)
25K determined structures (3.6K/year)
  • Reduce cost and time to determine protein
    structure
  • Develop software to automatically interpret the
    electron density map (EDM)

39
EDM
  • 3-D image of atomic structure
  • High value (electron density) at atom centers
  • Density falls off exponentially away from center

40
Automated model building
  • 90 built at high resolution (2Ã…)
  • 66 built at medium to low resolution (2.5
    2.8Ã…)
  • Gaps left at noisy areas in EDM (blurred density)

Gaps need to be resolved manually
41
The Fragment completion problem
  • Input
  • EDM
  • Partially resolved structure
  • 2 Anchor residues
  • Length of missing fragment
  • Output
  • A small number of candidate structures for
    missing fragment

A robotics inverse kinematics (IK) problem
42
Related work
  • Biology/Crystallography
  • Exact IK solvers
  • Wedemeyer Scheraga 99
  • Coutsias et al. 04
  • Optimization IK solvers
  • Fine et al. 86
  • Canutescu Dunbrack Jr. 03
  • Ab-initio loop closure
  • Fiser et al. 00
  • Kolodny et al. 03
  • Database search loop closure
  • Jones Thirup 86
  • Van Vlijman Karplus 97
  • Semi-automatic tools
  • Jones Kjeldgaard 97
  • Oldfield 01
  • Computer Science
  • Exact IK solvers
  • Manocha Canny 94
  • Manocha et al. 95
  • Optimization IK solvers
  • Wang Chen 91
  • Redundant manipulators
  • Khatib 87
  • Burdick 89
  • Motion planning for closed loops
  • Han Amato 00
  • Yakey et al. 01
  • Cortes et al. 02, 04

43
Contributions
  • Sampling of gap-closing fragments biased by the
    EDM
  • Refinement of fit to density without breaking
    closure
  • Fully automatic fragment completion software for
    X-ray Crystallography

Novel application of a combination of inverse
kinematics techniques
44
Two-stage IK method
  • Candidate generations Optimize density fit while
    closing the gap
  • Refinement Optimize closed fragments without
    breaking closure

45
Stage 1 candidate generation
  • Generate random conformation
  • Close using Cyclic Coordinate Descent (CCD) (Wang
    Chen 91, Canutescu Dunbrack Jr. 03)

46
Stage 1 candidate generation
  • Generate random conformation
  • Close using Cyclic Coordinate Descent (CCD) (Wang
    Chen 91, Canutescu Dunbrack 03)

47
Stage 1 candidate generation
  • Generate random conformation
  • Close using Cyclic Coordinate Descent (CCD) (Wang
    Chen 91, Canutescu Dunbrack 03)

48
Stage 1 candidate generation
  • Generate random conformation
  • Close using Cyclic Coordinate Descent (CCD) (Wang
    Chen 91, Canutescu Dunbrack 03)

49
Stage 1 candidate generation
  • Generate random conformation
  • Close using Cyclic Coordinate Descent (CCD) (Wang
    Chen 91, Canutescu Dunbrack 03)

CCD moves biased toward high-density
50
Stage 2 refinement
  • Target function T (goodness of fit to EDM)
  • Minimize T while retaining closure
  • Closed conformations lie on Self-motion manifold
    of lower dimension

1-D manifold
51
Stage 2 null-space minimization
  • Jacobian linear relation between joint
    velocities and end-effector linear and angular
    velocity .

Compute minimizing move using
N orthonormal basis of null space
52
Stage 2 minimization with closure
  1. Choose sub-fragment with n gt 6 DOFs
  2. Compute using SVD
  3. Project onto
  4. Move until minimum is reached or closure is broken

Escape from local minima using Monte Carlo with
simulated annealing
53
Test artificial gaps
  • Completed structure (gold standard)
  • Good density (1.6Ã… res.)
  • Remove fragment and rebuild

Length High (2.0Ã…) Medium (2.5Ã…) Low (2.8Ã…)
4 100 (0.14Ã…) 100 (0.19Ã…) 100 (0.32Ã…)
8 100 (0.18Ã…) 100 (0.23Ã…) 100 (0.36Ã…)
12 91 (0.51Ã…) 96 (0.41Ã…) 91 (0.52Ã…)
15 91 (0.53Ã…) 88 (0.63Ã…) 83 (0.76Ã…)
Produced by H. van den Bedem
54
Test true gaps
  • Completed structure (gold standard)
  • O.K. density (2.4Ã… res.)
  • 6 gaps left by model builder (RESOLVE)

Length Top scorer Lowest error
4 0.44Ã… 0.40Ã…
4 0.22Ã… 0.22Ã…
5 0.78Ã… 0.78Ã…
5 0.36Ã… 0.36Ã…
7 0.72Ã… 0.66Ã…
10 0.43Ã… 0.43Ã…
Produced by H. van den Bedem
55
Example TM0423
PDB 1KQ3, 376 res. 2.0Ã… resolution 12 residue
gap Best 0.3Ã… aaRMSD
56
Example TM0813
PDB 1J5X, 342 res. 2.8Ã… resolution 12 residue
gap Best 0.6Ã… aaRMSD
GLU-83
GLY-96
57
Example TM0813
PDB 1J5X, 342 res. 2.8Ã… resolution 12 residue
gap Best 0.6Ã… aaRMSD
GLU-83
GLY-96
58
Example TM0813
PDB 1J5X, 342 res. 2.8Ã… resolution 12 residue
gap Best 0.6Ã… aaRMSD
GLU-83
GLY-96
59
Outline
  1. Fast energy computation during Monte Carlo
    simulation
  2. Model completion for protein X-ray
    crystallography
  3. Large scale computation of similarity

Lotan and Schwarzer, J. Comput. Biol. 11(23)
299317, 2004
60
Large scale similarity
  • Analysis of simulation trajectories
  • Molecular dynamics simulation
  • Monte Carlo simulation
  • Clustering of decoy sets (e.g., Shortle et al.
    98)
  • Stochastic Roadmap Simulation (Apaydin et
    al. 03)

Fast similarity measures are needed for analyzing
large sets of conformations
61
Contributions
  • Uniform simplification of protein structure for
    similarity computation
  • Speed-up existing similarity measures
  • Method offers trade-off between speed and
    precision
  • Efficient computation of nearest neighbors

62
m-Averaged approximation
  • Cut chain into pieces of length m
  • Replace each sequence of m Ca atoms by its
    centroid

3n coordinates
3n/m coordinates
63
Chains and distances
  • Proximity along the chain entails spatial
    proximity
  • Far away links along the chain are spatially
    distant (on average)

ci
cj
64
Similarity measures
65
Evaluation test sets
8 structurally diverse proteins (54 -76 residues)
  1. Decoy sets conformations from the Park-Levitt
    set (Park et al, 97), N 10,000
  2. Random sets conformations generated by the
    program FOLDTRAJ (Feldman Hogue, 00), N 5000

66
Evaluation results decoy sets
m cRMS dRMS
3 0.99 0.96-0.98
4 0.98-0.99 0.94-0.97
6 0.92-0.99 0.78-0.93
9 0.81-0.98 0.65-0.96
12 0.54-0.92 0.52-0.69
  • 9x for cRMS (m 9)
  • 36x for dRMS (m 6)

Higher correlation for random sets!
67
k Nearest-neighbors problem
Given a set S of conformations of a protein and a
query conformation c, find the k conformations in
S most similar to c
  • Brute force complexity
  • for all

N size of S L time to compute
similarity
68
Efficient nearest neighbor search
  • kd-tree time per query
  • Limitations
  • Requires Minkowski metric
  • Less efficient when dgt20

cRMS is not a Minkowski metric dRMS has
dimensionality of
Reduce dRMS dimensionality using SVD
69
Reduction using SVD
  1. Stack m-averaged distance matrices as vectors
  2. Compute the SVD of entire set
  3. Project onto principle components

dRMS is reduced to ?20 dimensions
Complexity of SVD
70
Testing the method
  • Use decoy sets (N 10,000) and random sets (N
    5,000)
  • m-averaging with (m 4)
  • Project onto 16 PCs for decoys, 12 PCs for random
    sets
  • Find k 10, 25, 100 NNs for 250 conformations in
    each set

71
Results
  • Decoy sets
  • 77 correct
  • Furthest NN off by 10 - 15 (0.7Ã… 1.5Ã…)
  • 4k approximate NNs contain all true k NNs
  • Random sets slightly better results

Use reduction as fast filter
72
Running Time
  • N 100,000, m4, PC 16
  • Find k 100 for each conformation

Brute-force
84 hours Brute-force m-averaging
4.8 hours Brute-force m-averaging SVD 41
minutes kd-tree m-averaging SVD 19
minutes
kd-tree has more impact for larger sets
73
Contributions
  • Energy computation in MCS
  • Efficient maintenance and self-collision
    detection for kinematic chains
  • Efficient computation of pair-wise interactions
    in MCS of proteins
  • Caching scheme for partial energy sums during MCS
  • MCS software
  • Model completion in X-ray crystallography
  • sampling of gap-closing fragments biased towards
    the EDM
  • Refinement of fit to density without breaking
    closure
  • Fully automatic fragment completion software
  • Similarity computation for large conformation
    sets
  • Uniform simplification of protein structure for
    similarity computation
  • Speed-up existing similarity measures
  • Method offers trade-off between speed and
    precision
  • Efficient computation of nearest neighbors

74
Take-home message
  • Taking into account physical properties of
    proteins can lead to efficient algorithms for a
    wide variety of applications in structural biology

75
Outlook
computer scientist
biophysicist/biochemist
Models that simplify the physics and chemistry of
proteins
Algorithms that exploit properties of protein
models
Develop simplified protein models that lend
themselves to efficient computations
76
Acknowledgements
  • Jean-Claude Latombe
  • Vijay Pande
  • Michael Levitt
  • Leo Guibas
  • Axel Brunger, Balaji Prabhakar, Serafim Batzoglou
  • Fabian Schwarzer, Henry van den Bedem, Dan
    Halperin
  • Carlo Tomasi
  • Daniel Russakoff, Rachel Kolodny
  • Latombe group
  • Serkan Apaydin, Tim Bretl, Joel Brown, Phil Fong,
    Mitul Saha, Pekka Isto, Kris Hauser
  • Pande group
  • Bojan Zagrovic, Stefan Larson, Lillian Chong,
    Young Min Rhee, Sidney Elmer, Chris Snow, Guha
    Jayachandran, Eric Sorin, Sung-Joo Lee, Jim
    Cladwell, Michael Shirts, Nina Singhal, Relly
    Brandman, Vishal Vaidyanathan, Nick Kelley, Mark
    Engelhardt
  • Levitt Group
  • Patrice Koehl, Tanya Raschke, Erik Lindahl

77
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com