Sidechain Placement and Protein Design - PowerPoint PPT Presentation

1 / 40

About This Presentation

Title:

Sidechain Placement and Protein Design

Description:

Ribose. YPVDLKLVVKQ binding protein. Modify sequence TNT. to change structure binding ... Side chain angles are defined moving outward from the backbone, starting ... – PowerPoint PPT presentation

Number of Views:97

Avg rating:3.0/5.0

Slides: 41

Provided by: Leave

Category:

more less

Transcript and Presenter's Notes

Title: Sidechain Placement and Protein Design

1
Sidechain Placement and Protein Design

GCMB07, 2 May

2
Protein design

Sequence ? Structure ? FunctionKDTIALVVST
Ribose YPVDLKLVVKQ binding protein
Modify sequence TNTto change structure
bindingand function Looger03
or behavior Ambroggio06 folding order

3
Protein Design or Redesign

Create an amino acid sequence that folds to a
stable protein and performs a desired function
Avoid
Sampling all sequences
Solving protein folding
Relying on molecular dynamics
A successful design strategy build on an
existing structure
Scaffold backbone from a known folded structure
Redesign 20 residues
Find side chains that fit

4
Outline

Sidechain Rotamers Rotamer Libraries
Algorithms for Sidechain Placement
Brute Force
Dead End Elimination
Simulated Annealing
Stochastic Mean Field
Dynamic Programming
A Biased View of Protein Structure Design
How is design done?
Why is it successful?

5
Protein Structure

Chemical
1-Dimensional Sequence of amino acids
Two components for each amino acid
Backbone (NCaCO)
Side chain (residue)
Placed residue a position in an amino acid
sequence

S
OH
N
N
H2N
MSS
MSW
O
O
6
Side chain geometry

Conformation flexibility from dihedral angles
Side chain internal geometry
Bond angles and bond lengths fixed
Dihedrals c1, c2, may rotate
Rotamers rotational isomers
Side chains have preferred conformations
Prefer dihedrals around 60o, 180o and -60o
Rotamer Library set of dihedral angles
Ponder87, Dunbrack93, Lovel2000

7
Side chain conformation
side chains differ in size ( of atoms) and
degrees of freedom ( of c angles)
N
N
?2
?1
8
Serine c1 distribution
a chosen combination of side chain torsion
angles c1, c2, etc. for a residue is known
as a rotamer.
9
Side chain conformations--canonical staggered
forms
Newman projections for c1 of glutamate
glutamate
ttrans, ggauche
name of conformation
Side chain angles are defined moving outward from
the backbone, starting with the N atom so the c1
angle is NCaCbCg, the c2 angle is CaCbCg Cd
...
IUPAC nomenclature http//www.chem.qmw.ac.uk/iupa
c/misc/biop.html
10
Backbone independent rotamer library

Dunbrack Cohen, 1997

11
What do rotamer libraries provide? J. Meiler07

Rotamer libraries significantly reduce the number
of conformations that need to be evaluated during
the search.
This is done with almost no risk of missing the
real conformations.
Even small libraries of about 100-150 rotamers
cover about 96-97 of the conformations actually
found in protein structures.
The probabilities of each rotamer in the library
provide estimates of the potential energy due to
interactions within the side chain and with the
local backbone atoms, using the Boltzmann
distribution E ? ln(P)

12
Side chain geometry

Conformation flexibility from dihedral angles
Side chain internal geometry
Bond angles and bond lengths fixed
Dihedrals c1, c2, may rotate
Rotamers rotational isomers
Side chains have preferred conformations
Prefer dihedrals around 60o, 180o and -60o
Rotamer Library set of dihedral angles
Ponder87, Dunbrack93, Lovel2000
http//dunbrack.fccc.edu/bbdep/bbdepdownload.php
(Backbone dependent and independent libraries)
http//kinemage.biochem.duke.edu/databases/rotamer
.html (Backbone independent library)

13
Rotemers in crystallographic refinement
Fit structure to electron density from x-ray
diffraction

Red indicate clashes w/ added hydrogen atoms

better choice of side chain
14
Outline

Sidechain Rotamers Rotamer Libraries
Algorithms for Sidechain Placement
Brute Force Search
Dead End Elimination
Simulated Annealing
Stochastic Mean Field
Dynamic Programming

15
Side Chain Placement Problem

Given
A fixed protein backbone
A set of fixed (background) residues
A set of changing (molten) residues
A list of allowed amino acids for each molten
residue
A rotamer library
A pairwise decomposable energy function
Find the assignment of rotamers to the molten
residues, S, that minimizes the energy function

Kinemage rotamers for Ubiquitin surface residues
16
Energy Functions

f Protein Structure ? ?
Lennard-Jones
van der Waals attractive energies
atom overlap repulsive overlap
Electrostatics
Solvent Effects
Hydrogen bonds
Often pairwise decomposable
sum of atom-pair or rotamer-pair interaction
energies

17
Side Chain Placement Problem

Find the assignment of rotamers to the molten
residues, S, that minimizes the energy function
Functions stated in terms of rotamer energies
rotamer / background energy
rotamer pair energies

Esingle
Epair
18
Side Chain Placement Problem

NP-Complete
Reduction from SAT Pierce2002
Techniques
Optimality Guarantee
Dead-End Elimination Desmet92, Goldstein94,
Looger2001
Integer Linear Programming Erickson2001
Branch and Bound Gordon99, Canutescu2003
Dynamic Programming Leaver-Fay2005
No Optimality Guarantee
Genetic Algorithms Jones94
Simulated Annealing Holm92,Hellinga94,Kuhlman03
Self-Consistent Mean Field Koehl96

19
Dead End Elimination (DEE)

Reduce the search space without losing the Global
Minimum Energy Conformation (GMEC).
Eliminates rotamers which cannot be in the GMEC,
using more accurate (and more computationally
expensive) upper and lower bounds.
Uses brute force search on rotamers remaining.
Typically assumes that the scoring function can
be expressed as a sum of pair-wise interactions

20
A first, simple condition for elimination

A rotamer can be eliminated for a residue when
the minimum (best) energy it obtains by
interaction with other rotamers is still
higher (worse) than the maximum energy of some
other rotamer

21
The Goldstein improvement

A rotamer can be safely eliminated when there
exists a rotamer that has lower (better) energy
for each given environment.
This criteria is more powerful, and typically
requires though more computational time.

22
Even more powerful criteria can be obtained with
even more computation

A rotamer can be safely eliminated when, for each
environment, there exists some rotamer that has
lower (better) energy.

23
Dynamic Programming via an Interaction Graph

Surface residues on Ubiquitins b-sheet

Interaction Graph defined by Rosettas
energy function
24
Interaction Graph

G V, E, a multi-hypergraph
vertices ? molten residues v
state space ? rotamers for a residue S(v)
edge ? possibility of residue interaction e ?V
scoring function ? interaction energy fe ?S(v)
? ?

v?e
Hypergraph
Graph
25
Interaction Graph Evaluation (Pairwise case)

For G V, E, min
Each vertex, v, has a function to capture
interactions with the background fv S(v) ? R
Each pair of interacting vertices, u, v,
defines an edge with a function to capture pair
interactions fu,v S(u) x S(v) ? R
Given an interaction graph, GV,E, find the
state assignment S that minimizes Sw?V?E fw

26
Bottom Up Dynamic Programming

Eliminate node v
Let Ev be the edges incident upon v
Let Nv be the neighbors of v
For each edge e ? Ev with scoring function fe,
let fe,vs be edge e s scoring function with
vertex v fixed in state s
Create a new hyperedge incident upon Nv.
Compute fNv min s ? S(v) ? e ? Ev fe,vs
Remove v from graph

27
Scoring Function Representation Tables
u
Edge e u,v
S(v)
S(u)
v
f
g
h
i
j
a
b
c
d
e
28
Scoring Function Representation Tables
w
Edge e u,v,w
v
u
29
Experiments and Results

Rotamer Relaxation Task
Sequence fixed choose new rotamers for each
residue
Redesign Task
Search of conformation and sequence spaces.
Ubiquitins 15 surface residues
Large rotamer library
Relaxation, 32 states per vertex, tw-4
interaction graph
Redesign, 680 states per vertex, tw-3 interaction
graph (drop one edge)

Running Time Memory
Relaxation 200 ms (small)
Redesign 15.99 hrs 3.7 GB
30
Dynamic Programming for Hydrogen Placement

Dynamic programming (DP) limited by treewidth of
graph instances
Treewidths from graphs in protein design too
large for DP to be practical
Adding hydrogen atoms to PDB
Hydrogen placement via combinatorial
optimization REDUCE Word99
Non-pairwise decomposable energy function
Previously used brute force
Replaced with dynamic programming
Interaction graphs have low treewidth
Effective in practice minutes to ms.
REDUCE v3.02 in Molprobity suite, and distributed
from http//kinemage.biochem.duke.edu/software/red
uce.php

H
O
31
Simulated Annealing

Stochastic optimization technique
Monte Carlo
Make a random change, determine ?E
Metropolis criterion Metropolis57
accept with probability
Gradually lower temperature T
In Side Chain Placement
Assign each residue a rotamer
Repeat
Select a random residue, and a random alternate
rotamer
Find ?E induced by substituting the alternate
rotamer
Accept/Reject substitution according to
Metropolis criterion

32
Self-consistent mean field

I planned to cull a description from Patrices
BioEbook sections
http//nook.cs.ucdavis.edu8080/koehl/BioEbook/de
sign_scmf.html
http//nook.cs.ucdavis.edu8080/koehl/BioEbook/sc
mf.html
but didnt have time in class.

33
The practical problem of side chain modeling M07

The way we deal today with the problem of protein
structure prediction is very different from the
way nature deals with it.
Due to technical issues such as computation time
we are usually forced to accept a fixed backbone
and only then put the side chains on it.
The quality of the side chain modeling is
therefore heavily dependent on the position of
the backbone. If the initial backbone
conformation is wrong, the side chain modeling
quality will be accordingly bad.
What is really needed is a combined algorithm
that optimizes backbone conformation
simultaneously with side chain modeling.

34
Protein Design or Redesign

Create an amino acid sequence that folds to a
stable protein and performs a desired function
Avoid
Sampling all sequences
Solving protein folding
Relying on molecular dynamics
A successful design strategy build on an
existing structure
Scaffold backbone from a known folded structure
Redesign 20 residues
Find side chains that fit

35
Why Design Proteins?

Nature uses proteins
to signal events
to catalyze reactions
to move cells (motors)
to bear weight (I-beams)
Design is an experiment to help understand
folding/binding
Industrial biosynthesis
Proteins are both efficient and specific
Cure disease
Antibodies
Inhibition peptides as drugs
Perturb cell signaling pathways

36
Why do RosettaDesign, Dezymer, work?

Geometric approximations (3d jigsaw puzzles) are
surprisingly effective in design.
They mine PDB structures for behaviors of native
proteins and fragments.
They precompute energies for pairwise
interactions.
They use many fast computers to allow detailed
sampling of discrete conformations.
Fast optimization algorithms
Competition

37
How do RosettaDesign, Dezymer, fail?

Computationally difficult to achieve good packing
and hydrogen bond satisfaction in protein core
Scores for packing, solvation and hydrogen bond
satisfaction cannot be pairwise additive.
Scores often used as filters wed prefer to
optimize.
Stability of designed proteins
Multistate or negative design

38
Protein Stability

A naturally occurring protein adopts a compact
geometry when placed in water
Stability is difference in free energies of the
folded and unfolded states

39
Protein Stability

A naturally occurring protein adopts a compact
geometry when placed in water
Different proteins have different free energies
in their unfolded states

40
Challenges in Protein Design

Side chain placement is hard
The complexities of individual instances of SCPP
are related to the treewidth of their interaction
graphs.
Tight, collision-free packing is often impossible
on the input scaffold
The interaction graph to allow simultaneous
optimization of side chain and backbone
structures
Protein stability is not well captured by
pairwise decomposable energy functions
The interaction graph supports using non-pairwise
decomposable energy functions during side chain
placement