Title: Sidechain Placement and Protein Design
1Sidechain Placement and Protein Design
2Protein design
- Sequence ? Structure ? FunctionKDTIALVVST
Ribose YPVDLKLVVKQ binding protein - Modify sequence TNTto change structure
bindingand function Looger03 - or behavior Ambroggio06 folding order
3Protein Design or Redesign
- Create an amino acid sequence that folds to a
stable protein and performs a desired function - Avoid
- Sampling all sequences
- Solving protein folding
- Relying on molecular dynamics
- A successful design strategy build on an
existing structure - Scaffold backbone from a known folded structure
- Redesign 20 residues
- Find side chains that fit
4Outline
- Sidechain Rotamers Rotamer Libraries
- Algorithms for Sidechain Placement
- Brute Force
- Dead End Elimination
- Simulated Annealing
- Stochastic Mean Field
- Dynamic Programming
- A Biased View of Protein Structure Design
- How is design done?
- Why is it successful?
5Protein Structure
- Chemical
- 1-Dimensional Sequence of amino acids
- Two components for each amino acid
- Backbone (NCaCO)
- Side chain (residue)
- Placed residue a position in an amino acid
sequence
S
OH
N
N
H2N
MSS
MSW
O
O
6Side chain geometry
- Conformation flexibility from dihedral angles
- Side chain internal geometry
- Bond angles and bond lengths fixed
- Dihedrals c1, c2, may rotate
- Rotamers rotational isomers
- Side chains have preferred conformations
- Prefer dihedrals around 60o, 180o and -60o
- Rotamer Library set of dihedral angles
Ponder87, Dunbrack93, Lovel2000
7Side chain conformation
side chains differ in size ( of atoms) and
degrees of freedom ( of c angles)
N
N
?2
?1
8Serine c1 distribution
a chosen combination of side chain torsion
angles c1, c2, etc. for a residue is known
as a rotamer.
9Side chain conformations--canonical staggered
forms
Newman projections for c1 of glutamate
glutamate
ttrans, ggauche
name of conformation
Side chain angles are defined moving outward from
the backbone, starting with the N atom so the c1
angle is NCaCbCg, the c2 angle is CaCbCg Cd
...
IUPAC nomenclature http//www.chem.qmw.ac.uk/iupa
c/misc/biop.html
10Backbone independent rotamer library
11What do rotamer libraries provide? J. Meiler07
- Rotamer libraries significantly reduce the number
of conformations that need to be evaluated during
the search. - This is done with almost no risk of missing the
real conformations. - Even small libraries of about 100-150 rotamers
cover about 96-97 of the conformations actually
found in protein structures. - The probabilities of each rotamer in the library
provide estimates of the potential energy due to
interactions within the side chain and with the
local backbone atoms, using the Boltzmann
distribution E ? ln(P)
12Side chain geometry
- Conformation flexibility from dihedral angles
- Side chain internal geometry
- Bond angles and bond lengths fixed
- Dihedrals c1, c2, may rotate
- Rotamers rotational isomers
- Side chains have preferred conformations
- Prefer dihedrals around 60o, 180o and -60o
- Rotamer Library set of dihedral angles
Ponder87, Dunbrack93, Lovel2000 - http//dunbrack.fccc.edu/bbdep/bbdepdownload.php
(Backbone dependent and independent libraries) - http//kinemage.biochem.duke.edu/databases/rotamer
.html (Backbone independent library)
13Rotemers in crystallographic refinement
Fit structure to electron density from x-ray
diffraction
- Red indicate clashes w/ added hydrogen atoms
better choice of side chain
14Outline
- Sidechain Rotamers Rotamer Libraries
- Algorithms for Sidechain Placement
- Brute Force Search
- Dead End Elimination
- Simulated Annealing
- Stochastic Mean Field
- Dynamic Programming
15Side Chain Placement Problem
- Given
- A fixed protein backbone
- A set of fixed (background) residues
- A set of changing (molten) residues
- A list of allowed amino acids for each molten
residue - A rotamer library
- A pairwise decomposable energy function
- Find the assignment of rotamers to the molten
residues, S, that minimizes the energy function
Kinemage rotamers for Ubiquitin surface residues
16Energy Functions
- f Protein Structure ? ?
- Lennard-Jones
- van der Waals attractive energies
- atom overlap repulsive overlap
- Electrostatics
- Solvent Effects
- Hydrogen bonds
- Often pairwise decomposable
- sum of atom-pair or rotamer-pair interaction
energies
17Side Chain Placement Problem
- Find the assignment of rotamers to the molten
residues, S, that minimizes the energy function -
- Functions stated in terms of rotamer energies
- rotamer / background energy
- rotamer pair energies
Esingle
Epair
18Side Chain Placement Problem
- NP-Complete
- Reduction from SAT Pierce2002
- Techniques
- Optimality Guarantee
- Dead-End Elimination Desmet92, Goldstein94,
Looger2001 - Integer Linear Programming Erickson2001
- Branch and Bound Gordon99, Canutescu2003
- Dynamic Programming Leaver-Fay2005
- No Optimality Guarantee
- Genetic Algorithms Jones94
- Simulated Annealing Holm92,Hellinga94,Kuhlman03
- Self-Consistent Mean Field Koehl96
19Dead End Elimination (DEE)
- Reduce the search space without losing the Global
Minimum Energy Conformation (GMEC). - Eliminates rotamers which cannot be in the GMEC,
using more accurate (and more computationally
expensive) upper and lower bounds. - Uses brute force search on rotamers remaining.
- Typically assumes that the scoring function can
be expressed as a sum of pair-wise interactions
20A first, simple condition for elimination
- A rotamer can be eliminated for a residue when
the minimum (best) energy it obtains by
interaction with other rotamers is still
higher (worse) than the maximum energy of some
other rotamer
21The Goldstein improvement
- A rotamer can be safely eliminated when there
exists a rotamer that has lower (better) energy
for each given environment. - This criteria is more powerful, and typically
requires though more computational time.
22Even more powerful criteria can be obtained with
even more computation
- A rotamer can be safely eliminated when, for each
environment, there exists some rotamer that has
lower (better) energy.
23Dynamic Programming via an Interaction Graph
- Surface residues on Ubiquitins b-sheet
Interaction Graph defined by Rosettas
energy function
24Interaction Graph
- G V, E, a multi-hypergraph
- vertices ? molten residues v
- state space ? rotamers for a residue S(v)
- edge ? possibility of residue interaction e ?V
- scoring function ? interaction energy fe ?S(v)
? ?
v?e
Hypergraph
Graph
25Interaction Graph Evaluation (Pairwise case)
- For G V, E, min
- Each vertex, v, has a function to capture
interactions with the background fv S(v) ? R - Each pair of interacting vertices, u, v,
defines an edge with a function to capture pair
interactions fu,v S(u) x S(v) ? R - Given an interaction graph, GV,E, find the
state assignment S that minimizes Sw?V?E fw
26Bottom Up Dynamic Programming
- Eliminate node v
- Let Ev be the edges incident upon v
- Let Nv be the neighbors of v
- For each edge e ? Ev with scoring function fe,
let fe,vs be edge e s scoring function with
vertex v fixed in state s - Create a new hyperedge incident upon Nv.
- Compute fNv min s ? S(v) ? e ? Ev fe,vs
- Remove v from graph
27Scoring Function Representation Tables
u
Edge e u,v
S(v)
S(u)
v
f
g
h
i
j
a
b
c
d
e
28Scoring Function Representation Tables
w
Edge e u,v,w
v
u
29Experiments and Results
- Rotamer Relaxation Task
- Sequence fixed choose new rotamers for each
residue - Redesign Task
- Search of conformation and sequence spaces.
- Ubiquitins 15 surface residues
- Large rotamer library
- Relaxation, 32 states per vertex, tw-4
interaction graph - Redesign, 680 states per vertex, tw-3 interaction
graph (drop one edge)
Running Time Memory
Relaxation 200 ms (small)
Redesign 15.99 hrs 3.7 GB
30Dynamic Programming for Hydrogen Placement
- Dynamic programming (DP) limited by treewidth of
graph instances - Treewidths from graphs in protein design too
large for DP to be practical - Adding hydrogen atoms to PDB
- Hydrogen placement via combinatorial
optimization REDUCE Word99 - Non-pairwise decomposable energy function
- Previously used brute force
- Replaced with dynamic programming
- Interaction graphs have low treewidth
- Effective in practice minutes to ms.
- REDUCE v3.02 in Molprobity suite, and distributed
from http//kinemage.biochem.duke.edu/software/red
uce.php
H
O
31Simulated Annealing
- Stochastic optimization technique
- Monte Carlo
- Make a random change, determine ?E
- Metropolis criterion Metropolis57
- accept with probability
- Gradually lower temperature T
- In Side Chain Placement
- Assign each residue a rotamer
- Repeat
- Select a random residue, and a random alternate
rotamer - Find ?E induced by substituting the alternate
rotamer - Accept/Reject substitution according to
Metropolis criterion
32Self-consistent mean field
- I planned to cull a description from Patrices
BioEbook sections - http//nook.cs.ucdavis.edu8080/koehl/BioEbook/de
sign_scmf.html - http//nook.cs.ucdavis.edu8080/koehl/BioEbook/sc
mf.html - but didnt have time in class.
33The practical problem of side chain modeling M07
- The way we deal today with the problem of protein
structure prediction is very different from the
way nature deals with it. - Due to technical issues such as computation time
we are usually forced to accept a fixed backbone
and only then put the side chains on it. - The quality of the side chain modeling is
therefore heavily dependent on the position of
the backbone. If the initial backbone
conformation is wrong, the side chain modeling
quality will be accordingly bad. - What is really needed is a combined algorithm
that optimizes backbone conformation
simultaneously with side chain modeling.
34Protein Design or Redesign
- Create an amino acid sequence that folds to a
stable protein and performs a desired function - Avoid
- Sampling all sequences
- Solving protein folding
- Relying on molecular dynamics
- A successful design strategy build on an
existing structure - Scaffold backbone from a known folded structure
- Redesign 20 residues
- Find side chains that fit
35Why Design Proteins?
- Nature uses proteins
- to signal events
- to catalyze reactions
- to move cells (motors)
- to bear weight (I-beams)
- Design is an experiment to help understand
folding/binding - Industrial biosynthesis
- Proteins are both efficient and specific
- Cure disease
- Antibodies
- Inhibition peptides as drugs
- Perturb cell signaling pathways
36Why do RosettaDesign, Dezymer, work?
- Geometric approximations (3d jigsaw puzzles) are
surprisingly effective in design. - They mine PDB structures for behaviors of native
proteins and fragments. - They precompute energies for pairwise
interactions. - They use many fast computers to allow detailed
sampling of discrete conformations. - Fast optimization algorithms
- Competition
37How do RosettaDesign, Dezymer, fail?
- Computationally difficult to achieve good packing
and hydrogen bond satisfaction in protein core - Scores for packing, solvation and hydrogen bond
satisfaction cannot be pairwise additive. - Scores often used as filters wed prefer to
optimize. - Stability of designed proteins
- Multistate or negative design
38Protein Stability
- A naturally occurring protein adopts a compact
geometry when placed in water - Stability is difference in free energies of the
folded and unfolded states
39Protein Stability
- A naturally occurring protein adopts a compact
geometry when placed in water -
- Different proteins have different free energies
in their unfolded states
40Challenges in Protein Design
- Side chain placement is hard
- The complexities of individual instances of SCPP
are related to the treewidth of their interaction
graphs. - Tight, collision-free packing is often impossible
on the input scaffold - The interaction graph to allow simultaneous
optimization of side chain and backbone
structures - Protein stability is not well captured by
pairwise decomposable energy functions - The interaction graph supports using non-pairwise
decomposable energy functions during side chain
placement