Title: Bioinformatics I
 1Swiss Institute of Bioinformatics
Bioinformatics I Ab initio Protein Structure 
Modeling  Fold Recognition
14.1.2003 
Torsten.Schwede_at_unibas.ch 
 2Growth of the Protein Data Bank (PDB)
08. January 2003 19691
 PDB http//www.pdb.org  
 3Public Database Holdings
-  No experimental 
- structure for mostsequences
4In the near future for most of the known protein 
sequences no experimental structure will be 
available.
Can we predict protein structures from genome 
sequences? 
 5gattccagag atggacgctt ttgctcttat tcctcgtact 
cagtggcaat atgtgatggg tccttcactt taccgaataa 
tgaacaacct cttttaattt tataaatacc 
ttctataaat acttaggagg tattatgaat atatttgaaa 
tgttacgtat agatgaacgt cttagactta aaatctataa 
agacacagaa ggctattaca ctattggcat cggtcatttg 
cttacaaaaa gtccatcact taatgctgct aaatctgaat 
tagataaagc tattgggcgt aattgcaatg gtgtaattac 
aaaagatgag gctgaaaaac tctttaatca ggatgttgat 
gctgctgttc gcggaattct gagaaatgct aaattaaaac 
cggtttatga ttctcttgat gcggttcgtc gctgtgcatt 
gattaatatg gttttccaaa tgggagaaac cggtgtggca 
ggatttacta actctttacg tatgcttcaa caaaaacgct 
gggatgaagc agcagttaac ttagctaaaa gtatatggta 
taatcaaaca cctaatcgcg caaaacgagt cattacaacg 
tttagaactg
?
 Gene prediction
MNIFEMLRID EGLRLKIYKD TEGYYTIGIG HLLTKSPSLN 
AAKSELDKAI GRNCNGVITK DEAEKLFNQD VDAAVRGILR 
NAKLKPVYDS LDAVRRCALI NMVFQMGETG 
VAGFTNSLRM LQQKRWDEAA VNLAKSRWYN QTPNRAKRVI 
TTFRTGTWDA YKNL
?
Can we predict protein structures from protein 
sequences? 
 6-  Many proteins fold spontaneously to their native 
 structure
-  Protein folding is relatively fast 
-  Chaperones speed up folding, but do not alter 
 the strcuture
The protein sequence contains all information 
needed to create a correctly folded protein. 
 7Empirical Force Fields and Molecular Mechanics
-  describe interaction of atoms or groups 
-  the parameters are empirical, i.e. they are 
 dependent on others and have no direct intrinsic
 meaning
- Examples GROMOS96 (van Gusteren)CHARMM (M. 
 Karplus)AMBER (Kollman)
8- Bond stretching 
-  
- Approximation of the Morse potential by an 
 elastic spring  model
- Hookes law as reasonable approximation close to 
 reference bond length l0
l
k Force constant l distance 
 9- Angle Bending 
-  
- Deviation from angles from their reference angle 
 l0 often described by Hookes law
?
k Force constant ? bond angle
-  Force constants are much smaller than those for 
 bond stretching
10Torsional Terms
- Hypothetical potential function for rotation 
 around a chemical bond
Vn barrier height n multiplicity (e.g. 
n3) ? torsion angle ? phase factor 
- Need to include higher terms for non-symmetric 
 bonds (i.e. to distinguish trans, gauche/droit
 conformations)
11Non-bonded (Van der Waals) interactions 
-  act only only at very low distances 
- Attractive interaction by induced dipoles between 
 uncharged atoms  r 6
- When atoms come too close, their valence shells 
 start to overlap and repulse  r 12
12Electrostatic interactions 
- Electronegative elements attract electrons more 
 than less electronegative elements
- Unequal charge distribution is expressed by 
 fractional charges
- Electrostatic interaction often calculated by 
 Coulombs law
q
r
- 
 13Electrostatic interactions Solvent dielectric 
model?
-  use relative dielectric constant ?0?r 
-  Problem Inhomogeneous permittivity 
-  ? For proteins, we need to solve 
 Poisson-Boltzmann equation numerically
e  80
e  2-4 
 14Example for a (very) simple Force Field  
 15Molecular Mechanics - Energy Minimization
- The energy of the system is minimized. The system 
 tries to relax
- Typically, the system relaxes to a local minimum 
 (LM).
16Molecular Dynamics (MD)
In molecular dynamics, energy is supplied to the 
system, typically using a constant temperature 
(i.e. constant average constant kinetic energy).  
 17Molecular Dynamics (MD)
- Use Newtonian mechanics to calculate the net 
 force and acceleration experienced by each atom.
- Each atom i is treated as a point with mass mi 
 and fixed charge qi
- Determine the force Fi on each atom
- Use positions and accelerations at time t (and 
 positions from t - ? t) to calculate new
 positions at time t ? t
18Implicit Solvent Models
- Water molecules are not included as molecules, 
 but represented by an extra potential on the
 solvent accessible surface.
-  Advantages 
-  only 50 slower than vacuum calculations 
-  10 times faster than explicit water MD 
-  Disadvantages 
-  Really represents water ? -gt heavy discussions 
- Example SASA model (CHARMM) 
19Explicit Solvent Models
- Water molecules are explicitly included as 
 individual molecules.
-  Force Fields for water molecules are not trivial 
 ...
-  Computationally expensive ...
20Periodic Boundary Conditions (PBC)
- Periodic boundary conditions are used to simulate 
 solvated systems or crystals.
- In solvated systems, PBC prevents that the 
 solvent "evaporates in silico"
21Typical Time Scales ....
- Bond stretching 10-14 - 10-13 sec. 
- Elastic vibrations 10-12 - 10-11 sec. 
- Rotations of surface sidechains 10-11 - 10-10 
 sec.
- Hinge bending 10-11 - 10-7 sec. 
- Rotation of buried side chains 10-4 - 1 sec. 
- Protein folding 10-6 - 102 sec. 
- Timescale in MD 
- A Typical timestep in MD is 1 fs (10-15 
 sec)(ideally 1/10 of the highest frequency
 vibration)
22Ab initio protein folding simulation
? Blue Gene will need 3 years to simulate 100 
?sec. 
 23Want to fold some proteins at home? 
 24Want to fold some proteins at home?
-  Simulations of the villin headpiece 
-  Folding time is on the order of 10 microseconds 
-  Hundred of microseconds of MD time simulated 
For the villin movie, please see
 http//folding.stanford.edu/villin/  
 25Can we predict protein structures ?
-  ab initio folding simulation not yet ... 
-  ???
26Rosetta Stone Approach 
 27Rosetta Stone Approach (David Baker)
1. Find sequence patterns that strongly correlate 
with protein structure at the local level to 
create a library of fragments (I-sites). 
E.g. amphipathic helix
Amino acid statistics
Helix position 
 28Rosetta Stone Approach (David Baker)
2. Model building for a new sequence- Search 
for compatible fragments (reduced alphabet)
-  Use Monte Carlo simulated annealing to assemble 
 overlapping fragments
- - Scoring functions are used to select best 
 models (1000)
29Rosetta Stone Approach 
- ? Generates thousands of models 
-  Best Models in CASP4  6  10 Å rmsd Ca 
-  Difficult to distinguish good and bad models 
-  
http//isites.bio.rpi.edu/index.html 
 30Can we predict protein structures ?
-  ab initio folding simulation not yet ... 
-  Rosetta approach neither ... 
-  ???
31Growth of the Protein Data Bank (PDB)
08. January 2003 19691
 PDB http//www.pdb.org  
 32Protein Structure Databases
- Worldwide repository for the processing and 
 distribution of 3-D biological macromolecular
 structure data
- http//www.pdb.org 
- Protein structures solved experimentally (X-Ray 
 or NMR)
- Provides 
-  Coordinates (sometimes structure factors, NOEs) 
-  Images 
-  Links to derived data, e.g. similar structures, 
 fold families, etc.
33The number of different protein folds is limited
Seen this before ...
New Folds 
 34The number of different protein folds is limited
 last update Oct 2001  
 35Protein Structure Databases
CATH - Protein Structure Classification
- hierarchical classification of protein domain 
 structures
- UCL, Janet Thornton  Christine Orengo 
- clusters proteins at four major levels 
- Class(C) 
- Architecture(A) 
- Topology(T) 
- Homologous superfamily (H)
 http//www.biochem.ucl.ac.uk/bsm/cath_new/  
 36- Class(C)derived from secondary structure content 
 is assigned automatically
- Architecture(A)describes the gross orientation 
 of secondary structures, independent of
 connectivity.
- Topology(T) clusters structures according to 
 their topological connections and numbers of
 secondary structures
 http//www.biochem.ucl.ac.uk/bsm/cath_new/  
 37(No Transcript) 
 38(No Transcript) 
 39Protein Structure Databases
SCOP - Structural Classification of Proteins 
- MRC Cambridge (UK), Alexey Murzin, Brenner S. E., 
 Hubbard T., Chothia C.
- hierarchical classification of protein domain 
 structures
- created by manual inspection 
- comprehensive description of the structural and 
 evolutionary relationships
- organized as a tree structure 
- Class 
- Fold 
- Superfamily 
- Family 
- Species
 http//scop.mrc-lmb.cam.ac.uk/scop/