Title: Homology Modelling Course
1Homology Modelling Course Dr Marcus
Durrant Computational Biology Group Genome Centre
Room 101b John Innes Centre marcus.durrant_at_bbsrc.
ac.uk
2Premise every protein has a single structure,
defined by its primary amino acid sequence.
Why?
All chemical structures at equilibrium are
determined by thermodynamics
G H - TS
Where G free energy H enthalpy T
temperature S entropy
At equilibrium, the value of G is as small as it
can be.
3- Can we calculate G from first principles?
- YES- but at present, only for small molecules...
- The underlying physical laws necessary for the
mathematical theory of a - large part of physics and the whole of chemistry
are thus completely known, - and the difficulty is only that the exact
application of these laws leads to - equations much too complicated to be soluble.
- - P.A.M. Dirac, 1929
- In practice, we can approximate the enthalpy term
H quite well - by molecular mechanics.
- However,
- proteins have very many possible conformations.
- the entropy term is very hard to calculate.
- Hence, we need homology modelling.
4It has been predicted that the total number of
protein folds will be about 1,000 (C. Chothia,
Nature 357, 543-4, 1992) At present, about 700
folds have been characterised (SCOP
database) Note, however, that some folds are
much less well represented- e.g. membrane
surface proteins (12 known so far).
Structure is much more highly conserved than
sequence.
In homology modelling, we exploit this principle
to build an initial model, which is then refined.
5Principles of Protein Structure Normally, only
20 amino acids are used in proteins.
sidechain
N-terminus
C-terminus
backbone
6Protein Structural Elements The principal
protein structural elements are formed by
hydrogen bonding between the backbone amide
groups.
The local pattern of hydrogen bonding is
determined by the sidechains.
7Protein Structural Elements
Helix
Sheet
parallel
antiparallel (preferred)
Loop
Turn
8(No Transcript)
9Example of Structure Conservation/Divergence
missing N-terminus
extra loop
conserved sheets
conserved helices
These two structures are only 28 identical.
10The a-helix
nth backbone O group hydrogen bonded to the
(n4)th NH group destabilised by Pro, Gly
11The antiparallel b-sheet
Turn (note Gly)
note bifurcated H-bond Destabilised by
unfavourable R-group interactions
(steric/electronic)
12How useful will my model be?
sequence identity
0 30 60 100
- twilight zone
- overall fold
- residue-specific
- effects
- mutagenesis
- electrostatics
- cavity volume
comparable to medium-resolution X-ray/NMR
structure
In the twilight zone, a good alignment is often
more useful than a 3D model.
13Homology Modelling Procedure
validate model
14- Software servers available at JIC
- Online resources for locating structural
homologues - Fugue (Shi, Blundell Mizuguchi (2001), J Mol
Biol, 310, 243-57) - FASTA on PDB website
- Modelling software
- DeepView (AKA Swiss PDB viewer)
- Insight II (commercial package)
- See course homepage for links to other protein
tools
15- Types of PDB structure file
- X-Ray crystal structure
- Done on solid at very low temperature (minimise
- thermal motion/entropy)
- Quality is indicated by resolution the lower
the - number, the better (typically 1 4 Å)
- Frequently contain more than one molecule of
protein - May have missing residues- check header
- Always read the paper!
- NMR structure
- Done on protein in solution
- Generally small proteins only
- Usually presented as a set of similar
structures - No resolution value but roughly equivalent to 2
Å - Always read the paper!
16Principles of structure-guided sequence alignment
- The degree of structure conservation is closely
linked to the - biological function. Structure is highly
conserved when it - needs to be.
- Some residues (e.g. catalytic triad) will be
strictly conserved. - Some at least of the core structural elements
(e.g. turns) - should show significant conservation across
multiple - sequences.
- Some regions of the sequence will be better
conserved - than others.
- Deletions and insertions are much more likely
to occur in - loop regions.
17Dayhoff Mutation Matrix
- The different chemical properties of the amino
acids - mean that some are more similar than others
- The probability of a given mutation can be
estimated - by comparing many related sequences
- The results are expressed as a probability
matrix
-another tool for local sequence alignment
18Why use multiple structures in the sequence
alignment?
VLPGDMMHFAADEKRNDLLDQQEGARHFSSPYMDA LLPGDDDIYGVDT-
-----NDQDLTRHLTSPFQNA VLPGDMMHFAADEKNLDLRDQQEGA
RHFSSPYMDA LLPGDDDIYGVDTNDQDLTR------HLTSPFQNA MLP
GRKMVFALPIKVGDLHHR---SKKVTSPYNNA MVPGHHTLFGITQDLAD
LVTR----SPQSSPFNDG VVPGKHSPYVVSTRDQDLITRPG--TVRSSP
YQNG
19Use structure to refine sequence alignment
---helix---------loop-------helix-- HFNVKVRTMQAHR
AAAV--PVYYAGKGLTTENFTT HFQAKVRSMQAKKTGLYTKLKKPGVQA
LTSENWNS HFNVKVRT-----HAIYLYTKLKKAVTLTNDNFKT HFN
VKVRTMQAHRAAAV--PVYYAGKGLTTENFTT HFQAKVRSMQAKKTGLY
TKLKKPGVQALTSENWNS HFNVKVRTHAIYLYTKL-----KKAVTLTND
NFKT
20Structure assignment procedures
- Structurally conserved regions
- Conserved residue- copy all coordinates
-
- Non-conserved residue- copy sidechains using the
- maximum overlap principle (Summers Karplus)
-
21- Loops
- If possible, use a reference structure with the
same - length of loop
- Otherwise, either
- Search the database for comparable loops
(preferred) - Build the loops using random conformational
searching
22Molecular Mechanics
Treats molecules using classical mechanics rather
than Quantum mechanics
Electrostatics, etc also included as
classical terms Very efficient but limited
accuracy- avoid over-minimisation
MM optimisation cant make a bad model into a
good one!
23Now for the practical