Title: Solving NMR Structures II: Calculation and evaluation
1Solving NMR Structures IICalculation and
evaluation
- What NMR-based (solution) structures look like
- the NMR ensemble
- inclusion of hydrogen coordinates
- Methods for calculating structures
- distance geometry, restrained molecular
dynamics, simulated annealing - Evaluating the quality of NMR structures
- resolution, stereochemical quality, restraint
violations, etc
2NMR data do not uniquely define a 3D protein
structure (single set of coordinates)
- Restraints are ranges of allowed distances,
angles etc. rather than single values, reflecting
the fact that the experimental data contain
uncertainties both in measurement and
interpretation. -
- Only a limited number of the possible restraints
are observable experimentally - due to peak overlap/chemical shift degeneracy,
lack of stereospecific assignments, etc. - View of protein structure as a single set of
atomic coordinates may itself be physically
unrealistic! - proteins are dynamic molecules
3The NMR Ensemble
- NMR methods not calculate a single structure, but
rather repeat a structure calculation many times
to generate an ensemble of structures -
- The structure calculations are designed to
thoroughly explore all regions of conformational
space that satisfy the experimentally derived
restraints - At the same time, they often impose some physical
reasonableness on the system, such as bond
angles, distances and proper stereochemistry. - The ideal result is an ensemble which
- A. satisfies all the experimental restraints
(minimizes violations) - B. at the same time accurately represents the
full permissible conformational space under the
restraints (maximizes RMSD between ensemble
members) - C. looks like a real protein
4The NMR Ensemble
The fact that NMR structures are reported as
ensembles gives them a fuzzy appearance which
is both informative and sometimes annoying
At right, an ensemble of 25 structures for Syrian
hamster prion protein(only the backbone is shown)
Liu et al. Biochemistry (1999) 38, 5362.
5NMR structures include hydrogen coordinates
- X-ray structures do not generally include
hydrogen atoms in atomic coordinate files,
because the heavy atoms dominate the diffraction
pattern and the hydrogen atoms are not explicitly
seen. - By contrast, NMR restraints such as NOE distance
restraints and hydrogen bond restraints often
explicitly include the positions of hydrogen
atoms. Therefore, these positions are reported
in the PDB coordinate files.
6Methods for structure calculation
- distance geometry (DG)
- restrained molecular dynamics (rMD)
- simulated annealing (SA)
- hybrid methods
7Starting points for calculations
- to get the most unbiased, representative
ensemble, it is wise to start the calculations
from a set of randomly generated starting
structures. - Alternatively, in some methods the same initial
structure is used for each trial structure
calculation, but the calculation trajectory is
pushed in a different initial direction each time
using a random-number generator.
8DG--Distance geometry
- In distance geometry, one uses the nOe-derived
distance restraints to generate a distance
matrix, which one then uses as a guide in
calculating a structure - Structures calculated from distance geometry will
produce the correct overall fold but usually have
poor local geometry (e.g. improper bond angles,
distances) - hence distance geometry must be combined with
some extensive energy minimization method to
generate physically reasonable structures
9rMD--Restrained molecular dynamics
- Molecular dynamics involves computing the
potential energy V with respect to the atomic
coordinates. Usually this is defined as the sum
of a number of terms - Vtotal Vbond Vangle Vdihedr VvdW Vcoulomb
VNMR - the first five terms here are real energy terms
corresponding to such forces as van der Waals and
electrostatic repulsions and attractions, cost of
deforming bond lengths and angles...these come
from some standard molecular force field like
CHARMM or AMBER - the NMR restraints are incorporated into the VNMR
term, which is a pseudoenergy or
pseudopotential term included to represent the
cost of violating the restraints
10Pseudo-energy potentials for rMD
- Generate fake energy potentials representing
the cost of violating the distance or angle
restraints. Heres an example of a distance
restraint potential
KNOE(rij-riju)2 if rijgtriju
0 if rijlltrij lt riju
VNOE
KNOE(rij-rij1)2 if rijltrijl
where rijl and riju are the lower and upper
bounds of our distance restraint, and KNOE is
some chosen force constant, typically 250 kcal
mol-1 nm-2 So its somewhat permissible to
violate restraints but it raises V
11Example of nOe pseudopotential
VNOE
potential rises steeply with degree of violation
0
rijl riju
12SA-Simulated annealing
- SA is essentially a special implementation of rMD
and uses similar potentials but employs raising
the temperature of the system and then slow
cooling in order not to get trapped in local
energy minima - SA is very efficient at locating the global
minimum of the target function
13Dealing with ambiguous restraints
- often not possible to tell which atoms are
involved in a NOESY crosspeak, either because of
a lack of stereospecific assignments or because
multiple protons have the same chemical shift - sometimes an ambiguous restraint is included but
is expressed ambiguously in the restraint file,
e.g. 3 HA --gt 6 HB, where the wildcard
indicates that the beta protons of residue 6 are
not stereospecifically assigned. This is quite
commonly done for stereochemical ambiguities. - it is also possible to leave ambiguous restraints
out and then try to resolve them iteratively
using multiple cycles of calculation. This is
often done for restraints that involve more
complicated ambiguities, e.g. 3 HA--gt10 HN, 43
HN, or 57 HN, where three amides all have the
same shift. - can also make stereospecific assignments
iteratively using what are called floating
chirality methods
14Example of resolving an ambiguityduring
structure calculation
A
9-11 Ã…
9.52 ppm
B
4.34 ppm
3-4 Ã…
range of interatomic distances observed in trial
ensemble
C
4.34 ppm
Due to resonance overlap between atoms B and
C, an NOE crosspeak between 9.52 ppm and 4.34 ppm
could be A to C or A to B-- this restraint is
ambiguous
But if an ensemble generated with this ambiguous
restraint left out shows that A is never close to
B, then the restraint must be A to C.
15Iterative structure calculation with assignment
of ambiguous restraints
start with some set of unambiguous NOEs and
calculate an ensemble
- there are programs such as ARIA, with automatic
routines for iterative assignment of ambiguous
restraints. The key to success is to make
absolutely sure the restraints you start with are
right!
source http//www.pasteur.fr/recherche/unites/Bin
fs/aria/
16Acceptance criteria choosing structures for an
ensemble
- typical to generate 50 or more trial structures,
but not all will converge to a final structure
that is physically reasonable or consistent with
the experimentally derived NMR restraints. We
want to throw such structures away rather than
include them in our reported ensemble. - these are typical acceptance criteria for
including calculated structures in the ensemble - no more than 1 nOe distance restraint violation
greater than 0.4 Ã… - no dihedral angle restraint violations greater
than 5 - no gross violations of reasonable molecular
geometry - sometimes structures are rejected on other
grounds as well - too many residues with backbone angles in
disfavored regions of Ramachandran space - too high a final potential energy in the rMD
calculation
17Precision of NMR Structures (Resolution)
- judged by RMSD of superimposed ensemble of
accepted structures - RMSDs for both backbone (Ca, N, CCO) and all
heavy atoms (i.e. everything except hydrogen) are
typically reported, e.g. - bb 0.6 Ã…
- heavy 1.4 Ã…
- sometimes only the more ordered regions are
included in the reported RMSD, e.g. for a 58
residue protein you will see RMSD (residues 5-58)
if residues 1-4 are completely disordered.
18Reporting ensemble RMSD
- two major ways of calculating RMSD of the
ensemble - pairwise compute RMSDs for all possible pairs of
structures in the ensemble, and calculate the
mean of these RMSDs - from mean calculate a mean structure from the
ensemble and measure RMSD of each ensemble
structure from it, then calculate the mean of
these RMSDs - pairwise will generally give a slightly higher
number, so be aware that these two ways of
reporting RMSD are not completely equal. Usually
the Materials and Methods, or a footnote
somewhere in the paper, will indicate which is
being used.
19Minimized average structure
- a minimized average is just that a mean
structure is calculated from the ensemble and
then subjected to energy minimization to restore
reasonable geometry, which is often lost in the
calculation of a mean - this is NMRs way of generating a single
representative structure from the data. It is
much easier to visualize structural features from
a minimized average than from the ensemble. - for highly disordered regions a minimized average
will not be informative and may even be
misleading--such regions are sometimes left out
of the minimized average - sometimes when an NMR structure is deposited in
the PDB, there will be separate entries for both
the ensemble and the minimized average. It is
nice when people do this. Alternatively, a
member of the ensemble may be identified which is
considered the most representative (often the one
closest to the mean).
20How many restraints do we need to get a
high-resolution NMR structure?
- usually 15-20 nOe distance restraints per
residue, but the total is not as important as
how many long-range restraints you have, meaning
long-range in the sequence i-jgt 5, where i and
j are the two residues involved - good NMR structures usually have 3.5
long-range distance restraints per residue in the
structured regions - to get a very good quality structure, it is
usually also necessary to have some
stereospecific assignments, e.g. b hydrogens
Leu, Val methyls
21Assessing Structure Quality
- NMR spectroscopists usually run their ensemble
through the program PROCHECK-NMR to assess its
quality - high-resolution structure will have backbone RMSD
0.8 Ã…, heavy atom RMSD 1.5 Ã… - low RMS deviation from restraints (good agreement
w/restraints) - will have good stereochemical quality
- ideally gt90 of residues in core (most favorable)
regions of Ramachandran plot - very few unusual side chain angles and rotamers
(as judged by those commonly found in crystal
structures) - low deviations from idealized covalent geometry
22Structural Statistics Tables
list of restraints, and type
calculated energies
agreement of ensemble structures with restraints
(RMS)
precision of structure (RMSD)
sometimes also see listings of Ramachandran
statistics, deviations from ideal covalent
geometry, etc.