Title: Assessment and Validation Tools
1Assessment and Validation Tools for NMR
Structure Determinations Thanks to Chris
Spronk Jurgen Doreleijers Guy Montelione (for
figures and results)
2References
R. A. Laskowski, J. A. Rullmannn, M. W.
MacArthur, R. Kaptein, J. M. Thornton, AQUA and
PROCHECK-NMR programs for checking the quality
of protein structures solved by NMR J Biomol NMR
8, 477-486 (1996). S. B. Nabuurs, C. A. E. M.
Spronk, G. Vriend, and G. W. Vuister, Concepts
and Tools for NMR Restraint Analysis and
Validation Concepts in Magnetic Resonance 22A,
90-105 (2004). J. F. Doreleijers, A. J.
Nederveen, W. Vranken, J. Lin, A. M. J. J.
Bonvin, R. Kaptein, J. L. Markley, and E. L.
Ulrich, BioMagResBank databases DOCR and FRED
with converted and filtered sets of experimental
NMR restraints and coordinates from over 500
protein PDB structures, J. Biomol. NMR 32, 1-12
(2005). A. J. Nederveen, J.F. Doreleijers,W.
Vranken, Z. Miller, C. A. E. M. Spronk, S. B.
Nabuurs, P. Güntert, M. Livny, J. L. Markley, M.
Nilges, E. L. Ulrich, R. Kaptein, and A. M. J. J.
Bonvin, RECOORD a REcalculated COORdinates
Database of 500 proteins generated from
restraint data downloaded from the
BioMagResBank, Proteins 59, 662-272 (2005). L.
Wang, H. R. Eghbalnia, A. Bahrami, and J. L.
Markley, Linear analysis of carbon-13 chemical
shift differences and its application to the
detection and correction of errors in referencing
and spin system identifications, J. Biomol. NMR
32, 13-22 (2005).
3References continued
Zhang, H., Neal, S. Wishart, D. S. (2003)
RefDB a database of uniformly referenced protein
chemical shifts, Journal of Biomolecular NMR. 25,
173-195. Nabuurs, S. B., Spronk, C. A., Krieger,
E., Maassen, H., Vriend, G. Vuister, G. W.
(2003) Quantitative evaluation of experimental
NMR restraints, J Am.Chem.Soc. 125,
12026-12034. Moseley, H. N., Sahota, G.
Montelione, G. T. (2004) Assignment validation
software suite for the evaluation and
presentation of protein resonance assignment
data, J Biomol NMR. 28, 341-55. Nabuurs, S. B.,
Krieger, E., Spronk, C. A., Nederveen, A. J.,
Vriend, G. Vuister, G. W. (2005) Definition of
a new information-based per-residue quality
parameter, J Biomol NMR. 33, 123-34. Nabuurs, S.
B., Spronk, C. A., Vuister, G. W. Vriend, G.
(2006) Traditional biomolecular structure
determination by NMR spectroscopy allows for
major errors, PLoS Comput Biol. 2, e9.
Ginzinger, S. W., Gerick, F., Coles, M. Heun,
V. (2007) CheckShift automatic correction of
inconsistent chemical shift referencing, J Biomol
NMR. 39, 223-7. Vranken, W. (2007) A global
analysis of NMR distance constraints from the
PDB, J Biomol NMR. 39, 303-14. Bhattacharya, A.,
Tejero, R. Montelione, G. T. (2007) Evaluating
protein structures determined by structural
genomics consortia, Proteins. 66, 778-95. CING
(pronounced king) stands the Common Interface
for NMR structure Generation http//nmr.cmbi.ru.n
l/cing/Home.html
4Importance of structure validation
- Means for determining the precision and accuracy
of NMR structures - Benchmark for comparing different methods for
structure determination - Needed for community-wide assessment of the
validity of NMR structures - Standard by which improvements in technology can
be gauged - Structures should be reliable be consistent with
experimental data, have good local and overall
quality
5Precision vs. accuracy (often confused in the
literature)
- Precision is the variation of X around ltXgt,
expressed as standard deviation or variance - Accuracy is the closeness of ltXgt to the true
value of X - Accuracy can only be measured relative to a gold
standard (e.g. by reconstructing a known result
with simulated data)
Adapted from Chris Spronk
6Precision vs. true variance
Precision underestimates true variance
Precision equals true variance
Precision overestimates true variance
Adapted from Chris Spronk
7Limitations of the biomolecular NMR field
- No standard convention for estimating precision
- No standard convention for estimating accuracy
- No standard convention for estimating true
variance - Lack of objective reproducibility of manual data
analysis steps - Recognition of these problems is coming to the
forefront - Position paper on validation disseminated at a
meeting in Florence in January 2007 - Validation was the major issue addressed by the
Worldwide Protein Databank (wwPDB) NMR Task Group
(at the ISMAR meeting in Taiwan 10/2007 )
8Approaches to assessing accuracy and their
limitations
- Restraint violations
- Restraints are interpreted data
- No standard for calibrating constraints
-
- Restraints per residue
- Conformationally-restraining
- Restraints per restrained residue
- How to define restrained residues?
- ProCheck / MAGE
- Parameters derived from crystal structures
- Question of which residues to include/exclude
-
- Cross validation with RDC
- Not measured universally
- Not sensitive to rigid body translation multiple
alignments
Adapted from Guy Montelione
9Approaches to assessing accuracy and their
limitations
- Comparison with crystal structures
- Differences with x-ray structure may be
biologically relevant - Comparisons with solid state NMR data may be
better, but still could reflect real differences - Back calculation of NOEs - relaxation matrix
analysis - Compare to NOESY peak list?
- Compare to NOESY spectrum? (what does this mean?)
- Exchange broadening, lineshape, differential
relaxation effects - Diagonal, ridges, overlap, residual water,
saturation transfer - Differential relaxation of heteroatoms
- Back calculation of chemical shifts
- Promising used more and more
- H-bond geometry
- Interesting, but not comprehensive
10Analysis of 151 pairs of NMR and crystal
structures
- NMR overestimates precision of the ensemble
- NMR provides inaccurate global structure
- - Ensemble averaging
- - Just plain wrong
- Xray is inaccurate
- Crystallization shifts global conformational
equilibria
Line - rmsd of superimposed NMR ensemble
PRECISION Shade - rmsd between median NMR
conformer and Xtal structure ACCURACY
Filtered to be in same ligand state, similar
pH Analysis for FindCore core (bb and sc) atoms
only
Andrec, Snyder, Montelione, Levy, et al. (2007)
Proteins 69449
11Least biased representation of carbon chemical
shifts, irrespective of structure, is as the sum
of three Gaussian distributions
Data for all alanine residues in RefDB
Occurrences separately as a function of 13Ca and
d13Cb
Occurrences as a function of d13Ca d13Cb
12Linear Analysis of Chemical Shifts (LACS) plot
Data for valine from RefDB
L. Wang et al. (2005) J. Biomol. NMR, 3213-22
13LACS data for a particular protein assigned 13C?
and 13C? chemical shifts from BMRB
This intercept should be at (0,0) for properly
referenced data
L. Wang et al. (2005) J. Biomol. NMR, 3213-22
14We have used LACS to re-reference the BMRB
database
- 11 ( 1.0 ppm )
- 26 ( 0.5 ppm )
- 46 ( 0.3 ppm )
L. Wang et al. (2005) J. Biomol. NMR, 3213-22
15NMR structure determination
NMR experimental data
Structure ensemble
Experimental restraints
Structure calculation and selection
Assignment and conversion
Constraint violation and error analysis
Validated structure data
Structure quality checks and statistics
(often not done!)
Adapted from Chris Spronk
16Distance restraints
(A) Ensemble of 30 structural models of GB1). The
a-helix is shown as a blue ribbon, the -sheets
are indicated with red ribbons. Hydrogen atoms
have been omitted for clarity. (B) Restrained
minimized average structure of GB1, with the 659
experimental distance restraints in the
experimental dataset shown in yellow. Restraints
involving groups of hydrogen atoms are, for
clarity reasons, only shown for one of the
protons involved. Figure made using YASARA
(http//www.yasara.org). (From Nabuurs et al.,
2004)
17Sources of restraints (constraints)
NOE values J-couplings (Karplus eq.) Residual
dipolar couplings H-bonds experimental (trans
HB coupling) or inferred Relaxation probes
(relaxation or pseudocontact shifts) Chemical
shifts Biochemical information crosslinking,
....
18Ambiguously determined H-bonds in an a-helix from
NOEs
Three contributing distances are shown in yellow,
allowing for the formation of either the i, i3,
the i, i4, or the i, i5 hydrogen bond. In this
case, the distance would be restrained to 2 Ã….
(From Nabuurs et al., 2004)
19Classification of restraints
- Intra-residue
- Information on side chain conformation
- Sequential residue i to residue i1
- Information on secondary structure
- Medium range residue i to residue i4
- Information on secondary structure
- Long range residue i to residue i5 and higher
- Information on secondary and tertiary structure
- Inter chain between subunits
- Information on quaternary structure
20Redundancy of restraints
- Redundant restraints shouldnt be counted because
they dont add information to the structure - E.g. HN-HA distance of 3.5 Ã…
21Restraints and NOE completeness per residue
- NOE completeness ( expected / observed) 100
- on per-residue basis
- Restraints per residue (useful for identifying
regions with possible problems)
22Common selection criteria for NMR structures
- Violations cutoff
- No distance restraint violations gt 0.5 Ã…
- No dihedral angle violations gt 5
- Energy
- Select a sub-ensemble consisting of the lowest
energy structures
23Examples of selected conformers
rmsd3.04
rmsd0.82
rmsd0.77
energy cutoff
violations cutoff
24An example of structure statistics
25Protein structure properties used for validation
- Bond lengths, bond angles, chirality, omega
angles, side chain planarity - Ramachandran plot, rotameric states, packing
quality, backbone conformation - Inter-atomic bumps, buried hydrogen-bonds,
electrostatics
Adapted from Chris Spronk
26Bonded geometry
Distorted C?-chirality
L-amino acid
D-amino acid
27Rotameric states
Eclipsed
Staggered
28Inter-atomic bumps
Overlap of two backbone atoms
29Omega angles
Trans-configuration (omega180)
Cis-configuration (omega0)
30Side chain planarity
Planar Arg side-chain (good)
Non-planar Arg side-chain (bad)
31Internal hydrogen bonding should be satisfied
Internal hydrogen bonding in crambin
32Electrostatics should be reasonable
After energy minimization including electrostatics
Bad electrostatics
33Packing quality
Bad packing
Good packing
34Backbone conformation
Normal
Unique
35Backbone angles should lie in favorable regions
of the Ramachandran plot
Phi and psi angles
Ramachandran plot
36Example of decrease in number of violations
following refinement in explicit solvent
(From Nabuurs et al., 2004)
37Examples of tools available for assessing
structural quality
- AQUA and PROCHECK NMR
- Laskowski et al. (1996) J Biomol NMR 8477
- Useful graphical and text output
- WHAT IF
- http//swift.cmbi.ru.nl/whatif/
- More checks and more critical checks
- QUEEN
- Nabuurs et al. (2003) J Am Chem Soc 12512026
- Check of input constraints
- PSVS
- Bhattacharya et al. (2007) Proteins 66778
- Bundles several tools
- Provides an extensive report
38Information content of distance constraints from
QUEEN
- QUantitative Evaluation of Experimental NMR
constraints QUEEN - Method for evaluating distance constraints from
distance matrices - Quantifies information contained in distance
constraints - Identifies the relative contribution of each
constraint to the structure determination - QUEEN identifies
- Important restraints
- Unique restraints
- Redundant restraints
39Example of a WHAT IF summary report
40Protein Structure Validation Software
(PSVS)Bhattacharya, Tejero, Montelione (2007)
Proteins 66778
41Protein structure validation software suite (PSVS)
Bhattacharya, Tejero, Montelione (2007) Proteins
66778
42Poorly defined regions are excluded from analysis
Bhattacharya, Tejero, Montelione, Proteins
(2007)
43Example of PSVS report
Bhattacharya, Tejero, Montelione, Proteins
(2007)
44Correlation between ProCheck and MolProbity Z
scores
45ProCheck and MolProbity Z scores
Following NMR Structure Refinement
X-ray
NMR
- Why NMR different from X-ray?
- Solution structure
- Multiple conformational states?
- Less accurate structures?
Bhattacharya, Tejero, Montelione, Proteins
(2007)
46RPF quality scores
3D Structure
NOESY Peak List / Assignment List
Global and Local measures of the fit of NOESY
peak list data with 3D structure.
Violations map to the 3D structure and to the
NOESY spectrum
Essentially, a comparison of calc and observed
contact maps
Huang, Powers, Montelione (2005) J. Am. Chem.
Soc. 127 1665
47Validation at the wwPDB
- PDB
- Completeness
- Check of coordinates
- Nomenclature, ligands
- Accept restraints, but pass them directly to BMRB
- BMRB
- Completeness
- Nomenclature and self consistency
- Chemical shift ranges (AVS from Montelione)
- Chemical shift referencing
- Consistency of restraints and structure
48Summary and prospects
- Much of the work in developing approaches to
validating NMR structures has taken place in
Europe Tools and are now available that can avoid
problems if used intelligently - Additional approaches are on the horizon
- Authors should be encouraged to validate their
constraints and structures prior to data
deposition in the wwPDB - Centralized servers could facilitate this
- Authors are strongly encouraged to deposit
restraints, peak lists, NOESY spectra, and raw
data (including time-domain data) to BMRB so that
structures can be checked by others and
recalculated as improved methods become available