Title: NMR Structure Refinement
1A Knowledge-Based, Structural Bio-informatics
Approach to NMR Structure Refinement
Zhijun Wu Department of Mathematics Program on
Bioinformatics and Computational Biology Iowa
State University Ames, Iowa, USA
2Fundamental Problem in NMR Structure Determination
Given a set of distance constraints obtained from
NMR, find the coordinates of the atoms (and hence
the structure of the protein) satisfying the
distance constraints.
3Distance Geometry Problem
Given n atoms a1, , an and a set of distances
di,j between ai and aj
Blumenthal 1953, Torgerson 1958, Crippen and
Havel 1988
4Solutions to Various Distance Geometry Problems
problems with all distances solvable in O(n3)
using SVD
problems with all distances solvable in O(n)
using GBU
problems with sparse sets of distances NP-compl
ete (Saxe 1979)
Crippen and Havel 1988
Dong and Wu 2002
problems with distance ranges (NMR
data) NP-complete (More and Wu 1997), if the
ranges are small
5Remarks
- If the distance data is adequate and accurate,
the problem can be solved efficiently otherwise,
the solution may either be indefinite or
intractable.
- If the distance ranges are all that can be
specified, the solution will not be unique, if
not intractable.
- The distance constraints obtained from NMR cover
only a subset of all inter-atomic distances, and
are given in distance ranges.
- The structures determined by NMR are not as
detailed and accurate as those by X-ray
crystallography.
- The structures determined by NMR are not as
fixed as those by X-ray crystallography.
- An ensemble of structures can be determined for
a protein based on the NMR distance data --
structural flexibilities or modeling errors?
6Ensemble of Structures
Representative Structure
7Issues in NMR Structure Refinement
- Inadequate Distance Constraints
- Calculation of Structures
- Interpretation of Structural Variations
8Deriving Additional Conformational Constraints
Dipolar Coupling (Tjandra and Bax 1997, Clore and
Gronenborn 1998) Dihedral Angles in Structural
Databases (Kuszewski, Gronenborn, and Clore
1996) Distances in Structural Databases (Wall,
Subramaniam, and Phillips, Jr. 1999)
9Inter-Atomic Distances in Structural Databases
R H O
R H O R
Ca N C
Ca N C Ca
N C Ca
N C Ca N C
H O R
H O R H O
10Cross-Residue, Inter-Atomic Distances
Distances 1st Atom, 2nd Atom, 1st Residue,
2nd Residue, Separation Example N
C ALA ALA
0 Choices 5 5
20 20
2 Total 20,000 distances Ca,
Cß, C, N, O, 20 residues, separated by 0 or 1
residue
11Distributions of the Distances
PA1, A2, R1, R2, S (D)
A1 N, A2 C, R1 ALA, R2 ALA, S 0
in Databases of Known Protein Structures
12Distributions of the Distances
PA1, A2, R1, R2, S (D) Probability
Distribution of Cross-Residue Inter-Atomic
Distances in Databases of Known Protein
Structures A1 1st Atom, A2 2nd Atom, R1
1st Residue, R2 2nd Residue, S
Separation For any D in Di, Di1, PA1, A2,
R1, R2, S (D) Distances in Di, Di1 /
Total Distances
in Databases of Known Protein Structures
13Database Profiles of Distributions of Distances
2150 X-ray structures with resolution of 2 Å or
higher and sequence similarity of 90 or less
from PDB Data Bank are utilized.
The probability distributions of 20,000
short-range, cross-residue, inter-atomic
distances (S1/0) are profiled in databases.
14Inter-Atomic Distances in NMR Structures
A survey on 462 NMR structures shows that the
cross-residue, inter-atomic distances in the
structures deviate significantly (more than two
standard deviations) from their means evaluated
with their distributions in structural databases.
15Deviations of Distances in NMR-Determined
Structures
Sample atomic pairs (A1 and A2) across some of
the residues (R1 and R2) in NMR structure 2GB1
with distances (D) deviating more than twice
their standard deviations (STD) from their
average distributions (Mean) in known protein
structures.
16NMR Structure Refinement with Database Derived
Distance Constraints
Mean 2STD are used as lower and upper bounds
for distances (among Ca, Cß, C, N, O with S1/0).
A set of structures, 1EPH, 1GB1, 1IGL, 2IGG,
2SOB, 1CEY, 1CRP, 1E8L, 1ITL, 1PFL, are tested
(experimental data from BioMagResBank are
utilized).
The dynamic simulated annealing protocol in CNS
is used for the refinement.
17Incorrectly Formed Cross-Residue Inter-Atomic
Distances
Numbers of incorrectly formed cross-residue,
inter-atomic distances versus numbers of affected
residue pairs for structures refined with and
without database distance constraints (DDD).
18Acceptance Rates of the Refined NMR Structures
The acceptance rates for two ensembles of NMR
structures, 1E8L on the left and 1IGL on the
right, refined with (green line) and without
(blue line) using database distance constraints.
19RMSD Values of the Ensembles of Refined NMR
Structures
The means and standard deviations of the RMSD
values of the structure ensembles refined with
and without database distance constraints.
20Refined NMR Structures Compared with Their X-ray
Structures
The means and standard deviations of the RMSD
values for the ensembles of NMR structures
compared with their X-ray structures.
212IGG Immunoglobulin G binding domain of protein
G, a cell-surface protein from pathogenic
bacteria Streptococcus strain G148
2IGG
Number of residues 64 Number of atoms 973
Number of NOE distance constraints 445, Number
of angle constraints 31? and 9?1
1FCC
Lian, Derrick, Sutcliffe, Yang, and Roberts, 1992
22NMR and X-ray Crystal Structures of 2IGG
The NMR structures are refined with (green line)
and without (red line) using additional database
distance constraints. They are compared against
the structure determined by X-ray crystallography
(blue line).
23Residue-Residue Comparisons between Refined NMR
and X-ray Structures
The residue RMSD values for an accepted structure
(left) and an averaged and energy minimized
structure (right) of 2IGG refined with (red line)
and without database distance constraints (blue
line).
241FO7 - E200K variant of human prion protein,
related to the study of spongiform
encephalopathies.
1FO7
Number of residues107 Number of atoms 1733
Number of NOE distance constraints 516,
J-coupling constraints 44
1FKC
Zhang, Swietnicki, Zagorski, Surewicz,
Sönnichsen 2000
25Tertiary Structure of Prion Protein
http//content.karger.com/ProdukteDB/Katalogteile/
isbn3_8055/_76/_56/prions_01.pdf
26Ramachandran Plot of Unimproved Structure
85 in most favorable regions
27Ramachandran Plot of Improved Structure
90 in most favorable regions
28Superimposed Structures of Loop Regions
29Remarks
- Distance constraints are derived based on the
distance distributions in databases of known
protein structures. - NMR structures can be improved when the
distances are confined to the most probable
ranges according to their distributions. - The constraints can be derived similarly for
longer-range distances and to include other types
of atoms as well. - More efficient distance geometry algorithms can
be developed for structure calculations. - The structural fluctuations should be
distinguished from the structural variations that
originate from the modeling errors.
30Acknowledgements
- Feng Cui, Program on Bioinformatics and
Computational Biology, Iowa State University - Robert Jernigan, Lawrence Baker Center for
Bioinformatics and Biological Statistics, Iowa
State University - Kriti Mukhopadhyay, Department of Mathematics,
Iowa State University - Di Wu, Program on Bioinformatics and
Computational Biology, Iowa State University - Wonbin Young, Newlink Genetics, Inc., Iowa Sate
University Research Park - F. Cui, R. Jernigan, Z. Wu, J. Bioinformatics and
Computational Biology (2005), published - F. Cui, K. Mukhopadhyay, W. Young, R. Jernigan,
Z. Wu, J. Applied Bioinformatics (2006), under
review - D. Wu, F. Cui, R. Jernigan, Z. Wu, Nucleic Acids
Research (2006), to submit