Title: Introduction to Computational Structural Biology
1Mathematical / Computational Problems for Protein
Structure Prediction and Determination
Zhijun Wu Department of Mathematics Graduate
Program on Bioinformatics and Computational
Biology Iowa State University June 14,
2001 National Institute of Applied Sciences -
Rouen Mont Saint Aignan Cedex, France
2Some Fundamental Biological Questions
- Question 1
- Given a protein or DNA molecule, what is the
geometric structure of the molecule? - Question 2
- Why and how protein folds to a unique
three-dimensional structure? - Question 3
- Given a set of distances between pairs of atoms,
how can we determine the coordinates of the
atoms? - Question 4
- Given the magnitudes of the structure factors of
a protein, how can we determine the phases of the
structure factors? - Question 5
- Given two proteins, how can we compare their
geometric structures? - Question 6
-
Watson Crick, 1962
Protein folding
Anfinsen, 1972
NMR
Karle Hauptman, 1985
X-ray Crystallography
Mathematical Answers?
3Biological Building Blocks DNA, RNA, Protein
DNA
GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG
RNA
GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG
Protein
GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU
4Adenine
5Cytosine
6Guanine
7Thymine
8Uracil
9(Second Base)
C
U
A
G
U C A G
U
U C A G
C
(First Base)
(Third Base)
U C A G
A
U C A G
G
10? Alanine (Ala)
Arginine (Arg) ?
11? Asparagine (Asn)
Aspartate (Asp) ?
12? Cysteine (Cys)
Glutamate (Glu) ?
13? Glycine (Gly)
Glutamine (Gln) ?
14? Histidine (His)
Isoleucine (Ile) ?
15? Leucine (Leu)
Lysine (Lys) ?
16? Methionine (Met)
Phenylalanine (Phe) ?
17? Proline (Pro)
Serine (Ser) ?
18? Threonine (Thr)
Tryptophan (Trp) ?
19? Tyrosine (Tyr)
Valine (Val) ?
20Protein Folding
LEU
ARG
ASN
PRO
ALA
ASN
GLN
GLU
GLU
VAL
GLU
GLU
ASN
VAL
LEU
ARG
PRO
ASN
ALA
GLN
. . .
21HIV Retrotranscriptase
554 amino acids
4200 atoms
22HIV Retrotranscriptase
4200 atoms
554 amino acids
23Structure Prediction and Determination
Molecular Dynamics Simulation
Potential Energy Minimization
Nuclear Magnetic Resonance
X-ray Crystallography
24Molecular Dynamics Simulation
Physical Model
Mathematical Model
25Molecular Dynamics Simulation
Numerical Solution
Computer Simulation
26Simulation of Folding -- An Initial Value Problem
Time Step femtoseconds, Folding seconds or
longer
27Molecular Dynamics Simulation Software
- CHARMM, K. Karplus et al, Harvard University
- AMBER, P. Kollman et al, UC San Francisco
- XPLOR, A. Brüger et al, Yale University
- GROMOS, W. van Gunsteren, et al, ETH Zürich
28Potential Energy Minimization
minimization of potential energy in conformation
space
local / global optimization nonlinear,
unconstrained, continuous
example
Lennard-Jones
29Protein Energy Function
30Multi-Start Search
31Simulated Annealing
High Temperature
Algorithm input initial x0 y0 f (x0) set x
x0 y y0 for T T0, T1, , Tm
(decreasing) for k 1, , n x1
perturb (x0) y1 f (x1)
dy y1 y0 e exp (- dy / T)
if (rand lt e) x0 x1 y0
y1 end update x, y
end end
Low Temperature
32Global Smoothing and Continuation
33Gaussian Transformation
Scheraga, et al, 1989, 1992 Shalloway,
1992 Straub, et al, 1996 Wu, 1996 Moré and Wu,
1997
34NMR Distance-Based Structural Modeling
Bond Lengths / Angles
NMR Distance Data
Structure
35Molecular Distance Geometry Problem
Given distances between certain pairs of atoms in
the molecule, find the coordinates of the atoms.
36Graph Embedding
Given a weighted graph G (V, E, W), where Vvi
i1,,n, E(vi,vj) (i, j) in S, and
Wwi,jw (vi,vj) (vi,vj) in E,
v2
x2
5
3
5
3
v1
v3
x1
x3
4
4
37Under-Determined System
kn -- total number of coordinates k(k1)/2 --
k translations, k(k-1)/2 rotations
38Over-Determined System
39Inconsistent Data
v2
x2
8
3
8
3
x3
v1
v3
x1
x3
4
4
Triangular inequality, c a b, may be
violated!
40Flexible Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structure can be deformed continuously
without violating any distance constraints.
41Rigid Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structures cannot be deformed any more!
42Reflections
x2
4
x4
rigid unique?
5
3
3
d
v2
x1
x3
4
3
5
4
v4
3
d
v1
v3
4
x2
x4
4
5
3
3
d
x1
x3
4
Hendrickson 1991
43Algorithms and Complexity
When all distances are available
can be solved in P
When only a subset of distances is available
NP-complete
44e-Optimal Solutions
45It is NP-hard to obtain an e-approximate
solution to the distance geometry problem when e
lt 1/2n, where n is the number of the atoms.
-- More
and Wu 1996
46Least-Squares Formulation
47Inexact Distance Data
48X-ray Crystallography
X-ray beam
Protein crystal
X-ray diffraction
49Electron Density Distribution
50Magnitudes, Phases, Diffraction Intensities
51The Phase Problem
Given the magnitudes of the structure
factors, find correct phases that define the
electron density distribution function of the
crystal.
52Direct Methods
Nobel Prize 1985
- Karle and Hauptman (1950s)
- nonlinear least squares
- joint probability distribution
- successful for small molecules
- Bricogne (1984, 1988, 1993, 1997)
- Bayesian statistical approach
- statistical mechanics / information theory
- apply to macromolecules
53Entropy Maximization for Statistical Phase
Estimation
54The Dual Problem
Fast Newton , O (n log n) (Wu, Phillips, Tapia,
and Zhang, SIAM Review 2001)
55- Introduction
- DNA, RNA, Protein
- Structure Prediction and Determination
- Molecular Dynamics Simulation
- Physical and Mathematical Models
- Numerical Simulation
- Simulation of Protein Folding
- Potential Energy Minimization
- Potential Energy Functions
- Global Minimization
- Multi-Start, SA, Global Smoothing
- NMR Distance-Based Modeling
- Molecular Distance Geometry Problem
- Geometric Properties
- Least-Squares Formulation
- X-ray Crystallography Computing
- Phase Problem
- Entropy Maximization for Phase Estimation
- Summary
Summary