Title: Macromolecular Modeling and Simulation: Problems, Approaches, Challenges
1Macromolecular Modeling and Simulation Problems,
Approaches, Challenges
Zhijun Wu Department of Mathematics Graduate
Program on Bioinformatics and Computational
Biology Iowa State University May 8,
2001 Laboratory of Science and Engineering
Computing Chinese Academy of Sciences
2- Introduction
- DNA, RNA, Protein
- Structure Prediction and Determination
- Theoretical and Experimental Approaches
- Molecular Dynamics Simulation
- Physical and Mathematical Models
- Numerical Simulation
- Simulation of Folding / Misfolding
- Potential Energy Minimization
- Potential Energy Functions
- Global Minimization
- Global Smoothing and Continuation
- NMR Distance-Based Modeling
- Molecular Distance Geometry Problem
- Geometric Properties
- Least-Squares Formulation
- X-ray Crystallography Computing
- Phase Problem
- Entropy Maximization for Phase Estimation
Outline of the Talk
3Biological Building Blocks DNA, RNA, Protein
DNA
GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG
RNA
GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG
Protein
GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU
4Protein Folding
LEU
ARG
ASN
PRO
ALA
ASN
GLN
GLU
GLU
VAL
GLU
GLU
ASN
VAL
LEU
ARG
PRO
ASN
ALA
GLN
. . .
5HIV Retrotranscriptase
554 amino acids
4200 atoms
6HIV Retrotranscriptase
4200 atoms
554 amino acids
7Structure Prediction and Determination
Molecular Dynamics Simulation
Potential Energy Minimization
Nuclear Magnetic Resonance
X-ray Crystallography
8 Molecular Dynamics Simulation
- The step size has to be small in
- femtosecond to achieve accuracy.
- Current computing technology
- can make only picoseconds to
- nanoseconds of simulation,
- while protein folding may take
- seconds or even longer time.
- Molecular dynamics simulation
- has been used successfully for
- the study of other types of
- dynamical behavior of protein.
Folding can be simulated by following the
movement of the atoms in protein according to
Newtons second law of motion.
9 Potential Energy Minimization
- A reasonably accurate potential
- energy function needs to be
- constructed.
- Given such a function, a local
- minimizer is easy to find, but
- a global one is hard, especially
- if the function has many local
- minimizers. No completely
- satisfactory algorithm has been
- developed yet for minimizing
- proteins.
Hypothesis Protein native structure has the
lowest or almost lowest potential energy. It can
therefore be located at the global energy minimum
of protein.
- Potential energy minimization
- has been used successfully for
- structure refinement though.
10 NMR Structure Determination
- 15 of the structures in PDB
- Data Bank were determined by
- using NMR spectroscopy.
The NMR approach is based on the fact that nuclei
spin and generate magnetic fields. When two
nuclei are close their spins interact. The
intensity of the interaction depends on the
distance between the nuclei. Therefore, the
distances between certain pairs of atoms can be
estimated by measuring the intensities of the
nuclei spin-spin couplings.
- Not all distances between pairs of
- atoms can be detected. In
- practice, only lower and upper
- bounds for the distances can be
- obtained also.
- Structure can be determined by
- solving a distance geometry
- problem with the distance data
- from the NMR experiments.
The distance data obtained from the NMR
experiment can be used to deduce the structural
information for the molecule. One way of
achieving such a goal is based on molecular
distance geometry.
11 X-ray Crystallography Computing
- 80 of the structures in PDB
- Data Bank were determined by
- using X-ray crystallography.
- The process is time consuming,
- and some proteins cannot even
- be crystallized.
In X-ray crystallography, protein first needs to
be purified and crystallized, which may take
months or years to complete, if not failed.
- A mathematical problem, called
- the phase problem, needs to be
- solved before every crystal
- structure can be fully determined
- from the diffraction data.
After that, the protein crystal is put into an
X-ray equipment to make an X-ray diffraction
image. The diffraction image can be used to
determine the three-dimensional structure of the
protein.
12Molecular Dynamics Simulation
Physical Model
Mathematical Model
13Molecular Dynamics Simulation
Numerical Solution
Computer Simulation
14Simulation of Folding -- An Initial Value Problem
Time Step femtoseconds, Folding seconds or
longer
15Simulation of Misfolding -- A Boundary Value
Problem
Ron Elber 1996 Stochastic Path Integration /
Parallel Multiple Shooting
16Potential Energy Minimization
minimization of potential energy in conformation
space
local / global optimization nonlinear,
unconstrained, continuous
example
Lennard-Jones
17Protein Energy Function
18Global Smoothing and Continuation
19Gaussian Transformation
Scheraga, et al, 1989, 1992 Shalloway,
1992 Straub, et al, 1996 Wu, 1996 Moré and Wu,
1997
20Statistical Averaging
21Geometric Smoothing
22Some Simple Transformation
23High-Dimensional Transformation
24Transformation of Potential Functions
25NMR Distance-Based Structural Modeling
Bond Lengths / Angles
NMR Distance Data
Structure
26Molecular Distance Geometry Problem
Given distances between certain pairs of atoms in
the molecule, find the coordinates of the atoms.
27Graph Embedding
Given a weighted graph G (V, E, W), where Vvi
i1,,n, E(vi,vj) (i, j) in S, and
Wwi,jw (vi,vj) (vi,vj) in E,
v2
x2
5
3
5
3
v1
v3
x1
x3
4
4
28Under-Determined System
kn -- total number of coordinates k(k1)/2 --
k translations, k(k-1)/2 rotations
29Over-Determined System
30Inconsistent Data
v2
x2
8
3
8
3
x3
v1
v3
x1
x3
4
4
Triangular inequality, c a b, may be
violated!
31Flexible Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structure can be deformed continuously
without violating any distance constraints.
32Rigid Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structures cannot be deformed any more!
33Reflections
x2
4
x4
rigid unique?
5
3
3
d
v2
x1
x3
4
3
5
4
v4
3
d
v1
v3
4
x2
x4
4
5
3
3
d
x1
x3
4
Hendrickson 1991
34Algorithms and Complexity
When all distances are available
can be solved in P
When only a subset of distances is available
NP-complete
35If all distances are given, the problem can be
solved in polynomial time.
36xi
xi2
xi1
If for every i, all distances between atoms i,
i1, i2, i3 are given, a solution can be found
in polynomial time.
xi3
37xi
xk
xj
xl
Let atoms i, j, k be three atoms not in the same
line. Then xl can be determined for any l ? i, j,
k in constant time, if all distances between atom
l and atoms i, j, k are given.
38x3
x4
x2
x5
x6
x1
In general, for an arbitrary S, the problem is
NP-hard (Saxe 1979).
x8
x7
x5
x4
x3
x2
x1
x8
x7
x6
39e-Optimal Solutions
40It is NP-hard to obtain an e-approximate
solution to the distance geometry problem when e
lt 1 / 2n, where n is the number of the atoms.
-- More
and Wu 1996
41Least-Squares Formulation
42Inexact Distance Data
43X-ray Crystallography
X-ray beam
Protein crystal
X-ray diffraction
44Electron Density Distribution
45Magnitudes, Phases, Diffraction Intensities
46The Phase Problem
Given the magnitudes of the structure
factors, find correct phases that define the
electron density distribution function of the
crystal system.
47Direct Methods
Nobel Prize 1985
- Karle and Hauptman (1950s)
- nonlinear least squares
- joint probability distribution
- successful for small molecules
- Bricogne (1984, 1988, 1993, 1997)
- Bayesian statistical approach
- statistical mechanics / information theory
- apply to macromolecules
48Entropy Maximization for Statistical Phase
Estimation
49The Lagrangian
50The Dual Problem
51The Dual Problem
52Newtons Method
53A Fast Newtons Method (Wu, Phillips, Tapia,
Zhang 2001)
Sherman-Morrison-Woodbury Formula
54Newton Step
55- The Fast Newtons algorithm converges to the
solution to the entropy maximization problem
quadratically and in each iteration, requires
only O(n log n) floating point operations.
56- Introduction
- DNA, RNA, Protein
- Structure Prediction and Determination
- Theoretical and Experimental Approaches
- Molecular Dynamics Simulation
- Physical and Mathematical Models
- Numerical Simulation
- Simulation of Folding / Misfolding
- Potential Energy Minimization
- Potential Energy Functions
- Global Minimization
- Global Smoothing and Continuation
- NMR Distance-Based Modeling
- Molecular Distance Geometry Problem
- Geometric Properties
- Least-Squares Formulation
- X-ray Crystallography Computing
- Phase Problem
- Entropy Maximization for Phase Estimation
Summary