Introduction to Computational Structural Biology - PowerPoint PPT Presentation

1 / 55
About This Presentation
Title:

Introduction to Computational Structural Biology

Description:

National Institute of Applied Sciences - Rouen. Mont Saint Aignan Cedex, France ... Given a protein or DNA molecule, what is the geometric structure of the molecule? ... – PowerPoint PPT presentation

Number of Views:68
Avg rating:3.0/5.0
Slides: 56
Provided by: Zhij
Category:

less

Transcript and Presenter's Notes

Title: Introduction to Computational Structural Biology


1
Mathematical / Computational Problems for Protein
Structure Prediction and Determination
Zhijun Wu Department of Mathematics Graduate
Program on Bioinformatics and Computational
Biology Iowa State University June 14,
2001 National Institute of Applied Sciences -
Rouen Mont Saint Aignan Cedex, France
2
Some Fundamental Biological Questions
  • Question 1
  • Given a protein or DNA molecule, what is the
    geometric structure of the molecule?
  • Question 2
  • Why and how protein folds to a unique
    three-dimensional structure?
  • Question 3
  • Given a set of distances between pairs of atoms,
    how can we determine the coordinates of the
    atoms?
  • Question 4
  • Given the magnitudes of the structure factors of
    a protein, how can we determine the phases of the
    structure factors?
  • Question 5
  • Given two proteins, how can we compare their
    geometric structures?
  • Question 6

Watson Crick, 1962
Protein folding
Anfinsen, 1972
NMR
Karle Hauptman, 1985
X-ray Crystallography
Mathematical Answers?
3
Biological Building Blocks DNA, RNA, Protein
DNA
GAA GTT GAA AAT CAG GCG AAC CCA CGA CTG
RNA
GAA GUU GAA AAU CAG GCG AAC CCA CGA CUG
Protein
GLU GAL GLU ASN GLN ALA ASN PRO ARG LEU
4
Adenine
5
Cytosine
6
Guanine
7
Thymine
8
Uracil
9
(Second Base)
C
U
A
G
U C A G
U
U C A G
C
(First Base)
(Third Base)
U C A G
A
U C A G
G
10
? Alanine (Ala)
Arginine (Arg) ?
11
? Asparagine (Asn)
Aspartate (Asp) ?
12
? Cysteine (Cys)
Glutamate (Glu) ?
13
? Glycine (Gly)
Glutamine (Gln) ?
14
? Histidine (His)
Isoleucine (Ile) ?
15
? Leucine (Leu)
Lysine (Lys) ?
16
? Methionine (Met)
Phenylalanine (Phe) ?
17
? Proline (Pro)
Serine (Ser) ?
18
? Threonine (Thr)
Tryptophan (Trp) ?
19
? Tyrosine (Tyr)
Valine (Val) ?
20
Protein Folding
LEU
ARG
ASN
PRO
ALA
ASN
GLN
GLU
GLU
VAL
GLU
GLU
ASN
VAL
LEU
ARG
PRO
ASN
ALA
GLN
. . .
21
HIV Retrotranscriptase
554 amino acids
4200 atoms
22
HIV Retrotranscriptase
4200 atoms
554 amino acids
23
Structure Prediction and Determination
Molecular Dynamics Simulation
Potential Energy Minimization
Nuclear Magnetic Resonance
X-ray Crystallography
24
Molecular Dynamics Simulation
Physical Model
Mathematical Model
25
Molecular Dynamics Simulation
Numerical Solution
Computer Simulation
26
Simulation of Folding -- An Initial Value Problem
Time Step femtoseconds, Folding seconds or
longer
27
Molecular Dynamics Simulation Software
  • CHARMM, K. Karplus et al, Harvard University
  • AMBER, P. Kollman et al, UC San Francisco
  • XPLOR, A. Brüger et al, Yale University
  • GROMOS, W. van Gunsteren, et al, ETH Zürich

28
Potential Energy Minimization
minimization of potential energy in conformation
space
local / global optimization nonlinear,
unconstrained, continuous
example
Lennard-Jones
29
Protein Energy Function
30
Multi-Start Search
31
Simulated Annealing
High Temperature
Algorithm input initial x0 y0 f (x0) set x
x0 y y0 for T T0, T1, , Tm
(decreasing) for k 1, , n x1
perturb (x0) y1 f (x1)
dy y1 y0 e exp (- dy / T)
if (rand lt e) x0 x1 y0
y1 end update x, y
end end
Low Temperature
32
Global Smoothing and Continuation

33
Gaussian Transformation

Scheraga, et al, 1989, 1992 Shalloway,
1992 Straub, et al, 1996 Wu, 1996 Moré and Wu,
1997
34
NMR Distance-Based Structural Modeling
Bond Lengths / Angles
NMR Distance Data
Structure
35
Molecular Distance Geometry Problem
Given distances between certain pairs of atoms in
the molecule, find the coordinates of the atoms.
36
Graph Embedding
Given a weighted graph G (V, E, W), where Vvi
i1,,n, E(vi,vj) (i, j) in S, and
Wwi,jw (vi,vj) (vi,vj) in E,
v2
x2
5
3
5
3
v1
v3
x1
x3
4
4
37
Under-Determined System
kn -- total number of coordinates k(k1)/2 --
k translations, k(k-1)/2 rotations
38
Over-Determined System
39
Inconsistent Data
v2
x2
8
3
8
3
x3
v1
v3
x1
x3
4
4
Triangular inequality, c a b, may be
violated!
40
Flexible Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structure can be deformed continuously
without violating any distance constraints.
41
Rigid Structures
v2
v4
x2
x4
v1
v3
x1
x3
v2
v4
x2
x4
x2
x4
v1
v3
x1
x3
The structures cannot be deformed any more!
42
Reflections
x2
4
x4
rigid unique?
5
3
3
d
v2
x1
x3
4
3
5
4
v4
3
d
v1
v3
4
x2
x4
4
5
3
3
d
x1
x3
4
Hendrickson 1991
43
Algorithms and Complexity
When all distances are available
can be solved in P
When only a subset of distances is available
NP-complete
44
e-Optimal Solutions
45
It is NP-hard to obtain an e-approximate
solution to the distance geometry problem when e
lt 1/2n, where n is the number of the atoms.
-- More
and Wu 1996
46
Least-Squares Formulation
47
Inexact Distance Data
48
X-ray Crystallography

X-ray beam
Protein crystal
X-ray diffraction
49
Electron Density Distribution
50
Magnitudes, Phases, Diffraction Intensities
51
The Phase Problem

Given the magnitudes of the structure
factors, find correct phases that define the
electron density distribution function of the
crystal.
52
Direct Methods
Nobel Prize 1985
  • Karle and Hauptman (1950s)
  • nonlinear least squares
  • joint probability distribution
  • successful for small molecules
  • Bricogne (1984, 1988, 1993, 1997)
  • Bayesian statistical approach
  • statistical mechanics / information theory
  • apply to macromolecules

53
Entropy Maximization for Statistical Phase
Estimation

54
The Dual Problem

Fast Newton , O (n log n) (Wu, Phillips, Tapia,
and Zhang, SIAM Review 2001)
55
  • Introduction
  • DNA, RNA, Protein
  • Structure Prediction and Determination
  • Molecular Dynamics Simulation
  • Physical and Mathematical Models
  • Numerical Simulation
  • Simulation of Protein Folding
  • Potential Energy Minimization
  • Potential Energy Functions
  • Global Minimization
  • Multi-Start, SA, Global Smoothing
  • NMR Distance-Based Modeling
  • Molecular Distance Geometry Problem
  • Geometric Properties
  • Least-Squares Formulation
  • X-ray Crystallography Computing
  • Phase Problem
  • Entropy Maximization for Phase Estimation
  • Summary

Summary
Write a Comment
User Comments (0)
About PowerShow.com