Conformational Space - PowerPoint PPT Presentation

About This Presentation
Title:

Conformational Space

Description:

Title: CS Biology Author: latombe Last modified by: latombe Created Date: 2/6/2002 10:25:02 PM Document presentation format: On-screen Show Company – PowerPoint PPT presentation

Number of Views:34
Avg rating:3.0/5.0
Slides: 49
Provided by: latombe
Learn more at: http://web.stanford.edu
Category:

less

Transcript and Presenter's Notes

Title: Conformational Space


1
Conformational Space
2
Conformational Space
  • Conformation of a molecule specification of the
    relative positions of all atoms in 3D-space,
  • Typical parameterizations
  • List of coordinates of atom centers
  • List of torsional angles (e.g., the f-y-c for a
    protein)
  • Conformational space Space of all conformations

3
Conformational Space
4
Conformational Space
5
Relation to Robotics/Graphics
Configuration space
6
Need for a Metric
  • Simulation and sampling techniques can produce
    millions of conformations
  • Which conformations are similar?
  • Which ones are close to the folded one?
  • Do some conformations form small clusters (e.g.
    key intermediates while folding)?

7
Metric in Conformational Space
  • A metric over conformational space C is a
    function d c,c ? C ? d(c,c) ? ??0such
    that
  • d(c,c) 0 ? c c (non-degeneracy)
  • d(c,c) d(c,c) (symmetry)
  • d(c,c) d(c,c) ? d(c,c) (triangle
    inequality)

8
But not all metrics are good
  • Euclidean metric
  • d(c,c) Si1,...,n(fi-fi2 yi-yi2)

9
(No Transcript)
10
(No Transcript)
11
Metric in Conformational Space
  • A good metric should measure how well the atoms
    in two conformations can be aligned
  • Usual metrics cRMSD, dRMSD

12
RMSD
  • Given two sets of n points in ?3 A a1,,an
    and B b1,,bn
  • The RMSD between A and B is RMSD(A,B)
    (1/n)Si1,,nai-bi21/2
  • where ai-bi denotes the Euclidean distance
    between ai and bi in ?3
  • RMSD(A,B) 0 iff ai bi for all i

13
cRMSD
  • Molecule M with n atoms a1,,an
  • Two conformations c and c of M
  • ai(c) is position of ai when M is at c
  • cRMSD(c,c) is the minimized RMSD between the two
    sets of atom centers minT(1/n)Si1,,nai(c)
    T(ai(c))21/2
  • where the minimization is over all possible
    rigid-body transform T

14
(No Transcript)
15
(No Transcript)
16
cRMSD
  • cRMSD verifies triangle inequality
  • cRMSD takes linear time to compute
  • Often, cRMSD is restricted to a subset of atoms,
    e.g., the Ca atoms on a proteins backbone

17
Representation Restricted to Ca Atoms
- The positions of AA residue centers (Ca atoms)
mainly determine the structure of a
protein. - In structural comparison, people
usually work only on the backbone of Ca
atoms, and neglect the other atoms.
Protein 1tph
18
  • Possible project Design a method for
    efficiently finding nearest neighbors in a
    sampled conformation space of a protein, using
    the cRMSD metric.

19
dRMSD
  • Molecule M with n atoms a1,,an
  • Two conformations c and c of M
  • dij(c) n?n symmetrical intra-molecular
    distance matrix in M at c
  • dRMD(c, c) is (1/n(n-1))Si1,,n-1Sji1,,n(d
    ij(c) dij(c))21/2
  • dij is usually restricted to a subset of atoms,
    e.g., the Ca atoms on a proteins backbone

20
Intra-Molecular Distance Matrix
Distances between Ca pairs of a protein with 142
residues. Darker squares represent shorter
distances.
21
Intra-Molecular Distance Matrix
Distances between Ca pairs of a protein with 142
residues. Darker squares represent shorter
distances.
22
Intra-Molecular Distance Matrix
23
dRMSD
  • Molecule M with n atoms a1,,an
  • Two conformations c and c of M
  • dij(c) n?n symmetrical intra-molecular
    distance matrix in M at c
  • dRMSD(c, c) (2/n(n-1))Si1,,n-1Sji1,,n(dij
    (c) dij(c))21/2
  • dij is usually restricted to a subset of atoms,
    e.g., the Ca atoms on a proteins backbone

24
dRMSD
  • Molecule M with n atoms a1,,an
  • Two conformations c and c of M
  • dij(c) n?n symmetrical intra-molecular
    distance matrix in M at c
  • dRMSD(c, c) (2/n(n-1))Si1,,n-1Sji1,,n(dij
    (c) dij(c))21/2
  • dij is usually restricted to a subset of atoms,
    e.g., the Ca atoms on a proteins backbone
  • Advantage No aligning transform
  • Drawback Takes quadratic time to compute

25
Is dRMSD a metric?
  • dRMSD(c, c)
  • (2/n(n-1))Si1,,n-1Sji1,,n(dij(c)
    dij(c))21/2
  • is a metric in the n(n-1)/2-dimensional space,
    where a conformation c is represented by
    dij(c)
  • But, in this representation, the same point
    represents both a conformation and its mirror
    image

26
k-Nearest-Neighbors Problem
Given a set S of conformations of a protein and a
query conformation c, find the k conformations in
S most similar to c (w.r.t. cRMSD, dRMSD, other
metric)
Can be done in time O(N(log k L)) where
- N size of S- L time
to compare two conformations
27
k-Nearest-Neighbors Problem
The total time needed to compute the k nearest
neighbors of every conformation in S is O(N2(log
k L)) Much too long for large datasets where
N ranges from 10,000s to millions!!!
Can be improved by 1. Reducing L 2. More
efficient algorithm (e.g., kd-tree)
28
kd-Tree
In a d-dimensional space, where dgt2, range
searching for a point takes O(dn1-1/d)
29
k-Nearest-Neighbors Problem
Idea simplify proteins description
30
Assume that each conformation is described by
the coordinates of the n Ca atoms
cRMSD ? O(n) time dRMSD ? O(n2) time
31
This representation is highly redundant
  • Proximity along the chain entails spatial
    proximity
  • Atoms cant bunch up, hence far away atoms along
    the chain are on average spatially distant

32
? m-Averaged Approximation
  • Cut the backbone into fragments of m Ca atoms
  • Replace each fragment by the centroid of the m Ca
    atoms
  • ? Simplified cRMSD and dRMSD

3n coordinates
3n/m coordinates
33
Evaluation Test SetsLotan and Schwarzer, 2003
  • 8 diverse proteins (54 -76 residues)
  • Decoy sets of N 10,000 conformations from the
    Park-Levitt set Park et al, 1997

Correlation
m cRMSD dRMSD
3 0.99 0.96-0.98
4 0.98-0.99 0.94-0.97
6 0.92-0.99 0.78-0.93
9 0.81-0.98 0.65-0.96
12 0.54-0.92 0.52-0.69
Higher correlation for random sets (? greater
savings)
34
Running Times
35
Further Reduction for dRMSD
  1. Stack m-averaged distance matrices as vectors of
    a matrix A

36
N
A
r
Vector ai of elements of distance matrix of ith
conformation (i 1 to N)
37
Further Reduction for dRMSD
  • Stack m-averaged distance matrices as vectors of
    a matrix A
  • Compute the SVD A UDVT

38
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
39
SVD Decomposition
N
s1 s2 sr
A(rxN)
U(rxr)
VT(rxN)
0

r
0
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
Diagonal matrix
s1 ? s2 ? ... ? sr ? 0 (singular values)
Orthonormal(rotation) matrix
40
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
Diagonal matrix
Matrix withorthonormal rows
Orthonormal(rotation) matrix
41
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
42
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
s1v1 ? s2v2 ...
43
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
p principal components
vpT
44
SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)

r
s1 s2 sp
v1T
v2T
p principal components
vpT
0
45
Further Reduction for dRMSD
  1. Stack m-averaged distance matrices as vectors of
    a matrix A
  2. Compute the SVD A UDVT
  3. Project onto p principal components

46
Correlation
47
Complexity of SVD
  • SVD of rxN matrix, where N gt r, takes O(r2N) time
  • Here r (n/m)2
  • So, time complexity is O(n4N)
  • Would be too costly without m-averaging

48
Evaluation for 1CTF Decoy SetsLotan and
Schwarzer, 2003
  • N 100,000, k 100, 4-averaging, 16 PCs
  • 70 correct, with furthest NN off by 20
  • Brute-force
    84 h
  • Brute-force m-averaging 4.8 h
  • Brute-force m-averaging PC 41 min
  • kD-tree m-averaging PC 19 min
  • Speedup greater than x200
  • 6k approximate NNs contain all true k NNs
  • ? Use m-averaging and PC reduction as fast
    filters
Write a Comment
User Comments (0)
About PowerShow.com