Title: Conformational Space
1Conformational Space
2Conformational Space
- Conformation of a molecule specification of the
relative positions of all atoms in 3D-space, - Typical parameterizations
- List of coordinates of atom centers
- List of torsional angles (e.g., the f-y-c for a
protein) - Conformational space Space of all conformations
3Conformational Space
4Conformational Space
5Relation to Robotics/Graphics
Configuration space
6Need for a Metric
- Simulation and sampling techniques can produce
millions of conformations - Which conformations are similar?
- Which ones are close to the folded one?
- Do some conformations form small clusters (e.g.
key intermediates while folding)?
7Metric in Conformational Space
- A metric over conformational space C is a
function d c,c ? C ? d(c,c) ? ??0such
that - d(c,c) 0 ? c c (non-degeneracy)
- d(c,c) d(c,c) (symmetry)
- d(c,c) d(c,c) ? d(c,c) (triangle
inequality)
8But not all metrics are good
- Euclidean metric
- d(c,c) Si1,...,n(fi-fi2 yi-yi2)
9(No Transcript)
10(No Transcript)
11Metric in Conformational Space
- A good metric should measure how well the atoms
in two conformations can be aligned - Usual metrics cRMSD, dRMSD
12RMSD
- Given two sets of n points in ?3 A a1,,an
and B b1,,bn - The RMSD between A and B is RMSD(A,B)
(1/n)Si1,,nai-bi21/2 - where ai-bi denotes the Euclidean distance
between ai and bi in ?3 - RMSD(A,B) 0 iff ai bi for all i
13cRMSD
- Molecule M with n atoms a1,,an
- Two conformations c and c of M
- ai(c) is position of ai when M is at c
- cRMSD(c,c) is the minimized RMSD between the two
sets of atom centers minT(1/n)Si1,,nai(c)
T(ai(c))21/2 - where the minimization is over all possible
rigid-body transform T
14(No Transcript)
15(No Transcript)
16cRMSD
- cRMSD verifies triangle inequality
- cRMSD takes linear time to compute
- Often, cRMSD is restricted to a subset of atoms,
e.g., the Ca atoms on a proteins backbone
17Representation Restricted to Ca Atoms
- The positions of AA residue centers (Ca atoms)
mainly determine the structure of a
protein. - In structural comparison, people
usually work only on the backbone of Ca
atoms, and neglect the other atoms.
Protein 1tph
18- Possible project Design a method for
efficiently finding nearest neighbors in a
sampled conformation space of a protein, using
the cRMSD metric.
19dRMSD
- Molecule M with n atoms a1,,an
- Two conformations c and c of M
- dij(c) n?n symmetrical intra-molecular
distance matrix in M at c - dRMD(c, c) is (1/n(n-1))Si1,,n-1Sji1,,n(d
ij(c) dij(c))21/2 - dij is usually restricted to a subset of atoms,
e.g., the Ca atoms on a proteins backbone
20Intra-Molecular Distance Matrix
Distances between Ca pairs of a protein with 142
residues. Darker squares represent shorter
distances.
21Intra-Molecular Distance Matrix
Distances between Ca pairs of a protein with 142
residues. Darker squares represent shorter
distances.
22Intra-Molecular Distance Matrix
23dRMSD
- Molecule M with n atoms a1,,an
- Two conformations c and c of M
- dij(c) n?n symmetrical intra-molecular
distance matrix in M at c - dRMSD(c, c) (2/n(n-1))Si1,,n-1Sji1,,n(dij
(c) dij(c))21/2 - dij is usually restricted to a subset of atoms,
e.g., the Ca atoms on a proteins backbone
24dRMSD
- Molecule M with n atoms a1,,an
- Two conformations c and c of M
- dij(c) n?n symmetrical intra-molecular
distance matrix in M at c - dRMSD(c, c) (2/n(n-1))Si1,,n-1Sji1,,n(dij
(c) dij(c))21/2 - dij is usually restricted to a subset of atoms,
e.g., the Ca atoms on a proteins backbone - Advantage No aligning transform
- Drawback Takes quadratic time to compute
25Is dRMSD a metric?
- dRMSD(c, c)
- (2/n(n-1))Si1,,n-1Sji1,,n(dij(c)
dij(c))21/2 - is a metric in the n(n-1)/2-dimensional space,
where a conformation c is represented by
dij(c) - But, in this representation, the same point
represents both a conformation and its mirror
image
26k-Nearest-Neighbors Problem
Given a set S of conformations of a protein and a
query conformation c, find the k conformations in
S most similar to c (w.r.t. cRMSD, dRMSD, other
metric)
Can be done in time O(N(log k L)) where
- N size of S- L time
to compare two conformations
27k-Nearest-Neighbors Problem
The total time needed to compute the k nearest
neighbors of every conformation in S is O(N2(log
k L)) Much too long for large datasets where
N ranges from 10,000s to millions!!!
Can be improved by 1. Reducing L 2. More
efficient algorithm (e.g., kd-tree)
28kd-Tree
In a d-dimensional space, where dgt2, range
searching for a point takes O(dn1-1/d)
29k-Nearest-Neighbors Problem
Idea simplify proteins description
30Assume that each conformation is described by
the coordinates of the n Ca atoms
cRMSD ? O(n) time dRMSD ? O(n2) time
31This representation is highly redundant
- Proximity along the chain entails spatial
proximity - Atoms cant bunch up, hence far away atoms along
the chain are on average spatially distant
32? m-Averaged Approximation
- Cut the backbone into fragments of m Ca atoms
- Replace each fragment by the centroid of the m Ca
atoms - ? Simplified cRMSD and dRMSD
3n coordinates
3n/m coordinates
33Evaluation Test SetsLotan and Schwarzer, 2003
- 8 diverse proteins (54 -76 residues)
- Decoy sets of N 10,000 conformations from the
Park-Levitt set Park et al, 1997
Correlation
m cRMSD dRMSD
3 0.99 0.96-0.98
4 0.98-0.99 0.94-0.97
6 0.92-0.99 0.78-0.93
9 0.81-0.98 0.65-0.96
12 0.54-0.92 0.52-0.69
Higher correlation for random sets (? greater
savings)
34Running Times
35Further Reduction for dRMSD
- Stack m-averaged distance matrices as vectors of
a matrix A
36N
A
r
Vector ai of elements of distance matrix of ith
conformation (i 1 to N)
37Further Reduction for dRMSD
- Stack m-averaged distance matrices as vectors of
a matrix A - Compute the SVD A UDVT
38SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
39SVD Decomposition
N
s1 s2 sr
A(rxN)
U(rxr)
VT(rxN)
0
r
0
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
Diagonal matrix
s1 ? s2 ? ... ? sr ? 0 (singular values)
Orthonormal(rotation) matrix
40SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
Vector aj of elements of distance matrix of jth
conformation (j 1 to N)
Diagonal matrix
Matrix withorthonormal rows
Orthonormal(rotation) matrix
41SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
42SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
s1v1 ? s2v2 ...
43SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
p principal components
vpT
44SVD Decomposition
N
A(rxN)
U(rxr)
D(rxr)
VT(rxN)
r
s1 s2 sp
v1T
v2T
p principal components
vpT
0
45Further Reduction for dRMSD
- Stack m-averaged distance matrices as vectors of
a matrix A - Compute the SVD A UDVT
- Project onto p principal components
46Correlation
47Complexity of SVD
- SVD of rxN matrix, where N gt r, takes O(r2N) time
- Here r (n/m)2
- So, time complexity is O(n4N)
- Would be too costly without m-averaging
48Evaluation for 1CTF Decoy SetsLotan and
Schwarzer, 2003
- N 100,000, k 100, 4-averaging, 16 PCs
- 70 correct, with furthest NN off by 20
- Brute-force
84 h - Brute-force m-averaging 4.8 h
- Brute-force m-averaging PC 41 min
- kD-tree m-averaging PC 19 min
- Speedup greater than x200
- 6k approximate NNs contain all true k NNs
- ? Use m-averaging and PC reduction as fast
filters