Title: Protein Structure Alignment
1Protein Structure Alignment
Human Hemoglobin alpha-chain pdb1jebA
Human Myoglobin pdb2mm1
Another example G-Proteins 1c1yA,
1kk1A6-200 Sequence id 18 Structural id 72
2Transformations
- Translation
- Translation and Rotation
- Rigid Motion (Euclidian Trans.)
- Translation, Rotation Scaling
-
3Inexact Alignment. Simple case two closely
related proteins with the same number of
amino acids.
Question how to measure an alignment error?
4Distance Functions
- Two point sets Aai i1n
- Bbj j1m
- Pairwise Correspondence
- (ak1,bt1) (ak2,bt2) (akN,btN)
(1) Exact Matching aki bti0
(2) Bottleneck max aki bti (3) RMSD
(Root Mean Square Distance) Sqrt(
Saki bti2/N)
5Superposition - best least squares(RMSD Root
Mean Square Deviation)
Given two sets of 3-D points Ppi, Qqi ,
i1,,n rmsd(P,Q) v S ipi - qi 2 /n Find a
3-D rigid transformation T such that rmsd(
T(P), Q ) minT v S iT(pi) - qi 2 /n
A closed form solution exists for this task. It
can be computed in O(n) time.
6Correspondence is Unknown
Given two configurations of points in the
three dimensional space,
find those rotations and translations of one
of the point sets which produce large
superimpositions of corresponding 3-D
points.
7A 3-D reference frame can be uniquely defined by
the ordered vertices of a non-degenerate triangle
p1
p2
p3
8Sequence Based Structure Alignment
- Run pairwise sequence alignment.
- Based on sequence correspondence compute 3D
transformation (least square fit can be applied). - Iteratively improve structural superposition.
Not a good approach sequence alignment can be
incorrect.
9Structure Alignment (Straightforward Algorithm)
- For each pair of triplets, one from each molecule
which define almost congruent triangles compute
the rigid transformation that superimposes them. - Count the number of aligned point pairs and sort
the hypotheses by this number.
10- For the highest ranking hypotheses improve the
transformation by replacing it by the best RMSD
transformation for all the matching pairs. - Complexity O(n3m3 ) O(nm) .
-
- Applying 3D grid gives practically O(n3m3)
O(n) - If one exploits protein backbone geometry 3D
grid - O(nm) O(n)
11Structural Alignment Approaches
Two interrelated problems 3D transformation and
point correspondence (matching, alignment)
Some methods
- Generate a set of 3D transformations.
- Cluster similar transformations.
- Compute 3D alignment for each cluster
representative.
- Generate a set of 3D transformations.
- Compute 3D alignment for each transformation.
Geometric Hashing Combines transformation and
correspondence detection in one scheme.
12Accuracy improvement during detection of 3D
transformation.
Instead of 3 points use more. How many?
Align any possible pair of fragments - Fij(k)
13Accept Fij(k) if rmsd(Fij(k)) lte. Complexity
O(n3 n) O(n) (assume nm) (For each Fij(k)
we need compute its rmsd) can be reduced to
O(n3) O(n)
14Improvement BLAST idea - detect short similar
fragments, then extend as much as possible.
i1
i-1
i
j-1
j1
j
ai-1 ai ai1 bj-1 bj bj1
Extend while rmsd(Fij(k)) lte.
Complexity O(n2)O(n)
15- Sequence-order Independent Alignment
P
Q
164-helix bundle
2cblA
1f4nA
1rhgA
1b3q
17Sequence Order Independent Alignment
18Sequence Order Independent Alignment
2cblA 1f4n 1rhgA 1b3q
51
103
113
169
chain A
chain B
3
58
54
7
73
126
34
12
171
147
chain A
chain B
306
355
354
305
19The C2 domain calcium-binding motif
E. A. NALEFSKI and J. J. FALKE The C2 domain
calcium-binding motif Structural and functional
diversity Protein Sci 1996 5 2375-2390
20TRAF-Immunoglobulin Ensemble
E- strand
- Ensemble 8 proteins from 2 folds.
- Core sandwich of 6 strands
- Runtime 21 seconds
- helices - strands
21Some Links
- Rasmol Molecular Visualization
- SCOP - Structural Classification of Proteins
- MultiProt - Protein Structural (pairwise/multiple)
Alignment - MASS Secondary Structure Based
(pairwise/multiple) Alignment