Title: Structure Alignment
1Structure Alignment
2Structure Alignment
3Content
- Motivation
- Some basics
- Double Dynamic Programming
4PART I Motivation
5Motivation Conformational changes
- Upon ligand binding structures may change
- Structural alignment can highlight the changes
6Conformational changes Small GTPases
- Small GTPases act as molecular switches to
control and regulate important functions and
pathways within in cell
- Activated by guanine nucleotide exchange factors
(GEF) - Inactivated by GTPase activating proteins (GAP)
7G proteins Conformational change in GTP and GDP
bound state
8Open and closed conformation of cytrate synthase
(1cts,5cts)
- Open oxalacetate, Closed oxalacetate and
co-enzyme A - Loop between two helices moves by 6A and rotates
by 28º, some atoms move by 10A
9(No Transcript)
10Hinge motion in Lactoferrin (1lfh, 1lfg)
- Lactoferrin is an iron-binding protein found in
secretions such as milk or tears - Rotation of 54º upon iron-binding
11Hinge motion in Lactoferrin (1lfh, 1lfg)
- Lactoferrin is an iron-binding protein found in
secretions such as milk or tears - Rotation of 54º upon iron-binding
12(No Transcript)
13Motivation (Distant) Relatives
- Sequence similarity may be low, but structural
similarity can still be high
Picture from www.jenner.ac.uk/YBF/DanielleTalbot.p
pt
14Distant relatives
- Globins occur widely
- Primary function binding oxygen
- Assembly of helices surrounding haem group
15Relatives
- Sperm whale myoglobin (2lh7) and Lupin
leghaemoglobin (1mbd)
16Distant Relatives
17Relatives
- Actinidin (2act) and Papain (9pap)
- Sequence identity 49, rmsd 0.77A
- Same family Papain-like
18Relatives
- Plastocyanin (5pcy) and azurin (2aza)
- Core of structure is conserved
19Relatives
- Structure classifications like CATH and FSSP use
structural alignments to identify superfamilies.
20Motivation Convergent Evolution
21Sequence similarity low
gt1cse Subtilisin AQTVPYGIPLIKADKVQAQGFKGANVKVAVLD
TGIQA SHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAAL DNTTGV
LGVAPSVSLYAVKVLNSSGSGSYSGIVSGIE WATTNGMDVINMSLGGAS
GSTAMKQAVDNAYARGVVV VAAAGNSGNSGSTNTIGYPAKYDSVIAVGA
VDSNSNR ASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMA SPHV
AGAAALILSKHPNLSASQVRNRLSSTATYLGSS FYYGKGLINVEAAAQ
gt1acb Chymotrypsin CGVPAIQPVLSGLSRIVNGEEAVPGSWPWQV
SLQDKT GFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQG SSSEK
IQKLKIAKVFKNSKYNSLTINNDITLLKLSTA ASFSQTVSAVCLPSASD
DFAAGTTCVTTGWGLTRYTN ANTPDRLQQASLPLLSNTNCKKYWGTKIK
DAMICAGA SGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCST STP
GVYARVTALVNWVQQTLAAN
22Structural similarity low
1CSEE, 1ACBE
23Convergent Evolution
- c.41.1 and b.47.1 share interaction partners
d.40.1 CI-2 family of serine protease inhibitors
d.58.3Protease propeptides/inhibitors
c.41.1 Subtilisin-like
b.47.1Trypsin-likeserine proteases
d.84.1Subtilisin inhibitor
c.56.5 Zn-dependentexopeptidase
g.15.1 Ovomucoid/PCI-1 like inhibitor
24Convergent Evolution
1oyv Ovomucoid/PCI-1 like inhibitor,
g.15.1top Subtilisin like c.41.1bottom
1OYV
4sgb Ovomucoid/PCI-1 like inhibitor, g.15.1,
top Trypsin-like serine proteases, b.47.1.2,
bottom
25Convergent Evolution
1cse CI-2 family of serine proteases inhitors,
d.40.1 top Subtilisin like c.41.1bottom
1acb CI-2 family of serine proteases inhitors,
d.40.1 top Trypsin-like serine proteases,
b.47.1.2, bottom
26Catalytic Triad
gt1cse Subtilisin AQTVPYGIPLIKADKVQAQGFKGANVKVAVLD
TGIQA SHPDLNVVGGASFVAGEAYNTDGNGHGTHVAGTVAAL DNTTGV
LGVAPSVSLYAVKVLNSSGSGSYSGIVSGIE WATTNGMDVINMSLGGAS
GSTAMKQAVDNAYARGVVV VAAAGNSGNSGSTNTIGYPAKYDSVIAVGA
VDSNSNR ASFSSVGAELEVMAPGAGVYSTYPTNTYATLNGTSMA SPHV
AGAAALILSKHPNLSASQVRNRLSSTATYLGSS FYYGKGLINVEAAAQ
gt1acb Chymotrypsin CGVPAIQPVLSGLSRIVNGEEAVPGSWPWQV
SLQDKT GFHFCGGSLINENWVVTAAHCGVTTSDVVVAGEFDQG SSSEK
IQKLKIAKVFKNSKYNSLTINNDITLLKLSTA ASFSQTVSAVCLPSASD
DFAAGTTCVTTGWGLTRYTN ANTPDRLQQASLPLLSNTNCKKYWGTKIK
DAMICAGA SGVSSCMGDSGGPLVCKKNGAWTLVGIVSWGSSTCST STP
GVYARVTALVNWVQQTLAAN
27Convergent evolution
A
B
C
A
A
- A and B are native, C is viral
Henschel et al., Bioinformatics 2006
28HIV Nef mimics kinase in binding SH3
Kinase (Src Haematopoeitic cell kinase,
Catalytic domain)
- Comparison of Nef-SH3 and intra-chain interaction
of catalytic domain and SH3 of Hck, PDBs 1efn
and 2hck - No evidence of homology between Nef and Kinase
HIV1-Nef
Fyn-SH3/Hck-SH3
Henschel et al., Bioinformatics 2006
29Automatic calculation of equivalent residues
Nef
Kinase
- Apart from PxxP motif matches Arg71/Lys249,
Phe90/His289 - Residues with equivalents are strictly conserved
in HIV-Nef
Henschel et al., Bioinformatics 2006
30Mimickry of baculovirus p35 and human inhibitor
of apoptosis
- Caspase (red)
- P35 (yellow)
- IAP (green)
- Upon infection cell starts apoptosis programme,
p35 tries to stop it
Henschel et al., Bioinformatics 2006
31Mimickry of Capsids and Cyclophilin
- HIV capsid protein (yellow)
- Cyclophilin (red, green)
- Cyclophilin A restricts HIV infectivity
- Upon mutation of cyclophilin or inhibition with
cyclophorin, infectivity goes up gt100 (Towers,
Nature Medicine, 2003)
Henschel et al., Bioinformatics 2006
32PART II Some basics
33What do we need?
- To main operations to align structures
- Translation
- Rotation
- How to evaluate a structural alignment?
- Root mean square deviation, rmsd
34Basic Operations Translation
35Basic Operations Translation
36Basic Operations Translation
37Basic Operations Rotation
38Root Mean Square Deviation
- What is the distance between two points a with
coordinates xa and ya and b with coordinates xb
and yb? - Euclidean distanced(a,b) v (xa--xb )2 (ya
-yb )2 - And in 3D?
39Root Mean Square Deviation
- In a structure alignment the score measures how
far the aligned atoms are from each other on
average - Given the distances di between n aligned atoms,
the root mean square deviation is defined as - rmsd v 1/n ? di2
40Quality of Alignment and Example
- Unit of RMSD gt e.g. Ã…ngstroms
- Identical structures gt RMSD 0
- Similar structures gt RMSD is small (1 3 Ã…)
- Distant structures gt RMSD gt 3 Ã…
41PART III Dynamic Programming
42A very simple algorithm
- to align identical structures with
conformational changes - Generate a sequence alignment (not necessary if
both sequences are really 100 identical) - Compute center of mass for both structures
- Move both structures so that the centers of mass
are the origin - Compute the angle between all aligned residues
- Rotate structure by median of all angles
43A very simple algorithm
- to align identical structures with
conformational changes - Generate a sequence alignment (not necessary if
both sequences are really 100 identical) - Compute center of mass for both structures
- Move both structures so that the centers of mass
are the origin - Compute the angle between all aligned residues
- Rotate structure by median of all angles
Question How? Assume n atoms (x1,y1,z1) to
(xn,yn,zn) (for one structure)
44A very simple algorithm
Question How?Assume n atoms(x1,y1,z1) to
(xn,yn,zn) Center of mass (xCoM,yCoM,zCoM)
(1/n ?ni1 xi , 1/n ?ni1 yi 1/n ?ni1 zi )
- to align identical structures with
conformational changes - Generate a sequence alignment (not necessary if
both sequences are really 100 identical) - Compute center of mass for both structures
- Move both structures so that the centers of mass
are the origin - Compute the angle between all aligned residues
- Rotate structure by median of all angles
Question How?
45A very simple algorithm
Question How?Assume n atoms (x1,y1,z1) to
(xn,yn,zn) Center of mass (xCoM,yCoM,zCoM)
(1/n ?ni1 xi , 1/n ?ni1 yi 1/n ?ni1 zi
- to align identical structures with
conformational changes - Generate a sequence alignment (not necessary if
both sequences are really 100 identical) - Compute center of mass for both structures
- Move both structures so that the centers of mass
are the origin - Compute the angle between all aligned residues
- Rotate structure by median of all angles
For all i do xi xi-xCoM, yi yi-yCoM, yi
yi-yCoM,
46A very simple algorithm
- to align identical structures with
conformational changes - Generate a sequence alignment (not necessary if
both sequences are really 100 identical) - Compute center of mass for both structures
- Move both structures so that the centers of mass
are the origin - Compute the angle between all aligned residues
- Rotate structure by median of all angles
Why median and not mean?
47A refinement Alternating alignment and
superposition
- 1. P initial alignment (e.g. based on
sequence alignment) - 2. Superpose structures A and B based on P
- 3. Generate distance-based scoring matrix R from
superposition - 4. Use dynamic programming to align A and B using
scoring matrix R - 5. P new alignment derived from dynamic
programming step - 6. If P is different from P then go to step 2
again
48Distance-based scoring matrix
- Let d(Ai, Bj) be the Euclidean distance between
Ai and Bj - Let t be the upper distance limit for residues to
be rewarded - The scoring matrix R is defined as
follows R(Ai, Bj) 1 / d(Ai, Bj) - 1 /
t if R(Ai, Bj) gt max. score then R(Ai, Bj)
max. score - The gap/mismatch penalty is set to 0
49Distance-based scoring matrix
- Let d(Ai, Bj) be the Euclidean distance between
Ai and Bj - Let t be the upper distance limit for residues to
be rewarded - The scoring matrix R is defined as
follows R(Ai, Bj) 1 / d(Ai, Bj) - 1 /
t if R(Ai, Bj) gt max. score then R(Ai, Bj)
max. score - The gap/mismatch penalty is set to 0
What size doesPAM have? What size doesR have?
50Example
- R(Ai, Bj) 1/d(Ai, Bj) - 1/t for t1/10 and max.
score 2
51Part IV Double dynamic programming (chapter 9)
52Doube dynamic programming
- Goal Simultaniously align and superpose
structures - Double dynamic programming is a heuristic which
tries to achieve goal - Implemented as part of SSAP (used e.g. by CATH)
53Idea of double dynamic programming
- Use two levels of dynamic programming
- High level, which summarises low level DP
- Low level, which generates alignment based on
assumption that ai and bj are part of an
optimal alignment
54Low level matrix
- ijR is the low level scoring matrix assuming the
pair ai and bj are aligned - ijRkl is the score showing how well ak fits onto
bl under the constraint that ai and bj are
aligned - Perform dynamic programming for all pairs i,j
using ijR with constraint that optimal alignment
includes (i,j)
55(No Transcript)
56(No Transcript)
57Questions How was max. score set in this
example?
58(No Transcript)
59(No Transcript)
60(No Transcript)
61(No Transcript)
62(No Transcript)
63Summary
- Structural alignments are useful to study
conformational changes, to classify domains into
families (DDP is used in CATH), to study proteins
with distant relationships and hence low sequence
similarity - Algorithms
- Basic operations translate and rotate
- Simple algorithm based on dynamic programming
- Double dynamic programming
- low-level programming using substitution matrix
based residue distance - Aggregation of best paths for high-level
programming