Title: Doug Raiford
1Protein Structure Searches
2Problem definition
- Given a protein conformation can we find other
structurally similar proteins? - Might have a database of structures (like the PDB)
3If have a predicted and known
- Can do a simple RMSD to compare the two
conformations - Know precisely which aas compare to which
4What about if not identical sequences?
- Must map aas from one to aas in the other
- How might you do this?
- Sequence similarity
- MSAs
5Have we seen before?
- 3D PSSM
- Sequence alignment integrated with 3D alignment
- Stored in profile (position specific similarity
profile) - Gens 1D profiles first (MSAs)
- Then uses a structural alignment program (SAP) to
augment profiles with structural similarity
6SAP (structural alignment program)
- Aligning secondary structures
7How?
- What do you think of when you hear that you will
need to align two things? - Dynamic programming
a a a T ß ß ß
a
a
ß
ß
ß
ß
8Scoring
- Three components
- AA similarity (substitution matrix)
- Local structure
- E.g. both aas members of alpha helix
- Solvent exposure
Are the associated AAs similar, sequence wise
(i.e. both glycines)?
Are they both in a similar local structure?
Are they both buried or both exposed to solvent?
9Benefits
- SAP (structure alignment) allows a profile to be
influenced by secondary structure - Useful to 3D PSSM in thatthreading decisions
(whichaas match to a profile) - Homology based protein conformation enhancedby
making better decisions on where to insert
gaps/varying length loops
10Another already seen
- PFAM
- Have Markov Models for protein families
- Sequences that match models have high probability
of matching conformation - Even though not comparing structures (query to
target) - are matching a sequence to its most probable
structure
Pfam
HMMR
11What about similar structure in an alternative
way?
- Cant really align
- How else might it work?
12Dali (distance matrix alignment)
- How might two distance matrices look?
- All pair wise distances from each aa to all other
aas - If identical proteins the matrices would be
almost identical
Low distance region in matrix if parallel
13How turn into a similarity score?
- Find optimum set of similar sub-structures
- Even if in different 1D locations
- Find amino acid equivalence
- Once have equivalence can easily compare
structure similarity - E.g. with RMSD
14Approach
- Break matrix into a bunch of overlapping
sub-matrices - Do an all pair wise comparison
- Sub-matrices are merged that naturally extend
- Must find pairings of sub-matrices that yield
best overall score
1 2 3 4 5 6 7
1
2
3
4
5
6
7
1 2 3 4 5 6 7
1
2
3
4
5
6
7
15How optimize choice of pairings
1 2 3 4 5 6 7
1
2
3
4
5
6
7
- Monte Carlo approach
- Randomly generate pairings
- Calculate overall similarity
- Multiple solutions in parallel
- Slowly improve each by randomly altering pairings
(like a random search) - Have some probability of keeping a solution that
is worse than previous
1 2 3 4 5 6 7
1
2
3
4
5
6
7
16Once have aa associations
- Can determine similarity
- How?
17Have to minimize aa distances
- Must perturb XYZ (translation), pitch, and yaw
(rotation) of one of the proteins minimizing RMSD - Like linear regression
- Cant do until know which aas are associated
18Have to minimize aa distances
- Some numeric methods start by fixing between 2
and 4 amino acids - Some short cuts
- Center of gravity is the average of all vectors
- Translate
- ave(p1) ave(p2)
- Singular value decomposition to rotate (Like
Eivenvectors)
19(No Transcript)
20Score more complex so
- Requires double dynamic programming
- If nxm matrix then n times m different matrices
generated pinning return path to each aa pair - Used to generate a position specific scoring
which is then used in aa similarity scoring - Reduces the constraint that two particular aas
are equivalent
a a a T
a
a
ß
a a a T
a
a
ß
a a a T
a
a
ß
a a a T
a
a
ß
a a a T
a
a
ß