Title: Protein Structure Prediction
1Protein Structure Prediction
- Samantha Chui
- Oct. 26, 2004
2Central Dogma of Biology
- Question Given a protein sequence, to what
conformation will it fold?
3How does nature do it?
- Hydrophobicity vs. hydrophilicity
- Van der Waals interaction
- Electrostatic interaction
- Hydrogen bonds
- Disulfide bonds
4Current Approaches
- Experimental Methods
- X-ray crystallography
- NMR spectroscopy
- Computational Methods
- Homology modeling
- Similar sequences fold into similar structures
- Threading
- Dissimilar sequences may fold into similar
structures - Ab initio
- No similarity assumptions
- Conformational search
5Assembly of sub-structural units
predicted structure
known structures
6Small Libraries of Protein Fragments Model
Native Protein Structures AccuratelyRachel
Kolodny, Patrice Koehl, Leonidas Guibas, and
Michael Levitt, 2002
- Goal Find finite set of protein fragments that
can be used to construct accurate discrete
conformations for any protein - 1. Generate fragments from known proteins
- 2. Cluster fragments to identify common
structural motifs - 3. Test library accuracy on proteins not in the
initial set
7Datasets of protein fragments
- 200 unique protein domains from Protein Data Bank
(PDB) - 36,397 residues
- Four sets of backbone fragments
- 4, 5, 6, and 7-residue long fragments
- Divide each protein domain into consecutive
fragments beginning at random initial position
8Fragment structural similarity
- Coordinate root-mean-square (cRMS) deviation of
Ca atoms - cRMS(A,B) sqrt(Sdi2/N)
- one to one mapping between atoms in structure A
and structure B - Translate and rotate to find best alignment
- 0 if superimpose perfectly
9Pruning and clustering
- Outliers have large cRMS deviation from all other
fragments - Discard according to some fragment-length
specific threshold - k-means simulated annealing clustering
- Repeatedly run k-means clustering, merge nearby
clusters and split disperse clusters - Scoring function total variance S (x µ)2
- Less sensitive to initial choice of cluster
centers than k-means
10Compiling the libraries
- Select cluster centroids as library entries
- Minimum sum of cRMS deviations from all the other
cluster fragments - Form representative set of protein fragments
- Library contents highly dependent upon clustering
procedure - For each set of fragments, start with 50 random
seeds and choose library with minimal total
variance score
11Evaluating quality of a library
- Local-fit
- How well library fits local conformation of all
proteins in test set. - Global-fit
- How well library fits global three-dimensional
conformation of all proteins in test set
12Local-fit method
- Protein structures broken into set of all
overlapping fragments of length f - Find for each protein fragment the most similar
fragment in the library (cRMS) - Score Average cRMS value over all fragments in
all proteins in the test set
13Local-fit results
14Global-fit method
- Concatenate best local-fit library fragments just
found - Determine fragments orientation by superimposing
its first three Ca atoms onto last three Ca atoms
of preceding fragment
15Global-fit method
- Number of possible sequences of fragments
exponential in proteins length - Greedy algorithm finds good rather than best
global-fit approximation - Start at N terminus, approximate increasingly
larger segments of the protein - Concatenate library fragment which will yield
structure of minimal cRMS deviation from
corresponding segment - Deterministic, linear time
16Global-fit results
0.91 Ã…
1.85 Ã…
2.78 Ã…
50 fragments 7 residues 2.66 states/residue
100 fragments 5 residues 10 states/residue
20 fragments 5 residues 4.47 states/residue
17Assembly of sub-structural units
predicted structure
known structures
18Protein structure prediction via combinatorial
assembly of sub-structural unitsYuval Inbar,
Hadar Benyamini, Ruth Nussinov, and Haim J.
Wolfson, 2003
19CombDock
- Input structural units (SUs) with known 3D
conformations - SUs considered rigid bodies
- rotated and translated with respect to each other
- Goal predict overall structure
- Constraints
- Penetration avoid steric clashes
- Backbone restriction on maximum distance between
consecutive SUs
20All pairs docking
- N(N-1)/2 pairs of SUs
- Calculate candidate transformations according to
matching complementary local features on surface
of SUs - Apply transformation on 2nd SU of pair
- Keep K best for each
- Clustering to ensure all K transformations yield
significantly different complexes
21Combinatorial assembly
- Multigraph representation
- Vertices SUs
- Edges transformations between two SUs
- K parallel edges between any two vertices
- Final protein conformation spanning tree
- N SUs, one connectivity component, no cycles
22Combinatorial Assembly
- NN-2KN-1 different spanning trees
- Not all spanning trees are valid complexes
- Use heuristical algorithm
- Two subtrees adjacent iff there exists an index i
so that vertex i is in one subtree and i1 is in
the other - Sequential tree recursive definition
- One vertex
- Tree with edge that connects two adjacent
sequential trees
23Combinatorial Assembly
- Hierarchical algorithm of N stages
- ith stage generate sequential trees with i
vertices - Construct trees by connecting adjacent sequential
trees of smaller sizes generated earlier - Keep D best sequential trees at each step
- Discard trees which do not meet backbone and
penetration constraints - Score sum of scores of transformations
24Combinatorial Assembly
25CombDock Results
26Conclusion
protein sequence
predicted structure
known structures
fragment library
- Experimental Methods
- X-ray crystallography
- NMR spectroscopy
- Computational Methods
- Homology modeling
- Similar sequences fold into similar structures
- Threading
- Dissimilar sequences may fold into similar
structures - Ab initio
- No similarity assumptions
- Conformational search
27References
- Kolodny et al., Small libraries of protein
fragments model protein structures accurately - Inbar et al., Protein structure prediction via
combinatorial assembly of sub-structural units