Title: A computational study of protein folding pathways
1A computational study of protein folding pathways
- Reducing the computational complexity of the
folding process using the building block folding
model. - Nurit Haspel, Chung-Jung Tsai, Haim Wolfson and
Ruth Nussinov
2The building blocks model(Chung Jung Tsai)
- Protein folding is a hierarchical process.
- A protein is constructed from HFUs.
- HFU - the result of a combinatorial assembly of
building blocks. - Building block - a contiguous, highly populated
fragment. - The building block model allows illustrating the
protein folding pathway.
3An outline of the building blocks algorithm
- Scoring function - measures the relative
stability of a candidate building block - Three ingredients
- Compactness
- Degree of isolation
- hydrophobicity
- The result - an anatomy tree that illustrates
the most probable folding route.
4The Scoring Function
Z - Compactness H - hydrophobicity I - Isolation
5Compactness, Hydrophobicity and Isolation
definitions
- Compactness -
- Hydrophobicity -
- Isolation -
6The Cutting Procedure
- Locating a basket of candidate building blocks
(relatively stable contiguous fragments) - Assign a stability score to all the candidate
fragments - Collect the local minima in the fragment map
(best score in a given radius). - Recursively splitting the protein top-down
- Search the basket for a set of fragments that
constitute the whole fragment, allowing a short
overlap (7 residues) and a gap of up to 15
residues. - Minimum building block size - 15.
- No node can have only one child (except for the
root) - Stop when the node can not be split any further
- In this work, building blocks up to level 6.
7Example - Annexin III
8Example (cont.)
9Example (cont.)
10Usefulness of the anatomy tree
- It is possible to see whether a protein folds
through single or multiple route(s). - These routes can be observed by inspecting the
fragment map (there can be more than one way to
construct a tree). - Sequential versus non-sequential folding.
- Sequential contact made only between
consecutive building blocks. - Binary anatomy tree sequential folder.
- Fast versus slow folding
- Sequential folding proteins usually fold faster.
- Climbing up the tree allows us to illustrate the
folding process.
11Critical building blocks (Sandeep Kumar)
- Some building blocks may be considered critical
for correct folding. - A critical building block is in contact with
other building blocks in the protein. - It likely to be inserted between sequentially
connected building blocks. - Without it, the other building blocks are likely
to mis-associate. - The structure and sequence of a critical BB is
more likely to be conserved.
12Critical building block algorithm
- For each building block
- Compute its diff. contacting surface area .
- Compute its Critical building block index
- Compute its Z-score
13Critical building blocks (cont.)
A building block is critical if
- It is found at most levels below the hydrophobic
folding unit level - It has a consistently high CIndex at different
levels - Its CIndex is significant by at least 2 standard
deviations in at least one level of protein
anatomy
14The goals of my research
- Clustering the building blocks according to their
3-D structures, using a rigid matching algorithm. - Analyzing the building blocks Sequence,
stability distribution, size. - Analyzing the clusters Size, stability score
distribution, sequence conservation, criticalness
conservation.
15The goals of my research (cont.)
- Analyzing the critical building blocks position
within the protein, relative stability, sequence
and structure conservation. - Developing an algorithm that assigns a set of
building blocks to a protein sequence, using
sequence similarity, relative stability and more
information.
16Clustering the building blocks
- Each cluster has representative members (one or
more) - For each building block structure
- Go over the clusters.
- Match with cluster representative(s).
- If matches (1.5A rmsd, 70 size) - join the
building block to the cluster. - If no match found - open a new cluster with this
building block as a representative.
Problem -O(n²) comparisons n - number of clusters
17Clustering of the building blocks
Cluster 1
Cluster 2
Cluster n
?
?
18Making clustering more efficient
- Dividing the building blocks into SCOP families
(proteins from the same family usually produce
the same building blocks). - Clustering each family and then merge all the
clusters - reduces the number of clusters at each
instance.
19Building block and cluster data
20Distribution of number of clusters
21An example of a cluster
22 Sequence analysis of the clusters
- Sequence clustering of each structural cluster
(using BLAST). - Creating a non-redundant sequence dataset.
- Goal - finding a connection between (short)
sequences and structures.
23 Statistical analysis of the clusters and of the
critical building blocks
- Stability score distribution among cluster
members. - Criticalness score distribution among cluster
members. - Position distribution of the critical building
blocks. - Stability score as a function of criticalness
score.
24An example of stability distribution
25Criticalness score distribution within a cluster
26An N-terminus critical building block example
27A C-terminus critical building block example
28A mid-sequence critical building block example
29Distribution of the position inside the protein -
all-alpha, level 3
30Stability vs. Criticalness score example
31Stability score of critical and non-critical
building blocks (histogram)
Non-critical
Critical
32Final goal
- Given a sequence and using the information
accumulated so far - is there a way of matching a
set of building blocks to it?
33The building block assignment algorithm
- Perform sequence alignment of the protein
sequence against the building block sequence
database. - Construct a directed, acyclic graph.
- Each matching building block is a graph vertex
and is assigned a score depending on the sequence
alignment score, building block stability and
other parameters. - Directed edges connecting the fragments that
match to consecutive areas in the protein
sequence, allowing short overlaps and small gaps.
- Edge score average score of connected vertices.
34The building block assignment algorithm (cont.)
- Add fictitious start and target vertices.
- Connect start to all starting vertices
- Connect all ending vertices to target.
- Find shortest path from start to target using the
Single source shortest path algorithm. - The path is an optimal building block
assignment covering the protein sequence.
35Illustration of the algorithm
36Example ROP protein from E. coli (1rpo)
37Example Myoglobin from sea hare (1mba)
38Suggestions for future work
- Improving the algorithm and adding new parameters
to it (secondary structure alignment, trying
other building blocks from the same cluster as
the matching building blocks etc.). - Combinatorial assembly Yuvals work.
- Further cluster analysis inquiring into
sequence conservation - Conformation stability measurements (molecular
dynamics)
39Conclusions
- Using the hierarchical folding model, It may be
possible to reduce the folding complexity,
assigning local substructures and then assembling
them.