Title: Protein Function prediction using network concepts
1- Lecture 4
- Protein Function prediction using network
concepts - Application of network concepts in DNA sequencing
2Topology of Protein-protein interaction is
informative but further analysis can reveal other
information. A popular assumption, which is true
in many cases is that similar function proteins
interact with each other. Based on these
assumption, we have developed methods to predict
protein functions and protein complexes from the
PPI networks mainly based on cluster analysis.
3Cluster Analysis
Cluster Analysis, also called data segmentation,
implies grouping or segmenting a collection of
objects into subsets or "clusters", such that
those within each cluster are more closely
related to one another than objects assigned to
different clusters.
In the context of a graph densely connected nodes
are considered as clusters
Visually we can detect two clusters in this graph
4K-cores of Protein-Protein Interaction Networks
Definition Let, a graph G(V, E) consists of a
finite set of nodes V and a finite set of edges
E. A subgraph S(V?, E?) where V?? V and E? ? E
is a k-core or a core of order k of G if and
only if ? v ? V? deg(v) ? k within S and S is
the maximal subgraph of this property.
5Graph G
1-core graph The degree of all nodes are one or
more
61-core graph The degree of all nodes are one or
more
72-core graph The degree of all nodes are two or
more
81-core graph The degree of all nodes are one or
more
93-core graph The degree of all nodes are three
or more The 3-core is the highest k-core subgraph
of the graph G
10Analyzing protein-protein interaction data
obtained from different sources, G. D. Bader and
C.W.V. Hogue, Nature biotechnology, Vol 20, 2002
11(No Transcript)
12Prediction of Protein Functions Based on K-cores
of Protein-Protein Interaction Networks
Prediction of Protein Functions Based on
K-cores of Protein-Protein Interaction Networks
and Amino Acid Sequences, Md. Altaf-Ul-Amin,
Kensaku Nishikata, Toshihiro Koma, Teppei
Miyasato, Yoko Shinbo, Md. Arifuzzaman, Chieko
Wada, Maki Maeda, Taku Oshima, Hirotada Mori,
Shigehiko Kanaya The 14th International
Conference on Genome Informatics December 14-17,
2003, Yokohama Japan.
13Total 3007 proteins and 11531 interactions Around
2000 are unknown function proteins Highest K-core
of this total graph is not so helpful
1410-core graph
15We separate 1072 interactions (out of 11531)
involving protein synthesis and function unknown
proteins.
P. S.
U. F.
P. S.
P. S.
16Function unknown Proteins of this 6-kore graph
are likely to be involved in protein synthesis
17193 interactions out of 11531 interactions
involving electron transport and function unknown
proteins.
18Function unknown Proteins of this 2-kore graph
are likely to be involved in electron
transfer Further sub-classification may be
possible applying other information with the
k-core subgraph
Highest k-core or the 2-core subgraph of the
graph of the previous page
19 Prediction of Protein Functions Based on
Protein-Protein Interaction Networks A Min-Cut
Approach, Md. Altaf-Ul-Amin, Toshihiro Koma, Ken
Kurokawa, Shigehiko Kanaya, Proceedings of the
Workshop on Biomedical Data Engineering (BMDE),
Tokyo, Japan, pp. 37-43, April 3-4, 2005.
20- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
21- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
22Introduction After the complete sequencing of
several genomes, the challenging problem now is
to determine the functions of proteins
- Determining protein functions experimentally
- Using various computational methods
a) sequence b) structure c) gene
neighborhood d) gene fusions e) cellular
localization f) protein-protein interactions
23Introduction
Present work predicts protein functions based on
protein-protein interaction network.
- For the purpose of prediction, we consider the
interactions of - function-unknown proteins with function-known
proteins and - function-unknown proteins with function-unknown
proteins
In the context of the whole network.
24Introduction
Schwikowski, B., Uetz, P. and Fields, S. A
network of protein-protein interactions in yeast.
Nature Biotech. 18, 1257-1261 (2000) Deals with
a network of 2039 proteins and 2709 interactions.
65 of interactions occurred between protein
pairs with at least one common function
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A.,
and Tagaki, T. Assessment of prediction accuracy
of protein function from protein-protein
interaction data. Yeast 18, 523-531 (2001)
Reported similar results..
25Introduction
So, majority of protein-protein interactions are
between similar function protein pairs.
Therefore, We assign function-unknown proteins to
different functional groups in such a way so that
the number of inter-group interactions becomes
the minimum.
Hence we call the proposed approach a Min-Cut
approach.
26- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
27The concept of Min-Cut
U4
K8
U3
K4
K1
K6
U2
K2
K3
U1
K5
G1
G2
A typical and small network of known and unknown
proteins
28The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Unknown proteins assigned to known groups based
on majority interactions
29The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Number of CUT 4
30The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
An alternative assignment of unknown proteins
31The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Number of CUT 2
For every assignment of unknown proteins, there
is a value of CUT. Min-cut approach looks for an
assignment for which the number of CUT is minimum.
32- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
33Problem Formulation
Here we explain some points with a typical
example.
34Problem Formulation
V set of all nodes E set of all edges
GK1, K2, K3, K4, K5, K6, K7, K8, K9,
K10 UU1, U2, U3, U4, U5, U6, U7, U8
35Problem Formulation
We generate U? U such that each protein of U is
connected in N with at least one protein of group
G by a path of length 1 or length 2.
U U1, U2, U3, U4, U5, U6, U7
36Problem Formulation
We can assign proteins of U to different groups
and calculate CUT
Interactions between known protein pairs can
never be part of CUT
For this assignment of unknown proteins, the CUT
6
37Problem Formulation
The problem we are trying to solve is to assign
the proteins of set U to known groups G1 , G2
,.., G3 in such a way so that the CUT becomes
the minimum.
38- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
39A Heuristic Method
- The problem under hand is a variant of network
partitioning problem. - It is known that network partitioning problems
are NP-hard. - Therefore, we resort to some heuristics to find a
solution as better as it is possible.
40A Heuristic Method
41A Heuristic Method
U1 has one path of length 1 with G2 and two paths
of length two with G1
42A Heuristic Method
U4 has two paths of length 1 with G1, one path of
length one with G2 and one path of length two
with G3.
43A Heuristic Method
44A Heuristic Method
45A Heuristic Method
By assigning all the unknown proteins to
respective height priority groups, CUT 6
46A Heuristic Method
For this assignment of unknown proteins, the CUT
7
47A Heuristic Method
For this assignment of unknown proteins, the CUT
4
48- Outline
- Introduction
- The concept of Min-Cut
- Problem Formulation
- A Heuristic Method
- Evaluation of the Proposed Method
- Conclusions
49Evaluation of the Proposed Approach
- The proposed method is a general one and can be
applied to any organism and any type of
functional classification. - Here we applied it to yeast Saccharomyces
cerevisiae protein-protein interaction network - We obtain the protein-protein interaction data
from ftp//ftpmips.gsf.de/yeast/PPI/ which
contains 15613 genetic and physical interactions.
50Evaluation of the Proposed Approach
YAR019c YMR001c YAR019c YNL098c YAR019c YOR101w
YAR019c YPR111w YAR027w YAR030c YAR027w YBR135w
YAR031w YBR217w ------------- ------------- ----
--------- ------------- Total 12487 pairs
We discard self-interactions and extract a set of
12487 unique binary interactions involving 4648
proteins.
51Evaluation of the Proposed Approach
A network of 12487 interactions and 4648 proteins
is reasonably big
52Â
Evaluation of the Proposed Approach
We collect from http//mips.gsf.de/genre/proj/yeas
t/index.jsp the classification data
Â
53Â
Evaluation of the Proposed Approach
- The proposed approach is intended to predict the
functions of function-unknown proteins. - However, by predicting the functions of
function-unknown proteins, it is not possible to
determine the correctness of the predictions. - We consider around 10 randomly selected proteins
of each group of Table 1 as function-unknown
proteins.
Â
54Â
Evaluation of the Proposed Approach
- The union of 10 of all groups consists of 604
proteins. This is the unknown group U. - The union of the rest 90 of each of the
functional groups constitutes the set of known
proteins G. There are total 3783 proteins in G. - We generate U? U such that each protein of U is
connected in N with at least one protein of group
G by a path of length 1 or length 2. There are
470 proteins in U . - We predicted functions of these 470 proteins
using the proposed method.
Â
55Evaluation of the Proposed Approach
We applied this algorithm using Max_value50000
to predict the functions 470 proteins.
56Evaluation of the Proposed Approach
- We cannot guarantee that minimum CUT corresponds
to maximum successful prediction. - However, the trends of the results of the Figure
above shows that it is very likely that the lower
is the value of CUT the greater is the number of
successful predictions
57Evaluation of the Proposed Approach
We then examine the relation of successful
predictions with the number of degrees of the
proteins in the network .
Degree of U4 7 Degree of U73
58Evaluation of the Proposed Approach
We then examine the relation of successful
predictions with the number of degrees of the
proteins in the network .
59Evaluation of the Proposed Approach
- The success rate of prediction is as low as
30.46 for proteins that have only one degree in
the interaction network. - However it is 67.61 for proteins that have
degrees 8 or more. - This implies that the reliability of the
prediction can be improved by providing
reasonable amount of interaction information