Protein Function prediction using network concepts - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Protein Function prediction using network concepts

Description:

... Ul-Amin, Kensaku Nishikata, Toshihiro Koma, Teppei Miyasato, Yoko Shinbo, Md. ... Networks: A Min-Cut Approach', Md. Altaf-Ul-Amin, Toshihiro Koma, Ken ... – PowerPoint PPT presentation

Number of Views:88
Avg rating:3.0/5.0
Slides: 60
Provided by: altaf
Category:

less

Transcript and Presenter's Notes

Title: Protein Function prediction using network concepts


1
  • Lecture 4
  • Protein Function prediction using network
    concepts
  • Application of network concepts in DNA sequencing

2
Topology of Protein-protein interaction is
informative but further analysis can reveal other
information. A popular assumption, which is true
in many cases is that similar function proteins
interact with each other. Based on these
assumption, we have developed methods to predict
protein functions and protein complexes from the
PPI networks mainly based on cluster analysis.
3
Cluster Analysis
Cluster Analysis, also called data segmentation,
implies grouping or segmenting a collection of
objects into subsets or "clusters", such that
those within each cluster are more closely
related to one another than objects assigned to
different clusters.
In the context of a graph densely connected nodes
are considered as clusters
Visually we can detect two clusters in this graph
4
K-cores of Protein-Protein Interaction Networks
Definition Let, a graph G(V, E) consists of a
finite set of nodes V and a finite set of edges
E. A subgraph S(V?, E?) where V?? V and E? ? E
is a k-core or a core of order k of G if and
only if ? v ? V? deg(v) ? k within S and S is
the maximal subgraph of this property.
5
Graph G
1-core graph The degree of all nodes are one or
more
6
1-core graph The degree of all nodes are one or
more
7
2-core graph The degree of all nodes are two or
more
8
1-core graph The degree of all nodes are one or
more
9
3-core graph The degree of all nodes are three
or more The 3-core is the highest k-core subgraph
of the graph G
10
Analyzing protein-protein interaction data
obtained from different sources, G. D. Bader and
C.W.V. Hogue, Nature biotechnology, Vol 20, 2002
11
(No Transcript)
12
Prediction of Protein Functions Based on K-cores
of Protein-Protein Interaction Networks
Prediction of Protein Functions Based on
K-cores of Protein-Protein Interaction Networks
and Amino Acid Sequences, Md. Altaf-Ul-Amin,
Kensaku Nishikata, Toshihiro Koma, Teppei
Miyasato, Yoko Shinbo, Md. Arifuzzaman, Chieko
Wada, Maki Maeda, Taku Oshima, Hirotada Mori,
Shigehiko Kanaya The 14th International
Conference on Genome Informatics December 14-17,
2003, Yokohama Japan.
13
Total 3007 proteins and 11531 interactions Around
2000 are unknown function proteins Highest K-core
of this total graph is not so helpful
14
10-core graph
15
We separate 1072 interactions (out of 11531)
involving protein synthesis and function unknown
proteins.
P. S.
U. F.
P. S.
P. S.
16
Function unknown Proteins of this 6-kore graph
are likely to be involved in protein synthesis
17
193 interactions out of 11531 interactions
involving electron transport and function unknown
proteins.
18
Function unknown Proteins of this 2-kore graph
are likely to be involved in electron
transfer Further sub-classification may be
possible applying other information with the
k-core subgraph
Highest k-core or the 2-core subgraph of the
graph of the previous page
19
Prediction of Protein Functions Based on
Protein-Protein Interaction Networks A Min-Cut
Approach, Md. Altaf-Ul-Amin, Toshihiro Koma, Ken
Kurokawa, Shigehiko Kanaya, Proceedings of the
Workshop on Biomedical Data Engineering (BMDE),
Tokyo, Japan, pp. 37-43, April 3-4, 2005.
20
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

21
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

22
Introduction After the complete sequencing of
several genomes, the challenging problem now is
to determine the functions of proteins
  • Determining protein functions experimentally
  • Using various computational methods

a) sequence b) structure c) gene
neighborhood d) gene fusions e) cellular
localization f) protein-protein interactions
23
Introduction
Present work predicts protein functions based on
protein-protein interaction network.
  • For the purpose of prediction, we consider the
    interactions of
  • function-unknown proteins with function-known
    proteins and
  • function-unknown proteins with function-unknown
    proteins

In the context of the whole network.
24
Introduction
Schwikowski, B., Uetz, P. and Fields, S. A
network of protein-protein interactions in yeast.
Nature Biotech. 18, 1257-1261 (2000) Deals with
a network of 2039 proteins and 2709 interactions.
65 of interactions occurred between protein
pairs with at least one common function
Hishigaki, H., Nakai, K., Ono, T., Tanigami, A.,
and Tagaki, T. Assessment of prediction accuracy
of protein function from protein-protein
interaction data. Yeast 18, 523-531 (2001)
Reported similar results..
25
Introduction
So, majority of protein-protein interactions are
between similar function protein pairs.
Therefore, We assign function-unknown proteins to
different functional groups in such a way so that
the number of inter-group interactions becomes
the minimum.
Hence we call the proposed approach a Min-Cut
approach.
26
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

27
The concept of Min-Cut
U4
K8
U3
K4
K1
K6
U2
K2
K3
U1
K5
G1
G2
A typical and small network of known and unknown
proteins
28
The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Unknown proteins assigned to known groups based
on majority interactions
29
The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Number of CUT 4
30
The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
An alternative assignment of unknown proteins
31
The concept of Min-Cut
U4
K
U3
K
K
K
U2
K
K
U1
K
G1
G2
Number of CUT 2
For every assignment of unknown proteins, there
is a value of CUT. Min-cut approach looks for an
assignment for which the number of CUT is minimum.
32
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

33
Problem Formulation
Here we explain some points with a typical
example.
34
Problem Formulation
V set of all nodes E set of all edges
GK1, K2, K3, K4, K5, K6, K7, K8, K9,
K10 UU1, U2, U3, U4, U5, U6, U7, U8
35
Problem Formulation
We generate U? U such that each protein of U is
connected in N with at least one protein of group
G by a path of length 1 or length 2.
U U1, U2, U3, U4, U5, U6, U7
36
Problem Formulation
We can assign proteins of U to different groups
and calculate CUT
Interactions between known protein pairs can
never be part of CUT
For this assignment of unknown proteins, the CUT
6
37
Problem Formulation
The problem we are trying to solve is to assign
the proteins of set U to known groups G1 , G2
,.., G3 in such a way so that the CUT becomes
the minimum.
38
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

39
A Heuristic Method
  • The problem under hand is a variant of network
    partitioning problem.
  • It is known that network partitioning problems
    are NP-hard.
  • Therefore, we resort to some heuristics to find a
    solution as better as it is possible.

40
A Heuristic Method
41
A Heuristic Method
U1 has one path of length 1 with G2 and two paths
of length two with G1
42
A Heuristic Method
U4 has two paths of length 1 with G1, one path of
length one with G2 and one path of length two
with G3.
43
A Heuristic Method
44
A Heuristic Method
45
A Heuristic Method
By assigning all the unknown proteins to
respective height priority groups, CUT 6
46
A Heuristic Method
For this assignment of unknown proteins, the CUT
7
47
A Heuristic Method
For this assignment of unknown proteins, the CUT
4
48
  • Outline
  • Introduction
  • The concept of Min-Cut
  • Problem Formulation
  • A Heuristic Method
  • Evaluation of the Proposed Method
  • Conclusions

49
Evaluation of the Proposed Approach
  • The proposed method is a general one and can be
    applied to any organism and any type of
    functional classification.
  • Here we applied it to yeast Saccharomyces
    cerevisiae protein-protein interaction network
  • We obtain the protein-protein interaction data
    from ftp//ftpmips.gsf.de/yeast/PPI/ which
    contains 15613 genetic and physical interactions.

50
Evaluation of the Proposed Approach
YAR019c YMR001c YAR019c YNL098c YAR019c YOR101w
YAR019c YPR111w YAR027w YAR030c YAR027w YBR135w
YAR031w YBR217w ------------- ------------- ----
--------- ------------- Total 12487 pairs
We discard self-interactions and extract a set of
12487 unique binary interactions involving 4648
proteins.
51
Evaluation of the Proposed Approach
A network of 12487 interactions and 4648 proteins
is reasonably big
52
 
Evaluation of the Proposed Approach
We collect from http//mips.gsf.de/genre/proj/yeas
t/index.jsp the classification data
 
53
 
Evaluation of the Proposed Approach
  • The proposed approach is intended to predict the
    functions of function-unknown proteins.
  • However, by predicting the functions of
    function-unknown proteins, it is not possible to
    determine the correctness of the predictions.
  • We consider around 10 randomly selected proteins
    of each group of Table 1 as function-unknown
    proteins.

 
54
 
Evaluation of the Proposed Approach
  • The union of 10 of all groups consists of 604
    proteins. This is the unknown group U.
  • The union of the rest 90 of each of the
    functional groups constitutes the set of known
    proteins G. There are total 3783 proteins in G.
  • We generate U? U such that each protein of U is
    connected in N with at least one protein of group
    G by a path of length 1 or length 2. There are
    470 proteins in U .
  • We predicted functions of these 470 proteins
    using the proposed method.

 
55
Evaluation of the Proposed Approach
We applied this algorithm using Max_value50000
to predict the functions 470 proteins.
56
Evaluation of the Proposed Approach
  • We cannot guarantee that minimum CUT corresponds
    to maximum successful prediction.
  • However, the trends of the results of the Figure
    above shows that it is very likely that the lower
    is the value of CUT the greater is the number of
    successful predictions

57
Evaluation of the Proposed Approach
We then examine the relation of successful
predictions with the number of degrees of the
proteins in the network .
Degree of U4 7 Degree of U73
58
Evaluation of the Proposed Approach
We then examine the relation of successful
predictions with the number of degrees of the
proteins in the network .
59
Evaluation of the Proposed Approach
  • The success rate of prediction is as low as
    30.46 for proteins that have only one degree in
    the interaction network.
  • However it is 67.61 for proteins that have
    degrees 8 or more.
  • This implies that the reliability of the
    prediction can be improved by providing
    reasonable amount of interaction information
Write a Comment
User Comments (0)
About PowerShow.com