YoungRae Cho - PowerPoint PPT Presentation

About This Presentation

Title:

YoungRae Cho

Description:

... of Functional Modules and Hub Proteins in Protein Interaction Networks ... Parent node p(a) of a: Hub Confidence Measurement. Set of child nodes D(a) of a: ... – PowerPoint PPT presentation

Number of Views:71

Avg rating:3.0/5.0

Slides: 30

Provided by: youngr5

Learn more at: https://cse.buffalo.edu

Category:

more less

Transcript and Presenter's Notes

Title: YoungRae Cho

1
Seminar 2009
Identification of Functional Modules and Hub
Proteins in Protein Interaction Networks

Young-Rae Cho
Department of Computer Science and Engineering
State University of New York at Buffalo

2
What is Bioinformatics?

Bioinformatics
Interdisciplinary research area to manage and
analyze biological data

Techniques
Data
Applications
3
What is Bioinformatics?
Computational Techniques
Data Mining Machine Learning
Data Mining
Biomedical Applications
Knowledge
Biological Data
Functional Characterization
Genome Proteome Networks
Functional Characterization Disease
Diagnosis Drug Development
Networks
4
Overview

Introduction
Protein Interaction Networks and Their Structural
Properties
Preprocess - Network Weighting
Integration of Gene Ontology using Semantic
Similarity Measures
Functional Module Identification
Weighted Interaction Networks ?
? Functional Modules
Hub Protein Identification
Weighted Interaction Networks ?
? Hub Proteins
Conclusion

5
Biological Network

Definition
Directed or undirected graph representation
Biological molecules as nodes and
biochemical reactions or biophysical
interactions as edges
Examples
Metabolic networks
Signal transduction networks
Gene regulatory networks
Protein interaction networks
Importance
Provide a global view of cellular organizations
and biological processes
Applicable to systematic approaches for knowledge
discovery

6
Protein-Protein Interaction (PPI)

Biological Meaning of PPI
Proteins interact with each other for stability
and functionality
Most cellular functions are performed in a
protein complex level
Interaction evidence is interpreted as functional
coherence / consistency
Determination of PPIs
Experimental methods
Yeast two-hybrid systems, Mass
spectrometry, Protein microarray
Computational methods
Homology search, Gene fusion analysis,
Phylogenetic profiles
Problem of PPI data
Current PPI databases include a large amount of
false positives / false negatives
? Unreliability

7
Protein Interaction Network

Representation of Protein Interaction Networks
Undirected, un-weighted graph G(V,E),
a set of nodes V as proteins and a set
of edges E as interactions
Problem of Protein Interaction Networks
Large scale
Complex connectivity

8
Structural Properties

Small-world Phenomenon ( Watts Strogatz )
Appearance of networks in the middle of regular
and random networks
Higher average clustering coefficient than
expected by random chance
Significantly small average shortest path length
Scale-free Distribution ( Barabasi Albert )
Network growth by preferential attachment
Power law degree distribution a few high degree
nodes, many low degree nodes
Clustering coefficient distribution independent
to degree

9
Overview

Introduction
Protein Interaction Networks and Their Structural
Properties
Preprocess - Network Weighting
Integration of Gene Ontology using Semantic
Similarity Measures
Functional Module Identification
Weighted Interaction Networks ?
? Functional Modules
Hub Protein Identification
Weighted Interaction Networks ?
? Hub Proteins
Conclusion

10
Network Weighting Schemes

Motivation
Unreliable protein interaction networks
Transforming un-weighted graph to weighted graph
by assigning the interaction reliability
(or intensity) into each edge as a weight
Unsupervised Approaches
Using network connectivity, e.g., common
neighbors, alternative paths
Problem unreliable weights
Supervised Approaches
Using other resources verifying interactions,
e.g., gene sequence, gene expression
Integrating Gene Ontology data in my works
the most comprehensive
well-curated

11
Gene Ontology (GO)

Structure
Terms (Concepts) well-defined biological
description
Relationships is-a / part-of
(general-to-specific) between terms
Annotation
If a protein is annotated on a term, then it is
also annotated on the terms on the
paths towards root.

? Transitivity
P5
P1
P1, P2, P3
P1, P2, P4
P2, P3
P1, P6
P1, P2, P3, P6
P2, P3
12
Semantic Similarity

Reliability of Interacting Proteins
Average (or Maximum) semantic similarity of
pair-wise terms
including the interacting proteins in
annotations
Structure-based Approaches
Path length or Common parent terms
Problem all edges should represent the uniform
specificity
Information Content-based Approaches
Information content of a term T is defined as
log(P(T))
simxy - log ( Pi(x,y) )
where Pi(x,y) is the proportion of the
annotations of the term including x and y
Normalized simxy

13
Overview

Introduction
Protein Interaction Networks and Their Structural
Properties
Preprocess - Network Weighting
Integration of Gene Ontology using Semantic
Similarity Measures
Functional Module Identification
Weighted Interaction Networks ?
? Functional Modules
Hub Protein Identification
Weighted Interaction Networks ?
? Hub Proteins
Conclusion

14
Functional Module Identification

Functional Module
A set of molecules that participate in the same
biological processes or functions
Sub-network with dense intra-connections and
sparse interconnection
Functional Module Identification
? Graph clustering problem
Previous Clustering Approaches
Density-based methods, e.g., maximum clique,
quasi clique, clique percolation
Partition-based methods, e.g., restricted
neighborhood search, Markov clustering
Hierarchical methods
Bottom-up approaches, e.g., distance-based,
common neighbors
Top-down approaches, e.g., minimum cut,
betweenness cut

15
Functional Influence Model

Functional Influence
Influence factors normalized weights, inverse
of degree
Measurements
Single-path-based method
O( V E )
All-path-based method NP
Random-walk-based method
O( V3 ) iteration O( V4 )

Improvement by an efficient algorithm
16
Flow Simulation

Information Flow Simulation
Computation of functional influence infs(x) of s
on x ? V based on random walks
Input a weighted interaction network and a
source node s
Output functional influence pattern of s
Algorithm
Initialize infs(s)
Compute initial flow finit(s ? y) by
Update infs(y) by
Compute flow fs(y ? z) by
Repeat 3 and 4 until fs(y ? z) is less than a
threshold ?

17
Lower-level Algorithm
18
Schematic View
0.15
0.28
0.65
0.79
0.45
Pattern Clustering
0.27
1.26
0.83
1.0
1.74
0.41
0.92
0.89
1.38
0.11
0.31
19
Time Complexity

Efficiency
Traces only connecting nodes to calculate
functional influence of a source
Removes trivial flow, being less than ?, as early
as possible
Run Time
Theoretical upper bound is unknown ( not depends
on the network diameter )
Test potential factors ( nodes, density,
average degree ) with synthetic networks

20
Accuracy

Experiment
Data yeast protein interaction network from DIP
Pattern clustering pCluster algorithm (Wang et
al., SIGMOD 2002)
Evaluation
Functional categories and annotations from MIPS
Hyper-geometric p-value
Result

21
Overview

Introduction
Protein Interaction Networks and Their Structural
Properties
Preprocess - Network Weighting
Integration of Gene Ontology using Semantic
Similarity Measures
Functional Module Identification
Weighted Interaction Networks ?
? Functional Modules
Hub Protein Identification
Weighted Interaction Networks ?
? Hub Proteins
Conclusion

22
Hub Protein Identification

Hub Protein
Centrally located node in the modular structure
of a protein interaction network
( a structural hub )
Functionally essential protein
Previous Centrality Measurements
Closeness centrality
Betweenness centrality
Bridging centrality

23
Functional Influence Model