YoungRae Cho - PowerPoint PPT Presentation

About This Presentation
Title:

YoungRae Cho

Description:

... of Functional Modules and Hub Proteins in Protein Interaction Networks ... Parent node p(a) of a: Hub Confidence Measurement. Set of child nodes D(a) of a: ... – PowerPoint PPT presentation

Number of Views:71
Avg rating:3.0/5.0
Slides: 30
Provided by: youngr5
Learn more at: https://cse.buffalo.edu
Category:
Tags: youngrae | cho | hub

less

Transcript and Presenter's Notes

Title: YoungRae Cho


1
Seminar 2009
Identification of Functional Modules and Hub
Proteins in Protein Interaction Networks
  • Young-Rae Cho
  • Department of Computer Science and Engineering
  • State University of New York at Buffalo

2
What is Bioinformatics?
  • Bioinformatics
  • Interdisciplinary research area to manage and
    analyze biological data

Techniques
Data
Applications
3
What is Bioinformatics?
Computational Techniques
Data Mining Machine Learning
Data Mining
Biomedical Applications
Knowledge
Biological Data
Functional Characterization
Genome Proteome Networks
Functional Characterization Disease
Diagnosis Drug Development
Networks
4
Overview
  • Introduction
  • Protein Interaction Networks and Their Structural
    Properties
  • Preprocess - Network Weighting
  • Integration of Gene Ontology using Semantic
    Similarity Measures
  • Functional Module Identification
  • Weighted Interaction Networks ?
    ? Functional Modules
  • Hub Protein Identification
  • Weighted Interaction Networks ?
    ? Hub Proteins
  • Conclusion

5
Biological Network
  • Definition
  • Directed or undirected graph representation
  • Biological molecules as nodes and
  • biochemical reactions or biophysical
    interactions as edges
  • Examples
  • Metabolic networks
  • Signal transduction networks
  • Gene regulatory networks
  • Protein interaction networks
  • Importance
  • Provide a global view of cellular organizations
    and biological processes
  • Applicable to systematic approaches for knowledge
    discovery

6
Protein-Protein Interaction (PPI)
  • Biological Meaning of PPI
  • Proteins interact with each other for stability
    and functionality
  • Most cellular functions are performed in a
    protein complex level
  • Interaction evidence is interpreted as functional
    coherence / consistency
  • Determination of PPIs
  • Experimental methods
  • Yeast two-hybrid systems, Mass
    spectrometry, Protein microarray
  • Computational methods
  • Homology search, Gene fusion analysis,
    Phylogenetic profiles
  • Problem of PPI data
  • Current PPI databases include a large amount of
    false positives / false negatives
  • ? Unreliability

7
Protein Interaction Network
  • Representation of Protein Interaction Networks
  • Undirected, un-weighted graph G(V,E),
  • a set of nodes V as proteins and a set
    of edges E as interactions
  • Problem of Protein Interaction Networks
  • Large scale
  • Complex connectivity

8
Structural Properties
  • Small-world Phenomenon ( Watts Strogatz )
  • Appearance of networks in the middle of regular
    and random networks
  • Higher average clustering coefficient than
    expected by random chance
  • Significantly small average shortest path length
  • Scale-free Distribution ( Barabasi Albert )
  • Network growth by preferential attachment
  • Power law degree distribution a few high degree
    nodes, many low degree nodes
  • Clustering coefficient distribution independent
    to degree

9
Overview
  • Introduction
  • Protein Interaction Networks and Their Structural
    Properties
  • Preprocess - Network Weighting
  • Integration of Gene Ontology using Semantic
    Similarity Measures
  • Functional Module Identification
  • Weighted Interaction Networks ?
    ? Functional Modules
  • Hub Protein Identification
  • Weighted Interaction Networks ?
    ? Hub Proteins
  • Conclusion

10
Network Weighting Schemes
  • Motivation
  • Unreliable protein interaction networks
  • Transforming un-weighted graph to weighted graph
  • by assigning the interaction reliability
    (or intensity) into each edge as a weight
  • Unsupervised Approaches
  • Using network connectivity, e.g., common
    neighbors, alternative paths
  • Problem unreliable weights
  • Supervised Approaches
  • Using other resources verifying interactions,
    e.g., gene sequence, gene expression
  • Integrating Gene Ontology data in my works
  • the most comprehensive
  • well-curated

11
Gene Ontology (GO)
  • Structure
  • Terms (Concepts) well-defined biological
    description
  • Relationships is-a / part-of
    (general-to-specific) between terms
  • Annotation
  • If a protein is annotated on a term, then it is
    also annotated on the terms on the


  • paths towards root.

? Transitivity
P5
P1
P1, P2, P3
P1, P2, P4
P2, P3
P1, P6
P1, P2, P3, P6
P2, P3
12
Semantic Similarity
  • Reliability of Interacting Proteins
  • Average (or Maximum) semantic similarity of
    pair-wise terms
  • including the interacting proteins in
    annotations
  • Structure-based Approaches
  • Path length or Common parent terms
  • Problem all edges should represent the uniform
    specificity
  • Information Content-based Approaches
  • Information content of a term T is defined as
    log(P(T))
  • simxy - log ( Pi(x,y) )
  • where Pi(x,y) is the proportion of the
    annotations of the term including x and y
  • Normalized simxy

13
Overview
  • Introduction
  • Protein Interaction Networks and Their Structural
    Properties
  • Preprocess - Network Weighting
  • Integration of Gene Ontology using Semantic
    Similarity Measures
  • Functional Module Identification
  • Weighted Interaction Networks ?
    ? Functional Modules
  • Hub Protein Identification
  • Weighted Interaction Networks ?
    ? Hub Proteins
  • Conclusion

14
Functional Module Identification
  • Functional Module
  • A set of molecules that participate in the same
    biological processes or functions
  • Sub-network with dense intra-connections and
    sparse interconnection
  • Functional Module Identification
  • ? Graph clustering problem
  • Previous Clustering Approaches
  • Density-based methods, e.g., maximum clique,
    quasi clique, clique percolation
  • Partition-based methods, e.g., restricted
    neighborhood search, Markov clustering
  • Hierarchical methods
  • Bottom-up approaches, e.g., distance-based,
    common neighbors
  • Top-down approaches, e.g., minimum cut,
    betweenness cut

15
Functional Influence Model
  • Functional Influence
  • Influence factors normalized weights, inverse
    of degree
  • Measurements
  • Single-path-based method
  • O( V E )
  • All-path-based method NP
  • Random-walk-based method
  • O( V3 ) iteration O( V4 )

Improvement by an efficient algorithm
16
Flow Simulation
  • Information Flow Simulation
  • Computation of functional influence infs(x) of s
    on x ? V based on random walks
  • Input a weighted interaction network and a
    source node s
  • Output functional influence pattern of s
  • Algorithm
  • Initialize infs(s)
  • Compute initial flow finit(s ? y) by
  • Update infs(y) by
  • Compute flow fs(y ? z) by
  • Repeat 3 and 4 until fs(y ? z) is less than a
    threshold ?

17
Lower-level Algorithm
18
Schematic View
0.15
0.28
0.65
0.79
0.45
Pattern Clustering
0.27
1.26
0.83
1.0
1.74
0.41
0.92
0.89
1.38
0.11
0.31
19
Time Complexity
  • Efficiency
  • Traces only connecting nodes to calculate
    functional influence of a source
  • Removes trivial flow, being less than ?, as early
    as possible
  • Run Time
  • Theoretical upper bound is unknown ( not depends
    on the network diameter )
  • Test potential factors ( nodes, density,
    average degree ) with synthetic networks

20
Accuracy
  • Experiment
  • Data yeast protein interaction network from DIP
  • Pattern clustering pCluster algorithm (Wang et
    al., SIGMOD 2002)
  • Evaluation
  • Functional categories and annotations from MIPS
  • Hyper-geometric p-value
  • Result

21
Overview
  • Introduction
  • Protein Interaction Networks and Their Structural
    Properties
  • Preprocess - Network Weighting
  • Integration of Gene Ontology using Semantic
    Similarity Measures
  • Functional Module Identification
  • Weighted Interaction Networks ?
    ? Functional Modules
  • Hub Protein Identification
  • Weighted Interaction Networks ?
    ? Hub Proteins
  • Conclusion

22
Hub Protein Identification
  • Hub Protein
  • Centrally located node in the modular structure
    of a protein interaction network
  • ( a structural hub )
  • Functionally essential protein
  • Previous Centrality Measurements
  • Closeness centrality
  • Betweenness centrality
  • Bridging centrality

23
Functional Influence Model
  • Functional Influence
  • Influence factors normalized weights, inverse
    of degree
  • Measurements
  • Single-path-based method
  • O( V E )
  • All-path-based method NP
  • Random-walk-based method
  • O( V3 ) iteration O( V4 )

Improvement by a heuristic algorithm
24
Path Strength
  • Single-path-based path strength
  • All-path-based path strength
  • sums up the k-length path strength for all
    possible k
  • uses the threshold of maximum k

25
Network Conversion
  • Network Conversion
  • Input a protein interaction network / Output
    a hierarchical tree structure
  • Algorithm
  • Centrality (weighted closeness) of a node a
  • Set of ancestor nodes T(a) of a
  • Parent node p(a) of a
  • Hub Confidence Measurement
  • Set of child nodes D(a) of a
  • Set of descendent nodes La of a
  • Hub confidence H(a) of a

26
Schematic View
  • Hub Confidence
  • How strongly a node plays a role as a structural
    hub
  • Not fully depends on the hierarchical level in
    the tree structure

27
Structural Hubs
  • Top 10 Structural Hubs in the Yeast Protein
    Interaction Network
  • Not related to their degree
  • Each one has several different functions

28
Lethality
  • Biological Essentiality
  • Evaluated by comparing with lethal proteins
  • Lethality has been determined by protein
    knock-out experiments
  • Result

29
Conclusion
  • Problems
  • Complex and unreliable connectivity in protein
    interaction networks
  • Contributions
  • Reliable network generation by edge weighting
  • Hidden knowledge discovery, e.g., patterns or
    taxonomy
  • Collaboration with existing computational
    techniques
  • Future Works
  • Integration with multiple data sources
  • Comparative analysis across organisms

30
Questions?
Write a Comment
User Comments (0)
About PowerShow.com