Title: Gene and Protein Networks II Monday, April 16 2006
1Gene and Protein Networks IIMonday, April 16 2006
- CSCI 4830 Algorithms for Molecular Biology
- Debra Goldberg
2Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
3Summary of network models
Random not grown, low clustering, short distances, Poisson degree distribution
Regular (lattice) high clustering,long distances
Small world high clustering, short distances
Scale-free power law degree distribution
Hierarchical high clustering, modular, power law degree distribution
4There is information in a genes position in the
network
- We can use this to predict
- Relationships
- Interactions
- Regulatory relationships
- Protein function
- Process
- Complex / molecular machine
5Confidence assessment
- Can use topology to assess confidence if true
edges and false edges have different network
properties - Assess how well each edge fits topology of true
network - Can also predict unknown relations
6Prediction
- A v-w edge would have a high clustering
coefficient
v
w
7Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
8Interaction generality
- Confidence measure for edge based on topology
around neighbors.
Saito, Suzuki, and Hayashizaki 2002,2003
9Confidence assessment
- Integrate experimental details with local
topology - Degree
- Clustering coefficient
- Degree of neighbors
- Etc.
- Used logistic regression
Bader, et al., Nature Biotechnology 2003
10The synthetic lethal network has many triangles
Xiaofeng Xin, Boone Lab
112-hop predictors for SSL
- SSL SSL (S-S)
- Homology SSL (H-S)
- Co-expressed SSL (X-S)
- Physical interaction SSL (P-S)
- 2 physical interactions (P-P)
Wong, et al., PNAS 2004
12Multi-color motifs
Zhang, et al., Journal of Biology 2005
13Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
14Computationally predicting protein function
- Homology
- Machine Learning
- Graph-theoretic methods
15Majority method
- Consider immediate neighbors
- Guilt by association
- Schwikowski, et al., Nature Biotechnology 2001
16Neighborhood method
- How does frequency affect assignment?
- Consider a given radius
- Hishigaki, et al., Yeast 2001
17Minimum Cut methods
- Minimize interactions between proteins with
different annotations - Vazquez, et al., Nature Biotech. 2003
- Karaoz, et al., PNAS 2004
18Functional flow
- Use network flow algorithm to transport
function annotation - Nabieva, et al., Bioinformatics 2005
19A Markov Random Field method
- Function prediction based on
- Frequency of each function
- neighbors
- of these neighbors with function in question
- Functional linkage graph
- Iterate twice
- Letovsky and Kasif, Bioinformatics 2003
20Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
21Community structure
- Proteins in a community may be involved in a
common process or function - Communities are dense subgraphs with sparse
interconnections
22Hierarchical clustering (1)Using natural edge
weights
- Gene co-expression
- e.g., Eisen MB, et al., PNAS 1998
from www.medscape.com
23Hierarchical clustering (2)Adjacency vector
- Function cluster Tong et al., Science 2004
- Find drug targets Parsons et al., Nature
Biotechnology 2004
24Topological overlap
- A measure of neighborhood similarity
Ravasz, et al., Science 2002
25Spectral clustering
- Compute adjacency matrix eigenvectors
- Each eigenvector defines a cluster
- Proteins with high magnitude contributions
Bu, et al., Nucleic Acids Research 2003
positive eigenvalue negative eigenvalue
26Dense subgraphs
- Spirin and Mirny, PNAS 2003
- Find fully connected subgraphs (cliques), OR
- Find subgraphs that maximize density 2 m / (n
(n-1)) - Bader and Hogue, BMC Bioinformatics 2003
- Weight vertices by neighborhood density,
connectedness - Find connected communities with high weights
27Betweenness centrality
- Consider the shortest path(s) between all pairs
of nodes - Betweenness centrality of an edge is a measure
of how many shortest paths traverse this edge
- Edges between communities have higher centrality
Girvan , et al., PNAS 2002
28Finding motifs
29Finding motifs
30Motif function and aggregation
31Motif function and aggregation
32Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
33Relationships between network data types
- Distinct data sources generally lead to better
inferences. - Associations not independent
- Errors independent
34Various methods with varying goals
35Incorporating experimental conditions
Luscombe, et al., Nature 2004
36Party and date hubs
- Protein interaction network
- Partition hubs by expression correlation of
neighbors
Han, et al., Nature 2004
37Network connectivity
- Scale-free networks are
- Robust to random failures
- Vulnerable to attacks on hubs
- Removing hubs quickly disconnects a network and
reduces the size of the largest component
Albert, et al., Nature 2000
38Removing date hubs shatters network into
communities
Date Hubs
Party Hubs
Many sub-networks
A single main component
39Multiple species
40Network alignment
- Across or within species
- Interaction network and genome sequence
- e.g., Ogata, et al., Nucleic Acids Research 2000
41Outline
- Recap
- Confidence assessment, edge prediction (contd)
- Predicting protein function
- Predicting protein complexes/functional groups
- Network integration
- Caveats, cautions, practical issues
42Bias Protein abundance
- Abundant proteins are
- more likely to be represented in some types of
experiments - More likely to be essential
- Correlation between degree (hubs) and
essentiality disappears or is reduced when
corrected for protein abundance
Bloom and Adami, BMC Evolutionary Biology 2003
43Bias Degree correlation
- Anti-correlation of degrees of interacting
proteins disappears in un-biased data
Coulomb, et al., Proceedings of the Royal
Society B 2005
44Data quality and sparseness
45No gold standard
- Insufficient highly-accurate data
- Gold-standards often used to train and validate
- Insufficient standardization of procedures
46Significance
47Final words
- Network analysis has become an essential tool for
analyzing complex systems - There is still much biologists can learn from
scientists in other disciplines - Network analysis is itself a new and evolving
field