Protein Interaction Networks presentation

About This Presentation

Transcript and Presenter's Notes

Title: Protein Interaction Networks

1
Protein Interaction Networks

Thanks to Mehmet Koyuturk

2
Protein-Protein Interactions

Physical association between proteins
Signal transduction, phosphorylation
Docking, complex formation
Permanent vs. transient interactions
Co-location of proteins
Proteins that work in the same cellular component
Soluble location lysosome, mitochondrial stroma
Membrane location receptors in plasma membrane,
transporters in mitochondrial membrane
Functional association of proteins
Proteins involved in the same biomolecular
activity
Enzymes in the same pathway, co-regulated proteins

3
Permanent vs Transient Interactions

Permanent interactions
Some proteins form a stable protein complex that
carries out a structural or functional
biomolecular role
These proteins are protein subunits of the
complex and they work together
ATPase subunits, subunits of nuclear pore
Transient interactions
Proteins that come together in certain cellular
states to undertake a biomolecular function
DNA replicative complex, signal transduction

4
Signal Transduction

Phosphorylation
Protein-kinase interaction
Enzyme activation

Signaling cascade

5
Why Study Protein Interactions?

Identification of functional modules and
interconnections between these modules
Functional annotation based on binding partners
and interaction patterns
Identification of evolutionarily conserved
pathways
Identification of drug target proteins to
minimize side effects

6
Identification of Protein Interactions

Traditionally, protein interactions are
identified by wetlab experiments based on
hypotheses on candidate proteins
Small scale assays
Coimmunoprecipitation Immunoprecipitate one
protein, see if other is also precipitated
Reliable, but can only verify interactions
between suspected partners
High throughput screening
Throw in thousands of ORFs and see which ones
bind to each other
Yeast two hybrid, tandem affinity purification
Large scale, but a lot of noise

7
Yeast Two Hybrid

Split yeast GAL4 gene, which encodes a
transcription factor, required for activation of
GAL genes in two parts
Activating domain, binding domain
The split protein does not work unless the two
parts are in physical contact

8
Protein Interaction Networks

Organize all identified interactions in a
network, where proteins are represented by nodes
and interactions are represented by edges
TAP identifies a group of proteins that are
caught by target protein
Spoke model (star network) vs. matrix model
(clique)

Interaction
Protein
9
Functional Modularity in PPI Networks

A protein complex
Dense subgraph
A signal transduction pathway
Simple path, parallel paths
A protein with common, key,
fundamental role (e.g., a kinase)
Hub node

10
Computational Prediction of PPIs

Functional association is a higher level
conceptualization of interaction
Proteins that act as enzymes catalyzing reactions
in the same metabolic pathway
Functionally associated proteins are likely to
show up in similar contexts
Co-regulation, co-expression, co-evolution,
co-citation
Functional association between proteins can be
computationally identified by looking at
different sources of data such as sequences, gene
expression, literature
Can also be extended to capture physical
associations, for example, by taking into account
evolution at structural level

11
Conservation of Gene Neighborhood

In bacteria, the genome of an organism is
organized in such a way that that functionally
related proteins are coded by neighboring regions
Operons
When more than one bacterial species are
considered, it is observed that this neighborhood
relationship becomes even more relevant

Distribution of neighboring genes in H.
Influenzae and E. coli into functional classes
12
Comparison of Nine Bacterial Genomes

trpB-trpA is the only gene pair whose proximity
is conserved across nine prokaryotic genomes
These genes encode the two subunits of tryptophan
synthase that interact and catalyze a single
reaction

13
Close Orthologs

Run of genes
A set of genes on one strand, such that gaps
between adjacent genes is less than a threshold,
g (in practice, g ? 300 bp)
Any pair of genes on the same run are said to be
close
Bidirectional best hits
Genes X1 and X2 from genomes G1 and G2 are BBH,
if their sequence similarity is significant and
there are no Y1 (Y2) in G1(G2) that is more
similar to X2 (X1) than X1 (X2)

Pair of close bidirectional best hits Xa, Ya
close in G1, Xb, Yb close in G2, XaXb BBH, Ya
Yb BBH
14
Predicting Interactions

For each pair of close orthologs (occuring at
least one pair of genomes), calculate a score
Score should increase with the phylogenetic
distance between the two genomes, since closely
related organisms are more likely to have similar
genes nearby due to chance alone
Existence of a triplet (P1, P2, P3) should be
stronger than the existence of two pairs (P1, P2
and P1, P3)
Triplet distance can be estimated as the minimum
distance between any pair of organisms (in
addition to pair score)

15
Reconstructing Pathways
Purine Metabolism

Can identify the association between unknown
proteins and known pathways!

16
Projection of Gene Neighborhood

The composition of operons is evolutionarily
variable
A particular set of functionally related genes do
not always comprise an operon
The application of gene neighborhood based
interaction prediction is limited for a single
organism
With multiple organisms, it is possible to
statistically strengthen conclusions and project
findings on other organisms
If an operon with functionally related genes
exists in several genomes, a functional
association can be predicted for other organisms,
even if the corresponding genes are scattered
Variability turns out to be an advantage for
prediction

17
Gene Neighborhood - Limitations

It is only directly applicable to bacteria (and
archaea), because relevance of gene order does
not necessarily extend to eukaryotes
For closely related species, conserved gene order
might just be due to lack of time for genome
rearrangements
We are interested in selective constraints that
preserve gene order
Compared species should be distant enough
But not too distant, because we need sufficient
number of orthologs to be able to derive
statistically meaningful results

18
Gene Fusion

Domain fusion events
Two protein domains that act as independent
proteins (components) in one organism may form
(part of) a single polypoptide chain (composites)
in another organism
Most proteins that are involved in domain fusion
events are known to be subunits of multiprotein
complexes (76 in E. coli metabolic network)

19
Gene Fusion Based PPI Prediction

A pair of proteins in query genome are candidate
interacting pairs if
They show (local) sequence similarity to the same
protein (rosetta stone) in reference genome
They do now show sequence similarity with each
other
Complete genomes!

20
Predicted Interactions
Known physical interactions
Proteins in the same pathway
21
Gene Fusion Based Prediction - Results

Interactions predicted based on gene fusion
events
Distance on circle shows distance on genome

22
Co-evolution of Interacting Proteins

Selective pressure is likely to act on common
function
Proteins that are interacting are expected to
either be conserved together along with their
interactions, or not conserved at all
Hypothesis 1 Orthologs of interacting proteins
also interact in other species (supported by
evidence, but there are subtleties, which we will
discuss this later)
Hypothesis II If two proteins are
interacting, then they will show
similar conservation patterns
Phylogenetic profiles

23
Phylogenetic Profiles
24
Correlation of Phylogenetic Profiles

Assume we have N genomes, protein X has homologs
in x of them, Y has y, and they co-occur in z
genomes
Hamming distance
Pearson correlation
Mutual information
Statistical significance

25
Phylogenetic Profiles - Limitations

Many processes may be common across lineages
Too many false positives
Database of genomes may be biased
All organisms are treated equally
Improvement Use trees instead of profiles
Proteins are assumed to be conserved as a whole
It is domains that interact
Improvement Use domain profiles

Yeast nucleoli and ribosomal proteins
Organisms
26
Phylogenetic Tree Based Prediction

Phylogenetic trees of Ntr-family two-component
sensor histidine kinases and their corresponding
regulators

27
Mirror Tree Method

Need to have sufficient number of genomes that
contain homologs of both proteins

28
Matrix Method

Start with families of proteins that are
suspected to interact
Identify specific pairs of proteins that interact
by aligning the phylogenetic trees that underly
the two families
Assumption Identical number of proteins in each
family

29
Correlated Mutations

Co-evolution of interacting proteins can be
followed more closely by quantifying the degree
of co-variation between pairs of residues from
these proteins
Correlated mutations may correspond to
compensatory mutations that stabilize the
mutations in one protein with changes in the
other

Distribution of distances between aminoacid
positions on a folded protein
30
In silico Two-Hybrid

The correlation of mutations between two
positions (may be on different proteins) can be
estimated from pairwise assessment of aligned
multiple sequences
Position pairs with high correlation are
potential contact points
Interaction index
For a protein pair, compute the aggregate
correlation (of mutations) across all positions

31
In silico Two-Hybrid
32
Performance of I2H

I2H predicts physical, rather than functional
association
It requires complete genomes sufficient number
of homologs

33
Co-citation Based PPI Prediction

Functionally associated proteins are likely to be
cited in the same research article
We can assess the statistical significance of
co-citation based on hypergeometric model
Algorithmic problem How to recognize match
protein names?
Train algorithm using annotated abstracts via
conditional random fields (CRF)

34
Performance of Co-citation

Statistical significance is quite relevant until
it saturates

The method is robust to choice of parameters for
name recognition

35
Integrating PPI Networks

Interaction data coming from multiple sources
Different sources refer to different levels of
interaction
Can integration handle noise, making interaction
data more reliable?
Superpose interactions based on their reliability

36
Bayesian Integration

For each prediction method, compute
log-likelihood score
Let P(LE) be the number of interactions
predicted by method E, such that functional
association between corresponding proteins is
known
Let P(LE) be the number of false positives
Let P(L) and P(L) be the corresponding priors
Assign weights to methods based on their
log-likelihood scores

37
Comparison of Prediction Methods

Integrated network captures functional
association better
Note that the integrated network is trained
using available data on functional association

38
Classification Based Integration

Points Proteins, Space Expression,
Conservation, Labels Function
Points Protein Pairs, Space Co-expression,
Co-evolution, etc., Labels Existence of
Interaction

39
Performance of Domain Co-evolution
40
Co-Evolutionary Matrix
41
Domain Identification
42
Difference between Predicted PPIs

Write a Comment

User Comments (0)

About PowerShow.com

Protein Interaction Networks PowerPoint PPT Presentation