Protein Interaction Networks - PowerPoint PPT Presentation

About This Presentation
Title:

Protein Interaction Networks

Description:

Protein Interaction Networks Thanks to Mehmet Koyuturk – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 43
Provided by: koy6
Category:

less

Transcript and Presenter's Notes

Title: Protein Interaction Networks


1
Protein Interaction Networks
  • Thanks to Mehmet Koyuturk

2
Protein-Protein Interactions
  • Physical association between proteins
  • Signal transduction, phosphorylation
  • Docking, complex formation
  • Permanent vs. transient interactions
  • Co-location of proteins
  • Proteins that work in the same cellular component
  • Soluble location lysosome, mitochondrial stroma
  • Membrane location receptors in plasma membrane,
    transporters in mitochondrial membrane
  • Functional association of proteins
  • Proteins involved in the same biomolecular
    activity
  • Enzymes in the same pathway, co-regulated proteins

3
Permanent vs Transient Interactions
  • Permanent interactions
  • Some proteins form a stable protein complex that
    carries out a structural or functional
    biomolecular role
  • These proteins are protein subunits of the
    complex and they work together
  • ATPase subunits, subunits of nuclear pore
  • Transient interactions
  • Proteins that come together in certain cellular
    states to undertake a biomolecular function
  • DNA replicative complex, signal transduction

4
Signal Transduction
  • Phosphorylation
  • Protein-kinase interaction
  • Enzyme activation
  • Signaling cascade

5
Why Study Protein Interactions?
  • Identification of functional modules and
    interconnections between these modules
  • Functional annotation based on binding partners
    and interaction patterns
  • Identification of evolutionarily conserved
    pathways
  • Identification of drug target proteins to
    minimize side effects

6
Identification of Protein Interactions
  • Traditionally, protein interactions are
    identified by wetlab experiments based on
    hypotheses on candidate proteins
  • Small scale assays
  • Coimmunoprecipitation Immunoprecipitate one
    protein, see if other is also precipitated
  • Reliable, but can only verify interactions
    between suspected partners
  • High throughput screening
  • Throw in thousands of ORFs and see which ones
    bind to each other
  • Yeast two hybrid, tandem affinity purification
  • Large scale, but a lot of noise

7
Yeast Two Hybrid
  • Split yeast GAL4 gene, which encodes a
    transcription factor, required for activation of
    GAL genes in two parts
  • Activating domain, binding domain
  • The split protein does not work unless the two
    parts are in physical contact

8
Protein Interaction Networks
  • Organize all identified interactions in a
    network, where proteins are represented by nodes
    and interactions are represented by edges
  • TAP identifies a group of proteins that are
    caught by target protein
  • Spoke model (star network) vs. matrix model
    (clique)

Interaction
Protein
9
Functional Modularity in PPI Networks
  • A protein complex
  • Dense subgraph
  • A signal transduction pathway
  • Simple path, parallel paths
  • A protein with common, key,
  • fundamental role (e.g., a kinase)
  • Hub node

10
Computational Prediction of PPIs
  • Functional association is a higher level
    conceptualization of interaction
  • Proteins that act as enzymes catalyzing reactions
    in the same metabolic pathway
  • Functionally associated proteins are likely to
    show up in similar contexts
  • Co-regulation, co-expression, co-evolution,
    co-citation
  • Functional association between proteins can be
    computationally identified by looking at
    different sources of data such as sequences, gene
    expression, literature
  • Can also be extended to capture physical
    associations, for example, by taking into account
    evolution at structural level

11
Conservation of Gene Neighborhood
  • In bacteria, the genome of an organism is
    organized in such a way that that functionally
    related proteins are coded by neighboring regions
  • Operons
  • When more than one bacterial species are
    considered, it is observed that this neighborhood
    relationship becomes even more relevant

Distribution of neighboring genes in H.
Influenzae and E. coli into functional classes
12
Comparison of Nine Bacterial Genomes
  • trpB-trpA is the only gene pair whose proximity
    is conserved across nine prokaryotic genomes
  • These genes encode the two subunits of tryptophan
    synthase that interact and catalyze a single
    reaction

13
Close Orthologs
  • Run of genes
  • A set of genes on one strand, such that gaps
    between adjacent genes is less than a threshold,
    g (in practice, g ? 300 bp)
  • Any pair of genes on the same run are said to be
    close
  • Bidirectional best hits
  • Genes X1 and X2 from genomes G1 and G2 are BBH,
    if their sequence similarity is significant and
    there are no Y1 (Y2) in G1(G2) that is more
    similar to X2 (X1) than X1 (X2)

Pair of close bidirectional best hits Xa, Ya
close in G1, Xb, Yb close in G2, XaXb BBH, Ya
Yb BBH
14
Predicting Interactions
  • For each pair of close orthologs (occuring at
    least one pair of genomes), calculate a score
  • Score should increase with the phylogenetic
    distance between the two genomes, since closely
    related organisms are more likely to have similar
    genes nearby due to chance alone
  • Existence of a triplet (P1, P2, P3) should be
    stronger than the existence of two pairs (P1, P2
    and P1, P3)
  • Triplet distance can be estimated as the minimum
    distance between any pair of organisms (in
    addition to pair score)

15
Reconstructing Pathways
Purine Metabolism
  • Can identify the association between unknown
    proteins and known pathways!

16
Projection of Gene Neighborhood
  • The composition of operons is evolutionarily
    variable
  • A particular set of functionally related genes do
    not always comprise an operon
  • The application of gene neighborhood based
    interaction prediction is limited for a single
    organism
  • With multiple organisms, it is possible to
    statistically strengthen conclusions and project
    findings on other organisms
  • If an operon with functionally related genes
    exists in several genomes, a functional
    association can be predicted for other organisms,
    even if the corresponding genes are scattered
  • Variability turns out to be an advantage for
    prediction

17
Gene Neighborhood - Limitations
  • It is only directly applicable to bacteria (and
    archaea), because relevance of gene order does
    not necessarily extend to eukaryotes
  • For closely related species, conserved gene order
    might just be due to lack of time for genome
    rearrangements
  • We are interested in selective constraints that
    preserve gene order
  • Compared species should be distant enough
  • But not too distant, because we need sufficient
    number of orthologs to be able to derive
    statistically meaningful results

18
Gene Fusion
  • Domain fusion events
  • Two protein domains that act as independent
    proteins (components) in one organism may form
    (part of) a single polypoptide chain (composites)
    in another organism
  • Most proteins that are involved in domain fusion
    events are known to be subunits of multiprotein
    complexes (76 in E. coli metabolic network)

19
Gene Fusion Based PPI Prediction
  • A pair of proteins in query genome are candidate
    interacting pairs if
  • They show (local) sequence similarity to the same
    protein (rosetta stone) in reference genome
  • They do now show sequence similarity with each
    other
  • Complete genomes!

20
Predicted Interactions
Known physical interactions
Proteins in the same pathway
21
Gene Fusion Based Prediction - Results
  • Interactions predicted based on gene fusion
    events
  • Distance on circle shows distance on genome

22
Co-evolution of Interacting Proteins
  • Selective pressure is likely to act on common
    function
  • Proteins that are interacting are expected to
    either be conserved together along with their
    interactions, or not conserved at all
  • Hypothesis 1 Orthologs of interacting proteins
    also interact in other species (supported by
    evidence, but there are subtleties, which we will
    discuss this later)
  • Hypothesis II If two proteins are
  • interacting, then they will show
  • similar conservation patterns
  • Phylogenetic profiles

23
Phylogenetic Profiles
24
Correlation of Phylogenetic Profiles
  • Assume we have N genomes, protein X has homologs
    in x of them, Y has y, and they co-occur in z
    genomes
  • Hamming distance
  • Pearson correlation
  • Mutual information
  • Statistical significance

25
Phylogenetic Profiles - Limitations
  • Many processes may be common across lineages
  • Too many false positives
  • Database of genomes may be biased
  • All organisms are treated equally
  • Improvement Use trees instead of profiles
  • Proteins are assumed to be conserved as a whole
  • It is domains that interact
  • Improvement Use domain profiles

Yeast nucleoli and ribosomal proteins
Organisms
26
Phylogenetic Tree Based Prediction
  • Phylogenetic trees of Ntr-family two-component
    sensor histidine kinases and their corresponding
    regulators

27
Mirror Tree Method
  • Need to have sufficient number of genomes that
    contain homologs of both proteins

28
Matrix Method
  • Start with families of proteins that are
    suspected to interact
  • Identify specific pairs of proteins that interact
    by aligning the phylogenetic trees that underly
    the two families
  • Assumption Identical number of proteins in each
    family

29
Correlated Mutations
  • Co-evolution of interacting proteins can be
    followed more closely by quantifying the degree
    of co-variation between pairs of residues from
    these proteins
  • Correlated mutations may correspond to
    compensatory mutations that stabilize the
    mutations in one protein with changes in the
    other

Distribution of distances between aminoacid
positions on a folded protein
30
In silico Two-Hybrid
  • The correlation of mutations between two
    positions (may be on different proteins) can be
    estimated from pairwise assessment of aligned
    multiple sequences
  • Position pairs with high correlation are
    potential contact points
  • Interaction index
  • For a protein pair, compute the aggregate
    correlation (of mutations) across all positions

31
In silico Two-Hybrid
32
Performance of I2H
  • I2H predicts physical, rather than functional
    association
  • It requires complete genomes sufficient number
    of homologs

33
Co-citation Based PPI Prediction
  • Functionally associated proteins are likely to be
    cited in the same research article
  • We can assess the statistical significance of
    co-citation based on hypergeometric model
  • Algorithmic problem How to recognize match
    protein names?
  • Train algorithm using annotated abstracts via
    conditional random fields (CRF)

34
Performance of Co-citation
  • Statistical significance is quite relevant until
    it saturates
  • The method is robust to choice of parameters for
    name recognition

35
Integrating PPI Networks
  • Interaction data coming from multiple sources
  • Different sources refer to different levels of
    interaction
  • Can integration handle noise, making interaction
    data more reliable?
  • Superpose interactions based on their reliability

36
Bayesian Integration
  • For each prediction method, compute
    log-likelihood score
  • Let P(LE) be the number of interactions
    predicted by method E, such that functional
    association between corresponding proteins is
    known
  • Let P(LE) be the number of false positives
  • Let P(L) and P(L) be the corresponding priors
  • Assign weights to methods based on their
    log-likelihood scores

37
Comparison of Prediction Methods
  • Integrated network captures functional
    association better
  • Note that the integrated network is trained
    using available data on functional association

38
Classification Based Integration
  • Points Proteins, Space Expression,
    Conservation, Labels Function
  • Points Protein Pairs, Space Co-expression,
    Co-evolution, etc., Labels Existence of
    Interaction

39
Performance of Domain Co-evolution
40
Co-Evolutionary Matrix
41
Domain Identification
42
Difference between Predicted PPIs
Write a Comment
User Comments (0)
About PowerShow.com