Using phylogenetic profiles to predict protein function and localization - PowerPoint PPT Presentation

About This Presentation
Title:

Using phylogenetic profiles to predict protein function and localization

Description:

Title: Using phylogenetic profiles to predict protein function and localization Author: Catherine Grasso Last modified by: Catherine Grasso Created Date – PowerPoint PPT presentation

Number of Views:79
Avg rating:3.0/5.0
Slides: 20
Provided by: Catherine358
Category:

less

Transcript and Presenter's Notes

Title: Using phylogenetic profiles to predict protein function and localization


1
Using phylogenetic profiles to predict protein
function and localization
  • As discussed by Catherine Grasso

2
Papers
  • Pellegrini, et al. Assigning protein functions
    by comparative genome analysis Protein
    phylogenetic profiles. (1999) PNAS 96, 4285-4288.
  • Marcotte, et al. Localizing proteins in the cell
    from their phylogenetic profiles. (2000) PNAS 97,
    12115-12120.

3
Basic Idea
  • Sequence alignment is a good way to infer protein
    function, when two proteins do the exact same
    thing in two different organisms.
  • Proteins with gt 30 sequence identity have the
    same fold, and typically the same function.

4
Basic Idea
  • But can we decide if two proteins function in the
    same pathway, such as histidine biosynthesis, or
    the same biomolecular structure, such as the
    flagella or ribosome, even if they dont do the
    exact same thing?
  • Yes. Assume that if the two proteins function
    together they must evolve in a correlated
    fashion so every organism that has a homolog of
    one of the proteins must also have a homolog of
    the other protein.

5
Phylogenetic Profile
  • For a given protein, BLAST against N sequenced
    genomes.
  • Construct a vector with N coordinates.
  • If protein has a homolog in the organism n, set
    coordinate n to 1. Otherwise set it to 0.

Protein P1 0 0 1 0 1 1 0
0
6
Functional Link
  • Assign a degree of functional linkage between P1
    and P2 based on the number of positions (or bits)
    at which their profiles differ.

Protein P1 0 0 1 0 1 1 0
0
Protein P2 0 1 1 0 1 1 0
0
7
What They Did
  • Computed phylogenetic profiles for 4,290
    proteins in E. Coli.
  • Aligned each protein sequence Pi with the
    proteins from 16 other fully sequenced genomes.
  • Proteins coded by genome n are defined as
    including a homolog of Pi if they align to Pi
    with a score that is deemed statistically
    significant.

8
Conclusions
  • Comparing profiles is useful tool for identifying
    the complex or pathway in which a protein
    participates.
  • As the number of fully sequenced genomes
    increases scientists will be able to construct
    longer more informative profiles.
  • In 1999, 100 more genomes were due to be
    completed in next few months.
  • Suggests that as eukaryotic genomes come out
    profiles will be a useful tool for studying
    pathways in higher organisms.

9
Evolutionary Origin of Eukaryotic Cell
  • Mitochondria, chloroplasts and perhaps other
    organelles descended from microbes captured by
    progenitors of eukaryotic cells.
  • You exist because of a bad case of indigestion!

10
Evolutionary Origin of Eukaryotic Cell
  • This endosymbiosis was stabilized by shifting of
    genes of organelle into nuclear genome and
    transport systems being established to shuttle
    organellar proteins form cytoplasm into
    organelles.
  • Contemporary mitochondrial genome encode only a
    few genes (lt20), primarily large integral
    membrane proteins which cant be transported.

11
Evidence
  • Proteins of these organelles have molecular
    properties resembling prokaryotic rather than
    eukaryotic proteins
  1. Average lengths
  2. Domain composition
  3. Amino acid composition
  4. Homologs among prokaryotes

12
Phylogenetic profiles
  • Will show that proteins with similar phylogenetic
    profiles localize to similar subcellular
    locations.
  • Actually, will primarily show this for the
    mitochondria.

13
Calculating phylogenetic profiles
  • In this study, the value at each position of the
    profile is equal to -1/log E, where E is the
    BLAST expectation value of best matching protein
    in a genome.
  • Calculated only for E lt 1x10-6 and 1.0 otherwise.
    So zero is a perfect match and one is no match.

14
Three Categories
  • Prokaryote Derived Only has homologs in
    prokaryotes.
  • Eukaryote Derived Only has homologs in
    eukaryotes.
  • Organism Specific Has no homologs.
  • Why split these categories? Should have
    different functions and roles in mitochondria.

15
Linear Discriminant Functions
t
Varying t increases prediction accuracy at the
expense of coverage.
MP
Non-MP
16
Testing Algorithm
  • First, predicted the location of yeast proteins
    of known location (open diamonds).
  • Second, a jackknife test was performed. Repeated
    100 times with different random sets (filled
    diamonds). Coverage 58 at 50 accuracy.
  • Third, used yeast proteins as training set and
    worm proteins as test set. Coverage 65 at 50
    accuracy.

17
Prediction
  • Applied algorithm to all yeast proteins.
    Estimate 630 total mitochondrion-targeted genes
    in yeast or 10 of genome.
  • Applied algorithm to all worm proteins. Estimate
    660 total mitochondrion-targeted genes in worms
    of 4 of genome.

18
Verifications
  • Tested whether functions of newly predicted
    mitochondrial proteins matched functions of known
    mitochondrial protein better than the functions
    of a random set of proteins. (Jacard
    Coefficient, Pie Charts)
  • Fraction of predicted mitochondrial proteins with
    predicted transmembrane segments or signal
    peptides.
  • 2D gel of whole rat liver and human placental
    mitochondria reveals 250-350 visible proteins.

19
Conclusions
  • There is information in the phylogenetic
    profiles, but it is quite noisy.
  • Yields approximate numbers of genes migrated to
    the nuclear genomes from the mitochondria.
  • Gives even more evidence for endosymbiotic
    theory.
  • However, verifications did not confirm results as
    much as one might like.
  • Perhaps fundamental assumption flawed.
Write a Comment
User Comments (0)
About PowerShow.com