Computational functional genomics - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Computational functional genomics

Description:

Fusion method or the Rosetta stone analysis ... The Rosetta Stone model. 23. Fusion method what is it good for? ... found in the Rosetta Stone analysis, 68 ... – PowerPoint PPT presentation

Number of Views:142
Avg rating:3.0/5.0
Slides: 45
Provided by: science4
Category:

less

Transcript and Presenter's Notes

Title: Computational functional genomics


1
Computational functional genomics
  • Lital Haham
  • Sivan Pearl

2
Introduction
  • Piles of information but only flakes of knowledge.
  • The existing information

Collections of genomic sequences. Expression
profiles Protein-protein interactions And many
more
3
Introduction
  • Computational biology strives to extract the
    maximal possible information from known
    sequences, by classifying them according to their
    homologous relationships, predicting their
    biochemical activity, cellular function,
    3-dimensional structures and evolutionary origin.

4
The COG-Clusters of Orthologous Groups of proteins
  • Identification of orthologs is critical for
    reliable prediction of gene function in newly
    sequenced genomes.
  • The purpose of COG is to serve as a platform for
    functional annotation of newly sequenced genomes
    and for study of genome evolution.
  • Reflects one-to-one, one-to-many and many-to-many
    relationships.

5
The COG-statistics
  • In 2003, there are 3307 COGs including 74059
    proteins from 43 genomes.
  • Genomes from- Bacteria, Archaea and Eukaryota.
  • The database includes 17 functional groups.

6
The COG- make on your own
  • COG construction procedure is based on the notion
    that any group of at least 3 proteins from
    distant genomes that are more similar to each
    other than to any other protein from the same
    genomes, are most likely to belong to an
    orthologous family.

7
The COG- make on your own
All-against-all protein sequence comparison
8
The COG- make on your own
9
The COG- adding new genomes
  • The COGNITOR program adds new proteins to
    pre-existing COGs on the basis of multiple Best
    Hits.
  • 60-80 of the proteins of prokaryotes could be
    included.

10
The COG- more applications
  • Detecting missed genes.
  • Convenient for variety of evolutionary-oriented
    analyses of protein families.

11
Methods
  • Experimental method

Biochemical and genetic experiments
  • Computational methods

Homology method (BLAST), mRNA expression
Phylogenetic profile
Fusion method (Rosetta stone analysis)
Gene neighbour method
12
Homology method
  • Homology method searches proteins whose AA
    sequences are similar.
  • 40-70 of new genome can be assigned to some
    function.
  • Involve identification of some molecular function.

13
mRNA expression
  • Analysis of correlated mRNA expression levels
    enables to establish functional linkages, by
    detecting changes in mRNA expression in different
    cell types, or different environments.

14
Phylogenetic profile
  • Describes the pattern of presence or absence of a
    particular protein, across a set of organisms.
  • Number of possible profiles
  • This number far exceeds the protein families.

15
Phylogenetic profile
  • Why would two proteins always both be inherited
    into new species or neither inherited, unless the
    two function together?
  • If two proteins have the same phylogenetic
    profile, it is inferred that they have a
    functional link engaged in a common pathway or
    complex.

16
Phylogenetic profile
17
Phylogenetic profile- example
  • Analysis of three proteins RL7, FlgL and His5,
    according to their phylogenetic profiles.
  • RL7 more than half have function associated with
    the ribosome.
  • FlgL more than half include various flagellar
    proteins and cell-wall maintenance proteins.
  • His5 more than half involved in amino acid
    metabolism.

18
Phylogenetic profile- example
PgsA phospholipid synthesis YGGH hypothetical
YBEX hypothetical RL34 ribosome L34 RL36 ribosome
L36 RL27 ribosome L27 RL25 ribosome L25 YQCB
hypothetical YABO hypothetical YCEC
hypothetical RFH peptide release factor ClpB geat
shock protein
RL7 ribosome L7 RL15 ribosome L15 RL17 ribosome
L17 PTH peptidyl-tRNA hydrolase RNC ribonuclease
III
YJFH hypothethocal
RS14 ribosome S14
GidB glucose inhib. Division RL24 ribosome
L24 DEF polypeptide deformylase RL20 ribosome
L20 MesJ cell cycle protein RL19 ribosome
L19 RL21 ribosome L21 RL9 ribosome L9 SmpB small
protein B
G3P3 dehydrogenase
RL4 ribosome L4 NONE hypothtical
GrpE co-chaperone
19
Phylogenetic profile

Phylogenetic profiles link protein with
similar keywords
20
Fusion method or the Rosetta stone analysis
  • Some pairs of interacting proteins have homologs
    in another organism, fused into a single protein
    chain.
  • When two separate proteins in one organism, A and
    B, are expressed as a fused protein in some other
    species, there is a high probability that A and B
    are linked in function.

21
Fusion method
22
The Rosetta Stone model
23
Fusion method what is it good for?
  • Predicts protein pairs that have related
    biological functions.
  • Predicts potential protein-protein interactions.
  • Can turn up complexes of proteins, or protein
    pathways.

24
Fusion method what is it good for?
25
Fusion method
  • The group searched the 4290 protein sequences of
    the E.coli genome.
  • The proteins could form at most (4290)(4289)/2
    pair interactions. But we expect much less
  • There were found 6809 candidate for pair
    interactions.

26
Fusion method validation
  • Looking for a similar function in existing
    annotations that would imply at least functional
    interaction.
  • Of the E.coli pairs that were found in the
    Rosetta Stone analysis, 68 share at least one
    keyword in their annotations, whereas from E.coli
    proteins that were selected randomly, only 15
    share a keyword.

27
Fusion method validation
  • From a database containing protein pairs that
    have been found to interact (experimentally)
    6.4 are linked by Rosetta Stone sequences.
  • The phylogenetic profile method was applied to
    the interactions predicted by the fusion method.
    It found more than 8 times as many interactions
    suggested by the phylogenetic profile method, as
    for randomly chosen sets of interactions.

28
Fusion method missing pairs
  • False negatives

There was no fusion of the interacting proteins.
The fused protein disappeared during the course
of evolution.
29
Fusion method False alarms
  • False positives

False prediction of physical interactions when
the proteins are fused, but are co-regulated and
dont interact.
Cannot distinguish between homologs that bind
and those that do not.
30
Fusion method False alarms
  • The false positive rate in E.coli due to the
    inability to distinguish homologs is about 82.
  • To reduce these errors the promiscuous domains
    were found and removed during the analysis.
  • By filtering of only 5 of all domains, we can
    remove the majority of falsely predicted
    interactions.

31
Fusion method False alarms
32
Neighbour method
  • Functional links between genes can be identified
    by examining whether the proximity of the genes
    is conserved across multiple genomes.
  • Powerful in uncovering functional linkages in
    prokaryotes where operons are common.

33
Neighbour method
34
Neighbour method- definitions
  • close proximate genes are on the same strand
    within 300 bp, and transcribed in the same
    direction.
  • Direct link two proximate genes that are also
    proximate in at least two other genomes of
    different phylogenetic groups.
  • Inferred link two genes that are not close but
    with orthologs that are close in at least three
    other genomes of different phylogenetic groups.

35
Neighbour method- defenitions
36
Neighbour method
  • Proximity between genes is maintained mostly
    because it facilitates their co-transfer to
    another organism.
  • Example restriction-modification systems.

37
Neighbour method- validation
  • Identification of links that are annotated in
    KEGG or COG and calculate the fraction of those
    in the same functional pathway / category.
  • The functional correspondence is correlated to
    the minimal number of phylogenetic groups, in
    which the proximity is detected.

38
Neighbour method- validation
N tradeoff
39
Neighbour method- example
40
Happy end???
  • The group analyzed the 6,217 proteins of the
    yeast Saccharomyces combining several methods.
  • one can expect each protein to be functionally
    linked to perhaps 550 other proteins, giving
    30,000300,000 biologically meaningful links.

41
Happy end???
42
Networks
  • When methods of detecting functional linkages are
    applied to all the proteins of an organism,
    network of interacting, functionally linked
    proteins can be traced.
  • As methods improve for detecting protein
    linkages, it seems likely that most of the
    proteins will be included in the network.

43
Networks
44
????? ???
Write a Comment
User Comments (0)
About PowerShow.com