Predicting interactions between genes based on genome comparisons The genomic context component of S - PowerPoint PPT Presentation

1 / 59
About This Presentation
Title:

Predicting interactions between genes based on genome comparisons The genomic context component of S

Description:

Article: 'a gene co-expression network for global discovery of conserved genetic ... Gene fusion, Rosetta stone method. Conserved gene order between divergent genomes ... – PowerPoint PPT presentation

Number of Views:115
Avg rating:3.0/5.0
Slides: 60
Provided by: ber756
Category:

less

Transcript and Presenter's Notes

Title: Predicting interactions between genes based on genome comparisons The genomic context component of S


1
Predicting interactions between genes based on
genome comparisonsThe genomic context
component of STRINGBioinformatics seminar
series5-10-2004Berend Snel
2
To do
  • Seminar (today) please ask questions
  • Article a gene co-expression network for global
    discovery of conserved genetic modules
  • Make schedule for article discussion (today)
  • Read article (next couple of days)
  • 5 minute discussion per person of the article
    (Preferentially Monday 11 October)

3
http//string.embl.de
4
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

5
Complete genomes, now what?
  • Post-genomic era we have the parts list
    (complete genomes)
  • to understand the cell we need to know the
    functions of the genes

6
For most genes in any genome we need function
prediction
  • E. Coli, the most intensively studied organism
  • only 1924 genes (43) have been (partially)
    experimentally characterized.

7
Predicting protein function
What is function ? Various levels of
description Sequence similarity/homology has
the largest relevance for Molecular Function.
This aspect of protein function is best
conserved. Molecular function can often be
predicted from similarities between protein
sequences (BLAST), or structures.
8
Beyond homology and molecular function
  • Homolgy based function prediction works very
    well, but
  • a large fraction of genes are poorly described
    (no homologs, uncharacterized homologs this
    holds for 60 of the human genes)
  • There are other aspects of function functional
    associations, e.g. the target of a protein kinase
    or a transcriptional regulator
  • Thus predicting these associations

9
  • Genome sequences
  • Allowing us to interpret the function of proteins
    within the context in which they occur
  • Reverse this process predict the function of a
    protein from the context in which it tends to
    occur ? prediction of protein function/pathways
    from genome sequences Use the genome sequences
    (through comparative genome analysis) for
    interaction prediction genomic context methods
  • Genomic context methods have been shown to be
    reliable indicators for functional associations

10
There are many types of functional associations
(AKA functional interactions, interactions,
functional links, functional relations) in
molecular biology
Cellular process
11
Types of functional associations
metabolic pathways filling gaps
12
Types of functional associations
Transcription regulation
Signalling pathways
P
13
Types of functional associations
Cellular process
Protein complexes
14
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

15
Genomic context is an tool to predict functional
associations between genes
  • Use the genome sequences (through comparative
    genome analysis) for interaction prediction
    genomic context methods
  • Genomic context methods have been shown to be
    reliable indicators for functional interaction
  • Genomic context is also known as in silico
    interaction prediction, or genomic associations

16
Genomic context methods detect evolutionary
traces in genomes of functionally associated
proteins
trpA
trpB
17
(No Transcript)
18
Three different genomic context methods in STRING
  • Gene fusion, Rosetta stone method
  • Conserved gene order between divergent genomes
  • Co-occurrence of genes across genomes,
    phylogenetic profiles

19
All genomic context methods use orthologs
corresponding genes between genomes
  • Orthologs not just homologs related by
    speciation
  • Orthologs are very likely to have the same
    function
  • orthologs genomes alignment sequence

Gene Duplication
Speciation
20
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

21
Gene fusion
  • i.e. the orthologs of two genes in another
    organism are fused into one polypeptide
  • A very reliable indicator for functional
    interaction partly because it is an relatively
    infrequent evolutionary event 3470 distinct
    fusions when surveying 179 genomes

Fusion
22
Gene fusion an example
23
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

24
Gene order evolves rapidly
But
25
Differential retention of divergent / convergent
gene pairs suggests that conservation implies a
functional association
26
Comparison to pathways conservation implies a
functional association
27
Conserved gene order
  • i.e. genes that are present over sufficiently
    large evolutionary distances in the same gene
    cluster
  • Contributes by far the most predictions

28
Conserved gene order
NB1 predicting operons is not trivial in fact
conserved gene order or functional association is
a major clue NB2 using only operons without
requiring conservation results in much less
reliable function prediction
29
Conserved gene order an example from metabolism
of propionyl-CoA
target
query
30
Conserved gene order an example from metabolism
of propionyl-CoA
Biochemical assays confirm the function of
members of COG0346 as a DL-methylmalonyl-CoA
racemase
31
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

32
Presence / absence of genes
Gene content ? co-evolution. (The easy case, few
genomes. )
Differences between gene Content reflect
differences in Phenotypic potentialities
Genomes share genes for phenotypes they have in
common
33
Presence / absence of genes
L. innocua (non-pathogen)
L. monocytogenes (pathogen)
34
Presence / absence of genes
Genes involved in pathogenecity
L. monocytogenes (pathogenic)
L. innocua (non-pathogenic)
35
Generalization phylogenetic profiles /
co-occurence
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Gene 1 Gene 2 Gene 3 ....
species 1 species 2 species 3 species 4
species 5 ...... ... .. ..
Gene 1 1 0 1 1 0 1
Gene 2 1 1 0 0 1
0 Gene 3 0 1 0 0 1
0 ....
36
but phylogenetic signal in gene content!
Escherichia coli
Haemophilus influenzae
\s sp1 sp2 sp3 sp4 sp1 \1 0.2 0.4
0.2 sp2 \1 0.9 0.1 sp3
\1 0.3 sp4 \1

37
Co-occurrence of genes across genomes
  • i.e. two genes have the same presence/ absence
    pattern over multiple genomes they have
    co-evolved
  • AKA phylogenetic profiles

38
Predicting function of a disease gene protein
with unknown function, frataxin, using
co-occurrence of genes across genomes
  • Friedreichs ataxia
  • No (homolog with) known function

39
Frataxin has co-evolved with hscA and hscB
indicating that it plays a role in iron-sulfur
cluster assembly
A
.
a
e
B
o
u
l
i
c
c
h
u
n
R
s
.
e
S
p
r
y
a
r
P
D
X
H
n
o
.
N
P
.
.
.
a
.
e
V
i
M
r
.
f
E
w
e
B
.
c
a
C
a
n
m
.
.
m
r
c
a
.
s
f
h
d
.
g
s
M
u
l
u
z
e
t
h
i
c
o
e
M
i
l
coli
u
g
u
o
e
r
.
n
o
t
d
c
n
l
i
b
.
e
i
l
e
k
o
t
d
i
i
o
y
t
s
n
n
e
i
n
t
c
i
u
u
t
o
s
i
r
c
o
l
a
i
g
z
i
r
s
t
b
a
i
l
e
i
s
d
i
a
a
a
s
i
e
t
e
s
a
n
e
a
i
u
r
n
t
d
c
s
u
m
i
u
H
s
s
l
.
s
o
D.melan.
s
a
i
p
s
i
e
n
s


s
cyaY Yfh1
40
Iron-Sulfur (2Fe-2S) cluster in the Rieske protein
41
Prediction
Confirmation
42
The opposite of co-occurrenceanti-correlation /
complementary patterns predicting analogous
enzymes
Genes with complementary phylogenetic profiles
tend to have a similar biochemical function.
A
B
A
B
43
Complementary patterns in thiamin biosynthesis
predict analogous enzymes
44
Prediction of analogous enzymes is confirmed
45
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Gene fusion
  • Gene order
  • Presence / absence of genes across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

46
Benchmark and integration KEGG maps
47
Integrating genomic context scores into one
single score
  • Compare each individual method against an
    independent benchmark (KEGG), and find
    equivalency
  • Multiply the chances that two proteins are not
    interacting and subtract from 1 naive bayesian
    i.e. assuming independence

1
0.8
0.6
Fraction same KEGG map
0.4
Fusion
Gene Order
0.2
Co-occurrence
0
0
0.2
0.4
0.6
0.8
1
Score
48
Benchmark
100000
10000
1000
Coverage (number of predicted links between
orthologous groups)
Integrated
Gene Order (norm.)
Gene Order (abs.)
100
Cooccurrence
Fusion (norm.)
Fusion (abs.)
10
0.5
0.6
0.7
0.8
0.9
1.0
Accuracy (fraction of confirmed predictions,
i.e. same KEGG map)
49
Performance of genomic context compared to
high-throughput interaction data
purified complexes TAP
Purified Complexes HMS-PCI
genomic context
mRNA co-expression
two methods
synthetic lethality
Coverage
combined evidence
fraction of reference set covered by data
yeast two-hybrid
three methods
raw data
filtered data
parameter choices
Accuracy
fraction of data confirmed by reference set
50
Genomic context biochemistry by other means
Despite the high performance of genomic context
methods, as a tool for function prediction it is
not a button press method It is more like
biochemistry by other means. Often quite a lot
of manual input and expert knowledge from the
researcher is needed to distill associations into
a concrete function prediction Small-scale
bioinformatics?
51
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Co-occurrence across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

52
STRING allows a network view
e.g. see not only to which genes the query gene
has an association, but also what the relations
are among these other genes
53
STRING
Network output (depth1)
Assigning
uncharacterized archeal proteins
to a network around
Archeal flagellins
Archeal flagellin biosynth. ATPase
54
STRING
Type IV secretion pathway
Network (depth2)
Connecting associated cellular processes
Archeal flagellins
Archeal flagella components
Chemotaxis- related
55
STRING
Network (depth3)
Zooming out to other cellular processes
56
Using the local network to detect
multi-functional proteins
57
Contents
  • Predicting functional interactions between
    proteins
  • Genomic context methods
  • General
  • Fusion
  • Gene order
  • Co-occurrence across genomes
  • Integration and benchmarking of predictions
  • Interaction networks
  • In addition to genomic context functional
    genomics data

58
  • STRING currently in addition includes
  • Functional association data from large scale /
    high-throughput biochemical experiments
    (functional genomics data)
  • protein complex purification
  • yeast-2-hybrid
  • ChIP-on-chip
  • micro-array gene expression
  • known functional relations, so called legacy
    data, as present in PubMed abstracts and
    databases like MIPS or KEGG.

59
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com