Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

1 / 44

About This Presentation

Title:

Canadian Bioinformatics Workshops

Description:

Canadian Bioinformatics Workshops www.bioinformatics.ca – PowerPoint PPT presentation

Number of Views:158

Avg rating:3.0/5.0

Slides: 45

Provided by: Michael3429

Category:

more less

Transcript and Presenter's Notes

Title: Canadian Bioinformatics Workshops

1
Canadian Bioinformatics Workshops

www.bioinformatics.ca

2
2
Module Title of Module
3
Module 5Gene Function Prediction

Quaid Morris
Pathway and Network Analysis of omics Data
May 2-3, 2011

http//morrislab.med.utoronto.ca
4
Outline

Concepts in gene function prediction
Guilt-by-association
Gene recommender systems
Gene function prediction use cases
Functional interaction networks
Scoring interactions by guilt-by-association
GeneMANIA STRING
GeneMANIA demo
STRING demo

5
Using genome-wide data in the lab
CHiP-chip regulation data
Protein-protein interaction data
Genetic interaction data
?!?
Microarray expression data
6
Genomics revolution, the bad news
Genomics datasets are

noisy,
redundant,
incomplete,
mysterious,
massive

7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Google cant do biology
12
Google cant do biology
13
Guilt-by-association principle
Microarray expression data
Co-expression network
Conditions
Genes
Eisen et al (PNAS 1998)
Fraser AG, Marcotte EM - A probabilistic view of
gene function - Nat Genet. 2004 Jun36(6)559-64
14
Two types of functional prediction

Give me more genes like these,
e.g. find more genes in the Wnt signaling
pathway, find more kinases, find more members of
a protein complex
What does my gene do?
Goal determine a genes function based on who it
interacts with guilt-by-association.

15
Give me more genes like these
Input
Network and profile data
Output
from GeneMANIA
Gene recommender system
Query list
CDC48 CPR3 MCA1 TDH2
e.g., GeneMANIA, STRING, bioPIXIE (not updated)
16
What does my gene do?, Solution 1
Input
Network and profile data
Output
Gene recommender system then enrichment analysis
Query list
CDC48
e.g., GeneMANIA, bioPIXIE
17
What does my gene do?, Solution 2
CDC48
Input
Network and profile data
Supervised learning of a classifier
Classifier
from FuncBase
(e.g. Support Vector Machine, Naïve Bayes, Neural
networks, Random Forests)
Gene annotations, e.g. Gene Ontology
18
Comparing solutions

Supervised learning
Needs gene sets for training, typically training
is time-consuming and is done off-line but
classifier is very fast
So, fast but inflexible
Gene recommender systems
Typically most computation is done online (except
for offline calculation of composite functional
interaction network, see next slide), so
updating is easier and can use arbitrary gene
sets
So, a little slower but much more flexible

Note can solve give me more genes like these
with supervised learning as well, so long as gene
set is predefined
19
Composite functional interaction/linkage/associati
on networks
CHiP-chip regulation data
Protein-protein interaction data
Genetic interaction data
Composite functional association network
Microarray expression data
20
Pre-computed functional interaction networks
Pre-combine networks e.g. by simple addition or
Naïve Bayes
Pavlidis et al, 2002, Marcotte et al,
1999 bioPIXIE
21
Composite networks One size doesnt fit all

Gene function could be a/the
Biological process,
Biochemical/molecular function,
Subcellular/Cellular localization,
Regulatory targets,
Temporal expression pattern,
Phenotypic effect of deletion.

Some networks may be better for some types of
gene function than others
22
Solution Query-specific weights
Pavlidis et al, 2002, Lanckriet et al,
2004 Mostafavi et al, 2008
23
Two rules for network weighting

Relevance
The network should be relevant to predicting the
function of interest
Test Are the genes in the query list more often
connected to one another than to other genes?
Redundancy
The network should not be redundant with other
datasets particularly a problem for
co-expression
Test Do the two networks share many interactions
Caveat Shared interactions also provide more
confidence that the interaction is real.

24
Scoring nodes by guilt-by-association
Query list positive examples
25
Scoring nodes by guilt-by-association
Query list positive examples
Score
high
low
Two main algorithms
26
Node scoring algorithm details

Direct neighbour node score depends on
Strength of links to positive examples
of positive neighbors
Label propagation node score depends on
Strength of links and of positive direct
neighbors
of shared neighbors with positive examples
modular structure of network

27
Label propagation example
Before
After
28
Three parts of GeneMANIA

A large, automatically updated collection of
interactions networks.
A query algorithm to find genes and networks that
are functionally associated to your query gene
list.
An interactive, client-side network browser with
extensive link-outs

29
GeneMANIA data sources
Legend
Network types
minor curation
major curation
Co-expression

Gene ID mappings from Ensembl and Ensembl Plant
Network/gene descriptors from Entrez-Gene and
Pubmed
Gene annotations from Gene Ontology, GOA, and
model org. databases

Co-localization
Pathways
Physical interactions
Genetic interactions
Shared domains
Predicted interactions
Other
MGI Chemogenomics
30
Gene identifiers

All unique identifiers within the selected
organism e.g.
Entrez-Gene ID
Gene symbol
Ensembl ID
Uniprot (primary)
also, some synonyms organism-specific names
We use Ensembl database for gene mappings (but we
mirror it once / 3 months, so sometimes we are
out of date)

31
Current status

Six organisms
Human, Mouse, yeast, worm, fly, A Thaliana, Rat
coming soon
1,250 networks (about 50 co-expression, 35
physical interaction)
Web network browser

32
Cytoscape plugin
http//www.genemania.org/plugin/
33
(No Transcript)
34
(No Transcript)
35
QueryRunner
36
http//cytoscapeweb.cytoscape.org/
37
STRING http//string-db.org/
38
STRING results
39
STRING results
40
GeneMANIA vs STRING

STRING (2003-present)
Large organism converge
Protein focused
Uses eight pre-computed networks
Heavy use of phylogeny to infer functional
interactions, also contains text mining derived
interactions
Uses direct interaction to score nodes
Link weights are Prob of functional interaction
GeneMANIA webserver (2010-present)
Covers 6 (not 7) major model organisms (but can
add more with plugin)
Gene focused
Thousands of networks, weights are not
pre-computed, can upload your own network
Relies heavily on functional genomic data so has
genetic interactions, phenotypic info, chemical
interactions
Allows enrichment analysis
Uses label propagation to score nodes

41
Meaning of GeneMANIA link weights
Simple intuition Sum of link weights to
neighbors in each data source is 100
Weight 50
Weight 25
Precise definition Weight 100 x 1/sqrt( of
neighbours of node 1) x 1/sqrt( of neighbours of
node 2)
42
GeneMANIA future directions