Canadian Bioinformatics Workshops - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Canadian Bioinformatics Workshops

Description:

Canadian Bioinformatics Workshops www.bioinformatics.ca – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 45
Provided by: Michael3429
Category:

less

Transcript and Presenter's Notes

Title: Canadian Bioinformatics Workshops


1
Canadian Bioinformatics Workshops
  • www.bioinformatics.ca

2
2
Module Title of Module
3
Module 5Gene Function Prediction
  • Quaid Morris
  • Pathway and Network Analysis of omics Data
  • May 2-3, 2011

http//morrislab.med.utoronto.ca
4
Outline
  • Concepts in gene function prediction
  • Guilt-by-association
  • Gene recommender systems
  • Gene function prediction use cases
  • Functional interaction networks
  • Scoring interactions by guilt-by-association
  • GeneMANIA STRING
  • GeneMANIA demo
  • STRING demo

5
Using genome-wide data in the lab
CHiP-chip regulation data
Protein-protein interaction data
Genetic interaction data
?!?
Microarray expression data
6
Genomics revolution, the bad news
Genomics datasets are
  • noisy,
  • redundant,
  • incomplete,
  • mysterious,
  • massive

7
(No Transcript)
8
(No Transcript)
9
(No Transcript)
10
(No Transcript)
11
Google cant do biology
12
Google cant do biology
13
Guilt-by-association principle
Microarray expression data
Co-expression network
Conditions
Genes
Eisen et al (PNAS 1998)
Fraser AG, Marcotte EM - A probabilistic view of
gene function - Nat Genet. 2004 Jun36(6)559-64
14
Two types of functional prediction
  • Give me more genes like these,
  • e.g. find more genes in the Wnt signaling
    pathway, find more kinases, find more members of
    a protein complex
  • What does my gene do?
  • Goal determine a genes function based on who it
    interacts with guilt-by-association.

15
Give me more genes like these
Input
Network and profile data
Output
from GeneMANIA
Gene recommender system
Query list
CDC48 CPR3 MCA1 TDH2
e.g., GeneMANIA, STRING, bioPIXIE (not updated)
16
What does my gene do?, Solution 1
Input
Network and profile data
Output
Gene recommender system then enrichment analysis
Query list
CDC48
e.g., GeneMANIA, bioPIXIE
17
What does my gene do?, Solution 2
CDC48
Input
Network and profile data
Supervised learning of a classifier
Classifier
from FuncBase
(e.g. Support Vector Machine, Naïve Bayes, Neural
networks, Random Forests)
Gene annotations, e.g. Gene Ontology
18
Comparing solutions
  • Supervised learning
  • Needs gene sets for training, typically training
    is time-consuming and is done off-line but
    classifier is very fast
  • So, fast but inflexible
  • Gene recommender systems
  • Typically most computation is done online (except
    for offline calculation of composite functional
    interaction network, see next slide), so
    updating is easier and can use arbitrary gene
    sets
  • So, a little slower but much more flexible

Note can solve give me more genes like these
with supervised learning as well, so long as gene
set is predefined
19
Composite functional interaction/linkage/associati
on networks
CHiP-chip regulation data
Protein-protein interaction data
Genetic interaction data
Composite functional association network
Microarray expression data
20
Pre-computed functional interaction networks
Pre-combine networks e.g. by simple addition or
Naïve Bayes
Pavlidis et al, 2002, Marcotte et al,
1999 bioPIXIE
21
Composite networks One size doesnt fit all
  • Gene function could be a/the
  • Biological process,
  • Biochemical/molecular function,
  • Subcellular/Cellular localization,
  • Regulatory targets,
  • Temporal expression pattern,
  • Phenotypic effect of deletion.

Some networks may be better for some types of
gene function than others
22
Solution Query-specific weights
Pavlidis et al, 2002, Lanckriet et al,
2004 Mostafavi et al, 2008
23
Two rules for network weighting
  • Relevance
  • The network should be relevant to predicting the
    function of interest
  • Test Are the genes in the query list more often
    connected to one another than to other genes?
  • Redundancy
  • The network should not be redundant with other
    datasets particularly a problem for
    co-expression
  • Test Do the two networks share many interactions
  • Caveat Shared interactions also provide more
    confidence that the interaction is real.

24
Scoring nodes by guilt-by-association
Query list positive examples
25
Scoring nodes by guilt-by-association
Query list positive examples
Score
high
low
Two main algorithms
26
Node scoring algorithm details
  • Direct neighbour node score depends on
  • Strength of links to positive examples
  • of positive neighbors
  • Label propagation node score depends on
  • Strength of links and of positive direct
    neighbors
  • of shared neighbors with positive examples
  • modular structure of network

27
Label propagation example
Before
After
28
Three parts of GeneMANIA
  • A large, automatically updated collection of
    interactions networks.
  • A query algorithm to find genes and networks that
    are functionally associated to your query gene
    list.
  • An interactive, client-side network browser with
    extensive link-outs

29
GeneMANIA data sources
Legend
Network types
minor curation
major curation
Co-expression
  • Gene ID mappings from Ensembl and Ensembl Plant
  • Network/gene descriptors from Entrez-Gene and
    Pubmed
  • Gene annotations from Gene Ontology, GOA, and
    model org. databases

Co-localization
Pathways
Physical interactions
Genetic interactions
Shared domains
Predicted interactions
Other
MGI Chemogenomics
30
Gene identifiers
  • All unique identifiers within the selected
    organism e.g.
  • Entrez-Gene ID
  • Gene symbol
  • Ensembl ID
  • Uniprot (primary)
  • also, some synonyms organism-specific names
  • We use Ensembl database for gene mappings (but we
    mirror it once / 3 months, so sometimes we are
    out of date)

31
Current status
  • Six organisms
  • Human, Mouse, yeast, worm, fly, A Thaliana, Rat
    coming soon
  • 1,250 networks (about 50 co-expression, 35
    physical interaction)
  • Web network browser

32
Cytoscape plugin
http//www.genemania.org/plugin/
33
(No Transcript)
34
(No Transcript)
35
QueryRunner
36
http//cytoscapeweb.cytoscape.org/
37
STRING http//string-db.org/
38
STRING results
39
STRING results
40
GeneMANIA vs STRING
  • STRING (2003-present)
  • Large organism converge
  • Protein focused
  • Uses eight pre-computed networks
  • Heavy use of phylogeny to infer functional
    interactions, also contains text mining derived
    interactions
  • Uses direct interaction to score nodes
  • Link weights are Prob of functional interaction
  • GeneMANIA webserver (2010-present)
  • Covers 6 (not 7) major model organisms (but can
    add more with plugin)
  • Gene focused
  • Thousands of networks, weights are not
    pre-computed, can upload your own network
  • Relies heavily on functional genomic data so has
    genetic interactions, phenotypic info, chemical
    interactions
  • Allows enrichment analysis
  • Uses label propagation to score nodes

41
Meaning of GeneMANIA link weights
Simple intuition Sum of link weights to
neighbors in each data source is 100
Weight 50
Weight 25
Precise definition Weight 100 x 1/sqrt( of
neighbours of node 1) x 1/sqrt( of neighbours of
node 2)
42
GeneMANIA future directions
  • Rat (1-3 weeks), next is probably E. Coli
  • Non-coding genes (miRNAs!!!!)
  • Regulatory networks (ChIP, RNA-protein,
    miRNA-mRNAs)
  • More phenotypic information (OMIM, etc)
  • Orthology mapping for inferring interologs

43
GeneMANIA URLs
  • Main site (stable but still fun)
  • http//www.genemania.org
  • Beta site (new and edgy but possibly unreliable)
  • http//beta.genemania.org

44
  • We are on a Coffee Break Networking Session
Write a Comment
User Comments (0)
About PowerShow.com