Title: InnateDB
1 InnateDB Facilitating Systems Level Analyses
of the Mammalian Innate Immune Response
David Lynn M.Sc., Ph.D., Research Associate,
Brinkman Lab., Simon Fraser University
Hancock Lab., University of British
Columbia. InnateDB Data Analysis Workshop -
UBC, Vancouver. April 2nd 3rd 2008. Updated
Sept. 2009.
2Systems Biology Approaches to Investigating the
Innate Immune Response
- Although progress has been made in understanding
the innate immune response including the detailed
dissection of some of the critical signaling
pathways involved. - Now becoming clear that the innate immune
response does not involve simple linear pathways
but rather complex networks of pathways and
interactions, negative feedback loops and
multifaceted transcriptional responses. - To better understand the complexities of the
innate immune response and the cross-talk between
its components, complimentary systems level
analyses and more focused follow-up experimental
approaches are now needed.
3InnateDB Developed in the Context of Two Large
International Systems Biology Projects
Mouse Model Datasets Cerebral Malaria mouse
model (IMR, Australia) Tuberculosis mouse model
(AECM) Shigella xenograft model (Pasteur) Human
Clinical Datasets Typhoid Malaria Vietnam
(OUCRU/Stanford/Sanger) Non Typhoidal Salmonella
Malawi (Sanger) Chronic/Acute Helminth Ecuador
(USF de Quito/Sanger)
Modulating innate immune response viaHost
Defense peptides (Hancock lab, UBC) Mouse KOs
(Sanger)
Novel insight into host response and mechanism of
peptides. Common Pathways, networks and
transcriptional regulation?
4Why Systems Approaches are Needed
- Many layers of complexity
- Layers of regulation
- 100s 1000s DE genes
- Not simple pathways ? networks of molecular
interactions. - Gardy, Lynn, Brinkman, Hancock.
- Enabling a systems biology approach to
immunology focus on innate immunity. - Trends in Immunology June 2009.
5The Need for InnateDB the Manual Curation of
Innate Immunity Relevant Molecular Interactions
Pathways.
- Quickly apparent that available resources
provided poor coverage and detail of the
molecular interactions and pathways relevant to
innate immunity. - This information is essential for the
systems-orientated interpretation of large scale
genomics data. - TLR4 ? one of the most important molecules in the
innate immune response, has relatively few
molecular interactions annotated in the major
publicly available interaction DBs. - 5 of these DBs combined contained annotated
molecular interactions between TLR4 and just 11
other proteins. - Through a review of the literature we have
curated, in detail, a further 16 unique
interactions, and provided annotation of nearly
60 different lines of evidence supporting these
interactions. - Relatively new pathways (NLR, RLR pathways) not
annotated at all in major pathway databases. - Few resources available for analysis of data in a
pathway/network context that were accessible to a
biologist. No resources for innate immunity.
6Overview of InnateDB Project (www.innatedb.ca)
- InnateDB (www.innatedb.ca) is a database of all
human and mouse experimentally-verified
interactions and pathways - ( their component molecules
Genes/Proteins/RNAs). - Particular emphasis on the contextual manual
curation of interactions involved in innate
immunity (10,000 intxns). - InnateDB facilitates systems-level analyses of
mammalian signaling through integrated
bioinformatics and visualization tools pathway
ontology analysis, network construction
analysis, orthologs, Cerebral, Cytoscape, CyOOg,
etc. - Manual curation project integration of publicly
available databases into InnateDB greatly
increases innate immunity relevant molecular
interaction networks pathways. - Enable biologists without a computational
background to explore their data in a more
systems-oriented, yet user-friendly, manner.
7Contextually Curating Innate Immunity-Relevant
Interactions
- Manual curation gt 10,000 innate immune-relevant
interactions (human and mouse). - Involving 2,700 genes from review of 2,600
unique publications. - We can often double of interactions for a given
gene. - Pathways interactions are curated with
contextual annotations - (supporting publication participant molecules
the species the interaction detection method
the host system the interaction type the cell,
cell-line and tissue types etc). - Developed InnateDB submission system software to
allow submission of interaction annotation in an
ontology-controlled and MIMIx PSI-MI 2.5
compliant manner. - Developed curator tool software to allow curators
modify existing annotations.
8Going Beyond Innate Immunity A Centralized
Resource for Interactions Pathways
- Aside from the well known signalling pathways ? a
range of other disparate processes, including
apoptosis, ubiquitination, endocytosis, cell
activation and recruitment ? all required to
mount effective innate immune response. - Adding to this complexity ? borders between the
innate and adaptive immune responses are becoming
increasingly blurred. - Furthermore, if we hope to identify new networks
or pathways involved in innate immunity, analyzes
must include genes and proteins that are, as yet,
not known to play specific roles in the innate
immune response. - To address these issues ? InnateDB also
incorporates data on the entire human and mouse
interactomes.
9Going Beyond Innate Immunity An Integrative
Biology Resource
- 115,000 human and mouse interactions extracted
loaded from BIND, INTACT, DIP, BIOGRID MINT
DBs. - Cross-referenced genes to gt3,000 pathways from
KEGG, PID, BIOCARTA, INOH, NetPath Reactome
DBs. - Allows one to visualize/analyze interactions
associated with specific pathway. - Pathway ORA.
- Annotation from Ensembl provides details of human
mouse genes, transcripts and proteins. - UniProt, Entrez, Gene Ontology ? rich protein
gene annotation.
10Through manual curation integration of existing
data from publicly available databases we can
greatly increase innate immunity relevant networks
TLR4 direct and secondary interactions annotated
by InnateDB
TLR4 direct and secondary interactions annotated
by MINT Database
11Direct and Secondary Interactions of TLR4 in
InnateDB(20 of these interactions unique to
InnateDB)
12www.innatedb.ca
13InnateDB Advanced Yet User-Friendly Searching
Find Analyze Relevant Interactions, Pathways
Genes/Proteins.
14InnateDB Facilitating Systems-Level Analyses of
Gene Expression Data
Upload Your Own Gene Expression Data - Up to 10
conditions/timepoints at 1 time.
Overlay Gene Expression Data from Multiple
Conditions on Networks/Pathways
Pathway, Gene Ontology TF ORA tools Find DE
Pathways/Functionally Related Genes/TFs
Go Beyond Pathway Analysis Differentially
Expressed Sub-networks New Pathways? How Are DE
Genes Actually Inter-connected? Central
Regulators (Network Hubs)
15Pathway Analysis Any type of Quantitative Data.
Orthologous Pathways
GWA Candidate Associated Genes
- InnateDB pathway analysis
- identify OR pathways.
- highlight potentially unknown relationships
between makers on different chromosomes.
16Constructing Analyzing Networks Using InnateDB
- Pathway analysis can be very powerful in
determining which annotated pathways are most
significantly associated with DE genes. - Network analysis ? move from simple view of the
signaling response to a more comprehensive
analysis of the molecular interactions between DE
genes and their encoded proteins RNAs. - Potentially uncover as yet unknown signaling
cascades or pathways, functionally relevant
sub-networks and the central molecules, or hubs,
of these networks.
17Results Visualize Gene Expression Data in an
Interaction Network Context
18Multi-experiment View in Cerebral
19Robust Orthology Gene Order Predictions
Facilitating Comparative Analysis
- Majority of mammalian interaction data available
in InnateDB and other interaction databases
primarily refers to human genes and proteins. - To facilitate comparative network-based analysis
of the human, mouse and bovine interactomes,
detailed orthology predictions have been
integrated into InnateDB. - Orthology predictions generated using an in-house
method, Ortholuge, which provides accurate
predictions of orthology using a phylogenetic
distance-based approach. - Orthology predictions are further supported
through the development of a human and mouse gene
order and synteny browser.
20A Guide to Using InnateDB
21InnateDB User Friendly Interface www.innatedb.ca
22(No Transcript)
23Not sure what you want to search for? Browse
InnateDB by Interaction Type, Pathway or Various
Immune Gene Lists
24All InnateDB Interactions Can be Downloaded in
Proteomics Standards Initiative (PSI) 2.5 XML
Format
25Resources Page Details of Relevant Software,
Databases, and Immune gene Lists
26Statistics on Curated Interactions Interactions
from other Databases
27Use contact form or send email to
innatedb-mail_at_sfu.ca to report bugs, errors or to
get involved in curation.
28Documentation, Tutorials Help
29Searching InnateDB
30Do a simple search for genes, proteins or
interactions of interest on the InnateDB hompage
e.g. IRAK genes.
31Advanced Search for Genes Proteins
32Advanced Search for Interactions
InnateDB contains detailed information for more
than 115,000 human and mouse molecular
interactions integrated from several of the major
public interaction databases along with 10,000
manually-curated innate immunity relevant
interactions.
To reduce redundancy, interactions in InnateDB
that have the same participants and interaction
type are grouped together by default. Choose 'No'
to return all redundant interactions separately.
33Search for Particular Interactions or Genes that
are in a Specific Pathway
34Search Results searching for genes of interest
e.g. IRAK
35Search Results searching for genes of interest
e.g. IRAK
36Interaction Results Page.
37(No Transcript)
38Visualize Interactions in a subcellular
localization-based layout using the Cerebral
plugin for Cytoscape.
39How a biologist thinks of a pathway .
40Pathway Visualization in Cytoscape
41Pathway Visualization using Cerebral
www.pathogenomics.ca/cerebral (Bioinformatics
2007)
42A Quick Guide to Using Cerebral in InnateDB
- Cerebral can be used to visualize interaction
networks from a set of interactions from
InnateDB. - Cerebral uses subcellular localization
annotations to provide more biologically
intuitive pathway-like lay-outs of interaction
networks. - Note the subcellular localizations in Cerebral
should only be used as a guide. There are many
proteins with no annotated subcellular
localizations and many others that have multiple
possible localizations (only 1 will be shown,
nuclear, extracellular and membrane localizations
will take precedence over cytoplasm if there are
multiple). - InnateDB batch searching allows users to upload a
list of genes along with associated gene
expression data from up to 4 different
conditions. - Gene expression data can be overlaid on network
data and you can visualize this in Cerebral.
43Opening Interaction Data in Cerebral from an
Interaction Results page in InnateDB.
- You will be prompted to open a .jnlp file.
- You are recommended to save this file to your
computer and then open it this will allow you
save a copy of this dataset. - Opening the .jnlp file directly without saving
sometimes causes Cerebral to hang when loading
large datasets. - Note to use Cerebral you need to install Java
version 6 or greater. - You can get this from http//java.com/en/download/
index.jsp
44Opening Cerebral
- Cerebral is a Java plugin for the Cytoscape
Visualization software. - When you open the .jnlp file Cytoscape will begin
downloading. - You will then be prompted Do you want to run
the application click Run.
45Cerebral is Now Open and Displays Interactions
Based on Protein Subcellular Localizations
46Re-size the Network
Click here to re-size the network display to
full-screen.
47Navigating in Cerebral
- Right click and push your mouse forward or back
to zoom. - Hold middle button of your mouse and drag to
navigate around the network. - Grey nodes do not have an annotated subcellular
localization (from Gene Ontology data in
InnateDB). - Lines connecting nodes represent interactions.
Dashed lines have only 1 supporting publication
in InnateDB. The thicker the line the more
publications support the interaction.
48Interactively Link back to InnateDB to Look up
Information on Particular Genes/Interactions of
Interest.
- Right-click on a node (protein/gene) or edge
(interaction line) to link to the relevant gene
or interaction details page in InnateDB.
49Nodes Can be Dragged to Other Layers as Desired.
50Do a simple search for genes, proteins or
interactions of interest on the InnateDB hompage
e.g. IRAK genes.
51View Detailed Gene Annotation.
52Gene Details Page.
53Gene Details Page Molecular Interactions Gene
Ontology Annotation.
54Integrated Orthology Gene Order Information
55Human/Mouse Conserved Gene Order Synteny Browser
56Gene Details Page Associated Pathways
57Gene Details Page Cross-references to other
Databases
58Integrating Gene Expression Data in a Molecular
Interaction Network and Pathway Context
59InnateDB Integrating Gene Expression Data in a
Molecular Interaction Network and Pathway Context
Integrated Gene Expression Data with Molecular
interaction data Pathway associations Rich gene
annotation
Batch Search of InnateDB
Microarray Data ? Differentially expressed Genes
60Orthologous Interaction Networks
- Detailed protein/gene interaction data mainly
available for human. - Can use InnateDB ortholog predictions in mouse
and cow - Build the hypothetical orthologous interaction
network for genes of interest in these species. - Find associations to pathways for orthologous
genes e.g. map pathways to mouse genes based on
human orthology. - Predict potential differences in different
species e.g. missing orthologous gene in one
species ? may indicate reliability as model
organism for network of interest. - Compare orthologous predicted networks to
experimental data e.g. in mouse.
61Example Tab-delimited File
62Upload Gene/Protein List to InnateDB Along with
Any Associated Quantitative Data
Select a file to upload by clicking on the
"Upload File" button - upload a tab-delimited
file of protein/gene identifiers or accession
numbers and obtain a list of all genes, proteins,
pathways, interactors or interactions that they
are associated with. Alternatively, click on the
"Web Form" button and paste your tab-delimited
data in the text box (max. 1000 lines)
63(No Transcript)
64(No Transcript)
65(No Transcript)
66(No Transcript)
67Results Visualize Gene Expression Data in an
Interaction Network Context
68Multi-experiment View in Cerebral
Click on one of the mini-windows to view data for
condition in large window.
69 Cerebral
Multi-Array Viewer
70Cerebral Multi-Array Viewer
71Interactively Link back to InnateDB to Look up
Information on Particular Genes/Interactions of
Interest.
72Pathway Over-representation Analysis
73Return Pathways Associated with Uploaded Gene
List
- To do pathway over-representation analysis (ORA)
you first need to upload a list of gene
identifiers and associated fold-change in gene
expression values (and P values) as described
above. - InnateDB recommends that you to upload All genes
from your array dataset not just differentially
expressed (DE) genes (probes mapping to multiple
different genes should be removed). The pathway
ORA tool uses the proportion of DE genes on the
whole array to determine if a particular pathway
is significant. - As the above method can be very conservative due
to the large number of tests performed -InnateDB
also provides users with the option of uploading
a subset of genes and performing the pathway ORA
analysis. This subset analysis uses a slightly
different algorithm that does not take gene
expression values into account. This is necessary
as the algorithm does not know the proportion of
DE genes on the array. Therefore, this analysis
cannot handle data from multiple conditions. - If you have multiple probes for the same gene
these values will be averaged for the purposes of
the pathway ORA. - Because InnateDB sources its pathway data from
multiple databases, each with its own
interpretations of the components of a given
pathway, you will observe some degree of
duplication in the results however, this is
outweighed by the extra annotation that can be
obtained from different data sources.
74Pathways Associated with Uploaded List
75Choose Parameters for Pathway ORA
Choose fold-change in gene expression threshold
(determines which genes are considered
differently expressed) Default /- 1.5.
Choose P value threshold associated with each
fold-change in gene expression value. (determines
which genes are considered differently expressed)
Default P lt 0.05. Several different
statistical methods are available to determine if
pathways are significantly associated with DE
genes - Hypergeometric, Fisher Chi Square.
Two options to correct for multiple testing are
included - The Benjamini Hochberg correction
for the FDR and the more conservative Bonferroni
correction.
76(No Transcript)
77Pathway Summary Page
KEGG pathway diagrams can be dynamically linked
to overlaying gene expression data
78Acknowledgements The Bioinformatics Team
- Overall Project Management
- Bob Hancock
- Brett Finlay
- Lorne Babiuk
- Bernadette Mah
- Bioinformatics InnateDB Management
- Fiona Brinkman
- David Lynn
- InnateDB Database Development/Data Loading
- Matthew Laird
- Nicolas Richard
- Fiona Roche
- Timothy Chan
- Michael Acab
- InnateDB Search Engine User Interface
- Geoff Winsor
- InnateDB Submission System Curator tool
- Calvin Chan
- Naisha Shah
- Cerebral Pathway Visualization Software
- Jennifer Gardy
- Aaron Barsky
- Tamara Munzner
- Orthologs Gene Order
- Dan Tulpan
- Matthew Whiteside
- Mark Sun
- Matthew Laird
- Matthew Whiteside
- Systems Administration
- Matthew Laird (SFU)
- Timothy Chan (UBC)