Title: Babelomics
1Babelomics
- Functional interpretation of
- genome-scale experiments
Edinburgh, October 2008 Ignacio Medina
(Nacho)? dmontaner_at_cipf.es http//bioinfo.cipf.es/
dmontaner Bioinformatics and Genomics
Department Centro de Investigacion Principe
Felipe (CIPF)? (Valencia, Spain)?
2Babelomics A systems biology web resource for
the functional interpretation of genome-scale
experiments.
http//www.babelomics.org
3Genome-scale experiment output
Functional Interpretation
4Functional interpretation
To interpret experimental results is to use
current knowledge to rearrange them in a
meaningful way.
Experimental results observed in the lab (not
always a wet-lab).
- Recorded to
- Test a hypothesis.
- Get a first insight of a biological process.
DB information. Already tested and stored
5Index
- Functional Annotation Databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
6Index
- Functional Annotation Databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
7Functional Annotation DDBBIntroduction
- Last years has been an exponential increase
- The Nucleic Acids Research online Molecular
Biology Database Collection is a public
repository that lists almost all biological
databases - The 2008 update includes 1078 databases !!
http//www3.oup.co.uk/nar/database/c/
8Functional Annotation DDBBFunctional Databases
Some of the biological databases contains
Functional Information of the genes and sequences
Arabidopsis thaliana
Homo sapiens
Mus musculus
Rattus norvegicus
Drosophila melanogaster
Caenorhabditis elegans
Saccharmoyces cerevisae
Gallus gallus
Danio rerio
HGNC symbol EMBL acc RefSeq
PDB Protein Id IPI.
UniProt/Swiss-Prot UniProtKB/TrEMBL Ensembl IDs
EntrezGene Affymetrix Agilent
Gene IDs
Functional databases
9Functional Annotation DDBBGene Ontology (GO
terms)?
- The Gene Ontology project provides a controlled
vocabulary to describe gene and gene product
attributes in any organism - Last version has more than 22.000 terms
- The controlled vocabularies of terms are
structured
http//www.geneontology.org/
10Functional Annotation DDBBGene Ontology (GO
terms)?
The three categories of GO Molecular Function
the tasks performed by individual gene products
examples are transcription factor and DNA
helicase Biological Process broad biological
goals, such as mitosis or purine metabolism, that
are accomplished by ordered assemblies of
molecular functions Cellular Component
subcellular structures, locations, and
macromolecular complexes examples include
nucleus, telomere, and origin recognition complex
11Functional Annotation DDBBGene Ontology (GO
terms)?
GO is a DAG (Directed Acyclic Graph)?
terms are structured
More general information
Levels
More detailed information
12Functional Annotation DDBBGene Ontology (GO
terms)?
- AmiGO provides a web interface to search and
browse the ontology and annotation data
http//amigo.geneontology.org/cgi-bin/amigo/go.cgi
- QuickGO (EBI) provides also a web interface
http//www.ebi.ac.uk/ego
13Functional Annotation DDBBGene Ontology (GO
terms)?
Example of GO annotation of BRCA2 in Ensembl48
14Functional Annotation DDBBKEGG
KEGG pathways
http//www.genome.jp/kegg/
15Functional Annotation DDBBKEGG
16Functional Annotation DDBBMicroRNA
- Involved in gene regulation
- More than 8600 miRNAs
- The target database contains computationally
predicted targets for microRNAs across many
species
http//microrna.sanger.ac.uk/
17Functional Annotation DDBBInterPro
- A centralized database of protein families,
domains, repeats and sites in which identifiable
features found in known proteins can be applied
to new protein sequences
http//www.ebi.ac.uk/interpro/
Contents of InterPro 18.0
18Functional Annotation DDBBCisRed
- Holds conserved sequence motifs identified by
genome scale motif discovery, similarity,
clustering, co-occurrence and coexpression
calculations
http//www.cisred.org/
19Functional Annotation DDBBSwissprot
- A curated protein sequence database with a high
level of annotation, such as the description of
the function of a protein, its domains structure,
post-translational modifications, variants, etc. - Last release consist of 398181 entries.
20Functional Annotation DDBBPractical exercise
- About BCL2, BRCA1, ATM and P53(TP53)
- try to find the biological process and cellular
components (GO terms), do they share some GO
terms? Is that significant? - Are they targets of the same microRNA (mirbase)?
- What about protein functional domains (interpro)?
- Are they regulated by the same conserved motifs
(cisred)? - Are they involved in a common disease or pathway
(kegg, biocarta)? - ...
21Functional Annotation DDBBFrom GEPAS to
Babelomics
GEPAS Analysis
Microarray Data
Differential expression
Genes differentially expressed
List of genes (ie 120 genes)?
Preprocessing (normalization, scaling, ...)? Tab
matrix file
Predictors
Predicting genes
Genes with same expression patterns
Clustering
Genes from a deleted or duplicated region
Next generation of High throughput Sequencing
CGH array
Babelomics
22Functional Annotation DDBBBabelomics try to
answer these questions
- Is there any significant functional enrichment in
my gene list? - Are these genes involved in the same pathways?
- Are they sharing a specific microRNA regulation?
- Are they involved in the same disease?
- ...
23BabelomicsSchema
Babelomics, a suite of web tools for statistical
test, multiple test corrections, blast, ...
24Index
- Functional Annotation databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
25BabelomicsWEB tools suite
- A complete suite of web tools for the functional
analysis of groups of genes in high-throughput
experiments
26BabelomicsKey features
- Many functional and regulatory definitions (GO,
KEGG, Biocarta, text-mining derived bioentities,
Transcription factors, CisRed, miRNAs, InterPro,
etc.)? - Wide coverage of model organisms (more than 10
species)? - Functional profiling by functional enrichment
method (FatiGO) and gene set method (Fatiscan)? - Tools for automatic functional annotation of
unknown sequences (Blast2GO) integrated
27BabelomicsDatabases
- Rosetta database is freely distributed, to access
data - A web page to access to data is already
integrated in Babelomics - Java API to access to data is being developed
(local installation of database required)? - Web services available
- DAS server with the annotation is also being
developed - Next releases of Rosetta will add more regulatory
elements and pathways like epigenetic changes and
Reactome database
28BabelomicsDatabases
Babelomics
29BabelomicsTools
FatiGO Finds differential distributions of
functional terms between two groups of genes,
these terms can be Gene Ontology , InterPro
motifs, SwissProt KW , transcription factors
(TF), gene expression in tissues, bioentities
from scientific literature, cis-regulatory
elements CisRed. Tissues Mining Tool compares
reference values of gene expression in tissues to
your results. MARMITE Finds differential
distributions of bioentities extracted from
PubMed between two groups of genes. FatiScan
detect significant functions with Gene Ontology,
InterPro motifs, Swissprot KW and KEGG pathways
in lists of genes ordered according to differents
characteristics. MarmiteScan Use chemical and
disease-related information to detect related
blocks of genes in a gene list with associated
values.
30Index
- Functional Annotation databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next direction towards the GEPAS and Babelomics
integration
31BabelomicsFatiGO
- It allow us to compare functional annotation of
- Two list of genes
- One list against the rest of genome
- A genomic region against the rest of genome
- One statistical test for each Functional Block of
annotation - Fisher's exact test
- Multiple testing context (hundreds of
annotation)? - Filtering of annotation is convenient (the less
tests the best correction)?
32BabelomicsFatiGO test
One Gene List (A)?
The other list (B)?
Are this two groups of genes carrying out
different biological roles?
Biosynthesis 60
Biosynthesis 20
Sporulation 20
Sporulation 20
Genes in group A have significantly to do with
biosynthesis, but not with sporulation.
We do this for each GO, miRNA, Interpro , ... !!!
33Multiple functional testing
- The unit of information over which we test is
shifted from genes to functional blocks. - We do one statistical test for each block.
34BabelomicsFatiGO Compare
Select the organism
List of genes
Or use the rest of genome
35BabelomicsFatiGO Compare options
Databases available
Select Fisher's exact test
36BabelomicsFatiGO Your Annotations
Useful when you work with your own
annotations or with an specie that is not in
Babelomics
38969_at GO0003677 37639_at
GO0006306 37149_s_at GO0004674 37149_s_at
GO0005525 37639_at GO0006306 37149_s_
at GO0004674 37149_s_at
GO0005525 36554_at GO0004674 38052_at
GO0017111 38052_at
GO0016021 38840_s_at GO0016021
37BabelomicsFatiGO Genomic
Choose the region
Useful when you have CGH arrays or CNV data
38BabelomicsFatiGO results
Gene group1 is enriched in this functional block
Gene group2 is enriched in this functional block
percentages
p-values
corrected p-values
39BabelomicsFatiGO exercises
- Tool demo
- Go to the tutorial
- http//bioinfo.cipf.es/babelomicswiki/toolfatigo
- and try to reproduce the examples
40BabelomicsFatiGO approach may not be very
powerfull
A B
GO1 GO2
-
Significantly over-expressed in B
If a threshold based on the experimental values
is applied, and the resulting selection of genes
compared for enrichment of a functional term,
this might not be found
t-test with two tails. plt0.05
statistic
Significantly over-expressed in A
41Index
- Functional Annotation databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
42BabelomicsFatiscan features
- Interpret a ranked list of genes
- There is not need for choosing a cut-off (all
information is included)? - One statistical test for each Functional Block of
annotation - Multiple testing context (hundreds of
annotation)? - Filtering of annotation is convenient (the less
tests the best correction)?
43BabelomicsFatiscan, testing along an ordered list
B
C
A
List of genes
- Index ranking genes according to some biological
aspect under study. - Database that stores gene class membership
information. - FatiScan searches over the whole ordered list,
trying to find runs of functionally related genes.
Block of genes enriched in the annotation A
Annotation C is homogeneously distributed along
the list
Block of genes enriched in the annotation B
-
44BabelomicsFatiscan results
B
C
A
List of genes
-
45BabelomicsFatiscan results
A B
Gene ranking index
-
46FatiScan Example two classes
Tumor Control
t Tumor mean expression Control mean
expression
t
Proliferation
Is more associated with the genes on the top of
the list
All genes in the array
Is more associated with the genes that show
higher expression in Tumors
- t
47FatiScan Example - Survival Analysis
- Cromer et all. Identification of genes associated
with tumorigenesis and metastatic potential of
hypopharyngeal cancer by microarray analysis.
Oncogene 2004, 23(14) 2484-2498. - 34 hypopharyngeal cancer samples taken from
patients undergoing surgery. - Analyzed using Affymetrix HG-U95A microarrays
(12650 distinct transcription features ). - Disease free survival time after intervention was
recorded
Cox proportional hazards model h(t) h0 (t)
exp (? gene expression)?
48Gene Ontology biological process
?
Hazard increased with expression
Hazard decreased with expression
- ?
49BabelomicsFatiscan Web tool
List of genes
Functional databases
50BabelomicsFatiscan exercises
- Tool demo
- Go to the tutorial
- http//bioinfo.cipf.es/babelomicswiki/toolfatisca
n - and try to reproduce the examples
51Index
- Functional Annotation databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
52Babelomicsand others...
- Tissues Mining Tool (TMT) compares expression of
two lists of genes in a set of tissues - ID Converter 10 species and almost of the
existing Ids - GOGraphViewer a DAG viewer tool generates joined
gene ontology graphs (DAGs)? - Marmite
- ...
53BabelomicsMarmite
54Missing Functioal Anotation
Experiment
Data-Analysis
Gene-List
MNAT1 CTNNBL1 ENOX2 GTPBP1 RALY TAGLN2 RAB3A PPP2R
5A MAPRE1 ..... ...
Functional Annotation
Functional interpretation
Functional Profiling
55Blast2GO
Generates annotations
Visualization of funcional annotations
56Blast2GO Annotation strategy
Sq1
Sq1
Sq1
Sq2
Sq2
Sq2
Blast
Mapping
Annotation
Sq3
Sq3
Sq3
Sq4
Sq4
Sq4
57Input data (in fasta format)?
gtmy_favourite_species_seq1 still
unknown gtgatggaaaagaaaagttttgttatcgtcgacgcatatggg
tttctttttcgcgcgtattatgcgctgcctggattaagcacctcatacaa
ttttcctgtaggaggtgtatatggttttataaacatacttttgaaacatc
tctctttccacgatgcagattatttagttgtggtatttgattcggggtcg
aaaaattttcgtcacactatgtattccgaatacaaaactaatcgccctaa
agcaccagaggatctgtcactacaatgtgctccgctacgtgaggctgttg
aagcgtttaatattgtaagtgaagaagtgcttaactacgaagcagacgac
gtaatagctacactctgtacaaaatatgcatctagtaatgttggagtgag
aatactgtcagcagataaggatttactacaactcctaaatgataatgttc
aagtttacgaccctataaaaagcagatacctcaccaatgaatacgtttta
gaaaaatttggtgtttcatcagataagttgcatattgatacggttgcatc
gagttataatgagaaaattattctcagctaagctgtacaccgtttattac
acactcgaaaggccgttag gtmy_favourite_species_seq2
no clue ttgttagctaaaaaggaagactttcacacctttggtaatggt
gttggctctgctggaacaggtggagttgtagtttctgcatccatgttgtc
tgcggatttttcaaatcttagagaagagatagcagcggttagtacggctg
gtgcagattggttacacattgatgtgatggatgggtgcttcgtccccagt
ttgactatgggtcctgtggtgatttccggcattaggaaatgtacaaatat
gtttcttgatgtgcatttgatgattaatcgcccaggcgatcatctgaaga
gtgtggtagatgctggagctgataagatagagcacattcgcaagatgata
gaggaaagctcatcaaccgcgaaaatcgctgttgatggtggtgtttcaac
ggataatgcccgggctgttatcgaggcaggtgcgaatatactcgttgttg
gaacggcgctgtttgctgctgacgatatgagtaaagttgtaagaacttta
aaatcattttaa gtmy_favourite_species_seq3 just
sequenced gtgggactgctcatccctgtaggcagggtggctatttttt
gtgtaaaggcagtctttcatagtcttgtaccgccatactatctatggata
actacaaagcagttttttgaggtgtggtttttctctcttcctatagtagc
agttacatctttgtttacgggaggcgcgttagcccttcaggataccctcg
tgggaagcgctaaagtatcagggtaatggagtttttactcctgcaagatg
taatagagggtctggtaaaagctgtatcgtttgggctggtaatttcgcta
gttgggtgttacaacgggtatcactgtgagataggcgcaaggggtgtagg
aacagcgacaacaaaaacttcggtagcagcttctatgctcataattttgt
taaactatataattactgttttttacgcgta gtmy_favourite_spec
ies_seq4 we will see soon... atgtacgctgtatctcttt
caaatttgcatgtctctttcaacaacaaggaggttttgaaaggtgttgac
ttggacatagcatggggggattccctggttatactgggagaatctggtag
tggaaagtctgtactaacaaaggttgtattgggtctaatagtgccccaag
agggaagtgttactgtagatggcaccaatattcttgagaataggcagggc
atcaagaattttagtgttttgtttcaaaactgtgcgttatttgacagtct
tacgatttgggaaaatgtagtattcaatttccgtaggaggcttcgtttag
ataaggataatgccaaggctttggctttacggggattggagcttgtggga
ttggacgccagtgtaatgaacgtgtatcctgtggagctatcaggcgggat
gaaaaagcgcgtagctttggcaagagctattataggtagtcccaaaattc
taattttggatgagccaacttcgggattggatcctataatgtcttcagtg
gt
asdf
asdf
58Blast2GO Application
(1) Blast
(2) Mapping
(3) Annotation
Table with all the sequence information
Application statistics
Blast results
Application messages
Graph visualisation
59Finding the homologues (BLAST)?
parameter descriptions
Where to run Blast
Choose Database
Number of blast hits
e-Value cut-off
Blast algorithm
Blast mode
HSP length cut-off
save your results apart
60Blast2GO in Babelomics
- Blast
- Annotation
- Visualization
61Blast and annotation in Babelomics
62Blast and annotation in Babelomics
63Graph visualisation in Babelomics
64BabelomicsBlast2GO
Sequence data
Blast options
65Index
- Functional Annotation databases
- Babelomics Suite
- FatiGO
- Fatiscan
- Other tools
- Next step towards the GEPAS and Babelomics
integration
66Next stepGEPAS y Babelomics integration
- due in November/December
- New web interface based in new technologies of
the WEB 2.0 - DAS server and WEB services for almost all tools
- Local installation and command line execution
available - Many bugfixes and performance improvements
- A Java framework to run new tools
- ...
67Next stepNew tools
- New tools
- SNOW a tool for study the interactome
- Transcriptome analysis
- Logistic Regression method to gene set analysis
- Genome and annotation browser (DAS based)?
- Genome browser
- SNPs and methylation analysis
- Next generation sequence methods
- Epigenome and ChIPonChip analysis
- ...