Babelomics - PowerPoint PPT Presentation

1 / 67
About This Presentation
Title:

Babelomics

Description:

Rosetta database is freely distributed, to access data: ... Next releases of Rosetta will add more regulatory elements and pathways like ... – PowerPoint PPT presentation

Number of Views:246
Avg rating:3.0/5.0
Slides: 68
Provided by: sbfo
Category:

less

Transcript and Presenter's Notes

Title: Babelomics


1
Babelomics
  • Functional interpretation of
  • genome-scale experiments

Edinburgh, October 2008 Ignacio Medina
(Nacho)? dmontaner_at_cipf.es http//bioinfo.cipf.es/
dmontaner Bioinformatics and Genomics
Department Centro de Investigacion Principe
Felipe (CIPF)? (Valencia, Spain)?
2
Babelomics A systems biology web resource for
the functional interpretation of genome-scale
experiments.
http//www.babelomics.org
3
Genome-scale experiment output
Functional Interpretation
4
Functional interpretation
To interpret experimental results is to use
current knowledge to rearrange them in a
meaningful way.
Experimental results observed in the lab (not
always a wet-lab).
  • Recorded to
  • Test a hypothesis.
  • Get a first insight of a biological process.

DB information. Already tested and stored
5
Index
  • Functional Annotation Databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

6
Index
  • Functional Annotation Databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

7
Functional Annotation DDBBIntroduction
  • Last years has been an exponential increase
  • The Nucleic Acids Research online Molecular
    Biology Database Collection is a public
    repository that lists almost all biological
    databases
  • The 2008 update includes 1078 databases !!

http//www3.oup.co.uk/nar/database/c/
8
Functional Annotation DDBBFunctional Databases
Some of the biological databases contains
Functional Information of the genes and sequences
Arabidopsis thaliana
Homo sapiens
Mus musculus
Rattus norvegicus
Drosophila melanogaster
Caenorhabditis elegans
Saccharmoyces cerevisae
Gallus gallus
Danio rerio
HGNC symbol EMBL acc RefSeq
PDB Protein Id IPI.
UniProt/Swiss-Prot UniProtKB/TrEMBL Ensembl IDs
EntrezGene Affymetrix Agilent
Gene IDs
Functional databases
9
Functional Annotation DDBBGene Ontology (GO
terms)?
  • The Gene Ontology project provides a controlled
    vocabulary to describe gene and gene product
    attributes in any organism
  • Last version has more than 22.000 terms
  • The controlled vocabularies of terms are
    structured

http//www.geneontology.org/
10
Functional Annotation DDBBGene Ontology (GO
terms)?
The three categories of GO Molecular Function
the tasks performed by individual gene products
examples are transcription factor and DNA
helicase Biological Process broad biological
goals, such as mitosis or purine metabolism, that
are accomplished by ordered assemblies of
molecular functions Cellular Component
subcellular structures, locations, and
macromolecular complexes examples include
nucleus, telomere, and origin recognition complex
11
Functional Annotation DDBBGene Ontology (GO
terms)?
GO is a DAG (Directed Acyclic Graph)?
terms are structured
More general information
Levels
More detailed information
12
Functional Annotation DDBBGene Ontology (GO
terms)?
  • AmiGO provides a web interface to search and
    browse the ontology and annotation data

http//amigo.geneontology.org/cgi-bin/amigo/go.cgi
  • QuickGO (EBI) provides also a web interface

http//www.ebi.ac.uk/ego
13
Functional Annotation DDBBGene Ontology (GO
terms)?
Example of GO annotation of BRCA2 in Ensembl48
14
Functional Annotation DDBBKEGG
KEGG pathways
http//www.genome.jp/kegg/
15
Functional Annotation DDBBKEGG
16
Functional Annotation DDBBMicroRNA
  • Involved in gene regulation
  • More than 8600 miRNAs
  • The target database contains computationally
    predicted targets for microRNAs across many
    species

http//microrna.sanger.ac.uk/
17
Functional Annotation DDBBInterPro
  • A centralized database of protein families,
    domains, repeats and sites in which identifiable
    features found in known proteins can be applied
    to new protein sequences

http//www.ebi.ac.uk/interpro/
Contents of InterPro 18.0
18
Functional Annotation DDBBCisRed
  • Holds conserved sequence motifs identified by
    genome scale motif discovery, similarity,
    clustering, co-occurrence and coexpression
    calculations

http//www.cisred.org/
19
Functional Annotation DDBBSwissprot
  • A curated protein sequence database with a high
    level of annotation, such as the description of
    the function of a protein, its domains structure,
    post-translational modifications, variants, etc.
  • Last release consist of 398181 entries.

20
Functional Annotation DDBBPractical exercise
  • About BCL2, BRCA1, ATM and P53(TP53)
  • try to find the biological process and cellular
    components (GO terms), do they share some GO
    terms? Is that significant?
  • Are they targets of the same microRNA (mirbase)?
  • What about protein functional domains (interpro)?
  • Are they regulated by the same conserved motifs
    (cisred)?
  • Are they involved in a common disease or pathway
    (kegg, biocarta)?
  • ...

21
Functional Annotation DDBBFrom GEPAS to
Babelomics
GEPAS Analysis
Microarray Data
Differential expression
Genes differentially expressed
List of genes (ie 120 genes)?
Preprocessing (normalization, scaling, ...)? Tab
matrix file
Predictors
Predicting genes
Genes with same expression patterns
Clustering
Genes from a deleted or duplicated region
Next generation of High throughput Sequencing
CGH array
Babelomics
22
Functional Annotation DDBBBabelomics try to
answer these questions
  • Is there any significant functional enrichment in
    my gene list?
  • Are these genes involved in the same pathways?
  • Are they sharing a specific microRNA regulation?
  • Are they involved in the same disease?
  • ...

23
BabelomicsSchema
Babelomics, a suite of web tools for statistical
test, multiple test corrections, blast, ...
24
Index
  • Functional Annotation databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

25
BabelomicsWEB tools suite
  • A complete suite of web tools for the functional
    analysis of groups of genes in high-throughput
    experiments

26
BabelomicsKey features
  • Many functional and regulatory definitions (GO,
    KEGG, Biocarta, text-mining derived bioentities,
    Transcription factors, CisRed, miRNAs, InterPro,
    etc.)?
  • Wide coverage of model organisms (more than 10
    species)?
  • Functional profiling by functional enrichment
    method (FatiGO) and gene set method (Fatiscan)?
  • Tools for automatic functional annotation of
    unknown sequences (Blast2GO) integrated

27
BabelomicsDatabases
  • Rosetta database is freely distributed, to access
    data
  • A web page to access to data is already
    integrated in Babelomics
  • Java API to access to data is being developed
    (local installation of database required)?
  • Web services available
  • DAS server with the annotation is also being
    developed
  • Next releases of Rosetta will add more regulatory
    elements and pathways like epigenetic changes and
    Reactome database

28
BabelomicsDatabases
Babelomics
29
BabelomicsTools
FatiGO Finds differential distributions of
functional terms between two groups of genes,
these terms can be Gene Ontology , InterPro
motifs, SwissProt KW , transcription factors
(TF), gene expression in tissues, bioentities
from scientific literature, cis-regulatory
elements CisRed. Tissues Mining Tool compares
reference values of gene expression in tissues to
your results. MARMITE Finds differential
distributions of bioentities extracted from
PubMed between two groups of genes. FatiScan
detect significant functions with Gene Ontology,
InterPro motifs, Swissprot KW and KEGG pathways
in lists of genes ordered according to differents
characteristics. MarmiteScan Use chemical and
disease-related information to detect related
blocks of genes in a gene list with associated
values.
30
Index
  • Functional Annotation databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next direction towards the GEPAS and Babelomics
    integration

31
BabelomicsFatiGO
  • It allow us to compare functional annotation of
  • Two list of genes
  • One list against the rest of genome
  • A genomic region against the rest of genome
  • One statistical test for each Functional Block of
    annotation
  • Fisher's exact test
  • Multiple testing context (hundreds of
    annotation)?
  • Filtering of annotation is convenient (the less
    tests the best correction)?

32
BabelomicsFatiGO test
One Gene List (A)?
The other list (B)?
Are this two groups of genes carrying out
different biological roles?
Biosynthesis 60
Biosynthesis 20
Sporulation 20
Sporulation 20
Genes in group A have significantly to do with
biosynthesis, but not with sporulation.
We do this for each GO, miRNA, Interpro , ... !!!
33
Multiple functional testing
  • The unit of information over which we test is
    shifted from genes to functional blocks.
  • We do one statistical test for each block.

34
BabelomicsFatiGO Compare
Select the organism
List of genes
Or use the rest of genome
35
BabelomicsFatiGO Compare options
Databases available
Select Fisher's exact test
36
BabelomicsFatiGO Your Annotations
Useful when you work with your own
annotations or with an specie that is not in
Babelomics
38969_at GO0003677 37639_at
GO0006306 37149_s_at GO0004674 37149_s_at
GO0005525 37639_at GO0006306 37149_s_
at GO0004674 37149_s_at
GO0005525 36554_at GO0004674 38052_at
GO0017111 38052_at
GO0016021 38840_s_at GO0016021
37
BabelomicsFatiGO Genomic
Choose the region
Useful when you have CGH arrays or CNV data
38
BabelomicsFatiGO results
Gene group1 is enriched in this functional block
Gene group2 is enriched in this functional block
percentages
p-values
corrected p-values
39
BabelomicsFatiGO exercises
  • Tool demo
  • Go to the tutorial
  • http//bioinfo.cipf.es/babelomicswiki/toolfatigo
  • and try to reproduce the examples

40
BabelomicsFatiGO approach may not be very
powerfull
A B
GO1 GO2
-
Significantly over-expressed in B
If a threshold based on the experimental values
is applied, and the resulting selection of genes
compared for enrichment of a functional term,
this might not be found
t-test with two tails. plt0.05
statistic
Significantly over-expressed in A

41
Index
  • Functional Annotation databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

42
BabelomicsFatiscan features
  • Interpret a ranked list of genes
  • There is not need for choosing a cut-off (all
    information is included)?
  • One statistical test for each Functional Block of
    annotation
  • Multiple testing context (hundreds of
    annotation)?
  • Filtering of annotation is convenient (the less
    tests the best correction)?

43
BabelomicsFatiscan, testing along an ordered list
B
C
A
List of genes
  • Index ranking genes according to some biological
    aspect under study.
  • Database that stores gene class membership
    information.
  • FatiScan searches over the whole ordered list,
    trying to find runs of functionally related genes.

Block of genes enriched in the annotation A
Annotation C is homogeneously distributed along
the list
Block of genes enriched in the annotation B
-
44
BabelomicsFatiscan results
B
C
A
List of genes

-
45
BabelomicsFatiscan results
A B

Gene ranking index
-
46
FatiScan Example two classes
Tumor Control
t Tumor mean expression Control mean
expression
t
Proliferation
Is more associated with the genes on the top of
the list
All genes in the array
Is more associated with the genes that show
higher expression in Tumors
- t
47
FatiScan Example - Survival Analysis
  • Cromer et all. Identification of genes associated
    with tumorigenesis and metastatic potential of
    hypopharyngeal cancer by microarray analysis.
    Oncogene 2004, 23(14) 2484-2498.
  • 34 hypopharyngeal cancer samples taken from
    patients undergoing surgery.
  • Analyzed using Affymetrix HG-U95A microarrays
    (12650 distinct transcription features ).
  • Disease free survival time after intervention was
    recorded

Cox proportional hazards model h(t) h0 (t)
exp (? gene expression)?
48
Gene Ontology biological process
?
Hazard increased with expression
Hazard decreased with expression
- ?
  • lowest p-value 0.96

49
BabelomicsFatiscan Web tool
List of genes
Functional databases
50
BabelomicsFatiscan exercises
  • Tool demo
  • Go to the tutorial
  • http//bioinfo.cipf.es/babelomicswiki/toolfatisca
    n
  • and try to reproduce the examples

51
Index
  • Functional Annotation databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

52
Babelomicsand others...
  • Tissues Mining Tool (TMT) compares expression of
    two lists of genes in a set of tissues
  • ID Converter 10 species and almost of the
    existing Ids
  • GOGraphViewer a DAG viewer tool generates joined
    gene ontology graphs (DAGs)?
  • Marmite
  • ...

53
BabelomicsMarmite
54
Missing Functioal Anotation
Experiment
Data-Analysis
Gene-List
MNAT1 CTNNBL1 ENOX2 GTPBP1 RALY TAGLN2 RAB3A PPP2R
5A MAPRE1 ..... ...
Functional Annotation

Functional interpretation
Functional Profiling
55
Blast2GO
Generates annotations
Visualization of funcional annotations
56
Blast2GO Annotation strategy
Sq1
Sq1
Sq1
Sq2
Sq2
Sq2
Blast
Mapping
Annotation
Sq3
Sq3
Sq3
Sq4
Sq4
Sq4
57
Input data (in fasta format)?
gtmy_favourite_species_seq1 still
unknown gtgatggaaaagaaaagttttgttatcgtcgacgcatatggg
tttctttttcgcgcgtattatgcgctgcctggattaagcacctcatacaa
ttttcctgtaggaggtgtatatggttttataaacatacttttgaaacatc
tctctttccacgatgcagattatttagttgtggtatttgattcggggtcg
aaaaattttcgtcacactatgtattccgaatacaaaactaatcgccctaa
agcaccagaggatctgtcactacaatgtgctccgctacgtgaggctgttg
aagcgtttaatattgtaagtgaagaagtgcttaactacgaagcagacgac
gtaatagctacactctgtacaaaatatgcatctagtaatgttggagtgag
aatactgtcagcagataaggatttactacaactcctaaatgataatgttc
aagtttacgaccctataaaaagcagatacctcaccaatgaatacgtttta
gaaaaatttggtgtttcatcagataagttgcatattgatacggttgcatc
gagttataatgagaaaattattctcagctaagctgtacaccgtttattac
acactcgaaaggccgttag gtmy_favourite_species_seq2
no clue ttgttagctaaaaaggaagactttcacacctttggtaatggt
gttggctctgctggaacaggtggagttgtagtttctgcatccatgttgtc
tgcggatttttcaaatcttagagaagagatagcagcggttagtacggctg
gtgcagattggttacacattgatgtgatggatgggtgcttcgtccccagt
ttgactatgggtcctgtggtgatttccggcattaggaaatgtacaaatat
gtttcttgatgtgcatttgatgattaatcgcccaggcgatcatctgaaga
gtgtggtagatgctggagctgataagatagagcacattcgcaagatgata
gaggaaagctcatcaaccgcgaaaatcgctgttgatggtggtgtttcaac
ggataatgcccgggctgttatcgaggcaggtgcgaatatactcgttgttg
gaacggcgctgtttgctgctgacgatatgagtaaagttgtaagaacttta
aaatcattttaa gtmy_favourite_species_seq3 just
sequenced gtgggactgctcatccctgtaggcagggtggctatttttt
gtgtaaaggcagtctttcatagtcttgtaccgccatactatctatggata
actacaaagcagttttttgaggtgtggtttttctctcttcctatagtagc
agttacatctttgtttacgggaggcgcgttagcccttcaggataccctcg
tgggaagcgctaaagtatcagggtaatggagtttttactcctgcaagatg
taatagagggtctggtaaaagctgtatcgtttgggctggtaatttcgcta
gttgggtgttacaacgggtatcactgtgagataggcgcaaggggtgtagg
aacagcgacaacaaaaacttcggtagcagcttctatgctcataattttgt
taaactatataattactgttttttacgcgta gtmy_favourite_spec
ies_seq4 we will see soon... atgtacgctgtatctcttt
caaatttgcatgtctctttcaacaacaaggaggttttgaaaggtgttgac
ttggacatagcatggggggattccctggttatactgggagaatctggtag
tggaaagtctgtactaacaaaggttgtattgggtctaatagtgccccaag
agggaagtgttactgtagatggcaccaatattcttgagaataggcagggc
atcaagaattttagtgttttgtttcaaaactgtgcgttatttgacagtct
tacgatttgggaaaatgtagtattcaatttccgtaggaggcttcgtttag
ataaggataatgccaaggctttggctttacggggattggagcttgtggga
ttggacgccagtgtaatgaacgtgtatcctgtggagctatcaggcgggat
gaaaaagcgcgtagctttggcaagagctattataggtagtcccaaaattc
taattttggatgagccaacttcgggattggatcctataatgtcttcagtg
gt
asdf
asdf
58
Blast2GO Application
(1) Blast
(2) Mapping
(3) Annotation
Table with all the sequence information
Application statistics
Blast results
Application messages
Graph visualisation
59
Finding the homologues (BLAST)?
parameter descriptions
Where to run Blast
Choose Database
Number of blast hits
e-Value cut-off
Blast algorithm
Blast mode
HSP length cut-off
save your results apart
60
Blast2GO in Babelomics
  • Blast
  • Annotation
  • Visualization

61
Blast and annotation in Babelomics
62
Blast and annotation in Babelomics
63
Graph visualisation in Babelomics
64
BabelomicsBlast2GO
Sequence data
Blast options
65
Index
  • Functional Annotation databases
  • Babelomics Suite
  • FatiGO
  • Fatiscan
  • Other tools
  • Next step towards the GEPAS and Babelomics
    integration

66
Next stepGEPAS y Babelomics integration
  • due in November/December
  • New web interface based in new technologies of
    the WEB 2.0
  • DAS server and WEB services for almost all tools
  • Local installation and command line execution
    available
  • Many bugfixes and performance improvements
  • A Java framework to run new tools
  • ...

67
Next stepNew tools
  • New tools
  • SNOW a tool for study the interactome
  • Transcriptome analysis
  • Logistic Regression method to gene set analysis
  • Genome and annotation browser (DAS based)?
  • Genome browser
  • SNPs and methylation analysis
  • Next generation sequence methods
  • Epigenome and ChIPonChip analysis
  • ...
Write a Comment
User Comments (0)
About PowerShow.com