Title: http://creativecommons.org/licenses/by-sa/2.0/
1http//creativecommons.org/licenses/by-sa/2.0/
2EnsemblDatabase and Web Browser
Erin Pleasance Canadas Michael Smith Genome
Sciences Centre, Vancouver
3www.ensembl.org
4What is Ensembl?
- Joint project of EBI and Sanger
- Automated annotation of eukaryotic genomes
- Open source software
- Relational database system
- Web interface
The main aim of this campaign is to encourage
scientists across the world - in academia,
pharmaceutical companies, and the biotechnology
and computer industries - to use this free
information.
- Dr. Mike Dexter, Director of the Wellcome Trust
5Ensembl components
Search tools
Data
Chromosomes (ChromoView, KaryoView, CytoView,
MapView)
SNPs and Haplotypes (SNPView, GeneSNPView, HaploVi
ew, LDView)
Sequence Similarity (BLAST, SSAHA)
Diseases (DiseaseView)
Genome Sequence (ContigView)
Genes (GeneView, TransView, ExonView, ProtView)
Markers (MarkerView)
Functions (GOView)
Text (TextView)
Other Annotations
Families (DomainView, FamilyView
Anything (EnsMart)
Comparative Genomics (ContigView,
MultiContigView, SyntenyView, GeneView)
6Species in Ensembl
- Focus on vertebrates
- No fungi/plants
- Arabidopsis genome browser based on Ensembl at
http//atensembl.arabidopsis.info/
Vertebrates
Invertebrates
Mammals Human Chimp Mouse Rat Dog Cow Opossum
Insects Fruitfly Mosquito Honeybee
Fish Zebrafish Fugu Pufferfish Tetraodon
Pufferfish
Other Chicken Frog
Other Nematode
7Ensembl Gene Annotation
- Basis for initial analysis and publication of
most vertebrate genomes - Genome assembly from NCBI
- Gene build system
- Targetted gene builds predict known genes
- Similarity gene builds predict novel genes
8Curwen et al, Genome Res 14 942-950, 2004
9Targetted gene build
- Align known proteins with pmatch and BLAST
- Incorporate aligned cDNA sequences to find splice
sites, UTRs with genewise
UTRs predicted
Known gene (p53)
ContigView of best in genome gene with associated
evidence
Proteins aligned
Unigene clusters aligned
cDNAs aligned
10Similarity gene build
- Identify novel exons ab initio using Genscan
- Confirm exons by BLAST to known proteins, mRNAs,
UniGene clusters
ContigView of homology gene with associated
evidence
Unigene clusters aligned
Proteins aligned
Novel gene
GenScan predictions
11Ensembl Gene Annotation
- Resulting Ensembl genes are highly accurate
with low false positive rates - Ensembl human gene identifiers are 95 stable
between builds
12Manually curated genes VEGA
- Some chromosomes contain manually curated genes
from VEGA database - Otter database/server allows integration of
automatic and manual annotations (eg. from
Apollo)
VEGA gene
13Ensembl EST genes
- ESTs not accurate enough to produce Ensembl
genes, but important especially for identifying
alternative transcripts - ESTs aligned to genome and merged to create an
independent set of EST genes
Known gene
EST genes
Unigene clusters aligned
14Pseudogenes
- Processed pseudogenes in annotation identified
(lack of introns, frameshifts, presence of
multi-exon version elsewhere in genome, etc.)
Pseudogene
15Noncoding RNA Genes
- Genes with no ORFs that are functional (tRNAs,
rRNAs, miRNAs ) - 7220 annotations from Sean Eddy and Tom Jones
miRNAs
Coding gene
16Example 1 Exploring Caspase-3
- Aim to demonstrate basic browsing and views
- Caspase-3 is a gene involved in apoptosis (cell
suicide) - We will look at
- Gene annotation
- SNPs
- Orthologs and genome alignments
- Alternative transcripts and EST genes
17Example 1 Exploring Caspase-3
http//www.ensembl.org
18Species-specific homepage
Site map
Statistics of current release
19Finding the tool/view Site Map
20Text Search
Click Back to
Species-specific homepage
Gene
caspase-3
21GeneView
ContigView
ExportView
SNPView
ProteinView
ExonView
TransView of transcript
22GeneView
Orthologs predicted by sequence similarity and
synteny
GeneDAS Get data from external sources
23GeneView
On the same page, information provided for each
transcript individually
Links to external databases
24GeneView
25GeneSNPView
26Other SNP/Haplotype tools
- SNPView
- ProteinView (protein sequence with SNP markup)
- LDView View linkage disequilibrium (only limited
regions) - HaploView View haplotypes (only limited regions)
27GeneView
Click Back to
28ContigView
Chromosome and bands
Sequence contigs
29ContigView Detailed View
Genscan predictions
Targetted gene predictions (2 alternative
transcripts)
Gene annotations
EST genes
Other tracks Aligned sequences etc.
30ContigView
31MultiContigView
DNA sequence homology
Rat ortholog
32Other Comparative Genomics Tools
- Saw gene orthology, DNA homology
- Other view is SyntenyView
- Also access comparative genomics through EnsMart
33Data Mining with EnsMart
- Allows very fast, cross-data source querying
- Search for genes (features, sequences, etc.) or
SNPs based on - Position function domains similarity
expression etc. - Accessible from Ensembl website (MartView) as
well as stand-alone - Extremely powerful for data mining
34Example 2 EnsMart
- A new disease locus has been mapped between
markers D21S1991 and D21S171. It may be that the
gene involved has already been identified as
having a role in another disease. What candidates
are in this region?
35Example 2 EnsMart
- EnsMart is based on BioMart
- http//www.ensembl.org/Multi/martview
- OR
- http//www.ebi.ac.uk/BioMart/martview
36EnsMart Choosing your dataset
37EnsMart Filtering
21
D21S1991
D21S171
38EnsMart Output
Note you can output different types of
information eg. sequences
39EnsMart Output
40Sequence Similarity Searching
- Use SSAHA for exact matches (fast)
- Use BLAST for more distant similarity (slow)
41Finding anything else Help
42DAS Getting your Own Data in Ensembl
- DAS (Distributed Annotation System)
- Anyone can load data into Ensembl and allow
others to view it in the same view (eg.
ContigView) as other Ensembl annotations - Some built-in DAS sources
- http//www.ensembl.org/Docs/ ldas.html
43Other Ways to Access Ensembl
- MySQL database directly accessible
- APIs for Perl and Java
- Other software
- Apollo Java genome annotation viewer/editor
- Sockeye Java viewer
- You can get your own local version of Ensembl
software and data freely available
Sockeye
44For more information
- Publications (listed at http//www.ensembl.org/Doc
s/ wiki/html/EnsemblDocs/EnsemblPublications.html)
- Ensembl Special Genome Research May 2004
- Ensembl updates NAR Jan. 2002-2005
- EnsMart Kasprzyk et al, Genome Res Jan. 2004
- Documentation on how to download software and
database - http//www.ensembl.org/Docs/
45Exercises
- Homologues of human genes are often present in
Fugu rubripes in more condensed form (with
shorter introns). Is this true for the gene PTEN,
a tumor suppressor often mutated in advanced
cancers? - Try MultiContigView can you think of another way
to get this information as well? - The microRNA bantam regulates the Drosophila
(fruitfly) gene hid by binding the 3 UTR. Hid is
involved in apoptosis, and it is possible that
binding sites for bantam could be found in the 3
UTR of other apoptosis genes as well. Obtain the
3 UTR sequence of all Drosophila genes known to
be involved in apoptosis. - Using EnsMart, the GO term for apoptosis is
GO0006915, evidence code TAS - The file PCR_product.txt contains the sequence
of a PCR product amplified from a mouse cDNA
library. What gene does the product correspond
to? Does it contain the complete coding sequence
of that gene? - Would it be better to use BLAST or SSAHA?