http://creativecommons.org/licenses/by-sa/2.0/ - PowerPoint PPT Presentation

1 / 45
About This Presentation
Title:

http://creativecommons.org/licenses/by-sa/2.0/

Description:

ContigView of best in genome gene with associated evidence. Known gene (p53) Proteins aligned ... Identify novel exons ab initio using Genscan ... Exercises ... – PowerPoint PPT presentation

Number of Views:150
Avg rating:3.0/5.0
Slides: 46
Provided by: stephe78
Category:

less

Transcript and Presenter's Notes

Title: http://creativecommons.org/licenses/by-sa/2.0/


1
http//creativecommons.org/licenses/by-sa/2.0/
2
EnsemblDatabase and Web Browser
Erin Pleasance Canadas Michael Smith Genome
Sciences Centre, Vancouver
3
www.ensembl.org
4
What is Ensembl?
  • Joint project of EBI and Sanger
  • Automated annotation of eukaryotic genomes
  • Open source software
  • Relational database system
  • Web interface

The main aim of this campaign is to encourage
scientists across the world - in academia,
pharmaceutical companies, and the biotechnology
and computer industries - to use this free
information.
- Dr. Mike Dexter, Director of the Wellcome Trust
5
Ensembl components
Search tools
Data
Chromosomes (ChromoView, KaryoView, CytoView,
MapView)
SNPs and Haplotypes (SNPView, GeneSNPView, HaploVi
ew, LDView)
Sequence Similarity (BLAST, SSAHA)
Diseases (DiseaseView)
Genome Sequence (ContigView)
Genes (GeneView, TransView, ExonView, ProtView)
Markers (MarkerView)
Functions (GOView)
Text (TextView)
Other Annotations
Families (DomainView, FamilyView
Anything (EnsMart)
Comparative Genomics (ContigView,
MultiContigView, SyntenyView, GeneView)
6
Species in Ensembl
  • Focus on vertebrates
  • No fungi/plants
  • Arabidopsis genome browser based on Ensembl at
    http//atensembl.arabidopsis.info/

Vertebrates
Invertebrates
Mammals Human Chimp Mouse Rat Dog Cow Opossum
Insects Fruitfly Mosquito Honeybee
Fish Zebrafish Fugu Pufferfish Tetraodon
Pufferfish
Other Chicken Frog
Other Nematode
7
Ensembl Gene Annotation
  • Basis for initial analysis and publication of
    most vertebrate genomes
  • Genome assembly from NCBI
  • Gene build system
  • Targetted gene builds predict known genes
  • Similarity gene builds predict novel genes

8
Curwen et al, Genome Res 14 942-950, 2004
9
Targetted gene build
  • Align known proteins with pmatch and BLAST
  • Incorporate aligned cDNA sequences to find splice
    sites, UTRs with genewise

UTRs predicted
Known gene (p53)
ContigView of best in genome gene with associated
evidence
Proteins aligned
Unigene clusters aligned
cDNAs aligned
10
Similarity gene build
  • Identify novel exons ab initio using Genscan
  • Confirm exons by BLAST to known proteins, mRNAs,
    UniGene clusters

ContigView of homology gene with associated
evidence
Unigene clusters aligned
Proteins aligned
Novel gene
GenScan predictions
11
Ensembl Gene Annotation
  • Resulting Ensembl genes are highly accurate
    with low false positive rates
  • Ensembl human gene identifiers are 95 stable
    between builds

12
Manually curated genes VEGA
  • Some chromosomes contain manually curated genes
    from VEGA database
  • Otter database/server allows integration of
    automatic and manual annotations (eg. from
    Apollo)

VEGA gene
13
Ensembl EST genes
  • ESTs not accurate enough to produce Ensembl
    genes, but important especially for identifying
    alternative transcripts
  • ESTs aligned to genome and merged to create an
    independent set of EST genes

Known gene
EST genes
Unigene clusters aligned
14
Pseudogenes
  • Processed pseudogenes in annotation identified
    (lack of introns, frameshifts, presence of
    multi-exon version elsewhere in genome, etc.)

Pseudogene
15
Noncoding RNA Genes
  • Genes with no ORFs that are functional (tRNAs,
    rRNAs, miRNAs )
  • 7220 annotations from Sean Eddy and Tom Jones

miRNAs
Coding gene
16
Example 1 Exploring Caspase-3
  • Aim to demonstrate basic browsing and views
  • Caspase-3 is a gene involved in apoptosis (cell
    suicide)
  • We will look at
  • Gene annotation
  • SNPs
  • Orthologs and genome alignments
  • Alternative transcripts and EST genes

17
Example 1 Exploring Caspase-3
http//www.ensembl.org
18
Species-specific homepage
Site map
Statistics of current release
19
Finding the tool/view Site Map
20
Text Search
Click Back to
Species-specific homepage
Gene
caspase-3
21
GeneView
ContigView
ExportView
SNPView
ProteinView
ExonView
TransView of transcript
22
GeneView
Orthologs predicted by sequence similarity and
synteny
GeneDAS Get data from external sources
23
GeneView
On the same page, information provided for each
transcript individually
Links to external databases
24
GeneView
25
GeneSNPView
26
Other SNP/Haplotype tools
  • SNPView
  • ProteinView (protein sequence with SNP markup)
  • LDView View linkage disequilibrium (only limited
    regions)
  • HaploView View haplotypes (only limited regions)

27
GeneView
Click Back to
28
ContigView
Chromosome and bands
Sequence contigs
29
ContigView Detailed View
Genscan predictions
Targetted gene predictions (2 alternative
transcripts)
Gene annotations
EST genes
Other tracks Aligned sequences etc.
30
ContigView
31
MultiContigView
DNA sequence homology
Rat ortholog
32
Other Comparative Genomics Tools
  • Saw gene orthology, DNA homology
  • Other view is SyntenyView
  • Also access comparative genomics through EnsMart

33
Data Mining with EnsMart
  • Allows very fast, cross-data source querying
  • Search for genes (features, sequences, etc.) or
    SNPs based on
  • Position function domains similarity
    expression etc.
  • Accessible from Ensembl website (MartView) as
    well as stand-alone
  • Extremely powerful for data mining

34
Example 2 EnsMart
  • A new disease locus has been mapped between
    markers D21S1991 and D21S171. It may be that the
    gene involved has already been identified as
    having a role in another disease. What candidates
    are in this region?

35
Example 2 EnsMart
  • EnsMart is based on BioMart
  • http//www.ensembl.org/Multi/martview
  • OR
  • http//www.ebi.ac.uk/BioMart/martview

36
EnsMart Choosing your dataset
37
EnsMart Filtering
21
D21S1991
D21S171
38
EnsMart Output
Note you can output different types of
information eg. sequences
39
EnsMart Output
40
Sequence Similarity Searching
  • Use SSAHA for exact matches (fast)
  • Use BLAST for more distant similarity (slow)

41
Finding anything else Help
42
DAS Getting your Own Data in Ensembl
  • DAS (Distributed Annotation System)
  • Anyone can load data into Ensembl and allow
    others to view it in the same view (eg.
    ContigView) as other Ensembl annotations
  • Some built-in DAS sources
  • http//www.ensembl.org/Docs/ ldas.html

43
Other Ways to Access Ensembl
  • MySQL database directly accessible
  • APIs for Perl and Java
  • Other software
  • Apollo Java genome annotation viewer/editor
  • Sockeye Java viewer
  • You can get your own local version of Ensembl
    software and data freely available

Sockeye
44
For more information
  • Publications (listed at http//www.ensembl.org/Doc
    s/ wiki/html/EnsemblDocs/EnsemblPublications.html)
  • Ensembl Special Genome Research May 2004
  • Ensembl updates NAR Jan. 2002-2005
  • EnsMart Kasprzyk et al, Genome Res Jan. 2004
  • Documentation on how to download software and
    database
  • http//www.ensembl.org/Docs/

45
Exercises
  • Homologues of human genes are often present in
    Fugu rubripes in more condensed form (with
    shorter introns). Is this true for the gene PTEN,
    a tumor suppressor often mutated in advanced
    cancers?
  • Try MultiContigView can you think of another way
    to get this information as well?
  • The microRNA bantam regulates the Drosophila
    (fruitfly) gene hid by binding the 3 UTR. Hid is
    involved in apoptosis, and it is possible that
    binding sites for bantam could be found in the 3
    UTR of other apoptosis genes as well. Obtain the
    3 UTR sequence of all Drosophila genes known to
    be involved in apoptosis.
  • Using EnsMart, the GO term for apoptosis is
    GO0006915, evidence code TAS
  • The file PCR_product.txt contains the sequence
    of a PCR product amplified from a mouse cDNA
    library. What gene does the product correspond
    to? Does it contain the complete coding sequence
    of that gene?
  • Would it be better to use BLAST or SSAHA?
Write a Comment
User Comments (0)
About PowerShow.com