Investigating Genomes with Ensembl - PowerPoint PPT Presentation

About This Presentation
Title:

Investigating Genomes with Ensembl

Description:

Investigating Genomes with Ensembl – PowerPoint PPT presentation

Number of Views:153
Avg rating:3.0/5.0
Slides: 32
Provided by: giuliett
Category:

less

Transcript and Presenter's Notes

Title: Investigating Genomes with Ensembl


1
Investigating Genomes with Ensembl
  • Drs. Bert Overduin and Giulietta Spudich

2
Overview of the day
  • Introduction and website walk-through
  • Hands-on exercises (the browser)
  • Tea/Coffee
  • Introduction to BioMart
  • Hands-on exercises (BioMart)
  • Lunch
  • Determining the gene set
  • Hands-on exercises (gene set)
  • Tea/Coffee
  • Variations presentation and hands-on

3
Introducing
  • Genome browsing a comparison
  • Consensus genes
  • Ensembl annotation and software
  • How to find help

4
Sequencing the genome
5
What can we learn about genomes?
  • Within one genome regulatory elements, gene
    order, chromatin structure
  • Through comparative studies Evolution, conserved
    regions, rearrangements
  • Gene quality and prediction.

6
Genome Browsers Today
  • Ensembl Genome browser
  • http//www.ensembl.org
  • NCBI Map Viewer
  • http//www.ncbi.nlm.nih.gov/mapview/
  • UCSC Genome Browser
  • http//genome.ucsc.edu

7
Ensembl Genome Browser
8
NCBI Map Viewer
9
UCSC Genome Browser
10
What Distinguishes Ensembl from the UCSC and NCBI
Browsers?
  • The gene set. Automatic annotation based on mRNA
    and protein information.
  • Programmatic access via the Perl API (open
    source)
  • BioMart
  • Integration with other databases (DAS)
  • Comparative analysis (gene trees)

11
Challenges of genome browsers
  • Increasing sequence information

198,879,188,987 nt (Aug 2007)
12
Challenges of genome browsers
  • Increasing annotation ENCODE
  • Pilot project completed in 2007 1 of human
    genome
  • Discovered promoter elements are on either side
    of the transcription start site

13
To meet a challenge
  • Ensembls AIM To provide annotation for the
    biological community that is freely available and
    of high quality
  • Started in 1999
  • Joint project between EBI and Sanger
  • Funded primarily by the Wellcome Trust,
    additional funding by EMBL, NIH-NIAID, EU, BBSRC
    and MRC
  • Team of ca. 40 people, led by Ewan Birney (EBI)
    and Tim Hubbard (Sanger)

14
The Ensembl gene set
  • All Ensembl genes start from a known protein or
    mRNA
  • Sequence Ensembl
  • Assembly gene set

mRNAs protein
  • An initial alignment of protein and mRNA to the
    genome
  • begins the Genebuild.

15
Have you heard of
  • Ensembl strives for best possible gene set
  • www.ensembl.org
  • Havana (VEGA) same goal
  • http//vega.sanger.ac.uk
  • HGNC a unique name and symbol for every gene in
    human
  • http//www.genenames.org/
  • UniProt focus on proteins, and functional
    information
  • www.uniprot.org

16
Ensembl vs Havana annotation
  • All genes at once
  • (Ensembl Genebuild)
  • Quick, keeps current
  • Consistent annotation
  • Can apply rules to more
  • species
  • Gene by gene
  • (Havana/ VEGA)
  • Flexible, can deal with inconsistencies
  • Consult publications as well as databases
  • Out of the Ordinary Biology
  • However Slow, Expensive

17
Merging sets
  • Havana transcripts are incorporated into Ensembl
  • UniProt proteins are aligned to the genome in the
    Ensembl genebuild
  • UniProt imports Ensembl peptides for human
  • HGNC moved to Hinxton coordination

18
Consensus across genome browsers the CCDS
sethttp//www.ensembl.org/info/about/docs/ccds.ht
ml
  • A protein is deposited into the Consensus CDS
    protein set or CCDS set if
  • NCBI
  • UCSC
  • Havana
  • Ensembl
  • have determined the same sequence.

19
More about Ensembl
  • Genome browsing a comparison
  • Consensus genes
  • Ensembl annotation and software
  • How to find help

20
Ensembl Genes biological basis
All Ensembl gene predictions are based on
proteins and mRNAs in
  • UniProt/Swiss-Prot (manually curated)
  • UniProt/TrEMBL
  • NCBI RefSeq (manually curated)

Protein/ mRNA
Sequence Assembly
Ensembl Genes
21
Genes and Transcripts in Ensembl
  • Ensembl known genes or transcripts
  • Ensembl novel genes or transcripts
  • Ensembl EST genes or transcripts
  • Non-Ensembl genes
  • Imports for yeast, c. elegans, fly, mosquito,
    takifugu and tetraodon

22
Names in Ensembl
  • ENSG Ensembl Gene ID
  • ENST Ensembl Transcript ID
  • ENSP Ensembl Peptide ID
  • ENSE Ensembl Exon ID
  • For other species than human a suffix is added
  • MUS (Mus musculus) for mouse ENSMUSG
  • DAR (Danio rerio) for zebrafish ENSDARG,
    etc.

23
Gene Structure in Ensembl
Calmodulin Chicken
No UTRs
Calmodulin Human
UTRs annotated
24
What annotation is available?
  • Gene/transcript/peptide models (coding and
    noncoding (ncRNAs))
  • IDs in other database
  • Mapped cDNAs, peptides, micro array probes, BAC
    clones etc.
  • Cytogenetic bands, markers, repeats etc.
  • Comparative data
  • orthologues and paralogues, protein families,
    whole genome alignments, syntenic regions
  • Variation data
  • Single Nucleotide Polymorphisms (SNPs)
  • Regulatory data
  • best guess set of regulatory elements from
    ENCODE
  • Data from external sources (DAS)

25
Specific data sources
  • Microarrays (Affimetrix, Illumina, Agilent)
  • GO (Gene Ontology functional classes)
  • http//www.geneontology.org/
  • OMIM (human diseases and phenotypes)
  • http//www.ncbi.nlm.nih.gov/sites/entrez?db
    OMIM
  • Identifiers in Entrez, UniProt, Refseq, etc
  • PDB, MSD (structural databases)
  • http//www.rcsb.org/pdb/
  • http//www.ebi.ac.uk/msd/

26
Interpro
Collection of protein data Sequences, Motifs,
Structures
http//www.ebi.ac.uk/interpro/
27
How is this information organised?
  • Ensembl Views (Website)
  • Ensembl Database (open source)
  • (Perl API, FTP site)
  • BioMart DataMining tool

28
Ensembl Open Source
  • Data and software freely available
  • More than 50 installs worldwide
  • Academia and industry
  • Local or available via the web
  • Mirrors with Ensembl data, e.g.
    http//ensembl.genome.tugraz.at/index.html
  • http//ensembl.genomics.org.cn/
  • or user projects with own data

28 of 42
29
Powered by Ensembl
29 of 42
30
Help and Information
  • Use our helpdesk!
  • helpdesk_at_ensembl.org
  • View our help pages!
  • (the using Ensembl link)
  • View our animated tutorials
  • http//www.ensembl.org/common/Workshops_Online
  • Mailing lists
  • ensembl-announce_at_ebi.ac.uk
  • Come visit our blog!
  • http//ensembl.blogspot.com/

31
Ensembl Team
Write a Comment
User Comments (0)
About PowerShow.com