Data Mining in Ensembl with EnsMart - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining in Ensembl with EnsMart

Description:

Data Mining in Ensembl with EnsMart Possible queries All genes from a candidate region Genes with a particular protein domain Members of a protein family Genes ... – PowerPoint PPT presentation

Number of Views:159
Avg rating:3.0/5.0
Slides: 18
Provided by: XosM
Category:
Tags: ensmart | data | ensembl | mart | mining

less

Transcript and Presenter's Notes

Title: Data Mining in Ensembl with EnsMart


1
Data Mining in Ensembl with EnsMart
2
Possible queries
  • All genes from a candidate region
  • Genes with a particular protein domain
  • Members of a protein family
  • Genes associated with SNPs

3
Specific queries
  • Disease related genes between markers D10S255 and
    D10S259
  • Transmembrane proteins with an Ig-MHC domain
    (IPR003006) on chromosome 2
  • Genes with associated coding SNPs on chromosomal
    band 5q35.3
  • Mouse homologues for human disease genes.

4
More specific queries
  • Human genes with upstream regions conserved
    w.r.t. mouse
  • Upstream sequence for all Ensembl genes mapped to
    U95A chip (similarly, complete genomic annotation
    of MG_U74).
  • Genomic location and description of all mouse,
    rat and fugu homologues of all human genes, with
    transmembrane domains, expressed in
    cardiovascular system and have non-synonymous
    SNPs.

5
EnsMart vertical and horizontal data integration
Ensembl Genes
SNPs
EST Genes
Vega Genes
6
Ensembl data sets
Genes EST Markers Diseases Protein
Annotation SNPs Homology Expression
7
EnsMart
  • Data retrieval tool
  • Query builder interface
  • Gene or SNP lists
  • Associated features or sequences
  • Various output formats

8
Information flow
start
filter
output
9
Species and focus
10
Restrict your query
11
Restrict your query
12
Select output options
13
Select output options
14
Output formats
15
Obtaining sequences
16
Ensembl core database
  • Normalised
  • Each data point stored only once
  • Quick updates
  • Minimal storage requirements
  • But
  • Many tables
  • Many joins for complicated queries
  • Slow for data mining questions

17
Mart database
  • De-normalised
  • Tables with redundant information
  • Query-optimised
  • Fast and flexible
  • Ideal for data mining

18
Acknowledgements
  • Mart database
  • Arek Kasprzyk
  • Damian Keefe
  • Damian Smedley
  • Darin London
  • Craig Meslopp
  • User interface (MartView)
  • Will Spooner
  • Data and general support
  • The entire Ensembl team
Write a Comment
User Comments (0)
About PowerShow.com