Data Mining in Ensembl with BioMart - PowerPoint PPT Presentation

About This Presentation
Title:

Data Mining in Ensembl with BioMart

Description:

BioMart is a search engine that can find multiple terms and put them into ... Vega genes. Variations. BioMart around the world... BioMart started at Ensembl... – PowerPoint PPT presentation

Number of Views:74
Avg rating:3.0/5.0
Slides: 35
Provided by: Giuliett4
Category:

less

Transcript and Presenter's Notes

Title: Data Mining in Ensembl with BioMart


1
Data Mining in Ensembl with BioMart
www.ensembl.org/biomart/martview www.biomart.org/b
iomart/martview
Nov, 2009
2
BioMart- Data mining
  • BioMart is a search engine that can find multiple
    terms and put them into a table format.
  • Such as mouse gene (IDs), chromosome and base
    pair position
  • No programming required!

3
General or Specific Data-Tables
  • All the genes for one species
  • Or only genes on one specific region of a
    chromosome
  • Or genes on one region of a chromosome
    associated with an InterPro domain

4
The First Step Choose the Dataset
Dataset Current Ensembl, Human genes
5
The Second Step Filters
Filters Define a gene set
6
Attributes attach information
Attributes Determine output columns
7
Results
Tables or sequences
8
Query
  • For the human CFTR gene, can I export the
    EntrezGene ID, and also, probes with this gene
    sequence from the Affy HG U133 Plus 2
    microarray platform?
  • In the query
  • Filters what we know
  • Attributes what we want to know.

9
Query
  • For the human CFTR gene, can I export the
    EntrezGene ID, and also, probes with this gene
    sequence from the Affy HG U133 Plus 2
    microarray platform?
  • In the query
  • Filters what we know
  • Attributes what we want to know.

10
Query
  • For the human CFTR gene, can I export the
    EntrezGene ID, and also, probes with this gene
    sequence from the Affy HG U133 Plus 2
    microarray platform?
  • In the query
  • Filters what we know
  • Attributes what we want to know (columns in
    the result table)

11
A Brief Example
Use the current Ensembl (archives are also
available)
Select Homo sapiens
12
Select the genes with Filters
Expand the REGION panel.
Click Filters
Expand the GENE panel to enter in the gene ID(s).
13
Filters
Change this to HGNC symbol. Enter CFTR in the
box.
Click Count to see if genes passed through your
filters.
14
Attributes (Output Options)
Expand the GENE section.
Click on Attributes
15
Attributes (Output Options)
Select Description and Associated Gene Name.
Expand the EXTERNAL panel for non-Ensembl IDs.
16
Attributes (Output)
.
External IDs include EntrezGene IDs and also
Microarray probe IDs.
17
The Results Table - Preview
For the full result table click Go or View
ALL rows.
Results show Description, Name, EntrezGene and
Probe matches from the Affy HG U133-Plus-2
platform.
18
Full Result Table
Affy HG probe
Gene Name
EntrezGene ID
Ensembl Gene and Transcript IDs
Description
19
Other Export Options (Attributes)
  • Sequences UTRs, flanking sequences, cDNA and
    peptides, etc
  • Gene IDs from Ensembl and external sources (MGI,
    Entrez, etc)
  • Microarray data
  • Protein Functions/descriptions (Interpro, GO)
  • Orthologous gene sets
  • SNP/ Variation Data

20
BioMart Data Sets
  • Ensembl genes
  • Vega genes
  • Variations

21
BioMart around the world
BioMart started at EnsemblTo where has it
travelled?
22
Central Portal
www.biomart.org
23
WormBase
24
HapMap
25
(No Transcript)
26
GRAMENE
www.gramene.org
27
The Potato Center
28
How to Get There
  • http//www.biomart.org/biomart/martview
  • http//www.ensembl.org/biomart/martview
  • Or click on BioMart from Ensembl

29
The Flow
  • Choose Dataset (All genes for a species)
  • Choose Filters (narrows the gene set)
  • Choose Attributes (output options)
  • Now Try the Worked Example on Page 23!

30
Ensembl Core Databases
  • Relational Database
  • Normalised
  • Each data point stored only once
  • Therefore
  • Quick updates
  • Minimal storage requirements
  • But
  • Many tables
  • Many joins for complicated queries
  • Slow for data mining applications

31
Normalised Schema
32
BioMart Database
  • Data warehouse
  • De-normalised
  • Query-optimised
  • Therefore
  • Fast and flexible
  • Ideal for data mining
  • But
  • Tables with apparent redundancy
  • Needs rebuilding from scratch for every release
    from normalised core databases

33
De-Normalised Schema
34
Information Flow
DATASET
FILTER
ATTRIBUTES
Write a Comment
User Comments (0)
About PowerShow.com