BioMart Query Network - PowerPoint PPT Presentation

1 / 40
About This Presentation
Title:

BioMart Query Network

Description:

Vega. Ensembl. UniProt. myMart. MSD. BioMart API. JAVA. Perl ... Vega. dbSNP. Uniprot. MSD. Variety of small projects. In development. ArrayExpress. Wormbase ... – PowerPoint PPT presentation

Number of Views:27
Avg rating:3.0/5.0
Slides: 41
Provided by: arek
Category:
Tags: biomart | main1 | msd | network | query | value1 | vega

less

Transcript and Presenter's Notes

Title: BioMart Query Network


1
BioMart Query Network
Arek Kasprzyk European Bioinformatics Institute 8
January 2005
2
Biological databases
  • Distributed
  • Different format
  • Different focus
  • Different release schedule
  • Scalability factor

3
(No Transcript)
4
BioMart
5
(No Transcript)
6
MartView
7
BioMart_at_Ensembl
8
MartShell
9
MartExplorer
10
Database
11
Schema
12
Schema
FK
FK
FK
FK
PK
FK
FK
FK
FK
13
Schema
FK
FK
FK
FK
14
Schema - reversed star
FK1
FK1
main1
dm
dm
PK1
FK1 FK2
FK1 FK2
FK2
FK2
PK2 FK1
2
dm
PK2 PK1
FK2
FK2
15
Fixed schema transformation
16
Schema transformation
  • Central table
  • Longest n1, 11 path
  • Dimension table
  • Central transformation around 1n table.
  • Link tables are decomposed into a set of 1n first

17
MartBuilder
  • Input
  • central object
  • database meta data
  • cardinalities
  • Output
  • Set of SQL statements
  • create table as select
  • Transformations
  • represented as asymmetric tree

18
MartBuilder
DATASET hsapiens_gene_ensembl TYPE MAIN M
DIMENSION D EXIT E M TABLE NAME gene gene
alt_allele cardinality 11 n1 0n 1n SKIP
S S gene gene cardinality 11 n1 0n 1n
SKIP S S gene gene_description cardinality
11 n1 0n 1n SKIP S 11 gene
gene_stable_id cardinality 11 n1 0n 1n
SKIP S 11 gene kk__gene__main cardinality
11 n1 0n 1n SKIP S S gene transcript
cardinality 11 n1 0n 1n SKIP S S gene
analysis cardinality 11 n1 0n 1n SKIP
S n1 gene dna cardinality 11 n1 0n 1n
SKIP S S gene dnac cardinality 11 n1 0n
1n SKIP S S gene seq_region cardinality
11 n1 0n 1n SKIP S S TYPE MAIN M
DIMENSION D EXIT E E ADD EXTENSION
hsapiens_gene_ensembl__gene__MAIN YN N CHANGE
FINAL TABLE NAME hsapiens_gene_ensembl__gene__MAI
N TO CREATE TABLE TEMP0 as SELECT
gene.gene_id,gene.type,gene.analysis_id,gene.seq_r
egion_id,gene.seq_region_start,gene.seq_region_end
,gene.seq_region_strand,gene.display_xref_id,gene_
description.gene_id AS gene_id_TEMP0,gene_descript
ion.description FROM gene, gene_description WHERE
gene_description.gene_id gene.gene_id CREATE
TABLE hsapiens_gene_ensembl__gene__MAIN as SELECT
TEMP0.gene_id,TEMP0.type,TEMP0.analysis_id,TEMP0.s
eq_region_id,TEMP0.seq_region_start,TEMP0.seq_regi
on_end,TEMP0.seq_region_strand,TEMP0.display_xref_
id,TEMP0.gene_id_TEMP0,TEMP0.description,gene_stab
le_id.gene_id AS gene_id_TEMP1,gene_stable_id.stab
le_id,gene_stable_id.version FROM TEMP0,
gene_stable_id WHERE gene_stable_id.gene_id
TEMP0.gene_id drop table TEMP0
19
Transformation configuration
satellog_repeats M repeats disease
n1 satellog_repeats M repeats gc
11 satellog_repeats M repeats
linkage_depth S satellog_repeats M
repeats repeats S satellog_repeats M
repeats transcripts S satellog_repeats
M repeats ugcount S satellog_repeats
M repeats ugstats S satellog_repeats
M repeats rep_class
n1 satellog_repeats D ugcount
ugcount S satellog_repeats D ugcount
ugstats S satellog_repeats D ugcount
gc S satellog_repeats D ugcount
repeats n1r
20
Data access
21
Dataset Key Abstraction
  • Dataset
  • Organised into a single schema
  • BioMart database contains one or more dataset(s)
  • Attribute
  • Filter
  • Exportable/Importable (Links)
  • Dataset - an equivalent of relational table
  • Exportable/Importable PK/FK

22
Key Abstractions
23
Exportables, Importables and Links
  • Exportable ordered list of attributes
  • Importable ordered list of filters
  • WHERE filt1value1
  • WHERE filt1value1 or filt1value2
  • WHERE filt1gtvalue1 and filt2ltvalue2
  • Links matching importable and exportable

24
MartView
25
Dataset Configuration
  • Dataset configuration
  • Attributes
  • Filters
  • Trees, Groups, Collections
  • Links
  • Semantics
  • Relational mapping
  • User interface
  • Linking datasets
  • XML-based

26
Dataset Configuration
27
Table naming conventionNaïve configuration
  • Tables
  • Meta tables meta_content
  • Data tables dataset__content__type
  • Data tables
  • Main __main
  • Dimension __dm
  • Columns
  • Key _key
  • Boolean filter _bool
  • List filter _list

28
MartEditor
29
MartEditor
  • Naïve configuration
  • Updates
  • Links
  • Automatic discovery of new tables

30
Class diagram - configuration
31
Class diagram - querying
32
Information flow
  • Read connections
  • Register individual datasets and create linked
    datasets
  • Get input from the user, split queries to
    individual datasets.
  • Find the shortest path between datasets
    (Dijikstra)
  • Compile SQL

33
Summary
34
BioMart
  • Domain independent
  • Platform independent
  • MySQL 4
  • Oracle 9i
  • Plugin architecture

35
BioMart model
  • Already applied
  • Ensembl
  • Vega
  • dbSNP
  • Uniprot
  • MSD
  • Variety of small projects
  • In development
  • ArrayExpress
  • Wormbase
  • RGD

36
Future work
  • BioMart v 0.2 to be released later on in january
  • Java library to be upgraded over coming months to
    the new architecture
  • BioMart has been integrated with Taverna
  • MartBuilder - to be properly implemented

37
BioMart
  • www.ebi.ac.uk/biomart
  • Open source (LGPL)
  • Public MySQL server
  • ftp
  • mart-dev_at_ebi.ac.uk
  • mart-announce_at_ebi.ac.uk

38
Acknowledgments
  • BioMart
  • Damian Smedley
  • Darin London
  • Contributors
  • Arne Stabenau (Ensembl)
  • Andreas Kahari (Ensembl)
  • Craig Melsopp (Ensembl)
  • Katerina Tzouvara (Uniprot)
  • Paul Donlon (Unilever)
  • Will Spooner (CSHL)

39
(No Transcript)
40
(No Transcript)
Write a Comment
User Comments (0)
About PowerShow.com