Introduction to the BioMart API - PowerPoint PPT Presentation

About This Presentation
Title:

Introduction to the BioMart API

Description:

Object Oriented Perl Based API to BioMart Datasets ... To support GRID projects such as Taverna and other third party users who want to ... – PowerPoint PPT presentation

Number of Views:25
Avg rating:3.0/5.0
Slides: 30
Provided by: extra4
Category:

less

Transcript and Presenter's Notes

Title: Introduction to the BioMart API


1
Introduction to theBioMart API
2
BioMart APIs
  • Biomart_plib - Objected Oriented Perl interface

3
Biomart_plibArchitecture
  • Object Oriented Perl Based API to BioMart
    Datasets
  • Uses XML configuration shared by all BioMart
    Software

4
Query logic
5
Configuration logic
6
Initializing API script
  • my confFile "/home/user/martRegistryFile"
  • my initializer BioMartInitializer-gtnew(regis
    tryFilegtconfFile)
  • my registry initializer-gtgetRegistry
  • Optional Initializer parameters
  • action gt clean - replace the dataset
    configurations stored on the local file-system
    with those from the database and build a new,
    clean registry object
  • action gt update - replace any file-system
    dataset configurations modified since the last
    retrieval with the database copies and build a
    new registry object
  • Default behaviour with no action specified is to
    generate the registry object using the cached
    file-system configurations if they exist,
    otherwise retrieve them from the database.

7
Initializing API script
  • Optional Initializer parameters (cont)
  • mode gt lazyload - only keep a certain number
    of dataset configurations in memory at once for
    low memory machines and future scalability
  • Default behaviour with no mode specified is to
    keep all configurations in memory.

8
Building Query
  • my query BioMartQuery-gtnew(registry gt
    registry
  • virtualSchemaName gt default)
  • query-gtaddAttribute('hsapiens_gene_ensembl','ense
    mbl_gene_id')
  • or with optional virtualSchema and interface
    settings
  • query-gtaddAttribute('hsapiens_gene_ensembl','ense
    mbl_gene_id,
  • default,default)
  • query-gtaddFilter('hsapiens_gene_ensembl','chromos
    ome_name','1')
  • query-gtaddFilter('hsapiens_gene_ensembl','hgnc_sy
    mbol','FGFR1','IL2','DERL3')

9
Executing query and printing results
  • my query_runner BioMartQueryRunner-gtnew()
  • query_runner-gtexecute(query)
  • query_runner-gtprintResults

10
Executing query and printing results
  • Print formatted header
  • query_runner-gtprintHeader
  • Print just first 20 results
  • query_runner-gtprintResults(20)
  • Change the formatter from tab-separated default
    before execute the query
  • query-gtformatter(FASTA)
  • The formatter has to have a corresponding module
    in lib/BioMart/Formatter implementing the
    FormatterI.pm interface (eg) CSV, TXT, GTF, XLS
    etc

11
Multi dataset queries
my query BioMartQuery-gtnew('registry'gtregis
try, 'virtualSchemaName'gt'default')
query-gtaddAttribute('hsapiens_gene_ensembl','ense
mbl_gene_id') query-gtaddAttribute('hsapiens_gene
_ensembl','ensembl_transcript_id') query-gtaddAtt
ribute('mmusculus_gene_ensembl','ensembl_gene_id')
query-gtaddAttribute('mmusculus_gene_ensembl','e
nsembl_transcript_id') This is the equivalent
of picking human as the main dataset in the web
interface and mouse as the optional second
dataset (ie) the human attributes appear first
in the result table followed by the mouse
attributes. Note that BioMart queries are
currently restricted to two datasets maximum for
performance reasons and query planning technical
difficulties.
12
Web services type access
  • To support GRID projects such as Taverna and
    other third party users who want to federate mart
    data without leaving a port to the database
    server openly accessible.

13
Web services type access
  • http//test.biomart.org/cgi-bin/martservice?query
  • lt?xml version"1.0" encoding"UTF-8"?gt
  • lt!DOCTYPE Querygt
  • ltQuery virtualSchemaName "defaultSchema"gt
  • ltDataset name "hsapiens_gene_ensembl"gt
  • ltAttribute name ensembl_gene_id" /gt
  • ltAttribute name "chromosome_name"
    /gt
  • ltValueFilter name "chromosome_name" value
    "1"/gt
  • lt/Datasetgt
  • lt/Querygt

14
Web services type access
  • Change format from default tab-separated format
  • ltQuery virtualSchemaName "defaultSchema
    formatter CSVgt
  • ltDataset name "hsapiens_gene_ensembl"gt
  • ltAttribute name ensembl_gene_id" /gt
  • ltAttribute name "chromosome_name"
    /gt
  • ltValueFilter name "chromosome_name" value
    "1"/gt
  • lt/Datasetgt
  • lt/Querygt

15
Web services type access
  • Get count instead
  • ltQuery virtualSchemaName "defaultSchema
    count1gt
  • ltDataset name "hsapiens_gene_ensembl"gt
  • ltAttribute name ensembl_gene_id" /gt
  • ltAttribute name "chromosome_name"
    /gt
  • ltValueFilter name "chromosome_name" value
    "1"/gt
  • lt/Datasetgt
  • lt/Querygt

16
Web services type access
  • Multi-dataset query
  • ltQuery virtualSchemaName "defaultSchema"gt
  • ltDataset name "mmusculus_gene_ensembl"gt
  • ltValueFilter name "chromosome_name" value
    "1"/gt
  • lt/Datasetgt
  • ltDataset name "hsapiens_gene_ensembl"gt
  • ltAttribute name ensembl_gene_id" /gt
  • ltAttribute name "chromosome_name"
    /gt
  • ltValueFilter name "chromosome_name" value
    "1"/gt
  • lt/Datasetgt
  • lt/Querygt

17
Web services type access
(1) Recover the registry file http//test.biomar
t.org/cgi-bin/martservice?typeregistry (2)
Recover the datasets available for a mart
http//test.biomart.org/cgi-bin/martservice?typed
atasets virtualSchemadefaultmartensembl (3)
Recover the filters available for a
dataset http//test.biomart.org/cgi-bin/martserv
ice?typefilters virtualSchemadefaultdataseth
sapiens_gene_ensembl (4) Recover the attributes
available for a dataset http//test.biomart.org/
cgi-bin/martservice?typeattributes virtualSchem
adefaultdatasethsapiens_gene_ensembl
18
MartJ
  • Java Interface to Biomart Datasets
  • Uses XML configuration shared by all BioMart
    Software

19
RegistryDSConfigAdaptor
  • import org.ensembl.mart.lib.config.RegistryDSCon
    figAdaptor
  • URL confURL null
  • try
  • confURL
  • InputSourceUtil.getURLFor
    String(data/defaultMartRegistry.xml)
  • catch (MalformedURLException
    e)
  • throw new
    ConfigurationException("Warning, could not load "

  • data/defaultMartRegistry.xml
  • " file\n")
  • RegistryDSConfigAdaptor adaptor
  • new RegistryDSConfigAdaptor(
    confURL, false, false, false)

20
DatasetConfig
  • import org.ensembl.mart.lib.config.DatasetConfig
  • DatasetConfig config
  • adaptor.getDatasetConfigByDatasetInte
    rnalName(
  • "hsapiens_gene_ensembl",
  • "default"
  • )

21
Query
  • import org.ensembl.mart.lib.Query
  • Query query new Query()
  • //query needs some information from the
    DatasetConfig
  • query.setDataSource(config.getAdaptor().getDat
    aSource())
  • query.setMainTables(config.getStarBases())
  • query.setPrimaryKeys(config.getPrimaryKeys())

22
FieldAttribute/AttributeDescription
  • Import org.ensembl.mart.lib.config.AttributeDescri
    ption
  • import org.ensembl.mart.lib.FieldAttribute
  • AttributeDescription adesc
  • config.getAttributeDescriptionByInternalName(
    "gene_stable_id")
  • query.addAttribute(new FieldAttribute(
    adesc.getField(),

  • adesc.getTableConstraint(),

  • adesc.getKey()

  • )
  • )

23
Filter/FilterDescription
  • There are three types of Filter that can be added
    to the query, both are created using the
    attributes of a FilterDescription
  • A. BasicFilter
  • B. BooleanFilter (but watch for the two boolean
    'flavors')
  • C. IDListFilter

24
FilterDescription
  • import org.ensembl.mart.lib.config.FilterDescripti
    on
  • FilterDescription fdesc
  • config.getFilterDescriptionByInternalName(
    chr_name)

25
BasicFilter
  • import org.ensembl.mart.lib.BasicFilter
  • //The config system actually masks alot of
    complexity
  • //with regard to filters by requiring the
    internalName
  • //again when calling the getXXX methods
  • query.addFilter(new BasicFilter(
    fdesc.getField(name),

  • fdesc.getTableConstraint(name),

  • fdesc.getKey(name),

  • "",

  • "22"

  • )
  • )

26
BooleanFilter
  • import org.ensembl.mart.lib.BooleanFilter
  • //note there are different types of
    BooleanFilter
  • //"boolean" and "boolean_num"
  • if (fdesc.getType(name).equals("boolean"))
  • query.addFilter(new BooleanFilter(
    fdesc.getField(name),

  • fdesc.getTableConstraint(name),

  • fdesc.getKey(name),

  • BooleanFilter.isNULL

  • )
  • )
  • else //boolean_num
  • query.addFilter(new BooleanFilter(
    fdesc.getField(name),

  • fdesc.getTableConstraint(name),

  • fdesc.getKey(name),

  • BooleanFilter.isNotNULL_NUM

  • )
  • )

27
IDListFilter
  • import org.ensembl.mart.lib.IDListFilter
  • String ids new String
    ENSG00000146556.4,

  • ENSG00000197194.1,

  • ENSG00000197490.1,

  • ENSG00000177693.1

  • query.addFilter(new IDListFilter(
    fdesc.getField(name),

  • fdesc.getTableConstraint(name),

  • fdesc.getKey(name),

  • ids

  • )
  • )

28
Engine
  • import org.ensembl.mart.lib.Engine
  • import org.ensembl.mart.lib.FormatSpec
  • Engine engine new Engine()
  • engine.execute(
  • query,
  • new FormatSpec(FormatSpec
    .TABULATED, "\t"),
  • System.out
  • )

29
Future of MartJ
  • In the future, MartJ will be refactored to use
    the more flexible
  • Architecture that we developed for the perl based
    software.
Write a Comment
User Comments (0)
About PowerShow.com