Title: Introduction to the BioMart API
1Introduction to theBioMart API
2BioMart APIs
- Biomart_plib - Objected Oriented Perl interface
3Biomart_plibArchitecture
- Object Oriented Perl Based API to BioMart
Datasets - Uses XML configuration shared by all BioMart
Software
4Query logic
5Configuration logic
6Initializing API script
- my confFile "/home/user/martRegistryFile"
- my initializer BioMartInitializer-gtnew(regis
tryFilegtconfFile) - my registry initializer-gtgetRegistry
- Optional Initializer parameters
- action gt clean - replace the dataset
configurations stored on the local file-system
with those from the database and build a new,
clean registry object - action gt update - replace any file-system
dataset configurations modified since the last
retrieval with the database copies and build a
new registry object - Default behaviour with no action specified is to
generate the registry object using the cached
file-system configurations if they exist,
otherwise retrieve them from the database.
7Initializing API script
- Optional Initializer parameters (cont)
- mode gt lazyload - only keep a certain number
of dataset configurations in memory at once for
low memory machines and future scalability - Default behaviour with no mode specified is to
keep all configurations in memory.
8Building Query
- my query BioMartQuery-gtnew(registry gt
registry - virtualSchemaName gt default)
- query-gtaddAttribute('hsapiens_gene_ensembl','ense
mbl_gene_id') - or with optional virtualSchema and interface
settings - query-gtaddAttribute('hsapiens_gene_ensembl','ense
mbl_gene_id, - default,default)
- query-gtaddFilter('hsapiens_gene_ensembl','chromos
ome_name','1') - query-gtaddFilter('hsapiens_gene_ensembl','hgnc_sy
mbol','FGFR1','IL2','DERL3')
9Executing query and printing results
- my query_runner BioMartQueryRunner-gtnew()
- query_runner-gtexecute(query)
- query_runner-gtprintResults
-
10Executing query and printing results
- Print formatted header
- query_runner-gtprintHeader
- Print just first 20 results
- query_runner-gtprintResults(20)
- Change the formatter from tab-separated default
before execute the query - query-gtformatter(FASTA)
- The formatter has to have a corresponding module
in lib/BioMart/Formatter implementing the
FormatterI.pm interface (eg) CSV, TXT, GTF, XLS
etc -
11Multi dataset queries
my query BioMartQuery-gtnew('registry'gtregis
try, 'virtualSchemaName'gt'default')
query-gtaddAttribute('hsapiens_gene_ensembl','ense
mbl_gene_id') query-gtaddAttribute('hsapiens_gene
_ensembl','ensembl_transcript_id') query-gtaddAtt
ribute('mmusculus_gene_ensembl','ensembl_gene_id')
query-gtaddAttribute('mmusculus_gene_ensembl','e
nsembl_transcript_id') This is the equivalent
of picking human as the main dataset in the web
interface and mouse as the optional second
dataset (ie) the human attributes appear first
in the result table followed by the mouse
attributes. Note that BioMart queries are
currently restricted to two datasets maximum for
performance reasons and query planning technical
difficulties.
12Web services type access
- To support GRID projects such as Taverna and
other third party users who want to federate mart
data without leaving a port to the database
server openly accessible.
13Web services type access
- http//test.biomart.org/cgi-bin/martservice?query
- lt?xml version"1.0" encoding"UTF-8"?gt
- lt!DOCTYPE Querygt
- ltQuery virtualSchemaName "defaultSchema"gt
- ltDataset name "hsapiens_gene_ensembl"gt
- ltAttribute name ensembl_gene_id" /gt
- ltAttribute name "chromosome_name"
/gt - ltValueFilter name "chromosome_name" value
"1"/gt - lt/Datasetgt
- lt/Querygt
14Web services type access
- Change format from default tab-separated format
- ltQuery virtualSchemaName "defaultSchema
formatter CSVgt - ltDataset name "hsapiens_gene_ensembl"gt
- ltAttribute name ensembl_gene_id" /gt
- ltAttribute name "chromosome_name"
/gt - ltValueFilter name "chromosome_name" value
"1"/gt - lt/Datasetgt
- lt/Querygt
15Web services type access
- Get count instead
- ltQuery virtualSchemaName "defaultSchema
count1gt - ltDataset name "hsapiens_gene_ensembl"gt
- ltAttribute name ensembl_gene_id" /gt
- ltAttribute name "chromosome_name"
/gt - ltValueFilter name "chromosome_name" value
"1"/gt - lt/Datasetgt
- lt/Querygt
16Web services type access
- Multi-dataset query
- ltQuery virtualSchemaName "defaultSchema"gt
- ltDataset name "mmusculus_gene_ensembl"gt
- ltValueFilter name "chromosome_name" value
"1"/gt - lt/Datasetgt
- ltDataset name "hsapiens_gene_ensembl"gt
- ltAttribute name ensembl_gene_id" /gt
- ltAttribute name "chromosome_name"
/gt - ltValueFilter name "chromosome_name" value
"1"/gt - lt/Datasetgt
- lt/Querygt
17Web services type access
(1) Recover the registry file http//test.biomar
t.org/cgi-bin/martservice?typeregistry (2)
Recover the datasets available for a mart
http//test.biomart.org/cgi-bin/martservice?typed
atasets virtualSchemadefaultmartensembl (3)
Recover the filters available for a
dataset http//test.biomart.org/cgi-bin/martserv
ice?typefilters virtualSchemadefaultdataseth
sapiens_gene_ensembl (4) Recover the attributes
available for a dataset http//test.biomart.org/
cgi-bin/martservice?typeattributes virtualSchem
adefaultdatasethsapiens_gene_ensembl
18MartJ
- Java Interface to Biomart Datasets
- Uses XML configuration shared by all BioMart
Software
19RegistryDSConfigAdaptor
- import org.ensembl.mart.lib.config.RegistryDSCon
figAdaptor -
- URL confURL null
- try
- confURL
- InputSourceUtil.getURLFor
String(data/defaultMartRegistry.xml) - catch (MalformedURLException
e) - throw new
ConfigurationException("Warning, could not load "
-
data/defaultMartRegistry.xml - " file\n")
-
- RegistryDSConfigAdaptor adaptor
- new RegistryDSConfigAdaptor(
confURL, false, false, false)
20DatasetConfig
- import org.ensembl.mart.lib.config.DatasetConfig
-
- DatasetConfig config
- adaptor.getDatasetConfigByDatasetInte
rnalName( - "hsapiens_gene_ensembl",
- "default"
- )
21Query
- import org.ensembl.mart.lib.Query
- Query query new Query()
- //query needs some information from the
DatasetConfig - query.setDataSource(config.getAdaptor().getDat
aSource()) - query.setMainTables(config.getStarBases())
- query.setPrimaryKeys(config.getPrimaryKeys())
22FieldAttribute/AttributeDescription
- Import org.ensembl.mart.lib.config.AttributeDescri
ption - import org.ensembl.mart.lib.FieldAttribute
- AttributeDescription adesc
- config.getAttributeDescriptionByInternalName(
"gene_stable_id") -
- query.addAttribute(new FieldAttribute(
adesc.getField(), -
adesc.getTableConstraint(), -
adesc.getKey() -
) - )
23Filter/FilterDescription
- There are three types of Filter that can be added
to the query, both are created using the
attributes of a FilterDescription - A. BasicFilter
- B. BooleanFilter (but watch for the two boolean
'flavors') - C. IDListFilter
24FilterDescription
- import org.ensembl.mart.lib.config.FilterDescripti
on - FilterDescription fdesc
- config.getFilterDescriptionByInternalName(
chr_name)
25BasicFilter
- import org.ensembl.mart.lib.BasicFilter
- //The config system actually masks alot of
complexity - //with regard to filters by requiring the
internalName - //again when calling the getXXX methods
- query.addFilter(new BasicFilter(
fdesc.getField(name), -
fdesc.getTableConstraint(name), -
fdesc.getKey(name), -
"", -
"22" -
) - )
26BooleanFilter
- import org.ensembl.mart.lib.BooleanFilter
- //note there are different types of
BooleanFilter - //"boolean" and "boolean_num"
- if (fdesc.getType(name).equals("boolean"))
- query.addFilter(new BooleanFilter(
fdesc.getField(name), -
fdesc.getTableConstraint(name), -
fdesc.getKey(name), -
BooleanFilter.isNULL -
) - )
- else //boolean_num
- query.addFilter(new BooleanFilter(
fdesc.getField(name), -
fdesc.getTableConstraint(name), -
fdesc.getKey(name), -
BooleanFilter.isNotNULL_NUM -
) - )
27IDListFilter
- import org.ensembl.mart.lib.IDListFilter
- String ids new String
ENSG00000146556.4, -
ENSG00000197194.1, -
ENSG00000197490.1, -
ENSG00000177693.1 -
- query.addFilter(new IDListFilter(
fdesc.getField(name), -
fdesc.getTableConstraint(name), -
fdesc.getKey(name), -
ids -
) - )
28Engine
- import org.ensembl.mart.lib.Engine
- import org.ensembl.mart.lib.FormatSpec
- Engine engine new Engine()
- engine.execute(
- query,
- new FormatSpec(FormatSpec
.TABULATED, "\t"), - System.out
- )
29Future of MartJ
- In the future, MartJ will be refactored to use
the more flexible - Architecture that we developed for the perl based
software.