Title: Harnessing the Power
1Harnessing the Power Of communities MOBY
Beyond
Mark Wilkinson PI Bioinformatics iCAPTURE
Centrefor Cardiovascularand Pulmonary
Research Assistant Professor Dept. of Medical
Genetics UBC, Vancouver
2A brief history of BioMoby
- Model Organism Bring Your own Database Interface
Conference, Sept, 2001 (MOBY-DIC) - May 21, 2002 Genome Canada Platform Award
- May 25, 2002 API Version 0.1 deployed,
including the messaging layer that still exists
today - July 18, 2002 first Moby Client released (now
gbrowse_moby, part of gbrowse from GMOD) - June 9, 2003 API Version 0.5 deployed
- Currently, the API is at version 0.86 version
1.0 API in preparation for release end of November
3What does BioMoby do?
4- Create an ontology of bioinformatics data-types
- Define a serialization of this ontology (data
syntax) - Create an open API over this ontology
- Define Web Service inputs and outputs v.v.
Ontology - Register Services in an ontology-aware Registry
- Machines can find an appropriate service
- Machines can execute that service unattended
- Ontology is community-extensible
The MOBY-S Plan
5Overview of MOBY-S Transactions
6MOBY-S in detail
- MOBY-S Data typing system Semantic Type
- MOBY-S Data typing system Syntactic Type
7MOBY-S in detail
- MOBY-S Data typing system Semantic Type
- MOBY-S Data typing system Syntactic Type
8Moby Namespaces (from GO)
- Any identifiable piece of data is an entity
- Identifiers for these entities fall under
Namespaces - NCBI has gi numbers (gi Namespace)
- GO Terms have accession numbers (GO Namespace)
- Namespaces indicate datas semantic type.
- GO0003476 ? a Gene Ontology Term
- gi163483 ? a GenBank record
- Namespace ID precisely specifies a data
entity - This differs from an LSID in that our identifiers
ARE NOT OPAQUE they are semantically rich
9MOBY-S in detail
- MOBY-S Data typing system Semantic Type
- MOBY-S Data typing system Syntactic Type
10The MOBY-S Object Ontology
- Syntactic types are defined by a GO-like ontology
- Data Class name at each node
- Edges define the relationships between Classes
- GO used as a model because of its familiarity in
the community - Edges define one of three relationships
- IS A
- Inheritance relationship
- All properties of the parent are present in the
child - HAS A
- Container relationship of exactly 1
- HAS
- Container relationship with 1 or more
node
Edge
node
11The Simplest Moby Data-Type
ltObject namespaceNCBI_gi id111076/gt
The combination of a namespace and an identifier
within that namespace uniquely identify a data
entity, not its location(s), nor its
representation
Object
12A Primitive Data-type
DateTime
ISA
Float
ISA
Integer
ISA
ltInteger namespace idgt38lt/Integergt
Object
String
ISA
13A Derived Data-Type
ltVirtualSequence namespaceNCBI_gi
id111076gt ltInteger namespace id
articleNamelengthgt38lt/Integergt lt/
VirtualSequence gt
ISA
Integer
HASA
ISA
Object
String
Virtual Sequence
ISA
14A Derived Data-Type
ltGenericSequence namespaceNCBI_gi
id111076gt ltInteger namespace id
articleNamelengthgt38lt/Integergt ltString
namespace id articleNameSequenceStringgt
ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC
lt/Stringgt lt/ GenericSequence gt
ISA
Integer
HASA
HASA
ISA
Object
String
ISA
Virtual Sequence
ISA
Generic Sequence
15A Derived Data-Type
ltDNASequence namespaceNCBI_gi id111076gt
ltInteger namespace id articleNamelength
gt38lt/Integergt ltString namespace id
articleNameSequenceStringgt ATGATGATAGATAGAGGGC
CCGGCGCGCGCGCGCGC lt/Stringgt lt/ DNASequence
gt
ISA
Integer
HASA
HASA
ISA
Object
String
ISA
ISA
Virtual Sequence
ISA
Generic Sequence
DNA Sequence
16Legacy file formats
- Containing String allows us to define
ontological classes that represent legacy data
types (e.g. the 20 existing sequence formats!)
ltNCBI_Blast_Report namespaceNCBI_gi
id115325gt ltString namespace id
articleNamecontentgt TBLASTN 2.0.4
Feb-24-1998 Reference Altschul, Stephen F.,
Thomas L. Madden, Alejandro A. Schaumlffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and
PSI-BLAST a new generation of protein database
search programs", Nucleic Acids Res.
253389-3402. Query gi1401126 (504
letters) Database Non-redundant
GenBankEMBLDDBJPDB sequences
336,723 sequences 677,679,054 total
letters Searchingdone
Score
E Sequences producing significant alignments
(bits) Value gbU49928HSU49
928 Homo sapiens TAK1 binding protein (TAB1)
mRNA... 1009 0.0 embZ36985PTPP2CMR
P.tetraurelia mRNA for protein phosphatase t...
58 4e-07 embX77116ATMRABI1 A.thaliana mRNA
for ABI1 protein 53
1e-05 lt/Stringgt lt/NCBI_Blast_Reportgt
17Binaries pictures, movies
- We base64 encode binaries, and then define a
hierarchy of data classes that Contain String - base64_encoded_jpeg ISA text/base64 ISA
text/plain HASA String
ltbase64_encoded_jpeg namespaceTAIR_image
id3343532gt ltString namespace id
articleNamecontentgt MIAGCSqGSIb3DQEHAqCAMIACAQE
xCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4w
ggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1
UEBhMCWkExFTATBgNV MIAGCSqGSIb3DQEHAqCAMIACAQExCzA
JBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJn
oAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBh
MCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxM
JQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsT
FENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb2
5hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTI
xMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf MB0GA1UEAxMWVGhh
d3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQam
prM0Bt lt/Stringgt lt/base64_encoded_jpeggt
18Extending legacy data types
- With legacy data-types defined, we can extend
them as we see fit - annotated_jpeg ISA base64_encoded_jpeg
- annotated_jpeg HASA 2D_Coordinate_set
- annotated_jpeg HASA Description
ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set namespace
id articleNamepixelCoordinatesgt
ltInteger namespace id articleNamex_coordin
ategt3554lt/Integergt ltInteger
namespace id articleNamey_coordinategt663lt
/Integergt lt/2D_Coordinate_setgt ltString
namespace id articleNameDescriptiongt Th
is is the phenotype of a ufo-1 mutant under long
daylength, 16C lt/Stringgt ltString namespace
id articleNamecontentgt MIAGCSqGSIb3DQEHAqCA
MIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQD
CC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjE
LMAkGA1UEBhMCWkExFTATBgNV lt/Stringgt lt/annotated_jp
eggt
19The same object
annotated_jpeg ISA base64_encoded_jpeg HASA
2D_Coordinate_set HASA Description
ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set
namespace id articleNamepixelCoordin
atesgt ltInteger namespace id
articleNamex_coordinategt 3554 lt/Integergt
ltInteger namespace id
articleNamey_coordinategt 663 lt/Integergt
lt/2D_Coordinate_setgt ltString
namespace id articleNameDescription
gt This is the phenotype of a ufo-1 mutant under
long daylength, 16C lt/Stringgt
ltString namespace id
articleNamecontentgt MIAGCSqGSIb3DQEHAqCAMIACAQE
xCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4w
ggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1
UEBhMCWkExFTATBgNV lt/Stringgt lt/annotated_jp
eggt
20The same object
annotated_jpeg ISA base64_encoded_jpeg HASA
2D_Coordinate_set HASA Description
ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set
namespace id articleNamepixelCoordinatesgt
ltInteger namespace id
articleNamex_coordinategt 3554 lt/Integergt
ltInteger namespace id
articleNamey_coordinategt 663 lt/Integergt
lt/2D_Coordinate_setgt ltString namespace
id articleNameDescriptiongt This is the
phenotype of a ufo-1 mutant under long daylength,
16C lt/Stringgt MIAGCSqGSIb3DQEHAqCAMIACAQEx
CzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wg
gJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U
EBhMCWkExFTATBgNV lt/annotated_jpeggt
21How to think about MOBY Objects and Namespaces
22Why define Objects in an ontology?
- Bioinformatics service providers are not all
experienced programmers - The Moby Object Ontology provides an environment
within which naïve service providers can create
new complex data-types WITHOUT generating new
flatfile formats, and without having to
understand XML Schema - Minimize future heterogeneity between new
data-types to improve interoperability without
requiring endless schema-to-schema mapping
efforts.
23The Object Ontology Defines an XML Schema
- Object Ontology terms have meaningful names,
but this is for human intuition only - DNA Sequence, Annotated_GIF
- Object Ontology does not define the biological
meaning, however it does define how every XML tag
should be interpreted, therefore superior to pure
XML/XML-Schema solutions - It does define the representation
- SYNTAX
24The Object Ontology Defines an XML Schema
- The position of an ontology node precisely
defines the syntax by which that node will be
represented - End-users can define new data-types without
having to write XML Schema! - This was an important aim of the project
- A machine can understand the structure of any
incoming message by querying its ontological type
25A portion of the MOBY-S Object Ontology communit
y-built!
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35Pipeline discovery on the fly
- No explicit coordination between providers
- Run-time discovery of appropriate Services
- Automated execution of services
- This is happening without semantics
- Syntax only well almost -)
36Conclusions from the Behaviourof this Simple
Browser
- Service discovery is a semantic problem
- However interoperability is not
- Data integration is still a problem both
syntactic and semantic - and weve just made that
problem worse! - SYNTAX IS NOT THE PROBLEM!!!!
37- Some political details about BioMoby as we are
coming to the end of the current Genome Canada
funding period and are trying to get renewal
hint, hint, if there are any GC external
reviewers in the audience! ?
38Moby Breadth
- Namespaces (semantic datatypes) 236
- Objects (data syntaxes) 161
- Service Types (analytical categories) 18
- Service Instances 401 ( 200 Soaplab)
- Hundreds more in boutique Moby registries
serving specialized communities worldwide - All continents except Antarctica host Moby
services
39Moby Impact
- Mailing list count 175 members (84 on developers
mailing list) - Google Scholar
- BioMOBY 147
- Citations of 2002 BioMOBY paper 72
40Moby Developer Activity
- MOBY-DIC Chapter 7 meeting
- Vancouver, May 6-8, 2005
- 23 Developers attending
- Asia
- USA
- Canada
- Germany
- Spain
- France
- Mapped-out the route to the final 1.0 version of
the API
41Moby Registry Activity
PlaNet implements own MOBY Central
42Moby Exemplar Users
- PlaNet consortium (7 sites, 100-130 services)
- EBI SOAPLAB myGrid
- Generation Challenge Programme of the CGIAR (18
sites) - Genome Espania uses MOBY for much of the
bioinformatics service provision in the GE
Bioinformatics Platform
43Moby Clients
- Gbrowse_moby (M Wilkinson)
- Browser-style client
- Ahab Ishmael (B Good, M Wilkinson)
- BLAST Semantic Web style clients
- PlaNet Locus_View (H Schoof, R Ernst)
- Aggregator-style client
- Blue-Jay (P Gordon) and Rat Genome Database
prototype (S Twigger) - Menu-style clients
- MOBY Graphs (M Senger)
- Auto-workflow discovery tool
- Taverna (T Oinn, M Senger, E Kawas), and MOWserv
(INB, Spain) - Workflow builder/publisher/execution client
- Enhanced support for MOBY currently being built
- Eclipse plugins etc
44Taverna Workbench Tom Oinn and Martin
Senger myGrid Project
45MOWServ Web interface to the Spanish Instituto
Nacional de Bioinformatica MOBY Central
installation
46INB CollaborationMOBY Enhancements
- The INB has made several additions to the MOBY
API - Detailed error reporting
- Asynchronous service invocation
- These will become part of the official API in the
coming year.
47Future plans for Moby
- Decentralization and enrichment of the registry
through distributed RDF-based service instance
annotations LSID resolution - Complete!
- Mirroring of registries
- RDF-based messaging
- BioMoby pre-dates commodity Semantic Web tools
like RDF/OWL by a couple of years
48Future plans for Moby
- Mirroring of Services
- Enhanced registry usage metadata capture
- Ontological markup of Object Ontology Terms
- Better support for Web Service tooling if
possible - Unfortunately, W3C XML Schema is unable to
describe MOBY messages - Collaboration with the GBIF/DIGiR community
biodiversity information served through MOBY
49A weakness of MOBY
- Automated service discovery is fatally flawed due
to insufficiently rich semantics
50The problem with Moby
Chickens go in Pies come out!
51The problem with Moby
What sort o pies?
52The problem with Moby
Apple!
53The MOBY-S Service Ontology
- A simple ISA hierarchy too simple!
- Primitive types include
- Analysis
- Parsing
- Registration
- Retrieval
- Resolution
- Conversion
- Rendering
54A slice of the Service Ontology
Parse_NCBI_Blast
Parsing
The Exploding Bicycle - A. Rector, U
Manchester
Service
WU_Blast
Analysis
Alignment
Blast
NCBI_Blast
55MOBY in the future
- Tighter collaboration with myGrid
- We now have identical RDF data-models for our
registry metadata - We inherit the excellent myGrid Service Ontology,
while retaining the power of the MOBY Object
ontology!
56BioMoby Conclusions
- The bioinformatics community is facing
mission-critical data management problems - The solution must be simple.
- The community will adopt solutions that work even
if they have to change their behaviour to do so - The community can be trusted to build useful,
simple ontologies on its own
57The Semantic Web for Plant Genomics
- How do Web Services help us with the Semantic Web
problem?
58The Semantic Web RDF Triples
http// icapture.ubc.ca/ Wilkinson
http//biomoby.org
dcauthor
Basically, just entity-relationship diagrams
59The Internet
Credit to P. Lord, myGrid
60The World Wide Web
Credit to P. Lord, myGrid
61The Semantic Web (low stack)
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
Credit to P. Lord, myGrid
62How do WS relate to the SW?
- Bioinformatics information is mainly in Databases
- Therefore not available as named documents
(URIs) - Work on Semantic Web Services has focused
primarily on semantic annotation of Web Service
functionality (e.g. Moby myGrid) - i.e. the problem of Service Discovery
- Can Web Services be used to build the Semantic
Web? (credit to Phillip Lord, myGrid, for this
phraseology)
63Web Services no documents to point to!
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
64The Semantic Web
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
Credit to P. Lord, myGrid
65How do we make Web Services look like the
Semantic Web?
- Moby can help!
- Two novel Moby clients - Ahab and Ishmael are
starting to create Semantic Webby outputs
66The Ahab BioMoby Client
67Ahab
68Ahab RDF
69But BioMoby can run unattended!
- Because of syntactic agreement among service
providers, and - Because the machine can automatically disassemble
complex objects, and - Because discovery and execution of services that
act on those objects can be fully automated - BioMoby can build a massive Entity/Relationship
model completely unattended
70Okay, so get rid of the GUI
- Tell Ahab engine to chose all discovered services
for a piece of data - Execute every service
- Take each output, and go to (1)
- Go home for an early weekend
- This is Ishmael - a prototype BioMoby client
71The Output from Ishmael
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
72mySWeb
- The output of Ishmael is My Semantic Web
- Personalized Semantic Web-like RDF graph
- Centered around your data of interest
- Cachable/explorable by e.g. Haystack
- Because each node is a Moby-like URI with a
namespace id, it auto-detects re-discovery of
data elements (loops in the dataset)
73Acknowledgements
O B F
- BioMOBY A Bioinformatics Platform for
Genome Canada - Ahab, Ishmael, iCAPTURer Genome BC Better
Biomarkers in Transplantation - CardioSHARE Canadian Institutes for Health
Research (CIHR) - Taverna myGrid
- Ben Good CIHR Bioinformatics Training
Programme
74 Participants and Supporters
- Edward Kawas Lead Developer , BioMOBY project,
UBC, Canada - Benjamin Good CIHR Bioinformatics Training
Program, UBC, Canada - Clarence Kwan Genome Prairie Co-op student,
UBC, Canada - Bruce McManus Co-director, iCAPTURE Centre,
UBC, Canada - Carole Goble, Phillip Lord myGrid project, U
Manchester, UK - Martin Senger myGrid/Taverna, EBI, UK
- Bill Crosby Matthew Links U Windsor, Canada
- Heiko Schoof, Rebecca Ernst MIPS, Germany
- Simon Twigger Rat Genome Database, USA
- Yan Wong Pasteur Institute, France
- Frank Gibbons Harvard, USA
- David Gonzales Pisano Centro Nacional
Biotechnologia, Spain - Damian Gessler Gary Schiltz NCGR, USA
- Lincoln Stein Cold Spring Harbor Labs, USA
- Midori Harris - Gene Ontology Consortium, UK
- Richard Bruskiewich CGIAR/IRRI, Philippines