Harnessing the Power - PowerPoint PPT Presentation

1 / 74
About This Presentation
Title:

Harnessing the Power

Description:

Model Organism Bring Your own Database Interface Conference, Sept, ... Blue-Jay (P Gordon) and Rat Genome Database prototype (S Twigger) Menu-style clients ... – PowerPoint PPT presentation

Number of Views:53
Avg rating:3.0/5.0
Slides: 75
Provided by: markw179
Category:
Tags: harnessing | power

less

Transcript and Presenter's Notes

Title: Harnessing the Power


1
Harnessing the Power Of communities MOBY
Beyond
Mark Wilkinson PI Bioinformatics iCAPTURE
Centrefor Cardiovascularand Pulmonary
Research Assistant Professor Dept. of Medical
Genetics UBC, Vancouver
2
A brief history of BioMoby
  • Model Organism Bring Your own Database Interface
    Conference, Sept, 2001 (MOBY-DIC)
  • May 21, 2002 Genome Canada Platform Award
  • May 25, 2002 API Version 0.1 deployed,
    including the messaging layer that still exists
    today
  • July 18, 2002 first Moby Client released (now
    gbrowse_moby, part of gbrowse from GMOD)
  • June 9, 2003 API Version 0.5 deployed
  • Currently, the API is at version 0.86 version
    1.0 API in preparation for release end of November

3
What does BioMoby do?
4
  • Create an ontology of bioinformatics data-types
  • Define a serialization of this ontology (data
    syntax)
  • Create an open API over this ontology
  • Define Web Service inputs and outputs v.v.
    Ontology
  • Register Services in an ontology-aware Registry
  • Machines can find an appropriate service
  • Machines can execute that service unattended
  • Ontology is community-extensible

The MOBY-S Plan
5
Overview of MOBY-S Transactions
6
MOBY-S in detail
  • MOBY-S Data typing system Semantic Type
  • MOBY-S Data typing system Syntactic Type

7
MOBY-S in detail
  • MOBY-S Data typing system Semantic Type
  • MOBY-S Data typing system Syntactic Type

8
Moby Namespaces (from GO)
  • Any identifiable piece of data is an entity
  • Identifiers for these entities fall under
    Namespaces
  • NCBI has gi numbers (gi Namespace)
  • GO Terms have accession numbers (GO Namespace)
  • Namespaces indicate datas semantic type.
  • GO0003476 ? a Gene Ontology Term
  • gi163483 ? a GenBank record
  • Namespace ID precisely specifies a data
    entity
  • This differs from an LSID in that our identifiers
    ARE NOT OPAQUE they are semantically rich

9
MOBY-S in detail
  • MOBY-S Data typing system Semantic Type
  • MOBY-S Data typing system Syntactic Type

10
The MOBY-S Object Ontology
  • Syntactic types are defined by a GO-like ontology
  • Data Class name at each node
  • Edges define the relationships between Classes
  • GO used as a model because of its familiarity in
    the community
  • Edges define one of three relationships
  • IS A
  • Inheritance relationship
  • All properties of the parent are present in the
    child
  • HAS A
  • Container relationship of exactly 1
  • HAS
  • Container relationship with 1 or more

node
Edge
node
11
The Simplest Moby Data-Type
ltObject namespaceNCBI_gi id111076/gt
The combination of a namespace and an identifier
within that namespace uniquely identify a data
entity, not its location(s), nor its
representation
Object
12
A Primitive Data-type
DateTime
ISA
Float
ISA
Integer
ISA
ltInteger namespace idgt38lt/Integergt
Object
String
ISA
13
A Derived Data-Type
ltVirtualSequence namespaceNCBI_gi
id111076gt ltInteger namespace id
articleNamelengthgt38lt/Integergt lt/
VirtualSequence gt
ISA
Integer
HASA
ISA
Object
String
Virtual Sequence
ISA
14
A Derived Data-Type
ltGenericSequence namespaceNCBI_gi
id111076gt ltInteger namespace id
articleNamelengthgt38lt/Integergt ltString
namespace id articleNameSequenceStringgt
ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC
lt/Stringgt lt/ GenericSequence gt
ISA
Integer
HASA
HASA
ISA
Object
String
ISA
Virtual Sequence
ISA
Generic Sequence
15
A Derived Data-Type
ltDNASequence namespaceNCBI_gi id111076gt
ltInteger namespace id articleNamelength
gt38lt/Integergt ltString namespace id
articleNameSequenceStringgt ATGATGATAGATAGAGGGC
CCGGCGCGCGCGCGCGC lt/Stringgt lt/ DNASequence
gt
ISA
Integer
HASA
HASA
ISA
Object
String
ISA
ISA
Virtual Sequence
ISA
Generic Sequence
DNA Sequence
16
Legacy file formats
  • Containing String allows us to define
    ontological classes that represent legacy data
    types (e.g. the 20 existing sequence formats!)

ltNCBI_Blast_Report namespaceNCBI_gi
id115325gt ltString namespace id
articleNamecontentgt TBLASTN 2.0.4
Feb-24-1998 Reference Altschul, Stephen F.,
Thomas L. Madden, Alejandro A. Schaumlffer,
Jinghui Zhang, Zheng Zhang, Webb Miller, and
David J. Lipman (1997), "Gapped BLAST and
PSI-BLAST a new generation of protein database
search programs", Nucleic Acids Res.
253389-3402. Query gi1401126 (504
letters) Database Non-redundant
GenBankEMBLDDBJPDB sequences
336,723 sequences 677,679,054 total
letters Searchingdone
Score
E Sequences producing significant alignments
(bits) Value gbU49928HSU49
928 Homo sapiens TAK1 binding protein (TAB1)
mRNA... 1009 0.0 embZ36985PTPP2CMR
P.tetraurelia mRNA for protein phosphatase t...
58 4e-07 embX77116ATMRABI1 A.thaliana mRNA
for ABI1 protein 53
1e-05 lt/Stringgt lt/NCBI_Blast_Reportgt
17
Binaries pictures, movies
  • We base64 encode binaries, and then define a
    hierarchy of data classes that Contain String
  • base64_encoded_jpeg ISA text/base64 ISA
    text/plain HASA String

ltbase64_encoded_jpeg namespaceTAIR_image
id3343532gt ltString namespace id
articleNamecontentgt MIAGCSqGSIb3DQEHAqCAMIACAQE
xCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4w
ggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1
UEBhMCWkExFTATBgNV MIAGCSqGSIb3DQEHAqCAMIACAQExCzA
JBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wggJn
oAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBh
MCWkExFTATBgNV BAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxM
JQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUx HTAbBgNVBAsT
FENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb2
5hbCBGcmVl bWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTI
xMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEf MB0GA1UEAxMWVGhh
d3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQam
prM0Bt lt/Stringgt lt/base64_encoded_jpeggt
18
Extending legacy data types
  • With legacy data-types defined, we can extend
    them as we see fit
  • annotated_jpeg ISA base64_encoded_jpeg
  • annotated_jpeg HASA 2D_Coordinate_set
  • annotated_jpeg HASA Description

ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set namespace
id articleNamepixelCoordinatesgt
ltInteger namespace id articleNamex_coordin
ategt3554lt/Integergt ltInteger
namespace id articleNamey_coordinategt663lt
/Integergt lt/2D_Coordinate_setgt ltString
namespace id articleNameDescriptiongt Th
is is the phenotype of a ufo-1 mutant under long
daylength, 16C lt/Stringgt ltString namespace
id articleNamecontentgt MIAGCSqGSIb3DQEHAqCA
MIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQD
CC Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjE
LMAkGA1UEBhMCWkExFTATBgNV lt/Stringgt lt/annotated_jp
eggt
19
The same object
annotated_jpeg ISA base64_encoded_jpeg HASA
2D_Coordinate_set HASA Description
ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set
namespace id articleNamepixelCoordin
atesgt ltInteger namespace id
articleNamex_coordinategt 3554 lt/Integergt
ltInteger namespace id
articleNamey_coordinategt 663 lt/Integergt
lt/2D_Coordinate_setgt ltString
namespace id articleNameDescription
gt This is the phenotype of a ufo-1 mutant under
long daylength, 16C lt/Stringgt
ltString namespace id
articleNamecontentgt MIAGCSqGSIb3DQEHAqCAMIACAQE
xCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4w
ggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1
UEBhMCWkExFTATBgNV lt/Stringgt lt/annotated_jp
eggt
20
The same object
annotated_jpeg ISA base64_encoded_jpeg HASA
2D_Coordinate_set HASA Description
ltannotated_jpeg namespaceTAIR_Image
id3343532gt lt2D_Coordinate_set
namespace id articleNamepixelCoordinatesgt
ltInteger namespace id
articleNamex_coordinategt 3554 lt/Integergt
ltInteger namespace id
articleNamey_coordinategt 663 lt/Integergt
lt/2D_Coordinate_setgt ltString namespace
id articleNameDescriptiongt This is the
phenotype of a ufo-1 mutant under long daylength,
16C lt/Stringgt MIAGCSqGSIb3DQEHAqCAMIACAQEx
CzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC Av4wg
gJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U
EBhMCWkExFTATBgNV lt/annotated_jpeggt
21
How to think about MOBY Objects and Namespaces
22
Why define Objects in an ontology?
  • Bioinformatics service providers are not all
    experienced programmers
  • The Moby Object Ontology provides an environment
    within which naïve service providers can create
    new complex data-types WITHOUT generating new
    flatfile formats, and without having to
    understand XML Schema
  • Minimize future heterogeneity between new
    data-types to improve interoperability without
    requiring endless schema-to-schema mapping
    efforts.

23
The Object Ontology Defines an XML Schema
  • Object Ontology terms have meaningful names,
    but this is for human intuition only
  • DNA Sequence, Annotated_GIF
  • Object Ontology does not define the biological
    meaning, however it does define how every XML tag
    should be interpreted, therefore superior to pure
    XML/XML-Schema solutions
  • It does define the representation
  • SYNTAX

24
The Object Ontology Defines an XML Schema
  • The position of an ontology node precisely
    defines the syntax by which that node will be
    represented
  • End-users can define new data-types without
    having to write XML Schema!
  • This was an important aim of the project
  • A machine can understand the structure of any
    incoming message by querying its ontological type

25
A portion of the MOBY-S Object Ontology communit
y-built!
26
(No Transcript)
27
(No Transcript)
28
(No Transcript)
29
(No Transcript)
30
(No Transcript)
31
(No Transcript)
32
(No Transcript)
33
(No Transcript)
34
(No Transcript)
35
Pipeline discovery on the fly
  • No explicit coordination between providers
  • Run-time discovery of appropriate Services
  • Automated execution of services
  • This is happening without semantics
  • Syntax only well almost -)

36
Conclusions from the Behaviourof this Simple
Browser
  • Service discovery is a semantic problem
  • However interoperability is not
  • Data integration is still a problem both
    syntactic and semantic - and weve just made that
    problem worse!
  • SYNTAX IS NOT THE PROBLEM!!!!

37
  • Some political details about BioMoby as we are
    coming to the end of the current Genome Canada
    funding period and are trying to get renewal
    hint, hint, if there are any GC external
    reviewers in the audience! ?

38
Moby Breadth
  • Namespaces (semantic datatypes) 236
  • Objects (data syntaxes) 161
  • Service Types (analytical categories) 18
  • Service Instances 401 ( 200 Soaplab)
  • Hundreds more in boutique Moby registries
    serving specialized communities worldwide
  • All continents except Antarctica host Moby
    services

39
Moby Impact
  • Mailing list count 175 members (84 on developers
    mailing list)
  • Google Scholar
  • BioMOBY 147
  • Citations of 2002 BioMOBY paper 72

40
Moby Developer Activity
  • MOBY-DIC Chapter 7 meeting
  • Vancouver, May 6-8, 2005
  • 23 Developers attending
  • Asia
  • USA
  • Canada
  • Germany
  • Spain
  • France
  • Mapped-out the route to the final 1.0 version of
    the API

41
Moby Registry Activity
PlaNet implements own MOBY Central
42
Moby Exemplar Users
  • PlaNet consortium (7 sites, 100-130 services)
  • EBI SOAPLAB myGrid
  • Generation Challenge Programme of the CGIAR (18
    sites)
  • Genome Espania uses MOBY for much of the
    bioinformatics service provision in the GE
    Bioinformatics Platform

43
Moby Clients
  • Gbrowse_moby (M Wilkinson)
  • Browser-style client
  • Ahab Ishmael (B Good, M Wilkinson)
  • BLAST Semantic Web style clients
  • PlaNet Locus_View (H Schoof, R Ernst)
  • Aggregator-style client
  • Blue-Jay (P Gordon) and Rat Genome Database
    prototype (S Twigger)
  • Menu-style clients
  • MOBY Graphs (M Senger)
  • Auto-workflow discovery tool
  • Taverna (T Oinn, M Senger, E Kawas), and MOWserv
    (INB, Spain)
  • Workflow builder/publisher/execution client
  • Enhanced support for MOBY currently being built
  • Eclipse plugins etc

44
Taverna Workbench Tom Oinn and Martin
Senger myGrid Project
45
MOWServ Web interface to the Spanish Instituto
Nacional de Bioinformatica MOBY Central
installation
46
INB CollaborationMOBY Enhancements
  • The INB has made several additions to the MOBY
    API
  • Detailed error reporting
  • Asynchronous service invocation
  • These will become part of the official API in the
    coming year.

47
Future plans for Moby
  • Decentralization and enrichment of the registry
    through distributed RDF-based service instance
    annotations LSID resolution
  • Complete!
  • Mirroring of registries
  • RDF-based messaging
  • BioMoby pre-dates commodity Semantic Web tools
    like RDF/OWL by a couple of years

48
Future plans for Moby
  • Mirroring of Services
  • Enhanced registry usage metadata capture
  • Ontological markup of Object Ontology Terms
  • Better support for Web Service tooling if
    possible
  • Unfortunately, W3C XML Schema is unable to
    describe MOBY messages
  • Collaboration with the GBIF/DIGiR community
    biodiversity information served through MOBY

49
A weakness of MOBY
  • Automated service discovery is fatally flawed due
    to insufficiently rich semantics

50
The problem with Moby
Chickens go in Pies come out!
51
The problem with Moby
What sort o pies?
52
The problem with Moby
Apple!
53
The MOBY-S Service Ontology
  • A simple ISA hierarchy too simple!
  • Primitive types include
  • Analysis
  • Parsing
  • Registration
  • Retrieval
  • Resolution
  • Conversion
  • Rendering

54
A slice of the Service Ontology
Parse_NCBI_Blast
Parsing
The Exploding Bicycle - A. Rector, U
Manchester
Service
WU_Blast
Analysis
Alignment
Blast
NCBI_Blast
55
MOBY in the future
  • Tighter collaboration with myGrid
  • We now have identical RDF data-models for our
    registry metadata
  • We inherit the excellent myGrid Service Ontology,
    while retaining the power of the MOBY Object
    ontology!

56
BioMoby Conclusions
  • The bioinformatics community is facing
    mission-critical data management problems
  • The solution must be simple.
  • The community will adopt solutions that work even
    if they have to change their behaviour to do so
  • The community can be trusted to build useful,
    simple ontologies on its own

57
The Semantic Web for Plant Genomics
  • How do Web Services help us with the Semantic Web
    problem?

58
The Semantic Web RDF Triples
http// icapture.ubc.ca/ Wilkinson
http//biomoby.org
dcauthor
Basically, just entity-relationship diagrams
59
The Internet
Credit to P. Lord, myGrid
60
The World Wide Web
Credit to P. Lord, myGrid
61
The Semantic Web (low stack)
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
Credit to P. Lord, myGrid
62
How do WS relate to the SW?
  • Bioinformatics information is mainly in Databases
  • Therefore not available as named documents
    (URIs)
  • Work on Semantic Web Services has focused
    primarily on semantic annotation of Web Service
    functionality (e.g. Moby myGrid)
  • i.e. the problem of Service Discovery
  • Can Web Services be used to build the Semantic
    Web? (credit to Phillip Lord, myGrid, for this
    phraseology)

63
Web Services no documents to point to!
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
64
The Semantic Web
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
Credit to P. Lord, myGrid
65
How do we make Web Services look like the
Semantic Web?
  • Moby can help!
  • Two novel Moby clients - Ahab and Ishmael are
    starting to create Semantic Webby outputs

66
The Ahab BioMoby Client
67
Ahab
68
Ahab RDF
69
But BioMoby can run unattended!
  • Because of syntactic agreement among service
    providers, and
  • Because the machine can automatically disassemble
    complex objects, and
  • Because discovery and execution of services that
    act on those objects can be fully automated
  • BioMoby can build a massive Entity/Relationship
    model completely unattended

70
Okay, so get rid of the GUI
  • Tell Ahab engine to chose all discovered services
    for a piece of data
  • Execute every service
  • Take each output, and go to (1)
  • Go home for an early weekend
  • This is Ishmael - a prototype BioMoby client

71
The Output from Ishmael
sameAs
TranscriptOf
ISA
activates
componentOf
hasProduct
address
clonedBy
72
mySWeb
  • The output of Ishmael is My Semantic Web
  • Personalized Semantic Web-like RDF graph
  • Centered around your data of interest
  • Cachable/explorable by e.g. Haystack
  • Because each node is a Moby-like URI with a
    namespace id, it auto-detects re-discovery of
    data elements (loops in the dataset)

73
Acknowledgements
O B F
  • BioMOBY A Bioinformatics Platform for
    Genome Canada
  • Ahab, Ishmael, iCAPTURer Genome BC Better
    Biomarkers in Transplantation
  • CardioSHARE Canadian Institutes for Health
    Research (CIHR)
  • Taverna myGrid
  • Ben Good CIHR Bioinformatics Training
    Programme

74
Participants and Supporters
  • Edward Kawas Lead Developer , BioMOBY project,
    UBC, Canada
  • Benjamin Good CIHR Bioinformatics Training
    Program, UBC, Canada
  • Clarence Kwan Genome Prairie Co-op student,
    UBC, Canada
  • Bruce McManus Co-director, iCAPTURE Centre,
    UBC, Canada
  • Carole Goble, Phillip Lord myGrid project, U
    Manchester, UK
  • Martin Senger myGrid/Taverna, EBI, UK
  • Bill Crosby Matthew Links U Windsor, Canada
  • Heiko Schoof, Rebecca Ernst MIPS, Germany
  • Simon Twigger Rat Genome Database, USA
  • Yan Wong Pasteur Institute, France
  • Frank Gibbons Harvard, USA
  • David Gonzales Pisano Centro Nacional
    Biotechnologia, Spain
  • Damian Gessler Gary Schiltz NCGR, USA
  • Lincoln Stein Cold Spring Harbor Labs, USA
  • Midori Harris - Gene Ontology Consortium, UK
  • Richard Bruskiewich CGIAR/IRRI, Philippines
Write a Comment
User Comments (0)
About PowerShow.com