Title: Feasting on Brains!
1Feasting on Brains! From Web Services to Web 2.0
to the Semantic Web and back again A
personal journey through the Semantic Web and Web
Services for Health Care and Life Sciences Mark
Wilkinson (markw_at_illuminae.com) Assistant
Professor, Medical Genetics University of British
Columbia Heart and Lung Research Institute at St.
Pauls Hospital
2Benjamin Good(Hes a Creep!)
3approach
Bioinformatics is a broad fieldand suffers
SEVERE interoperability problems
Bioinformaticians tend to be specialists in a
particular domain of computational analysis
As a group, the brains of all bioinformaticians
Contain all (known) bioinformatics
Is it possible to extract the knowledge Required
for interoperability from the brains of
bioinformaticians en masse?
4Human Computation (luis von Ahn)
5 Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
6An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
7Classes
Animal
Mammal
Hair
Primate
Lemur
Human
Zombie
Brains
Chips
Shoots
8instances
9Properties
has
is_a
eats
10relations
has
is_a
eats
11An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Zombie
eats
Brains
Chips
Shoots
Classes, instances properties, relationships
12Web Service?
- A software tool that is accessible over the Web
- Web Services are intended to be accessed by
machines, not people.
13 Interoperability?
- The ability of two Web Services to exchange
information, and use that information correctly - This generally requires Semantics in the form of
Ontologies
14Mmmm Brains!!
- BioMoby
- Eating brains to enable Web Service
Interoperability
15What does BioMoby do?
16- Create an ontology of bioinformatics data-types
- Define an ontology of bioinformatics operations
- Open these ontologies for community input
- Define Web Services v.v. these two ontologies
- A Machine can find an appropriate service
- A Machine can execute that service unattended
- Ontology is community-extensible
The BioMoby Plan
17Overview of BioMoby Semantic Interoperability
18Why couldnt we do this before?
19Interoperability is HARD!
20Interoperability throughHuman Computation
- BioMoby Data Type Ontology An explicit list of
all biological data-types, and the relationships
between them. -
- Ontology built, brain by brain, by
informaticians! - We achieve interoperability simply because
informaticians donate their brain-power - HUMAN COMPUTATION
21A portion of the BioMoby Ontology built from
the brains of the community!
22so what can I do with it?
23(No Transcript)
24(No Transcript)
25(No Transcript)
26(No Transcript)
27(No Transcript)
28(No Transcript)
29(No Transcript)
30(No Transcript)
31(No Transcript)
32(No Transcript)
33(No Transcript)
34(No Transcript)
35(No Transcript)
36(No Transcript)
37(No Transcript)
38(No Transcript)
39Analytical workflow Discovery
- No explicit coordination between providers
- Run-time discovery of appropriate tools
- Automated execution of those tools
- The machine understands the data you have
in-hand, and assists you in choosing the next
step in your analysis.
40Interoperability throughHuman Computation
- Individuals contributed their knowledge about
bioinformatics data-types to a central ontology - Their combined knowledge enabled the construction
of an interoperable framework
41 42Usage Statistics
- 15 Nations
- gt 60 independent institutions
- gt1600 interoperable Bioinformatics Resources
- 500,000 requests for brokering each month
43What have we learned?
- We can consume
- the brains of a large community
- to generate something complex, yet organized
44Open Kimono
- The BioMoby ontology is actually quite messy
- communal brains can build useful ontologies, but
the problem is
45Ontologies are HARD!
46How are ontologies usually constructed?
47By small, hard-working, dedicated groups with
lots of money!
- Gene Ontology code
- Curated 5 full-time staff
- 25 Million (Lewis,S personal communication)
- NCI Metathesaurus code
- Curated 12 full-time staff
- 15 Million (Peter K. , estimate)
- Health Level 7 (HL7)
- Curated
- Lots Some claim as much as 15 Billion
(Smith, Barry, KBB Workshop, Montreal, 2005)
48- To build the global Semantic Web for Systems
Biology we need to encode knowledge from EVERY
domain of biology from barley root apex
structure and function, to HIV clinical-trials
outcomes and this knowledge is constantly
changing! - At gt15M each, can we afford the Semantic Web???
49Mmmm Need MORE Brains!!
50Dr. Bruce McManus with a human heart in his
hands He knows his hearts but he doesntknow
how to buildan ontology
51What we need
52The Problem
53The Solution?
54The Solution?
55So how do we do it?
56Remember what we learnedfrom Moby communities
CAN build ontologies!
57Building Systems BiologyOntologies through
Human Computation
58iCAPTURer
- Benjamin Good
- Ph.D. Student, UBC Bioinformatics
- Genome BC Better Biomarkers in Transplantation
project, St. Pauls Hospital iCAPTURE Centre
59Old Way
- KE drills the brain of one or a very few
experts. -
- Painful, expensive, and time-consuming
60New Way? the iCAPTURer
- KE creates a clever interface
- No direct interaction with expert
- Thousands of experts
- Cheap Cheap Cheap!
61iCAPTURer 1.0
- Go to a scientific conference
- Text-mine conference abstracts
- Auto-Extract concepts
- Put concepts into a series ofquestion
templates - a web interface presents questions about these
concepts to conference attendees - Give points for every question they answer
- Give a prize to the highest point winner
62Results
- Is _____ a meaningful term?
- Yes, No, I dont know buttons
- What is a synonym for ______
- Text entry box
- Where does _____ fit in the following tree of
related terms? - Clickable tree
63Observations
- Yes/No questions work well
- Text entry is less effective
- Adding to a tree is a disaster!
- Competition is a great motivatorfor human
computation!
64COST?
65COST?
66COST?
67COST?
68COST?
lt 15,000,000
69iCAPTURer 1.5
70Start with hypothetical concept tree Put
concepts-concept relations into a series of
true/false questions Make a web interface to
ask questions If a relationship is false, then
re-start at the root of the concept tree Give
points for every question they answer Give a
prize to the highest point winner
71 Chatterbot
- Ive heard that a cardiac myocyte is a type of
cardiac cell. Is this true? - Ive heard that STEMI means the same thing as ST
Elevated Myocardial Infarction. Is that
nonsense, or is it correct? - How do you feel about your mother?
72Results
- Knowledge capture in 3 days
- gt11,000 Concepts
73COST
0
74Full details of this experiment are available
in Proceedings of the Pacific Symposium on
Biocomputing, 2006
75Ontology Quality?
76Potential Ontology Evaluation Metrics
- Manual, subjective
- Auto, questionable value
- Auto, useful, not enough
-
- Auto, dependent on NLP
- Auto/Manual gold standard must exist!
- Optimal! Auto/Manual, but not generalizable
- Domain independent
- philosophical desiderata
- graphical structure
- satisfiability
- Domain specific
- Fit to text
- Similarity to a gold standard
- Task-based
77Good???
78What do we mean by Good?
Ontology construction is motivated by the goal
of alignment not on concepts but on the
universals in reality and thereby also on the
corresponding instances - Barry
Smith Reality should be the benchmark for the
goodness of an ontology
79ontology evaluation based on referents in
reality
80Chosen Philosophical PrincipleEpistemology
Precedes Ontology
- A Class should refer to an invariant pattern of
properties common among all its instances - Mammals have mammary glands and hair
- Humans are an instance of the class Mammal
- Therefore
- If class-instances are mapped into an ontology
- Each instance has properties or qualities
- These properties or qualities SHOULD segregate
into different classes if the ontology is any good
81Philosophical Desiderata
- Non-vagueness
- at least one instance can exist with the Class
pattern - Vague class mammalian cell wall
- Non-ambiguity
- no more than one common pattern per Class
- Ambiguous class cell (e.g. cell phone, jail
cell) - Non-redundancy
- within the same level of granularity, no other
class refers to same common properties - Redundant classes human, homo sapiens
Cimino, J, 1998
82Realist Evaluation Step 1Table of
Instance-Properties
Instance Char1 Char2 Class B?
I.1 Y N Y
I.2 Y Y Y
I.3 N N N
I.4 N Y N
... ... ... ...
A
C
B
I.1
I.3
I.2
I.4
(Test one class at a time)
83Realist Evaluation Step 2Machine Learning
Instance Char1 Char2 Class B?
I.1 Y N Y
I.2 Y Y Y
I.3 N N N
I.4 N Y N
... ... ... ...
If char1 Y Then Class X 100
Pattern
Class B score for this pattern
84WEKA
- Produced by Waikato University in New Zealand
- An open source library containing implementations
of hundreds of machine learning algorithms - (rule learners, LDA, SVM, neural networks... )
85Realist Evaluation
0.35
0.1
0.92
Instance Char1 Char2 Class 1?
I.1 Y N Y
I.2 Y Y Y
I.3 N N N
I.4 N Y N
... ... ... ...
Class Score for Each Class
86Realist Evaluation - positive control
- Identify an ontology that already has logical
constraints on properties of a classes. - Assemble instances that have those properties
- Classify the instances with a reasoner
- Remove class restrictions from the ontology, but
keep instances assigned to their classes - Look for patterns of instance properties
- If successful, patterns should be detected
- The higher the pattern score, the gooder the
ontology is
87Positive Control Phosphabase
- An ontology describing different classes of
phosphatase enzymes. - Given the domain composition of a protein,
phosphatase class can be inferred automatically.
Wolstencraft et al (2006) Protein classification
using ontology classification Bioinformatics.
Vol. 22 no. 14, pages 530538
88Remove the Logical Rules
- Remove the defining rules for each class
- Maintain the classified instances
- Execute the realist evaluation
- Can we re-discover the patterns that the logical
class-rules used to dictate?
89Realist Evaluation Positive Control
- 25 classes from phosphabase tested on 700
simulated protein instances - 21 - pattern correctly identified for 100 of
instances - For 4 others, patterns identified covering 99,
92, 82, 82 of instances respectively.
90Realist Evaluation Positive Control
- So the Phosphabase ontology is good
- We can detect strong patterns of properties in
its instances that follow the philosophical
desiderata - This is unsurprising, since we knew that it was
good in the first place
91Evaluation of Gene Ontologyis ongoing
92Interesting side effect
- Class-defining rules are generated by the realist
evaluation - Most existing bio-ontologies lack formal
class-definitions - This evaluation could be used to create such
rules ? automatic classifiers - Can also detect what TYPE of property is best
classified by current bio-ontologies
93Is Realist Evaluation a Valid metric?
- the realist evaluation measures the success of an
ontology in classifying a specific set of
properties - We claim that this is a metric relating to the
quality of that ontology - Is this metric any better than other metric like
graph complexity, or fit-to-text?
94Evaluatingmetrics
95OntoLoki Making mischief with Ontologies
- Take an ontology that we claim is good
- Make it worse by mischievously adding changes
- Measure the degree of mischief
- Run the evaluation metric of interest
- ? Metric score should correlate with the amount
of mischief added
96Comparison of ontology quality metrics
Measured Ontology Quality
Amount of noise added (ontology quality
decreasing)
97Is Reality Evaluation a good metric?
98Lets OntoLoki it to find out!
99OntoLoki test of Realist Evaluation Metric
Average Class Score
Noise Added (a measure of nodes affected)
100Conclusion
- Human computation can collect significant amounts
of knowledge in an organized way
OntoLoki seems to be effective atevaluating
the evaluation metrics
Reality evaluation is an interesting new
metric for testing ontologies
101Subjective iCAPTURer Observations
- Humans had an EXTREMELY difficult time
classifying concepts into pre-existing categories - Humans had an EXTREMELY difficult time defining
new categories and placing them into the existing
classification system
102Classification is HARD!
103Abandoning Classification
104(briefly)
105An ontology is a representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Classes, instances properties, relationships
has_size
Big
Medium
Small
106AN ontology is ONE representation of knowledge
Animal
has
Mammal
Hair
Primate
is_a
Lemur
Human
Gorilla
Ontology of Anatomy
has_size
Big
Medium
Small
107AN ontology is ONE representation of knowledge
Animal
lives
African_animal
Africa
Southern_African_animal
is_a
Ontology of Habitat Also might want Odour,
digits, bone density, friendliness, cuteness..
Aquatic
plains
mountain
108Clay Shirky Ontology is Overrated
- Attempts to predict the future
- Soviet Union used to be a category in the
Library of Congress - Attempts mind-reading
- Size, location, odour.. Authors must predict what
users are interested in - Great minds dont think alike..
- No two people are likely to create the same
ontology
http//www.shirky.com/writings/ontology_overrated.
html
109Categories
Properties
110BRAINS!! MORE BRAINS!!
- Mass Collaborative Tagging
111Mass Open Social Tagging
- A rapidly growing trend on the Web
- Unstructured
- Mass-collaboration
- Anyone can say anything about anything using any
words they wish
112Connotea Scientific Tagging(Connotea is a
product of Nature Publishing Group)
113Connotea Growth
114Tagging is EASY!
115The Tagged World
- Tagging is easy!
- Tagging costs nothing
- Tagging empowers all viewpoints
- Tagging is happening!!!!!!
116Lexical Comparison of Tagging with Formal
Indexing Systemsand Ontologies
117Ontology (FMA)
118Ontology (GO Molecular Function)
119Ontology (GO Biological Process)
120Tagging (Bibsonomy)
121Tagging (CiteULike)
122Tagging (Connotea)
123Ontologies and Folksonomies are fundamentally
different!
124Problem??
- Folksonomies and ontologies are fundamentally
different! - It may not be possible to derive one from the
other accurately - Nevertheless, we would like to take advantage of
tagging behaviour while gaining the power of
controlled vocabularies/Ontologies
125E.D.The Entity Desciber
126Connotea tagging
User types in all tags
Type-ahead displays previously used tags
127Connotea E.D. Tagging
128Leveraging Tagging?
- Tagging effectively assigns properties to
entities - ED Tagging constrains those properties to a
controlled vocabulary or ontology - Can we discover patterns in those properties that
indicate a natural classification system? - Can a realist-evaluation generate logical rules
that define classes based on patterns of tags?
129Final Thoughts
- Ontologies are important, but hard to build
- iCAPTURer formal, template-based, cost-free
consumption of biologists brains seems to work! - Informal annotation (tagging) is cheap, easy,
and scalable, - and is HAPPENING
- Can we leverage tagging to create ontology-like
structures? Maybe Maybe not!
130My journey back to Web Services
131Why do I care about WS so passionately?
132(No Transcript)
133The Deep Web
- All the data and knowledge only accessible
through Web Forms - Estimated to be orders of magnitude greater than
the surface Web- 91,000 Terabytes in the deep
Web- 167 Terabytes in the Surface Web - Much of the Deep Web CANNOT be represented on the
Semantic Web since it DOES NOT EXIST until the
Web Form is accessed
134Moby 2.0 and CardioSHARE Merging the Deep
Weband the Semantic Web
135What Web Services do
BLAST SERVICE
Sequence Data
Blast Hit
136What BioMoby does
??
Sequence Data
Want Blast
MOBY BLAST SERVICE
137The implied relationship between input and output
Sequence Data
Blast Hit
givesBlastResult
Not Bologically Meaningful
138The implied biological relationship between input
and output
hasHomologyTo
Sequence Data
Blast Hit
looks a lot like the RDF statement
139To merge Web Servicesand the Semantic
WebSimply assertthe relationshipand let
Moby do the rest!
140Start with a partial Triple
URI rdftype Sequence
hasHomologyTo
141What Moby 2.0 Does
??
URI rdftypeSequence
hasHomologyTo
MOBY BLAST SERVICE
Moby 2.0 hasHomologyTo property provided
byBLAST services
142Moby 2.0 Query
Consume Sequence Data Provide hasHomologyTo
Property Attached to other Sequence Data
143Moby 2.0 extends SPARQL
- SPARQL queries contain concepts and relationships
of interest - Map RDF predicates onto Moby services capable of
generating them - Registry query What Moby service consumes
subject and generates the predicate
relationship type?
144But wait, theres more!
145CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
146CardioSHARE Exploit knowledge in OWL-DL
ontologies to enhance query
This SPARQL query could be posed on a database
of RAW, UNANNOTATED Protein sequences, and be
answered by Moby 2.0
147What do Moby 2.0 and CardioSHARE achieve?
- Makes the Deep Web transparently accessible as if
it were a Semantic Web Resource - Allows SPARQL to do truly semantic queries!
- Reduces the requirement of Biologists to know
how/where to get their data of interest - Simplifies construction of complex analytical
pipelines by automating much of the
discovery/execution tasking
148 Ontology Spectrum
Thesauri narrower term relation
Selected Logical Constraints (disjointness,
inverse, )
Frames (properties)
Formal is-a
Catalog/ ID
Informal is-a
Formal instance
General Logical constraints
Terms/ glossary
Value Restrs.
Originally from AAAI 1999- Ontologies Panel by
Gruninger, Lehmann, McGuinness, Uschold, Welty
updated by McGuinness. Description in
www.ksl.stanford.edu/people/dlm/papers/ontologies-
come-of-age-abstract.html
149Fin