Title: Ontology Engineering: Tools and Methodologies
1Ontology Engineering Tools and Methodologies
- Ian Horrocks
- lthorrocks_at_cs.man.ac.ukgt
- Information Management Group
- School of Computer Science
- University of Manchester
2Tutorial Resources
- http//www.cs.man.ac.uk/horrocks/nsd07/
3Ontologies
4Ontology Origins and History
- In Philosophy, fundamental branch of metaphysics
- Studies being or existence and their basic
categories - Aims to find out what entities and types of
entities exist
5Ontology in Information Science
- An ontology is an engineering artefact consisting
of - A vocabulary used to describe (a particular view
of) some domain - An explicit specification of the intended meaning
of the vocabulary. - Often includes classification based information
- Constraints capturing background knowledge about
the domain - Ideally, an ontology should
- Capture a shared understanding of a domain of
interest - Provide a formal and machine manipulable model
6Example Ontology (Protégé)
7The Web Ontology Language OWL
8OWL History
- Semantic Web led to requirement for a web
ontology language - set up Web-Ontology (WebOnt) Working
Group - WebOnt developed OWL language
- OWL based on earlier languages RDF, OIL and
DAMLOIL - OWL now a W3C recommendation (i.e., a standard)
- OWL is a family of 3 languages OWL Lite, OWL DL
and OWL Full - OIL, DAMLOIL and OWL (DL Lite) based on
Description Logics - Many OWL DL/Lite tools ontologies
- Relatively few OWL Full tools or ontologies
9What Are Description Logics?
- A family of logic based Knowledge Representation
formalisms - Descendants of semantic networks and KL-ONE
- Describe domain in terms of concepts (classes),
roles (properties, relationships) and individuals - Operators allow for composition of complex
concepts - Names can be given to complex concepts, e.g.
HappyParent Parent u 8hasChild.(Intelligent t
Athletic)
10Why (Description) Logic?
- OWL exploits results of 15 years of DL research
- Well defined (model theoretic) semantics
Quillian, 1967
11Why (Description) Logic?
- OWL exploits results of 15 years of DL research
- Well defined (model theoretic) semantics
- Formal properties well understood (complexity,
decidability)
I cant find an efficient algorithm, but neither
can all these famous people.
Garey Johnson. Computers and Intractability A
Guide to the Theory of NP-Completeness. Freeman,
1979.
12Why (Description) Logic?
- OWL exploits results of 15 years of DL research
- Well defined (model theoretic) semantics
- Formal properties well understood (complexity,
decidability) - Known reasoning algorithms
13Why (Description) Logic?
- OWL exploits results of 15 years of DL research
- Well defined (model theoretic) semantics
- Formal properties well understood (complexity,
decidability) - Known reasoning algorithms
- Implemented systems (highly optimised)
KAON2
14Why the Strange Names?
- Description Logics are a family of KR formalisms
- Mainly distinguished by available operators
- Available operators indicated by letters in name,
e.g., - S basic DL (ALC) plus transitive roles (e.g.,
ancestor ? R) - H role hierarchy (e.g., hasDaughter v hasChild)
- O nominals/singleton classes (e.g., Italy)
- I inverse roles (e.g., isChildOf hasChild)
- N number restrictions (e.g., gt2hasChild,
63hasChild) - Basic DL role hierarchy nominals inverse
NR SHOIN - The basis for OWL-DL
- SHOIN is very expressive, but still decidable
(just) - Decidable ? we can build reliable tools and
reasoners
15Why (Description) Logic?
- Foundational research was crucial to design of
OWL - Informed Working Group decisions at every stage,
e.g. - Why not extend the language with feature x,
which is clearly harmless? - Adding x would lead to undecidability - see
proof in
16Class/Concept Constructors
- C is a concept (class) P is a role (property) x
is an individual name - XMLS datatypes as well as classes in 8P.C and
9P.C - Restricted form of DL concrete domains
17Knowledge Base / Ontology Axioms
18Knowledge Base / Ontology
- A TBox is a set of schema axioms (sentences),
e.g. - Parent v Person u gt1hasChild,
- HappyParent Parent u 8hasChild.(Intelligent t
Athletic) - An ABox is a set of data axioms (ground facts),
e.g. - JohnHappyParent,
- John hasChild Mary
- An OWL ontology is just a SHOIN KB
19OWL RDF/XML Exchange Syntax
E.g., Parent u 8hasChild.(Intelligent t Athletic)
- ltowlClassgt
- ltowlintersectionOf rdfparseType"
collection"gt - ltowlClass rdfabout"Parent"/gt
- ltowlRestrictiongt
- ltowlonProperty rdfresource"hasChild"/gt
- ltowlallValuesFromgt
- ltowlunionOf rdfparseType" collection"gt
- ltowlClass rdfabout"Intelligent"/gt
- ltowlClass rdfabout"Athletic"/gt
- lt/owlunionOfgt
- lt/owlallValuesFromgt
- lt/owlRestrictiongt
- lt/owlintersectionOfgt
- lt/owlClassgt
20Ontology Reasoning
21Why Ontology Reasoning?
- Given key role of ontologies in many
applications, it is essential to provide tools
and services to help users - Design and maintain high quality ontologies,
e.g. - Meaningful all named classes can have instances
22Why Ontology Reasoning?
- Given key role of ontologies in many
applications, it is essential to provide tools
and services to help users - Design and maintain high quality ontologies,
e.g. - Meaningful all named classes can have instances
- Correct captures intuitions of domain experts
23Why Ontology Reasoning?
- Given key role of ontologies in many
applications, it is essential to provide tools
and services to help users - Design and maintain high quality ontologies,
e.g. - Meaningful all named classes can have instances
- Correct captures intuitions of domain experts
- Minimally redundant no unintended synonyms
?
Banana split
Banana sundae
24Why Ontology Reasoning?
- Given key role of ontologies in many
applications, it is essential to provide tools
and services to help users - Design and maintain high quality ontologies,
e.g. - Meaningful all named classes can have instances
- Correct captures intuitions of domain experts
- Minimally redundant no unintended synonyms
- Answer queries, e.g.
- Find more general/specific classes
- Retrieve individuals/tuples matching
a given query
25Ontology Applications
26e-Science
- E.g., Open Biomedical Ontologies Consortium (GO,
MGED) - Used, e.g., for in silico investigations
relating theory and data - E.g., relating data on phosphatases to (model of)
biological knowledge
27Medicine
- Building/maintaining terminologies such as
Snomed, NCI, Galen and FMA - Used, e.g., for semi-automated annotation of MRI
images
28Organising Complex Information
- E.g., UN-FAO, NASA, Ordnance Survey, General
Motors, Lockheed Martin,
29Organising Complex Information
- E.g., UN-FAO, NASA, Ordnance Survey, General
Motors, Lockheed Martin,
30 OWL Experiences and Directions
- Workshop at ESWC07 (Innsbruck, Austria, 6-7
June) - Brings together users, implementors and
researchers - Submissions include
- Enterprise Integration (Mitre)
- Product development (Lockheed Martin)
- Role based access control (NASA)
- Healthcare (SNOMED)
- Agriculture and fisheries (UN Food Agriculture
Organization) - Oral Medicine (Chalmers)
31Ontology Engineering
32Ontology Engineering Tasks
- Typical tasks in Ontology Engineering
- author concept descriptions
- refine the ontology
- manage errors
- integrate different ontologies
- (partially) reuse ontologies
- These tasks are highly challenging need for
- tool infrastructure support
- design methodologies
33Tools and Infrastructure
- Editors/environments
- Protégé, Swoop, TopBraid Composer, Construct,
Ontotrack,
34Tools and Infrastructure
- Editors/environments
- Oiled, Protégé, Swoop, Construct, Ontotrack,
- Reasoning systems
- Cerebra, FaCT, Kaon2, Pellet, Racer,
Pellet
KAON2
35Tools and Infrastructure
- Editors/environments
- Oiled, Protégé, Swoop, Construct, Ontotrack,
- Reasoning systems
- Cerebra, FaCT, Kaon2, Pellet, Racer,
- Design methodologies
- Modularity, foundational ontologies, etc.
36Development Maintenance
37Development Environments
- Most widely used free to download tools are
- Protégé (Stanford / Manchester) -- be sure to get
v4.x - Swoop (UMD / Clark Parsia)
- Commercial tools include
- TopBraid, RacerPro,
- Facilities typically include
- Range of display modes and editing features
- Visualisation
- Consistency and subsumption checking
- Useful extras may include
- Debugging and explanation
- Repair
- Integration and/or partitioning
http//code.google.com/p/swoop/
http//protege.stanford.edu/
38Demo Ontologies
- GALEN
- http//www.cs.man.ac.uk/horrocks/OWL/Ontologies/g
alen.owl - NCI
- http//www.mindswap.org/2003/CancerOntology
- Tambis
- http//www.cs.man.ac.uk/horrocks/OWL/Ontologies/t
ambis.owl
39GALEN
- Ontology about medical terms and surgical
procedures. - Work started in the 90s within the OpenGALEN
project. - Main applications
- Integration of clinical records, and
- decision support.
- GALEN
- is very large (35,000 concepts),
- is fairly expressive (SHIF description logic),
- has not been classified yet by any DL reasoner
- We will look at a smaller version, which
- is still large (3,000 concepts),
- is similarly expressive as full GALEN,
- was first classified by the FaCT system.
40GALEN The Ontology at a Glance
- Size
- 3,000 classes
- 500 object properties
- no individuals or datatypes
- Expressivity
- 350 General Concept Inclusion Axioms (GCIs).
- Concept constructors
- Conjunction (intersectionOf)
- Existential restrictions (someValuesFrom)
- 150 functional properties
- 26 transitive properties
41GALEN The (Unclassified) Hierarchies
- The class hierarchy
- Number of subsumption relations 1,978
- Maximum depth of the tree 13
- No multiple inheritance
- The property hierarchy
- 4 properties with multiple inheritance
42GALEN Concept definitions and GCIs
- Concept definition
- Axiom of the form A C with
- A a concept name
- C a (possibly complex) concept
- A definition assigns a name A to a complex
concept C - Some examples
- LungPathology PathologicalCondition u 9
locativeAttribute.Lung - RenalTransplant Transplanting u 9
actsOn.Kindney
43GALEN Concept definitions and GCIs
- Inclusion axioms
- Axioms of the form A v C
- A is a concept name
- C is a possibly complex concept
- Represent an incomplete (partial) definition
- Examples
- XRayMachine v ImagingDevice
- Candida v Fungus u 9 hasFunction.AerobicMetabolicP
rocess - In GALEN, some of these can be very complex
- check out the definitions of Knee Joint and
Kidney!
44GALEN Concept definitions and GCIs
- General Concept Inclusion Axioms (GCIs)
- Axioms of the form C D
- C,D can be complex
- May describe general (background) knowledge about
the ontology - Examples
- Secretion u 9 actsSpecificallyOn.Leucocidin v
- 9 isFunctionOf.StraphilococcusAureus
-
- Transport u 9 actsOn.Glucose u 9
carriesFrom.Blood v - 9 carriesTo.Cell
45Classifying GALEN
- Ontology statistics (revisited)
- Number of class subsumption relations 6729
- 1978 of which are told and the rest inferred
- Maximum depth of the class tree 15
- As opposed to 13 in the case of the unclassified
tree - Classes with multiple inheritance 408
- All multiple inheritance relations have been
inferred! - This was intended in the design of GALEN
- Maximum depth of the property tree 9
- No change with respect to the told tree
- Properties with multiple inheritance 4
- Again, no change with respect to the told tree
- Reasoning is mostly performed on classes and not
on properties
46Modeling Choices
- The upper part
- Composed of the domain-independent concepts and
roles. - Examples
- TopCategory, DomainCategory, GeneralisedStructure
- Shallowly defined (mostly a taxonomy)
- The domain specific part
- Examples
- Plant, LungPathology,
- Richly defined
- Much more than just a taxonomy!
47Inferred Knowledge
- A trivial subsumption
- Why is PathologicalCondition a subclass of
DomainCategory? - Simply look at the definition of Pathological
Condition! - Another example
- Why is PathologicalBehavior a subclass of
PathologicalCondition? - Look at the definition of both classes
- Notice that Behavior is a subclass of
DomainCategory - A non-trivial subsumption
- Why is AchalasiaProcesses a PathologicalBodyProces
ses?
48Classifying GALEN
- Simple and multiple inheritance
- Focus, for example, on PathologicalBodyProcess
- Navigate to its super-classes
- Visualisation can be useful
- In Swoop we can Fly the mother ship!
49The NCI Ontology
- Huge bio-medical ontology describing the Cancer
domain - Maintained by dozens of domain experts
- Contains information about
- genes,
- diseases,
- drugs,
- research institutions,
- All with a cancer-centric focus
50NCI The Ontology at a Glance
- Size
- 30.000 classes
- 70 object properties
- no individuals or datatypes
- Expressivity
- Concept constructors
- Conjunction (intersectionOf)
- Existential restrictions (someValuesFrom)
- Axioms
- Definitions (no GCIs)
- Domain and range of properties
51NCI The (Unclassified) Hierarchies
- The class hierarchy
- Number of subsumption relations 103.232
- Maximum depth of the tree 19
- Classes with multiple inheritance 4636
- Browse through it!
- The property hierarchy
- No properties with multiple inheritance
- Browse through it!
52Axioms in NCI
- Examples
- Cancer_Gene v Gene u 9 hasFunction.Tumoregenesis
- Alzheimer_Disease v Dementia
- Domain(rAnatomic_Structure_Has_Location)
Anatomy_Kind - Range(rTechnique_Has_Purpose)
Clinical_Or_Research_Activity_Kind
53The NCI Kinds
- Upper concepts representing the sub-domains of
NCI - Examples
- Anatomy.
- Biological processes.
- Chemicals and drugs.
- Organisms
- Properties relating the Kinds
54NCI
- Partitioning and crop-circles view of the
partitioning - Gives an intuition about the different
sub-domains in NCI, which ones are central, and
which ones are side domains
55NCI and GALEN
- The domains of NCI and GALEN overlap. Both
ontologies define concepts such as - Anatomical parts bone, tissue, etc.
- Diseases
- Organisms,
- Example
- Check out how Femur is defined in NCI and GALEN
- Different modeling decisions and focus of
interest
56Tambis
- TAMBIS is a medical ontology constructed during
the early days of the Web. - The intended application was the integrated
access to information in a set of databases. - The OWL version was generated from the old format
using a (buggy) script.
57Tambis The Ontology at a Glance
- Size
- 400 classes
- 100 object properties
- no individuals or datatypes
- Expressivity
- No General Concept Inclusion Axioms.
- Concept constructors
- Conjunction (intersectionOf)
- Disjunction (unionOf)
- Existential restrictions (someValuesFrom)
- Universal restriction (allValuesFrom)
- Cardinality restrictions
- Axioms
- Definitions (complete and partial)
- Transitive, functional, symmetric and inverse
properties
58Tambis the (unclassified) hierarchies
- Subclass relationships 226
- No multiple inheritance
- Maximum depth of class tree 6
- Maximum depth of property tree 2
59Tambis Example Axioms
- Tambis uses cardinality restrictions profusely
- See definition of anion
- Use of disjunction
- See definition of atom
- Use of universal restrictions
- See definition of book-title
- Use of complex nested restrictions
- See definition of complement-dna
- See definition of gene
- Disjointness axioms
- See definitions of metal, non-metal and metalloid
60Tambis Classification
- Subclass relationships 600
- compared to 226
- Classes with multiple inheritance 19
- compared to none
- Maximum deph of class tree 7
- compared to 6
- Maximum depth of property tree 2
- 144 unsatisfiable concepts!
61Tambis Unsatisfiable concepts
- Almost half of the concepts in Tambis are
unsatisfiable - The explanations are non-trivial
- E.g., protein-structure and macromolecular-part
- Distinguishing root and derived unsatisfiable
classes - derived unsatisfiable classes are unsatisfiable
because they depend on another unsatisfiable
concept. - definition of Enzyme,
- definition of Binding-site
- root unsatisfiable classes contain an inherent
contradiction - definition of Metal,
- definition of Non-metal,
- definition of Metalloid
62Advanced Issues and Design Patterns
63Qualified Number Restrictions (QCRs)
- Existential restrictions in OWL DL are qualified
- Person u 9hasChild.Male
- Cardinality restrictions can only be qualified
with gt - Person u gt2.hasChild
- The lack of QCRs has been identified as a major
limitation of OWL, especially in biomedical
applications - A quadruped is an animal with exactly four parts
that are legs - A medical oversight committee is a committee
which consists of at least five members of which
two are medical doctors, one is a manager and two
are members of the public.
64Qualified Cardinality Restrictions
- Can be approximated using property inclusion
and property range. - Quadruped Animal u ( 4 hasLeg)
- hasLeg v hasPart
- Range(hasLeg) Leg
65Qualified Cardinality Restrictions
- This approximation is unsound in general
- MedicalCommittee Committee u (3 hasMember)
u 1hasMember.MD u 1 hasMember. MD - Approximated by
- MedicalCommittee (3 hasMember) u
1hasMDMember u - 1hasNotMDMember
- hasMDMember v hasMember
- hasNotMDMember v hasMember
- Range(hasMDMember) MD
- Range(hasNotMDMember) MD
66Transitive Propagation of Properties
- In OWL, we can express transitive propagation of
a property - If Paris is located in France and France is
located in Europe, then France is located in
Europe. - If the hand is a part of the arm and the arm is
part of the human body, then the hand is a part
of the human body. - In OWL, however, we cannot express transitive
propagation of a property along a different
property - If an ulcer is located in the gastric mucosa and
the gastric mucosa is a part of the stomach, then
the ulcer is located in the stomach - If a burn is located in the foot and the foot is
part of the leg, then the burn is located in the
leg.
67Transitive Propagation of Properties
- Various patterns that approximate transitive
propagation have been proposed and used in
ontologies. - Use of the property hierarchy and transitivity
-
- Part_Of v Located_In
- Transitive(Part_Of)
- This pattern may yield undesired results, since
part-whole relations may not always imply
location - The orange peal is part of the orange, but is it
located in the orange?
68Design Methodologies
69Modularity in Software Engineering
- Typically referred to as the extent to which
software is divided into components with - high internal cohesion
- controlled coupling between each other through
simple interfaces (encapsulation) - Benefits of modular software design
- software maintainability
- software understandability
70Modularity in Ontology Engineering
- Benefits of a modular ontology design to
simplify - ontology refinement/update
- modifying a module should not lead to
modifications in parts of the ontology that are
not conceptually related - understanding
- relationships between different modules in an
ontology controlled and well-understood - integration with other ontologies
- no unexpected consequences
- partial reuse
- reuse only the relevant part/module of an
ontology
71 Q 1 CysticFibrosis v Fibrosis u
9locatedIn.Pancreas u 9hasOrigin.GeneticOr
igin 2 GeneticFibrosis v Fibrosis u
9hasOrigin.GeneticOrigin 3 Fibrosis u 9
locatedIn. Pancreas v GeneticFibrosis 4
GeneticFibrosis v GeneticDisorder
Q ² CysticFibrosis v Genetic Disorder
P Q ² gt v Project
P Q ² gt v 9 hasFocus.gt
P Q ² GeneticFibrosis t GeneticDisorder v ?
P Q ² CysticFibProject v GenDisorderProject
P 1 GenDisorderProject Project u
9hasFocus.GeneticDisorder 2 CysticFibProject
Project u 9hasFocus.CysticFibrosis 3 9hasFocus.gt
v Project 4 Project u (GeneticFibrosis u
GeneticDisorder) v ? 5 8 hasFocus.CysticFibrosis
v 9hasFocus.GeneticDisorder
72Foundational Ontologies
73Recent Work andResearch Challenges
74Increasing Expressive Power
- Complex role inclusion axioms Horrocks, Kutz
Sattler, KR-06 - E.g., hasLocation partOf v hasLocation
- Concrete domains/datatypes, e.g., Lutz,
IJCAI-99 Pan et al, ISWC-03 - E.g., value comparison (income gt expenditure)
- OWL 1.1 (see http//webont.org/owl/1.1/)
- Syntactic sugar to make commonly-stated things
easier to say - New class property constructors
- Expanded datatype expressiveness
- Meta-modelling constructs
- Semantic-free comments
- Now a W3C Member Submission
75Increasing Expressive Power
- Complex role inclusion axioms Horrocks, Kutz
Sattler, KR-06 - E.g., hasLocation partOf v hasLocation
- Concrete domains/datatypes, e.g., Lutz,
IJCAI-99 Pan et al, ISWC-03 - E.g., value comparison (income gt expenditure)
- OWL 1.1 (see http//webont.org/owl/1.1/)
- Database style keys Lutz et al, JAIR 2004
- E.g., make model chassis-number is a key for
Vehicles - Rule language extensions
- W3C RIF WG (see http//www.w3.org/2005/rules/)
- First order extensions (e.g., SWRL) Horrocks et
al, JWS, 2005 - Hybrid language extensions, e.g., Eiter et al,
KR-04 Motik et al, ISWC-04 Rosati, JoWS, 2005 - LP/F-Logic/Common Logic Chen et al, JLP, 1993
de Bruijn et al, WWW-05
76Improving Scalability
- Optimisation techniques
- Improve performance of DL reasoners, e.g., Sirin
et al, KR-06 - Reduction to disjunctive Datalog Motik et at,
KR-04 - Transform SHOIN ontology to DatalogÇ rules
- Use LP techniques to deal with large numbers of
ground facts - Hybrid DL-DB systems Horrocks et al, CADE-05
- Use DB to store Abox (individual) axioms
- Cache inferences and use DB queries to
answer/scope logical queries - Polynomial time algorithms for sub-ALC logics
- Graph based techniques for EL Baader et al,
IJCAI-05 - Database techniques for DL-Lite Calvanese et al,
AAAI-05
77Summary
- OWL Ontologies provide vocabulary for annotations
- Terms have well defined meaning
- OWL now being used in a wide range of
applications - e-Science, medicine, geography, geology,
- Reasoning enabled tools are of crucial importance
- For both design and deployment of ontologies
- Large and extremely active RD area
- New and improved tools methodologies constantly
appearing - Research challenges remain
- But tools now mature enough for prime time
applications
78Acknowledgements
- Thanks to my many friends in the DL and Semantic
Web communities, in particular - Alan Rector
- Franz Baader
- Uli Sattler
- The Swoop/Pellet team
- Aditya Kalyanpur
- Evren Sirin
- Bernardo Cuenca Grau
- Bijan Parsia
79Resources
Thank you for listening
Any questions?
- FaCT system (open source)
- http//owl.man.ac.uk/factplusplus/
- OWL
- http//www.w3.org/TR/owl-features/
- OWL Experiences and Directions Workshop
- http//owled2007.iut-velizy.uvsq.fr/
- Protégé
- http//protege.stanford.edu/plugins/owl/
- OWL 1.1 Proposal
- http//webont.org/owl/1.1/