Title: Chapter 7 Ontology Engineering
 1Chapter 7Ontology Engineering
- Grigoris Antoniou 
 - Frank van Harmelen
 
  2Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  3Methodological Questions
- How can tools and techniques best be applied? 
 - Which languages and tools should be used in which 
circumstances, and in which order?  - What about issues of quality control and resource 
management?  - Many of these questions for the Semantic Web have 
been studied in other contexts  - E.g. software engineering, object-oriented 
design, and knowledge engineering  
  4Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  5Main Stages in Ontology Development 
- Determine scope 
 - Consider reuse 
 - Enumerate terms 
 - Define taxonomy 
 - Define properties 
 - Define facets 
 - Define instances 
 - Check for anomalies 
 - Not a linear process!
 
  6Determine Scope 
- There is no correct ontology of a specific domain 
  - An ontology is an abstraction of a particular 
domain, and there are always viable alternatives  - What is included in this abstraction should be 
determined by  - the use to which the ontology will be put 
 - by future extensions that are already anticipated
 
  7Determine Scope (2)
- Basic questions to be answered at this stage are 
  - What is the domain that the ontology will cover? 
 - For what we are going to use the ontology? 
 - For what types of questions should the ontology 
provide answers?  - Who will use and maintain the ontology?
 
  8Consider Reuse
- With the spreading deployment of the Semantic 
Web, ontologies will become more widely available 
  - We rarely have to start from scratch when 
defining an ontology  - There is almost always an ontology available from 
a third party that provides at least a useful 
starting point for our own ontology  
  9Enumerate Terms
- Write down in an unstructured list all the 
relevant terms that are expected to appear in the 
ontology  - Nouns form the basis for class names 
 - Verbs (or verb phrases) form the basis for 
property names  - Traditional knowledge engineering tools (e.g. 
laddering and grid analysis) can be used to 
obtain  - the set of terms 
 - an initial structure for these terms
 
  10Define Taxonomy
- Relevant terms must be organized in a taxonomic 
hierarchy  - Opinions differ on whether it is more 
efficient/reliable to do this in a top-down or a 
bottom-up fashion  - Ensure that hierarchy is indeed a taxonomy 
 - If A is a subclass of B, then every instance of A 
must also be an instance of B (compatible with 
semantics of rdfssubClassOf 
  11Define Properties
- Often interleaved with the previous step 
 - The semantics of subClassOf demands that whenever 
A is a subclass of B, every property statement 
that holds for instances of B must also apply to 
instances of A  - It makes sense to attach properties to the 
highest class in the hierarchy to which they 
apply  
  12Define Properties (2)
-  While attaching properties to classes, it makes 
sense to immediately provide statements about the 
domain and range of these properties  - There is a methodological tension here between 
generality and specificity  - Flexibility (inheritance to subclasses) 
 - Detection of inconsistencies and misconceptions
 
  13Define Facets From RDFS to OWL
- Cardinality restrictions 
 - Required values 
 - owlhasValue 
 - owlallValuesFrom 
 - owlsomeValuesFrom 
 - Relational characteristics 
 - symmetry, transitivity, inverse properties, 
functional values  
  14Define Instances
- Filling the ontologies with such instances is a 
separate step  - Number of instances gtgt number of classes 
 - Thus populating an ontology with instances is not 
done manually  - Retrieved from legacy data sources (DBs) 
 - Extracted automatically from a text corpus
 
  15Check for Anomalies
- An important advantage of the use of OWL over RDF 
Schema is the possibility to detect 
inconsistencies  - In ontology or ontologyinstances 
 - Examples of common inconsistencies 
 - incompatible domain and range definitions for 
transitive, symmetric, or inverse properties  - cardinality properties 
 - requirements on property values can conflict with 
domain and range restrictions 
  16Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  17Existing Domain-Specific Ontologies
- Medical domain Cancer ontology from the National 
Cancer Institute in the United States  - Cultural domain 
 - Art and Architecture Thesaurus (AAT) with 
125,000 terms in the cultural domain  - Union List of Artist Names (ULAN), with 220,000 
entries on artists  - Iconclass vocabulary of 28,000 terms for 
describing cultural images  - Geographical domain Getty Thesaurus of 
Geographic Names (TGN), containing over 1 million 
entries 
  18Integrated Vocabularies
- Merge independently developed vocabularies into a 
single large resource  - E.g. Unified Medical Language System 
integrating100 biomedical vocabularies  - The UMLS metathesaurus contains 750,000 concepts, 
with over 10 million links between them  - The semantics of a resource that integrates many 
independently developed vocabularies is rather 
low  - But very useful in many applications as starting 
point 
  19Upper-Level Ontologies
- Some attempts have been made to define very 
generally applicable ontologies  - Mot domain-specific 
 - Cyc, with 60,000 assertions on 6,000 concepts 
 - Standard Upperlevel Ontology (SUO) 
 
  20Topic Hierarchies
- Some ontologies do not deserve this name 
 - simply sets of terms, loosely organized in a 
hierarchy  - This hierarchy is typically not a strict taxonomy 
but rather mixes different specialization 
relations (e.g. is-a, part-of, contained-in)  - Such resources often very useful as starting 
point  - Example Open Directory hierarchy, containing 
more then 400,000 hierarchically organized 
categories and available in RDF format 
  21Linguistic Resources
- Some resources were originally built not as 
abstractions of a particular domain, but rather 
as linguistic resources  - These have been shown to be useful as starting 
places for ontology development  - E.g. WordNet, with over 90,000 word senses 
 
  22Ontology Libraries 
- Attempts are currently underway to construct 
online libraries of online ontologies  - Rarely existing ontologies can be reused without 
changes  - Existing concepts and properties must be refined 
using rdfssubClassOf and rdfssubPropertyOf  - Alternative names must be introduced which are 
better suited to the particular domain using 
owlequivalentClass and owlequivalentProperty  - We can exploit the fact that RDF and OWL allow 
private refinements of classes defined in other 
ontologies  
  23Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  24The Knowledge Acquisition Bottleneck
- Manual ontology acquisition remains a 
time-consuming, expensive, highly skilled, and 
sometimes cumbersome task  - Machine Learning techniques may be used to 
alleviate  - knowledge acquisition or extraction 
 - knowledge revision or maintenance 
 
  25Tasks Supported by Machine Learning
- Extraction of ontologies from existing data on 
the Web  - Extraction of relational data and metadata from 
existing data on the Web  - Merging and mapping ontologies by analyzing 
extensions of concepts  - Maintaining ontologies by analyzing instance data 
 - Improving SW applications by observing users
 
  26Useful Machine Learning Techniques for Ontology 
Engineering
- Clustering 
 - Incremental ontology updates 
 - Support for the knowledge engineer 
 - Improving large natural language ontologies 
 - Pure (domain) ontology learning 
 
  27Machine Learning Techniques for Natural Language 
Ontologies
- Natural language ontologies (NLOs) contain 
lexical relations between language concepts  - They are large in size and do not require 
frequent updates  - The state of the art in NLO learning looks quite 
optimistic  - A stable general-purpose NLO exist 
 - Techniques for automatically or 
semi-automatically constructing and enriching 
domain-specific NLOs exist  
  28Machine Learning Techniques for Domain Ontologies
- They provide detailed descriptions 
 - Usually they are constructed manually 
 - The acquisition of the domain ontologies is still 
guided by a human knowledge engineer  - Automated learning techniques play a minor role 
in knowledge acquisition  - They have to find statistically valid 
dependencies in the domain texts and suggest them 
to the knowledge engineer  
  29Machine Learning Techniques for Ontology Instances
- Ontology instances can be generated automatically 
and frequently updated while the ontology remains 
unchanged  - Fits nicely into a machine learning framework 
 - Successful ML applications 
 - Are strictly dependent on the domain ontology, or 
  - Populate the markup without relating to any 
domain theory  - General-purpose techniques not yet available 
 
  30Different Uses of Ontology Learning
- Ontology acquisition tasks in knowledge 
engineering  - Ontology creation from scratch by the knowledge 
engineer  - Ontology schema extraction from Web documents 
 - Extraction of ontology instances from Web 
documents  - Ontology maintenance tasks 
 - Ontology integration and navigation 
 - Updating some parts of an ontology 
 - Ontology enrichment or tuning 
 
  31Ontology Acquisition Tasks
- Ontology creation from scratch by the knowledge 
engineer  - ML assists the knowledge engineer by suggesting 
the most important relations in the field or 
checking and verifying the constructed knowledge 
bases  - Ontology schema extraction from Web documents 
 - ML takes the data and meta-knowledge (like a 
meta-ontology) as input and generate the 
ready-to-use ontology as output with the possible 
help of the knowledge engineer  
  32Ontology Acquisition Tasks(2) 
- Extraction of ontology instances from Web 
documents  - This task extracts the instances of the ontology 
presented in the Web documents and populates 
given ontology schemas  - This task is similar to information extraction 
and page annotation, and can apply the techniques 
developed in these areas  
  33Ontology Maintenance Tasks
- Ontology integration and navigation 
 - Deals with reconstructing and navigating in large 
and possibly machine-learned knowledge bases  - Updating some parts of an ontology that are 
designed to be updated  - Ontology enrichment or tuning 
 - This does not change major concepts and 
structures but makes an ontology more precise  
  34Potentially Applicable Machine Learning Algorithms
- Propositional rule learning algorithms 
 - Bayesian learning 
 - generates probabilistic attribute-value rules 
 - First-order logic rules learning 
 - Clustering algorithms 
 - They group the instances together based on the 
similarity or distance measures between a pair of 
instances defined in terms of their attribute 
values  
  35Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  36Ontology Mapping
- A single ontology will rarely fulfill the needs 
of a particular application multiple ontologies 
will have to be combined  - This raises the problem of ontology integration 
(also called ontology alignment or ontology 
mapping)  - Current approaches deploy a whole host of 
different methods we distinguish linguistic, 
statistical, structural and logical methods 
  37Linguistic methods
- The most basic methods try to exploit the 
linguistic labels attached to the concepts in 
source and target ontology in order to discover 
potential matches  - This can be as simple as basic stemming 
techniques or calculating Hamming distances, or 
it can use specialized domain knowledge (e.g. the 
difference between Diabetes Melitus type I and 
Diabetes Melitus type II is not a negligible 
difference to be removed by a small Hamming 
distance)  
  38Statistical Methods
- Some methods use instance data, to determine 
correspondences between concepts  - A significant statistical correlation between the 
instances of a source concept and a target 
concept, gives us reason to believe that these 
concepts are strongly related  - These approaches rely on the availability of a 
sufficiently large corpus of instances that are 
classified in both the source and the target 
ontologies 
  39Structural Methods
- Since ontologies have internal structure, it 
makes sense to exploit the graph structure of the 
source and the target ontologies and try to 
determine similarities, often in coordination 
with other methods  - If a source target and a target concept have 
similar linguistic labels, then the dissimilarity 
of their graph neighborhoods could be used to 
detect homonym problems where purely linguistic 
methods would falsely declare a potential mapping 
  40Logical Methods
- The most specific to mapping ontologies 
 - A serious limitation of this approach is that 
many practical ontologies are semantically rather 
lightweight and thus dont carry much logical 
formalism with them  
  41Ontology-Mapping Techniques Conclusion
- Although there is much potential, and indeed 
need, for these techniques to be deployed for 
Semantic Web engineering, this is far from a 
well-understood area  - No off-the-shelf techniques are currently 
available, and it is not clear that this is 
likely to change in the near future 
  42Lecture Outline
- Introduction 
 - Constructing Ontologies Manually 
 - Reusing Existing Ontologies 
 - Semiautomatic Ontology Acquisition 
 - Ontology Mapping 
 - On-To-Knowledge SW Architecture
 
  43On-To-Knowledge Architecture
- Building the Semantic Web involves using 
 - the new languages described in this course 
 - a rather different style of engineering 
 - a rather different approach to application 
integration  - We describe how a number of Semantic Web-related 
tools can be integrated in a single lightweight 
architecture using Semantic Web standards to 
achieve interoperability between tools 
  44Knowledge Acquisition 
- Initially, tools must exist that use surface 
analysis techniques to obtain content from 
documents  - Unstructured natural language documents 
statistical techniques and shallow natural 
language technology  - Structured and semi-structured documents 
wrappers induction, pattern recognition 
  45Knowledge Storage 
- The output of the analysis tools is sets of 
concepts, organized in a shallow concept 
hierarchy with at best very few cross-taxonomical 
relationships  - RDF/RDF Schema are sufficiently expressive to 
represent the extracted info  - Store the knowledge produced by the extraction 
tools  - Retrieve this knowledge, preferably using a 
structured query language (e.g. RQL)  
  46Knowledge Maintenance and Use 
- A practical Semantic Web repository must provide 
functionality for managing and maintaining the 
ontology  - change management 
 - access and ownership rights 
 - transaction management 
 - There must be support for both 
 - Lightweight ontologies that are automatically 
generated from unstructured and semi-structured 
data  - Human engineering of much more knowledge-intensive
 ontologies 
  47Knowledge Maintenance and Use (2)
- Sophisticated editing environments must be able 
to  - Retrieve ontologies from the repository 
 - Allow a knowledge engineer to manipulate it 
 - Place it back in the repository 
 - The ontologies and data in the repository are to 
be used by applications that serve an end-user  - We have already described a number of such 
applications 
  48Technical Interoperability 
- Syntactic interoperability was achieved because 
all components communicated in RDF  - Semantic interoperability was achieved because 
all semantics was expressed using RDF Schema  - Physical interoperability was achieved because 
 - All communications between components were 
established using simple HTTP connections  
  49On-To-Knowledge System Architecture