Title: The Structure of Ontologies
1The Structure of Ontologies
- Martin Volk
- Stockholm University
2- What Are You?
- I grew up in Rhode Island, a New England state
which is largely Italian-American and
French-Canadian, known chiefly for its small
stature. When I was a kid in our neighborhood,
the first thing you would ask on encountering a
newcomer was "whats your name?" The second was
"What are you" as an invitation to recite your
ethnic composition in a kind of singsong voice. - 90 of the kids would say "Italian with a little
bit of French," or "half-Portuguese, one-quarter
Italian and one-quarter Armenian." When I would
chime in with "half Jewish, one quarter Scottish
and one quarter English," the range of responses
went from very puzzled looks to "does that mean
youre not Catholic?" Wherein, I guess, began my
fascination with classification, and especially
with the problem of residual categories, or, the
Other, or not elsewhere classified. - --Leigh Star
- (quoted from http//weber.ucsd.edu/gbowker/classi
fication/)
3Overview of todays lecture
- What is an Ontology?
- How are ontologies structured? A closer look
- Why do Computational Linguists work with / on
Ontologies? - Some terminology with respect to the Semantic
Web. - Your next task!
4Remarks about your ontologies
- Protégé is a complex system!
- not suited for synonyms nor for translations.
- Options
- Use of the documentation field
- For translations build a parallel hierarchy
- Trade-off between
- Stockholm-specific ontology
- General university ontology
- Test for classes
- Is the mother node a good answer to What is a X?
5Remarks about SUIS
- SUIS will use your University ontology for query
expansion. - If your query is Who is Päivi Juvonen? then
SUIS will search for Päivi Juvonen in
combination with any person function (teacher,
librarian, cook, rector, director of studies, ) - If your query is Where is the lecture on
Semiotics? then SUIS will search for semiotics,
some teaching event and a date.
6Remarks about SUIS
- SUIS will use your ontology for answering
questions directly. - If your query is What is a lecturer? then SUIS
will use the ontology for replying - A lecturer is a type of teacher working at a
university. - A lecturer is called universitetslektor or
högskoleadjunkt in Swedish.
7What is an ontology?
- An ontology holds information about what
categories exist in the domain, what properties
they have, and how they are related to one
another. (Chandrasekaran et al. 1999)
8Types of ontologies
- Top-level ontology
- Lexical ontology (e.g. WordNet)
- WordNet is a lexical database for English (with
synonyms and hyperonyms) - General ontology (e.g. Cyc)
- Cyc is a formalized representation of fundamental
human knowledge facts, rules of thumb, and
heuristics for reasoning about the objects and
events of everyday life - Domain ontology (e.g. UMLS)
- The Unified Medical Language System
- Task ontology (e.g. CPV)
- The Common Procurement Vocabulary
9Kinds of Ontologies
- Ontologies may vary not only in their content,
but also in their structure and implementation. - Level of description
- An ontology may mean different things
- from simple lexicons or controlled vocabularies,
to - categorically organized thesauri, to
- taxonomies where terms are related hierarchically
and can be given distinguishing properties, to - full-blown ontologies where these properties can
define new concepts and where concepts have named
relationships with other concepts, like "changes
the effect of" or "buys from".
10Kinds of Ontologies
- Conceptual scope
- Ontologies differ in respect to the scope and
purpose of their content. The most prominent
distinction is between - the domain ontologies describing specific fields
of endeavor, like medicine, and - upper level ontologies describing the basic
concepts and relationships invoked when
information about any domain is expressed in
natural language. - The synergy among ontologies springs from the
cross-referencing between upper level ontologies
and various domain ontologies.
11Kinds of Ontologies
- Specification language
- To build ontologies a number of possible
languages can be used, including - general logic programming languages like Prolog.
- More common are languages that have evolved
specifically for ontology construction. The Open
Knowledge Base Connectivity (OKBC) model and
languages like KIF (and its successor CL --
Common Logic) have become the bases of other
ontology languages. - There are also several languages based on a form
of logic known as description logics. These
include Loom and DAMLOIL, which is currently
being evolved into the Web Ontology Language
(OWL) standard. - When comparing ontology languages, language
expressiveness is usually given up for
computability and simplicity.
12Building Ontologies
- Acquire domain knowledge
- Assemble appropriate information resources and
expertise that will define, with consensus and
consistency, the terms used formally to describe
things in the domain of interest. - Organize the ontology
- Design the overall conceptual structure of the
domain. This involves identifying the domain's
principal concrete concepts and their properties,
identifying the relationships among the concepts,
creating abstract concepts as organizing
features, referencing or including supporting
ontologies, distinguishing which concepts have
instances, and applying other guidelines of your
chosen methodology. - Flesh out the ontology
- Add concepts, relations, and individuals to the
level of detail necessary to satisfy the purposes
of the ontology.
13Ontology (Noy McGuinness)
- An ontology defines a common vocabulary for
researchers who need to share information in a
domain. It includes machine-interpretable
definitions of basic concepts in the domain and
relations among them.
14Why ontologies? (Noy McGuinness)
- Why would someone want to develop an ontology?
Some of the reasons are - To share common understanding of the structure of
information among people or software agents - To enable reuse of domain knowledge
- To make domain assumptions explicit
- To separate domain knowledge from the operational
knowledge - To analyze domain knowledge
15(No Transcript)
16WordNet relations
17Other relations (Navigli Velardi. 2004.
CL-Journal, p.168)
18Ontologies(acc. to John Sowa. 2000. Knowledge
Representation)
- (p.51) Logic has no vocabulary for describing
the things that exist. Ontology fills that gap
It is the study of existence, of all the kinds of
entities abstract or concrete that make up
the world. - (p. 52) AI systems start with limited ontologies
(microworlds), a small number of concepts that
are tailored for a single application. - Example Chat-80 geographical categories
19Ontologies
- Sowa (p. 68) All perception begins with
contrasts light-dark, up-down, hard-soft,
loud-quiet, sweet-sour. Such contrasts are the
source of distinctions for generating the
categories of existence.
20Examples of Ontologies
- The categories in the Yellow Pages (telephone
book). - The Yahoo categories.
- Library cataloging system.
- Good storing in a department store.
- The faculties at a university.
- The departments in a municipal or national
administration or in a large enterprise.
21Natural Language Processing (NLP) and Ontologies
- Building ontologies
- extracting terms from corpora as concept
candidates. - suggesting synonyms from context analysis.
- suggesting hyperonyms from compound analysis
(e.g. cruciate ligament is a ligament). - extending ontologies.
- merging ontologies.
- of two companies / two administrative units.
22NLP and Ontologies
- Applying ontologies
- mapping concepts found in text to nodes in an
ontology - e.g. Riesling ? white wine
- disambiguating word senses
- e.g. SV lag ? EN law or team
- reasoning with the help of ontologies
- e.g. If Riesling is a white wine, then it is an
alcoholic beverage. - allowing for cross-language searches
- e.g. head of department ? prefekt
23Automatic ontology construction
- Find is_a patterns in a corpus
- A computer printer is a computer peripheral
device - the HP OfficeJet 4215 is a multitalented,
multifunction printer - Use compounds
- laser printer ? printer
24Automatic ontology construction
- Use appositive NPs
- Your Basic Laser Printer the Brother 1240
- lets you choose your active printer (the printer
you are about to use) - Canon launches A4 Photo Printer - The Bubble Jet
i990 - Use coordination
- The PhotoSmart printer and scanner are 399 each
- Printer and Photocopier Troubleshooting
- BUT Hyena's printer and job management functions
include
25Merging Ontologies
- (Sowa) Different systems may use different names
for the same kinds of entities - even worse, they may use the same names for
different kinds.
26Merging Ontologies
- Before two ontologies can be merged it might be
necessary to introduce new classes to accommodate
the different structures. - Often ontologies get aligned rather than merged.
27The Semantic Web
- a vision by Tim Berners-Lee
- to facilitate the use of web pages by computers
- Ideally specification of content in a formal
language - Example
- father(carl_gustav, victoria).
- father(gustav_adolf, carl_gustav).
- grandfather(A,C) -
- father(A,B),
- father(B,C).
28What is XML?
- XML eXtended Markup Language
- a language for structuring text and data
- the XML structure of a document can be checked
against a grammar (DTD document type definition)
29XML
URI Uniform Resource Identifier
provide names
RDF Resource Description Framework
Dublin Core a set of core elements
simple statements
OWL Web Ontology Language
complex logical statements
30Uniform Resource Identifier (URI)
- If you want to discuss something, you must first
identify it. - Example Absolute URIs (from http//en.wikipedia.or
g ) - http//somehost/absolute/URI/with/absolute/path/to
/resource.txt - ftp//somehost/resource.txt
- urna-rose-by-any-other-name (hmm... unregistered
URN namespace)
31Uniform Resource Identifier (URI)
- A URL (Uniform Resource Locator web address) is
one type of a URI. - Everybody can create a URI.
- A URI is not a set of directions telling your
computer how to get to a specific file on the
Web. - A URI is a name for a "resource" (a thing). This
resource may or may not be accessible over the
Internet.
32RDF (Resource Description Framework)
- RDF is a language for representing information
about resources in the World Wide Web, - e.g. representing metadata about Web resources,
such as the title, author, and modification date
of a Web page. - An RDF statement is a lot like a simple sentence,
except that almost all the words are URIs. - Each RDF statement has three parts a subject, a
predicate and an object.
33RDF example
- Example a Wikipedia page about Tony Benn
- To say that the title of a page is "Tony Benn"
and its publisher is "Wikipedia" in RDF - lthttp//en.wikipedia.org/Tony_Benngt
lthttp//purl.org/dc/elements/1.1/titlegt "Tony
Benn" . - lthttp//en.wikipedia.org/Tony_Benngt
lthttp//purl.org/dc/elements/1.1/publishergt
"Wikipedia" . - and in RDF/XML
- ltrdfRDF
- xmlnsrdfhttp//www.w3.org/1999/02/22-rdf-synta
x-ns - xmlnsdc"http//purl.org/dc/elements/1.1/"gt
- ltrdfDescription rdfabout"http//en.wikipedia.o
rg/Tony_Benn"gt - ltdctitlegtTony Bennlt/dctitlegt
- ltdcpublishergtWikipedialt/dcpublishergt
- lt/rdfDescriptiongt
- lt/rdfRDFgt
34URI vs Ontologies
- The use of all these URIs is useless if we never
describe what they mean. This is where schemas
and ontologies come in. A schema and an ontology
are ways to describe the meaning and
relationships of terms.
35Richer schema capabilities
- RDF Schema provides basic capabilities. Other
richer schema capabilities that have been
identified as useful - cardinality constraints on properties,
- e.g., that a Person has exactly one biological
father. - specifying that a given property (such as
exhasAncestor) is transitive, - e.g., that if A exhasAncestor B, and B
exhasAncestor C, then A exhasAncestor C. - specifying that a given property is a unique
identifier (or key) for instances of a particular
class. - specifying constraints on the range or
cardinality of a property that depend on the
class of resource to which a property is applied,
- e.g., for a soccer team the exhasPlayers
property has 11 values, while for a basketball
team the same property should have only 5 values.
- the ability to describe new classes in terms of
combinations (e.g., unions and intersections) of
other classes, or to say that two classes are
disjoint.
36OWL (Web Ontology Language)
- is a powerful ontology language.
- is based on RDF and RDF Schema.
- The intent of OWL is to provide additional
machine-processable semantics for resources. - is a semantic markup language for publishing and
sharing ontologies on the Web. - can be used to explicitly represent the meaning
of terms in vocabularies and the relationships
between those terms. ? i.e. ontologies
37OWL
- OWL Lite supports those users needing a
classification hierarchy and simple constraints. - For example, it only permits cardinality values
of 0 or 1. ? simple to provide tool support for
OWL Lite, and ? quick migration path for thesauri
and other taxonomies. - OWL DL (description logics) supports maximum
expressiveness while retaining computational
completeness (all conclusions are guaranteed to
be computable) and decidability (all computations
will finish in finite time). OWL DL includes all
OWL language constructs, but with certain
restrictions - For example, while a class may be a subclass of
many classes, a class cannot be an instance of
another class. - OWL Full gives maximum expressiveness and the
syntactic freedom of RDF with no computational
guarantees. - For example, in OWL Full a class can be treated
simultaneously as a collection of individuals and
as an individual in its own right.
38The Dublin Core
- The Dublin Core is a set of "elements"
(properties) for describing documents (and hence,
for recording metadata). - The set was originally developed in 1995 at the
Metadata Workshop in Dublin, Ohio. - The goal is to provide a minimal set of
descriptive elements that facilitate the
description and the automated indexing of
documents, similar to a library card catalog. - is meant to be sufficiently simple to be
understood and used by a wide range of authors.
39The Dublin Core
- Title A name given to the resource.
- Creator An entity primarily responsible for
making the content of the resource. - Subject The topic of the content of the
resource. - Description An account of the content of the
resource. - Publisher An entity responsible for making the
resource available - Contributor An entity responsible for making
contributions to the content of the resource.
40The Dublin Core
- Date A date associated with an event in the life
cycle of the resource. - Type The nature or genre of the content of the
resource. - Format The physical or digital manifestation of
the resource. - Identifier An unambiguous reference to the
resource within a given context. - Source A reference to a resource from which the
present resource is derived. - Language A language of the intellectual content
of the resource. - Relation A reference to a related resource.
- Coverage The extent or scope of the content of
the resource. - Rights Information about rights held in and over
the resource.
41The Dublin Core
- Information using the Dublin Core elements may be
represented in any suitable language (e.g., in
HTML meta elements). - However, RDF is an ideal representation for
Dublin Core information.
42XML
URI Uniform Resource Identifier
provide names
RDF Resource Description Framework
Dublin Core a set of core elements
simple statements
OWL Web Ontology Language
complex logical statements
43Summary
- There are many uses for Natural Language
Processing with respect to ontologies. - Ontologies play an important role in the vision
of the Semantic Web.