Title: Computational Lexicons and the Semantic Web
1Computational Lexicons and the Semantic Web
- Alessandro Lenci
- Università di Pisa Department of Linguistics
-
- Istituto di Linguistica Computazionale - CNR
2Tutorial Outline
- Computational lexicons for the Semantic Web (SW)
- how they are
- how they should be
- The SW for computational lexicons
- lexicon design in the age of the SW
- Training session
- case study lexical modelling in RDF/S
3The Semantic Web Vision
- Turning the WWW into a machine understandable
knowledge base
Intelligent Agents
Documents
Semantic Web
Applications
Databases
4Six Challenges for the SW(Benjamins et al. 2002)
- Content availability
- Ontology availability
- Multilinguality
- Scalability
- Visualization
- Stability of SW languages
5Six Challenges for the SW(Benjamins et al. 2002)
- Content availability
- Ontology availability
- Multilinguality
- Scalability
- Visualization
- Stability of SW languages
Human Language Technology (HLT)
6Lexical Information and HLT
- All language analysis involves determining
meaning at some level - Anything from groups of related words to a
full-blown representation of each sentence
Information retrieval
bank account money
John went to the store
Topic financial
GO AGENT John TARGET store
7Computational Lexicons and HLT
Computational lexicons provide machine
understandable word knowledge
- Explicit representation of word meaning
- word content accessible to computational agents
- Word meaning linked to word syntax and morphology
- Multilingual lexical links
8Computational Lexicons and HLT
- Contain the linguistic information required to
build meaning representations
Lexicon
went vpast GO go v. (NP_SUBJ ((role AGENT) (sem
animate)) (VP ((verb GO)
(PP ((prep TO) (NP
((role TARGET) (sem loc))))) John n. sem
human store n. sem loc
Lexicon
account n. domain financial account v. bank_1
n. domain financial bank_2 n. domain
geography money n. domain financial
bank account money
John went to the store
Topic financial
GO AGENT John TARGET store
9Computational Lexicons and HLT
- Critical language resources for NLP systems
- syntactic subcategorization frames for parsing
- semantic selectional preferences for ambiguity
reduction - semantic classes for WSD, semantic tagging, etc.
- Key components of HLT
- monolingual lexicons IE, QA, etc.
- multilingual lexicons MT, CLIR, etc.
10Ontologies and Computational Lexicons
HLT
Access to Content
Semantic Web
Ontologies
Computational Lexicons
?
11Ontologies
- An ontology is a system of concepts relevant for
knowledge and action in (a portion of) the world - categorization of objects and processes
- inference
- action planning
An ontology is a specification of a
conceptualization (Gruber 1993)
12Ontologies
A set of knowledge terms, including the
vocabulary, the semantic interconnections, and
some simple rule of inference and logic (Hendler
2001)
ARTIFACT
OBJECT
ANIMAL
LOCATION
ENTITY
EVENT
13Types of Ontologies
Vertical typology
Foundational Ontology
OBJECT
Domain Core Ontology
SOFTWARE
Domain Specific Ontology
WORD_PROCESSOR
- Horizontal typology
- Information System ontology
- AI ontology
- Linguistic ontology
14Linguistic Ontology
- A system of symbols representing the concepts
(meanings) encoded by NL expressions (lexical
units, terms, etc.) - specify semantic classes grouping semantically
similar terms - semantic representation language
- interlingua
car, van, truck
ARTIFACT
VEHICLE
OBJECT
dog, cat, horse
ANIMAL
MAMMAL
beach
LOCATION
ENTITY
BEACH
spiaggia
piano concert, rock concert
EVENT
CONCERT
15Ontologies and Computational Lexicons
Ontology
Concept Space
Semantics
Syntax
Multilinguality
Morphology
Language/s
Computational Lexicon
16Computational Lexiconstipology
- Monolingual vs. multilingual
- General purpose vs. domain (application) specific
- Content type
- (Morpho)-Syntactic
- Semantic
- Mixed
- Terminological
17Syntactic Computational Lexicons
- Syntactic lexical information is distilled in
subcategorization frames - ComLex, PAROLE, etc.
- Syntactic frames typically include
- number of selected arguments
- syntactic categories of their realizations (PP,
NP, etc.) - lexical constraints on argument realization (e.g.
preposition heading a PP) - argument functional role (Subj, Obj, etc.)
- optionality, control, auxiliary selection, etc.
hit V (Subj NP) (Objd NP) answer N
(Obji PP_to)
18Semantic Computational Lexicons
- Representing the meaning of a word (minimally)
requires - Distinguishing different senses of the word
- E.g. bank finacial institution vs. geographical
configuration - Capturing inferences
- E.g. being human implies being animate
- Representing similarity of meaning with other
words - E.g. bank, account, money all related to finances
19Semantic Computational Lexicons
- Mikrokosmos (Nirenburg, Mahesh et al.)
- WordNet (Miller, Fellbaum et al.)
- EuroWordNet (Vossen et al.)
- SIMPLE (Calzolari, Lenci et al.)
- FrameNet (Fillmore et al.)
20Computational Lexiconsdesign issues
- Network based
- hierarchy (taxonomy)
- WordNet
- heterarchy
- EuroWordNet
- Frame based
- Mikrokosmos
- FrameNet
- Hybrid
- SIMPLE
21EuroWordNet
22EuroWordNetTop Ontology
23EuroWordNet
24PAROLE-SIMPLE Lexicons
- 12 EU monolingual core lexicons built according
to a harmonized model and further extended at the
national level - Integrated combinations of syntactic and semantic
information - syntactic subcategorization frames
- semantic type (Ontology)
- semantic frames linked to syntax
- semantic roles
- selectional preferences
- etc.
- semantic relations
- Pustejovskys qualia roles, etc.
- regular polysemy
- event structure
25SIMPLE Architecture
Italian lexicon
PAROLE Syntax
SemU
Semantic Frame (semantic roles, etc.)
Semantic Relations
Event Structure
Polysemy
etc.
26SIMPLEsemantic relations
Top
Telic
Formal
Constitutive
Agentive
Is_a
Is_a_part_of
Property
Created_by
Agentive_cause
Indirect_telic
Activity
Contains
Instrumental
Is_the_habit_of
...
...
Used_for
Used_as
27SIMPLEsemantic network
ltfabbricaregt make
Ala (wing)
Agentive
SemU 3232 Type Part Part of an airplane
Agentive
ltvolaregt fly
Used_for
Is_a_part_of
ltaeroplanogt airplane
Isa
SemU 3268 Type Part Part of a building
ltpartegt part
Isa
Used_for
Isa
SemU D358 Type Body_part Organ of birds for
flying
ltedificiogt building
Is_a_part_of
Is_a_part_of
SemU 3467 Type Role Role in football
ltuccellogt bird
ltgiocatoregt player
Isa
28SIMPLEsemantic frames
PREDemploy1 Arg1ltAGENT - HUMANgt Arg2ltPATIENT
- HUMANgt
agent nominalization
master link
patient nominalization
event nominalization
SemU employee
SemU employment
SemU to employ
SemU employer
29SIMPLEsemantic frames
Comprendere V
SemU 61725 Type Cognitive_event To understand
SemU 6962 Type Constitutive_state To include
PREDComprendere1 ltArg1 humangt, ltArg2
semioticgt
PREDComprendere2 ltArg1 Entitygt,
ltArg2Entitygt
30SIMPLEsemantic frames
il difensore di Berlusconi (Berlusconi's
defender) il difensore del Milan (the Milan
fullback)
Difensore N
agent nominalization
SemU 4125 Type Role Defender
PREDDifendere1 ltArg1gt, ltArg2gt
SemU 3526 Type Role Fullback
ltsquadragt team
Is_a_member_of
31Semantic multidimensionality
- Identification of the semantic contribution of an
NP requires to access a rich representation of
semantic content of the nominal heads - The semantic structure of the nominal head
determines the semantic relation expressed by a
modifying PP (in Italian) - la pagina del libro (the page of the book)
- il difensore del Milan (the Juventus fullback)
- il suonatore di liuto (the lute player)
- il tavolo di legno (the wooden table)
PART-OF
MEMBER-OF
TELIC
MADE-OF
32SIMPLEsample entries
semantic relations
ontology
semantic frame
33Computational Lexiconsloose ends
- Non-compositional aspects in the lexicon
- collocations, terms, MWEs, etc.
- Integration between lexicons and corpus data
- lexical tuning, data-driven lexicon population,
etc. - Semantic dynamics (polysemy, lexical creativity,
etc.) - context-sensitivity of meaning as a challenge
for lexical semantics - sense enumeration vs. sense generation
- heavy smoker, heavy book, heavy road, heavy sea,
heavy wine, heavy sky, heavy artillery, etc.
34Computational Lexiconsloose ends
- Semantic type system for lexical senses must
account for a non-static kaleidoscope of senses - Salience of aspects of meaning differ for
different types - natural kinds ? Is-a artifacts ? function
- Possible solutions
- multiple layers of representation
- explicit identification of information so that
NLP systems can access what is needed at a given
time - dynamic type systems
35Computational Lexiconsnew challenges from the SW
- From language resources for HLT to knowledge
resources for inferential engines - in-depth lexical description for better content
understanding - Content interoperability between computational
lexicons - better integration between lexical information
from different sources - Beyond the lexical information bottleneck
- automatic lexical knowledge acquisition
36Lexical Inferences
- Midfielder Scott Sellars was sold to Blackburn
for 35,000 and was bought back in the summer for
750,000. - (FrameNet Corpus)
after e1 OWN (buyer, goods) NOT(OWN (buyer,
money))
after e2 NOT(OWN (seller, goods)) OWN (seller,
money)
e1 lt e2 TIME e2 SUMMER
37Hot Topics
To provide SW agents with high inferential
capacities in accessing linguistic content
- In-depth lexical analysis
- e.g. X buys Y from Z at t gt Z owns Y before t
X owns Y after t - Key issues at the lexicon-grammar interface
- predicate event structure
- states, processes, accomplishments, etc.
- temporal adverbs and temporal expressions
- e.g. in three years, etc.
- quantificational expressions etc.
- syntax-semantics argument linking
38Computational Lexicons and the Semantic Web
- Part 2
- Lexicon Design in the Age of the Semantic Web
39Lexicons of the Future
- General purpose
- portable over different domains
- Multilingual
- relations among lexical entities in different
languages - Flexible and extensible
- enable use of information at appropriate
granularity for the application - enable continual extension dynamic
- Integrated with Web technology
- content interoperability
40Lexical Content Interoperability
- The Lexical Web
- Enable universal access to lexical information
FrameNet
SIMPLE
WordNet
EuroWordNet
Intelligent Agents
41Some Requirements for Lexical Content
Interoperability
- Compatibility between different models of
lexical analysis - relational semantic models (e.g. WordNet)
- Syntactic and semantic frames
-
- Compatibility between different degrees of
lexical specification - deep lexical representations (e.g. PAROLE-SIMPLE)
- shallow semantic descriptions
- Compatibility between different paradigms of
multilinguality - lexicons for transfer-based MT
- interlingua-based lexicons
42The Need for Standards
- To represent common information
- while keeping flexibility
- To enhance the sharing and reusability of
multilingual lexical resources - To establish an open environment for the
development and integration of multilingual
resources - Information must be consistent with related
technologies in order to take advantage of them - XML, RDF/S, etc.
43International Standards for Language Engineering
- Definition of standards for multilingual
computational lexicons both at the content and at
the representational level
44ISLE
EAGLES guidelines for syntactic and semantic
lexicons
GENELEX Model
MILE Lexical Model
45The MILE Lexical Model
- A general architecture to foster the content
interoperability between multilingual
computational lexicons - Key issues
- Modularity
- User-adaptability
- Resource sharing
- Reusability
SW technologies and standards applied at lexicon
modelling
46The MILE Lexical Model (MLM)
- The MLM core is the Multilingual ISLE Lexical
Entry (MILE) - a general schema for multilingual lexical
resources - a lexical meta-entry as a common representational
layer for multilingual lexicons - Computational lexicons can be viewed as different
instances of the MILE schema
MILE Lexical Model
lexicon1
lexicon3
lexicon2
47MILEthe building-block model
- The MILE architecture is designed according to
the building-block model - Lexical entries are obtained by combining various
types of lexical objects (atomic and complex) - Users design their lexicon by
- selecting and/or specifying the relevant lexical
objects - combine the lexical objects into lexical entries
- Lexical objects may be shared
- within the same lexicon (intra-lexicon
reusability) - among different lexicons (inter-lexicon
reusability)
48MILEthe building-block model
49Modularity in MILE
multi-MILE
multilingual correspondence conditions
multiple levels of modularity
50The Mono-MILE
- Each monolingual layer within Mono-MILE
identifies a basic unit of lexical description
SemU
basic unit to describe the semantic properties of
the MU
semantic layer
basic unit to describe the syntactic behavior of
the MU
SynU
syntactic layer
basic unit to describe the inflectional and
derivational morphological properties of the word
MU
morphological layer
51The Mono-MILE
MU
52Syntax-Semantics Linking
CorrespSynUSemU
53Syntax-Semantics Linking
John gave the book to Mary John gave Mary the book
SynU1
SemU1
obj_NP
obl_PP_to
subj_NP
Semantic_FrameGIVE
Arg2 Theme
Arg3 Goal
Arg1 Agent
SynU2
obj_NP
obj_NP
subj_NP
54The Multi-MILE
- Open to various approaches to multilinguality
- transfer-based
- monolingual descriptions are used to state
complex correspondences (tests and actions)
between source and target entries - interlingua-based
- monolingual entries linked to
language-independent lexical objects (e.g.
semantic frames, primitive predicates, etc.)
55Multi-MILE
IT_SemU_2 ? En_SemU_1 IT_SynU_2 ?
En_SynU_1 IT_Slot_0 ?EN_Slot_1 IT_Slot_1 ?
EN_Slot_0
AddFeature to source SemU HUMAN
AddSlot to target SynU MODIF PP_with
56Multi-MILE
IT Lexicon
EN Lexicon
multilingual conditions
finger
modif(mano)
dito
modif(piede)
toe
multilingual conditions
entrare to enter
run PP_into
PP_di_corsa
57Defining the MLM
- The MLM is designed as an E-R model (MILE Entry
Schema) - defines the lexical objects and the ways they can
be combined into a lexical entry - The MLM includes two types of lexical objects
- MILE Lexical Classes (MLC)
- MILE Lexical Data Categories (MDC)
58MILE Lexical Classes
- Represent the main building blocks of lexical
entries - Define an ontology of lexical objects
- represent lexical notions such as semantic unit,
syntactic feature, syntactic frame, semantic
predicate, semantic relation, synset, etc. - Similar to class definitions in OO languages
- specify the relevant attributes
- define the relations with other classes
- hierarchically structured
59MILE Lexical Classesan ontology of lexical
objects
60MILE Lexical Data Categories
- MDC are instances of the MILE lexical Classes
- Each MDC respresents a resource
- uniquely identified by a URI
- Two types of MDC
- Core MDC
- belong to shared repositories (Lexical Data
Category Registry) - lexical objects and linguistic notions with wide
consensus - User Defined MLDC
- user-specific or language specific lexical
objects
61MILE Lexical Data Categories
MLMFeature
MLMGrammaticalFunction
62Defining the MLM
MILE Entry Schema
MILE Lexical Classes
RDF/S Descriptions
63RDF Instantiation of the MLM
Lexicon2
Resources
Lexicon1
Lexicon3
Metadata
Lexical Objects
Resources
Lexical Classes
Lexical Data Categories
64General Means
- W3C standards
- Resource Definition Framework (RDF/S)
- Ontology Web Language (OWL)
-
- Built on the XML web infrastructure to enable the
creation of a Semantic Web - web objects are classified according to their
properties - semantics of relations (links) to other web
objects precisely defined
65MILE Lexical Model
- Ideal structure for rendering in RDF
- hierarchy of lexical objects built up by
combining atomic data categories via clearly
defined relations - Proof of concept
- Create an RDF schema for the MILE Lexical Model
- version 1.2
- Instantiate MILE Lexical Data Categories
66The RDF Schema
- Defines classes of objects (MLC) and their
relations to other objects - Like a class definition in Java, etc.
- Classes and properties in the schema correspond
to the E-R model - Can specify sub-classes/sub-properties and
inheritance
67MILE Lexical Data Category Registry (MDC)
- Instantiation of pre-defined lexical objects
- Extension of the shared class schema with
lexicon-specific sub-classes and sub-properties - Can be used off the shelf or as a departure
point for the definition of new or modified
categories - Enables modular specification of lexical entities
- eliminate redundancy
- identify lexical entries or sub-entries with
shared properties
68MLC in RDF/S features
features are properties of lexical objects
mlmLexObject
mlmValues
mlmfeature
rdfssubPropertyOf
rdfssubClassOf
mlmsemFeature
rdfssubClassOf
mlmSemValues
mlmsynFeature
mlmSynValues
69MLC in RDF/S syntactic features
ltrdfsProperty rdfIDsynCat"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1synFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1SynCatValues/gt lt/rdfsProp
ertygt ltrdfsClass rdfIDSynCatValuesgt ltrdfss
ubClassOf rdfresourcehttp//webilc.ilc.cnr.it
/lenci/isle/mile- schema-v.1 SynValues/gt
ltowloneOf rdfparseType"Collection"gt ltowlThin
g rdfabout"Noun"/gt ltowlThing
rdfabout"Verb"/gt ltowlThing
rdfabout"Adjective"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
feature values
70MLC in RDF/S semantic features
ltrdfsProperty rdfIDdomain"gt ltrdfssubProperty
Of rdfresource"http//webilc.ilc.cnr.it/lenc
i/isle/mile- schema-v.1semFeature"/gt ltrdfsrang
e rdfresourcehttp//webilc.ilc.cnr.it/lenci/
isle/mile- schema-v.1 DomainValues/gt lt/rdfsPro
pertygt ltrdfsClass rdfIDDomainValuesgt ltrdfs
subClassOf rdfresourcehttp//webilc.ilc
.cnr.it/lenci/isle/mile- schema-v.1SemValues/gt
ltowloneOf rdfparseType"Collection"gt ltowl
Thing rdfabout"Finance"/gt ltowlThing
rdfabout"Medicine"/gt ltowlThing
rdfabout"Sport"/gt ... lt/owloneOfgt
lt/rdfsClassgt lt/rdfsRDFgt
domain ontology
71Synsets in RDF/S
mlmword
mlmSynset
rdfsliteral
mlmgloss
rdfsliteral
mlmfeature
mlmsynsetRelation
mlmValues
mlmSynset
cf. also http//www.semanticweb.org/library/wordne
t/wordnet-20000620.rdfs
72Synsets in RDF/S
ltrdfsClass rdfID"Synset"gt ltrdfslabelgtSynsetlt/
rdfslabelgt ltrdfscommentgtThis class formalizes
the notion of synset as defined in WordNet
(Fellbaum 1998).lt/rdfscommentgt ltrdfssubClassOf
rdfresourceLexObject/gt lt/rdfsClassgt ltrdfsP
roperty rdfID"synsetRelation"gt ltrdfsdomain
rdfresource"Synset"/gt ltrdfsrange
rdfresource"Synset"/gt lt/rdfsPropertygt ltrdfsP
roperty rdfID"hypernym" mlmsource"WordNet1.7"gt
ltrdfscommentgtThe WordNet hypernym
relationlt/rdfscommentgt ltrdfssubPropertyOf
rdfresource"synsetRelation"/gt lt/rdfsPropertygt
ltrdfsProperty rdfID"meronym"
mlmsource"WordNet1.7"gt ltrdfscommentgtThe
WordNet meronym relationlt/rdfscommentgt ltrdfssub
PropertyOf rdfresource"synsetRelation"/gt lt/rdfs
Propertygt
relation between synsets
different types of synset relations
73WordNet 1.7 Synsets
ltmlmSynset rdfabout"http//www.cogsci.prin
ceton.edu/wn1.7/concept01752990
mlmsource"WordNet1.7"gt ltmlmglossgtA member of
the genus Canislt/mlmglossgt ltmlmwordgtdoglt/mlmwo
rdgt ltmlmwordgtdomestic doglt/mlmwordgt ltmlmwordgt
Canis familiarislt/mlmwordgt ltmdcsynCat
rdfresource"Noun"/gt ltmdcdomain
rdfresource"Zoology"/gt ltmdchypernym rdfreso
urce"http//www.cogsci.princeton.edu/wn1.7/conce
pt 01752283"/gt lt/mlmSynsetgt
features
hypernym
74Conclusions and Future Work
- The MILE Lexical Model is oriented towards open,
distributed lexical resources - Lexical Information Servers for multiple access
to lexical information repositories - Enhance user-adaptivity and resource sharing
- Develop integration and interchange tools
- Promote interchange with the Semantic Web and
Ontology communities - Related projects and initiatives
- ISO, INTERA, ENABLER, etc.
75Acknowledgements
S. Atkins, N. Bel, F. Bertagna, P. Bouillon, N.
Calzolari, C. Fellbaum, R. Grishman, N. Ide, M.
Palmer, W. Peters, G. Thurmair, M. Villegas, P.
Wittenburg, A. Zampolli and many others
Thank You !