Title: LIRICS WP2
1LIRICS WP2 NLP Lexica
Monica Monachini monica.monachini_at_ilc.cnr.it CNR-I
LC - Pisa 23rd May 2006
2Summary of the presentation
- Overview of WP2
- 1 year objectives
- Main results in T2.1 and T2.2
- Work done
- Synergies with other LIRICS WPs, ISO activities,
meetings - Priorities for future activities
3WP2 overall objective
- Define a family of standards for NLP lexicons
- Two-level standards
- the high level specifications provide structural
elements, i.e. lexical classes and relations
between them, the meta-model - the low level specifications provide standardized
constants, i.e. data categories used to adorn
the lexical classes ? ISO 12620
4WP 2 T2.1 overview and objectives
5WP 2 T2.1 results
- Proposal for a unified set of lexical
information and unified descriptors as draft set
of Data Categories - Maximum set of candidate lexical data categories
subdivided along the layers of linguistic
description morphosyntax, syntax and semantics. - Data Categories shared between WP2 and WP3
relevant to Morphosyntactic description have been
incorporated in the Syntax Tool the
Morphosyntactic Profile.
6WP2 T2.1 Deliverables
 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 3rd year 3rd year 3rd year 3rd year 3rd year 3rd year
 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 m22 M23 M24 M25 M26 M27 M28 M29 M 3 0
WP2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
T2.1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
T2.2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â I Â Â Â Â Â Â
T2.3 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â I Â Â Â Â Â Â Â Â Â
D.2.1 Survey and evaluation of existing standard
for Lexica D.2.1 Survey and evaluation of
existing standard for Lexica (revision) (version
foreseen in conjunction with Data Cats to be
issued togetherwith the data model in T2.2)
D.2.1 Survey and evaluation of existing standard
for Lexica
7WP2 T2.2 overview and objectives
- Define a lexical framework, a general and
abstract meta-model as a set of structural nodes
relevant for lexical description, enabling
specific implementations on the basis of common
Data Categories - Definition of the common set of related Data
Categories
8WP2 T2.2 results
- Formulation of a high-level lexical meta-model,
the Lexical Markup Framework, a flexible
environment for user-defined mark-up languages - Proof-of-concepts mapping exercises of well
known NLP lexicon practices against the model
9WP2 T2.2 Deliverables
 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 1st year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 2nd year 3rd year 3rd year 3rd year 3rd year 3rd year 3rd year
 M1 M2 MM3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 m22 M23 M24 M25 M26 M27 M28 M29 M 3 0
WP2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
T2.1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
T2.2 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â I Â Â Â Â Â Â
T2.3 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â I Â Â Â Â Â Â Â Â Â
NLP Lexica standard for CD ballot (submitted
beginning year 06)
NLP Lexica standard for ISO DIS ballot
Internal milestone for internal quality control
10WP2 Activities, Meetings, Synergies...
- LIRICS WPs BI- TRI-LATERAL Working Meetings
- CNR-ILC MPI, 15.2.2005 PAROLE-SIMPLE lexical
architecture and LEXUS tool - WP2 internal meeting, 16.2.2005 basic structure
of the meta-model for lexicons (core model
extensions) - CNR-ILC DFKI, 5.5.2005 convergences between
morpho-syntactic and syntactic data issues for
the submission of the N W I on Syntax (SynAF) to
ISO - Pisa, 23-24.11.2005. WP2 internal meeting basic
structure of the meta-model for representation of
Multiword expressions - LIRICS Meetings
- Paris, 16-17.3.2005. Progress of work within WP2.
Presentation of the standard core model for
lexicons and the extensions for NLP lexicons - Barcelona, 21-22.6.2005. LIRICS Industrial
Advisory Board Meeting - Barcelona, 22.6.2005 Presentation of first bulk
of information relevant for lexical description - Nancy, 8-9.12.2005. WP4 TDG3 Workshop
connections between lexico-semantic
representation and semantic roles in lexicon - ISO Meetings
- Berlin 8-9.4.2005. ISO TC37/SC4 WG4 Meetings
- Warsaw 21-26.08.05. Plenary meeting of ISO
TC37/SC4. Task force for the purpose of
designating generic data category sets for
alignment with with the level of the metamodel
task force related to the representation of MWEs. - Rome 27.10.2005. UNI-DIAM Commission candidature
of Italy as P-member in ISO TC37/SC4 (CNR-ILC
reference expert)
11What is LMF for?
- provide a common model for the creation and use
of lexical resources - manage the exchange of data between and among
these resources - enable the merging of electronic resources to
form extensive global resources. -
- Range of topics
- monolingual,
- bilingual
- multilingual lexical resources
- Scalability
- the same specifications are to be used for both
small and large lexicons - Coverage
- linguistic description range from morphology,
syntax, semantic to multilingual representation - languages are not restricted to European
languages - the range of targeted NLP applications is not
restricted.
12Future activities/Priorities/Plans
- Data Categories
- deliver rev 2 of D2.1 candidate data categories
will receive the necessary adjustments after
discussion - extend the ISO Registry to cover further layers
of linguistic description do we need an ISO
Syntactic Profile (Bejin)? - LMF model
- refine the NLP multilingual and MWE extensions
- XML representation of LMF linguistic objects in
order to allow unified access to LMF conformant
lexicons through APIs - Provide implementation of test suite lexical
entries PAROLE-SIMPLE lexicons ready to be
described according to LMF (LEXUS), to be put in
the LMF server and made accessible via the web.
13Structure of LMF
Structural skeleton, with the basic hierarchy of
information in a lexical entry
extend a subset of core-model classes are
conformant to the core model cannot be used
regardless to the core model
LMF specifications comply with modeling UML
principles
14Core package
Container for managing the top level language
components. The number of words or MWe of the
lexicon is equal to the number of lexical entries
in a given lexicon.
It is a cross-reference pivot that can link to
many Lexical Entries within or across Lexicons.
Form consists of a text string that represents a
single word or a multi-word expression
One to many Representation Frames can be
associated with Form, each of which contains a
form and data categories that specify the
orthographic types and name of the word
Sense specifies or disambiguates the meaning and
context of a form
15Package for extensional morphology
1st strategydescribe the morphologyrepresenting
explicitly all inflections
16Package for inflectional paradigm
2nd strategy declare an inflectional paradigm
use the inflectional paradigm extension for
defining it
17Package for NLP syntax
Syntactic behavior represents one of the
behaviors of one (or more) senses
Construction describes one syntactic construction
and can be shared by all words with the same
syntactic behavior
Self refers to the head lexical entry and
describes syntactic properties
Syntactic Argument describes a syntactic actant
ConstructionSet regroups together various
Syntactic Constructions and factorizes syntactic
descriptions to have a minimum of syntactic
behavior elements in the lexicon.
18XML representation
19Package for NLP semantics
Predicative Representation describes the link
between Sense and Semantic Predicate
Semantic Predicate describes an abstract meaning
Semantic Argument describes a semantic actant and
is linked with its syntactic counterpart
20Package for NLP semantics (cont.)
21XML representation
22Package for NLP semantics (cont.)
23Package for Multilingual representation
Sense Axis Relation describes the linking between
two different Sense Axis
Source and TargetTest permit to express
conditions about the translation on the
source/target language side
24Package for Multiword expressions