Title: Adriana Roventini*
1 Adriana Roventini Rita Marinelli
Extending the Italian WordNet with the
Specialized Language of the Maritime
Domain Istituto di Linguistica Computazionale
del CNR Pisa Italye-mail rita.marinelli_at_ilc.
cnr.it - adriana.roventini_at_ilc.cnr.it
2 Our purpose
to describe the construction we are carrying
out at the Institute for Computational
Linguistics, of a terminological subset belonging
to the maritime lexical domain (in particular to
the technical and commercial/maritime transport
domain).
3Wordnet
- In the Princeton semantic WordNet (Miller et al.,
1990) the meanings of words are represented in
terms of their conceptual-semantic and lexical
relations to other words - it has been the tool of choice for building
Natural Language Processing (NLP) systems of
various kinds.
4EWN
- The main goals of the EuroWordNet (EWN) are
- to develop a (multilingual) lexical resource,
retaining the basic underlying design of WordNet
1.5 (hereafter WN1.5) - to improve it in order to meet the needs of
research in the field of NLP (Vossen, 1999). -
5Background
- SI-TAL an Italian national Project (Integrated
System for the Automatic Treatment of Language) - development of various integrated language
resources and software tools for the automatic
treatment of Italian written and spoken language - ITALWORDNET lexical semantic resource developed
within the SI-TAL project, enlarging the first
database built in EWN.
6IWN
- EWN project IWN
SI-TAL -
Integrated System for the Automatic -
Treatment of Language
- IWN database containing ca. 50.000 synsets
- Nouns
- Verbs
- Adjectives
- Adverbs
- Proper Names
- IWN links synsets by lexical-semantic relations
- Synonymy
- the most important
relations - Hyponymy
- Many other semantic relations encoded for various
subsets of Italian Nouns (Common Proper ),
Verbs, Adjectives - IWN synsets linked toWordNet 1.5 through a
generic ILI (InterLingual - Index)
Not encoded in EWN
7The IWN linguistic model
- Synsets and synonymy relation
- Synset as basic notion
- around which WN, EWN and IWN are built
synset or set of synonymous words belonging to
the same Part-of-Speech (PoS) that can be
interchanged at least in a context. - Synsets are connected by semantic relations
to other synsets and to the ILI (an unstructured
version of WN 1.5, containing all its synsets but
not the relations among them).
8Inherited from EWN also
- language-internal relations link the
language-specific synsets (mainly
hyperonymy/hyponymy or is-A relation, role,
causes, purpose, part relations, etc.) - equivalence relations link the Italian synsets
to the InterLingual-Index (ILI). - By linking our wordnet to the ILI we ensured the
possibility to use IWN for multilingual
applications.
9Reasons for our choice
- The globalisation of trade, business and travel
and the technology development (growing
importance of transport). - The changes produced within the maritime activity
and the related terminology (remarkable incidence
of this lexical domain) - New techniques of communication, translation and
diffusion of terms (monopole of the English
language).
10Building/structuring the terminological IWN
- according to the design principles of the generic
wordnet, (applying the same semantic relations
model) - exploiting the possibility - available in IWN
through the Inter-Lingual Index (ILI) - of
linking the specialized terms to the
corresponding closest concepts in English.
11Sources
- Several information sources have been used to
select the BC - the Dizionario Globale dei termini marinareschi,
edited by the Capitaneria del Porto di Livorno,
online on the Web - the Dizionario di marina, edited by Barberi
Squarotti G. , Gallinaro I, (2002) - the Glossario dello spedizioniere (Annuario
Federspedi 1988) - the Dizionario di termini marittimi mercatili,
compiled by P. R. Brodie and translated by E.
Vincenzini, Lloyds of London Press, Legal
Publishing and Conferences Division, 1988.
12Choice of the base concepts (BCs)
- design of the terminological database top level,
identifying the most relevant and representative
domain concepts or basic concepts (BCs) . - (i.e. showing a large number of hyponyms,
and/or more frequently used in this particular
domain of maritime navigation and transport).
13First Base-Concepts
- A first nucleus of over 200 BCs was identified,
such as nave (ship), porto (harbour), ormeggio
(mooring), albero (mast), carico (cargo),
spedizione (shipment), navigazione (navigation),
trasporto (transport), tariffa (tariff), nolo
(freight) and so on, which are sufficiently
general and constitute the root nodes of the
specialized database.
14BCs export/import
- as XML files
- (see the example below concerning the verb
imbarcare/to ship). -
IWN
xml
IWNTerm
15- Example of an XML export file imbarcare (to
ship) - - ltWORD_MEANING ID"V32560" PART_OF_SPEECH"V"gt
- ltGLOSS /gt
- - ltVARIANTSgt
- ltLITERAL LEMMA"imbarcare" SENSE"1"
STATUS"CT" /gt - lt/VARIANTSgt
- - ltINTERNAL_LINKSgt
- - ltRELATION TYPE"xpos_near_synonym" ID"2"
INV_ID"2"gt - ltTARGET_WM ID"27869" PART_OF_SPEECH"N"
LEMMA"imbarco" SENSE"1" GLOSS"" /gt - lt/RELATIONgt
- - ltRELATION TYPE"has_hyperonym" ID"8"
INV_ID"8"gt - ltTARGET_WM ID"32127" PART_OF_SPEECH"V"
LEMMA"fare" SENSE"14" GLOSS"causare un
cambiamento in un processo o uno stato (seguito
da un infinito)." /gt - lt/RELATIONgt
- - ltRELATION TYPE"has_hyponym" ID"10"
INV_ID"10"gt - ltTARGET_WM ID"36489" PART_OF_SPEECH"V"
LEMMA"reimbarcare" SENSE"1" GLOSS"" /gt - lt/RELATIONgt
- - ltRELATION TYPE"involved_instrument" ID"31"
INV_ID"31"gt - ltTARGET_WM ID"15111" PART_OF_SPEECH"N"
LEMMA"imbarcatoio" SENSE"1" GLOSS"" /gt - lt/RELATIONgt
16New BCs
- Other BCs were included ex novo, not present
with their maritime senses in the generic
database, but very frequently used and
representative of this specific domain, for
instance nolo (freight), classe (class), fanale
(light), punto (position), destino (destination),
agente marittimo (shipping agent), spedizioniere
(freight forwarder).
17Example Punto (Position)
18Use of Relations to codify specialized terms
- first nucleus of terms increased
- (encoding hyponyms and using other semantic
relations)
19Example Ormeggio (Mooring)
20Kind and Number of Terms
- 2227 lemmas corresponding to 1721 synsets and
2355 word-senses belonging to the maritime
(technical/nautical and maritime transports)
domain all linked to the generic wordnet. - Terms belonging to all the different grammatical
categories of nouns, verbs, adjectives, adverbs
and a small set of proper names have been
codified in the terminological data base (3971
relations).
21Example Porto (Harbour)
22Polilexical Units
- Base Concepts (BCs) as the root of a
terminological sub-hierarchy - (in many cases) hyponyms BC adjective or
prepositional phrase - For instance
- carico (cargo),
- carico completo (full cargo), carico di
merci varie (general cargo), carico in coperta
(deck cargo), carico parziale (part load cargo), - tariffa (tariff),
- tariffa doganale (custom tariff), tariffa di
trasporto (transport tariff), tariffa forfettaria
(flat-rate tariff), - nolo (freight)
- nolo anticipato (freight prepaid), nolo
intero (full freight), nolo secondo il valore (ad
valorem freight), nolo a destino (freight payable
at destination).
23Linking Terms to the ILI
- Actually the English term or multiword (or its
acronym) is often known and used much more than
the Italian one in the maritime transport
activity. - Difficulty in finding the synonyms
-
-
- both the English term (or multiword) and the
Italian - one are included in the synset as variants,
(as we thought - this could be useful to non-professionals as
well).
24EXAMPLES
- RO-RO (Roll On/Roll Off) usually indicates nave
traghetto per automezzi (ferry for vehicles
transport), - the abbreviation FOB (Free On Board) is used to
say con le spese pagate fino a bordo, (loading
costs paid up to the ships broadside), - CIF (Cost Insurance and Freight) to say costi
fino a bordo più assicurazione e nolo mare pagati
(loading costs, insurance and sea-freight
prepaid).
25 The Link Structure
- the BCs identified for this terminological
lexicon constitute the top level and are the root
nodes for the plug-in operation which allows
linking between the generic and the specialized
wordnet.
26Two types of plug_in relations are codified
- the eq-plug-in relation, as equivalence
synonymy relation between synsets of the two
databases - the has-hyperonym(hyponym)-plug relation, as
equivalence hyperonymy/hyponymy relation between
synsets of the two databases.
27Tool Facilities
- a simultaneous parallel consultation of the two
databases to facilitate insertion of the
relations - an integrated research between the two databases
- if the lemma is found in both databases and
there is an eq-plug-in relation between the
synsets, the synset belonging to the specific
domain eclipses the generic one exploiting the
integrated research.
28Tool Facilities
downward and horizontal relations (part-of
relations, role relations, cause relations,
derivation, etc.) are taken from the
terminological wordnet. upward (hyperonymy)
relations are taken from the generic one. It is
possible to access the generic database or the
terminological database or both databases at the
same time.
29EXAMPLE Nolo (Freight)
30Nolo plug-in (with downward relations)
31Nolo plug-in (with upward relations)
32EXAMPLE Bussola (Compass)
33 Bussola plug_in (with downward relations)
34Bussola plug_in (with upward relations)
35Differences between IWN and Dictionaries/Glossarie
s
- The data are not only described (by the
definition), but also codified (by relations) - data structured only alfabetically in the
dictionary edited by the Harbour Master (we can
read for example all information about bussola
all together and almost confused) become, in a
relational database, synsets, linked to each
other by many types of semantic relations
(hyperonymy, hyponymy, holo/mero part, etc.)
which can also be managed automatically.
36FINAL REMARKS
- maritime terminology is object of great interest
in a maritime nation like Italy, which has a
strong marine tradition - the English terms prevail over the Italian
synonyms - maritime terminology dictionaries are rare and
sometimes it is very difficult to find an English
translation of these terms
37Instrument for work
- The possibility of having definitions and
translations of specific terms is a useful
instrument for work (export-import companies,
maritime agencies, etc.), at school and the
didactic activities of various types (nautical
Institutes, professional training, etc.) and, in
general, whenever a reference to terms of this
specific domain is needed.
38- From a commercial point of view, the English
language prevails over all other languages
contracts, negotiations, chartering and operation
documents of cargo ships (like bills of lading,
etc.) are in English, and so are a great number
of reference books. - from the point of view of usefulness, there are
circumstances in which it is necessary to refer
to a translation of technical terms that is
correct, abreast and absolutely unambiguous. -
39Our aim
- to build a terminological database showing the
semantic relations between different concepts, a
precise correct linkage to the English terms, and
then to make it a point of reference, in
circumstances like legal actions, for instance,
when the judge.. - to carry on this research increasing the number
of terms and starting a cooperation with the
official transport organizations in order to
enrich and refine this product and to arrive at a
definitive version recognized and validated. - to start this kind of research for the Italian
language.
40Results
- Specialized lexicon enlarged
- Italian terms clarified
- More effective management of Italian terms and
English terms -
- In spite of globalisation, in a maritime
country like ours it is absolutely essential not
to lose our linguistic identity -