Title: Building a Primitivebased Lexical Consultation System
1Building a Primitive-based Lexical Consultation
System
- prepared by Lim Beng Tat
- Supervisor Dr Tang Enya Kong
- Dr. Guo Cheng Ming
2Abstract The research gives about the design of
semantic-primitive-based lexical consultation
system and the possible processes which will be
performed on a mahine-readable dictionary (MRD)
and corpus to produce a machine-tractable
dictionary (MTD) and tractable corpus
automatically. Linguistic tools such as sense
tagger and reources are created during or after
the processes. Besides that, this research will
also show how to perform an unsupervised word
sense disambiguation method to the samples of
unrestricted text from various prospective
application areas by using the newly constructed
MTD. This is important to the applications that
need lexical semantics such as machine
translation, information retrieval and hypertext
navigation, content and thematic analysis,
grammatical analysis, speech processing and text
processing.
3Outline
- Introduction
- Problem
- Objective
- Lexical Consultation System
- System design and architecture
- Example applications
- Bilingual Knowledge Bank
4Introduction
- Dictionaries
- Supply knowledge (language and world)
- E.g. Collins English Dictionary (CED), Longman's
Dictionary of Contemporary English (LDOCE) and
Webster's 9th Dictionary (W9)
5Introduction (Cont)
- Explicit information (POS)
- Implicit information / semantic information
- Hypernym/hyponym relations (class/subclass)
- Synonymy/Antonymy relations
- Meronym/Holonym relation (part/whole, ...)
- Collocational relations (compounds, idioms, ...)
and etc
6Introduction (Cont)
- Problem Extracting semantic information from
dictionary? - 2 methods
- Defining pattern
- Identify significant recurring phrase
- E.g. A member of- NP
- hand a member of a ship's crewW9
- Extraction of semantic hierarchy
- Extraction of hyponym.
- E.g. dipper a ladle used for dipping... CED
- ladle a long-handled spoon... CED
- spoon a metal, wooden, or plastic utensil...
CED
7Introduction (Cont)
- Disadv
- Circularity
- E.g. tool an implement, such as a hammer... CED
- implement a piece of equipment tool or utensil.
CED - utensil an implement, tool or container... CED
- Inconsistency in dictionaries
- E.g. corkscrew a pointed spiral piece of
metal... W9 - dinner service a complete set of plates and
dishes... LDOCE - Dictionaries for human usage
- Other methods
- Semantic primitive and word sense disambiguation
8Semantic Primitive
- Semantic primitive refer to a core meaning that
cannot be not further analyzed - E.g. bachelor and red
- bachelor means that someone is a man who is
not married
- red represents semantic primitive (a basic
meaning), while bachelor does not.
9Semantic Primitive (Cont)
- 2 types of semantic primitive
- Prescriptive and descriptive
- Prescriptive semantic primitives
- Set of pre-defined primitive
- E.g. father marry couple
- marry human, human.
- father human
- couple human, thing.
- To choose the correct sense of couple
10Semantic Primitive (Cont)
- Prescriptive semantic primitives
- Problem always need to be extended
- Descriptive semantic primitives
- Set of semantic primitives which is derived from
a natural source of data such as dictionary. - E.g.
father5 - a term5 of address for priest2 in
some church especially roman7 or orthodox3
catholic marry3 - perform1 a marriage4
ceremony couple1 - a pair5 of people5 who
live7 together2
Uniquely identify each of the definition of
entries
Avoid Circularity
11Word Sense Disambiguation(WSD)
- Documents are collections of sentences containing
words - Some words have more than one meaning. These
meanings are often called word senses. - Goal
- Assign meanings to words in some context
according to some lexical resource.
12Objective
- Producing Machine-Tractable Dictionary (MTD) from
Machine-Readable Dictionary using descriptive
semantic primitives and WSD - Producing tractable database/corpus from
database/corpus
13Linguistic Resources
- Machine-Tractable dictionary
- Encoded with information extracted from MRD
- Usable format and highly structured semantic
information for NLP tasks
Descriptive semantic primitives
Determining the relatedness or closeness among
word senses in a dictionary
14Lexical Consultation System
- Semantic Primitive Extractor
- LCDD Generator
- WSD
15Semantic Primitive Extractor
- Searching for self-reference circle in definition
- For example,
sense_1 def sense_2 sense_5 sense_6 sense_2
def sense_3 sense_2 sense_3 def sense_1
sense_2 sense_4 def sense_5 sense_5 def
sense_2 sense_4 sense_6 def sense_5 sense_4
gtsense_1 is a semantic primitive
16Semantic Primitive Extractor (cont)
- Step 1 Expanding dictionary
abandon 1 a feeling of extreme
emotional intensity abandon 2 leave
behind . . betray 2 abandon
abandon 1 a feeling of extreme
emotional intensity abandon 2 leave
behind . . betray 2 abandon1 abandon2
17Semantic Primitive Extractor (cont)
- Step 2 identify semantic primitives using
self-reference circle - Example,
- Extract primitives from pre-released WordNet
during SENSEVAL2. - Pre-released WordNet1.7 192,460 entries
- Extracted primitives 9368 entries (around 5 of
pre-released WordNet1.7 entries)
18LCDD generator
- Identify the word senses definition layers
- First layer for forecast2 and fixed6
- Second layer for forecast2 and fixed6
- forecast2
- fixed6
forecast2 predict1 in advance3
fixed6 specify1 in advance3
19LCDD generator(Cont)
LCDD(forecast2, fixed6) a70 (b c
d)/330
Depth-First Method
a
Layer 1 for forecast2
Layer 1 for fixed6
b
c
Layer 2 for forecast2
Layer 2 for fixed6
d
20WSD
- Simple Summation Algorithm
- For example, assume that a sentence, father,
marry and couple. Each word in the sentence
has two senses only. - father1 marry1 couple1
- father1 marry1 couple2
- father1 marry2 couple1
- father1 marry2 couple2
- father2 marry1 couple1
- father2 marry1 couple2
- father2 marry2 couple2
- father2 marry2 couple2
- Dynamic programming techniques
21System Design
General Dictionary (MTD)
Lexical Consultation System
Domain MTD for WSD
Domain MRD
22System Architecture
Bilingual Knowledge Bank (BKB)
Papillon Dictionaries or FEM
23Tractable Bilingual Knowledge Bank (BKB)
1E
1M
1E
1M
(0-5,0-4)
(0-5,0-4)
kutip(1)v (3-4/3-4)
kutip(1)v (3-4/3-4)
kutip(2)v (3-4/3-4)
pick(1)v up(1)p (3-47-8/3-4)
pick(1)v up(1)p (3-47-8/3-4)
pick(1)v up(1)p (3-47-8/3-4)
(0-1,0-1)
(0-1,0-1)
(0-1,0-1)
(2-4,2-4)
(2-4,2-4)
dia(1)n (0-1/0-1)
he(1)n (0-1/0-1)
dia(1)n (0-1/0-1)
(2-3,3-4)
dia(1)n (0-1/0-1)
(2-3,3-4)
he(1)n
bola(1)n (2-3/2-4)
ball(1)n (3-4/2-4)
bola(1)n (2-3/2-4)
ball(1)n (3-4/2-4)
he(1)n (0-1/0-1)
he(1)n (0-1/0-1)
lelaki(3)n (0-1/0-3)
bola(1)n (2-3/2-4)
ball(1)n (3-4/2-4)
man(4)n (2-3/0-3)
0-1
0-1
itu(1)det (3-4/3-4)
itu(1)det (3-4/3-4)
the(2)det (2-3/2-3)
the(2)det (0-1/0-1)
old(3)adj (1-2/1-2)
the(1)det (2-3/2-3)
the(1)det (2-3/2-3)
tua (2)adj (1-2/1-2)
itu (1)det (2-3/2-3)
itu(1)det (3-4/3-4)
(0-1,0-1)
(0-1,0-1)
(0-1,0-1)
(3-4,2-3)
(3-4,2-3)
dia kutip bola itu 0-1 3-4 2-3 3-4
he pick the ball up 0-1 3-4 2-3 3-4
7-8
dia kutip bola itu 0-1 3-4 2-3 3-4
he pick the ball up 0-1 3-4 2-3 3-4
7-8
(2-3,3-4)
(2-3,3-4)
0the1old2man3pick4the5ball6up7
0lelaki1tua2itu3kutip4bola5itu6
24- Thank you
- Any comments please send to btlim_at_cs.usm.my
25Semantic Primitive Extractor (cont)
- Step 2 compute the frequency of each sense entry
in dictionary according to its appearance in
definition text. - Sort the list by frequency
- an entry with high frequency gt
- high probability that entry is a primitive
- Problems
- Empty definition
- Possibility of selecting wrong semantic
primitives based on the self-reference method
26WSD (Cont)
- Improving the quality of a number of Natural
Language Processing Tasks - Machine Translation
- Information Extraction
- Internet Search Engines
27WSD (Cont)
previous path value difference between the two
consecutive paths