Title: Language and Tools for Lexical Resource Management
1Language and Tools for Lexical Resource
Management
- Asanee Kawtrakul (1)
- Aree Thunkijjanukij (2)
- Preeda Lertpongwipusana(1)
- Poonna Yospanya(1)
- (1)Department of Computer Engineering, Faculty of
Engineering, - (2) Thai National AGRIS center
- Kasetsart University
2Acknowledgement
- JIRCUS Japan International Research Center for
Agricultural Sciences - Organizing committee
- Kasetsart University
3Outline
- Background Motivation
- Problems in Lexical Resource Preparation
- Requirements for Lexical Resource Management
- Proposed Language and tools
- Conclusion and Next steps
4Background and Motivation
- Thailand is the agricultural basis country
- having a rich knowledge and data in agricultural
field, - A great quantity of agricultural information was
scattered in unstructured and unrelated text - Skimming/Digesting and integrating becomes
essential - Knowledge is around the world
- Knowledge Discovery without language barrier is
also needed
5The Basic Idea behind..
GraphicalUser Interface
6Textual Data as a Input
- Let us focus on Canadas agricultural products.
In 1998, there were 1,216 registered commercial
egg producers in Canada. Ontario produced 39.8
of all eggs in Canada, Quebec was second with
16.6. The western provinces have a combined egg
production of 35.6 and the eastern provinces
have a combined production of 8.0.
With a courtesy of Agriculture and Agri-Food
Canada, http//www.agr.ca/cb
7Summarization and Translation as a Result
8The Development of Agricultural System for
Knowledge Acquisition and Dissemination
- 5 years Project (2001-2005)
- The Collaborative work between
- Thai National AGRIS center
- Providing Bilingual Thesaurus (AGROVOC)
- Department of Computer Engineering
- Developing NLP techniques for Searching,
Summarizing and Translation including tools for
lexical resource management - Funded by Kasetsart University Research and
Development Institution
9Acquisition System
Linguist/Domain Expert
Very Large Corpus
Rules
Thesaurus
Lexicon
Linguistic Knowledge Base
- Intelligent Search Engine
- With Translation
- With Summarization
Document Indexing Clustering
Gathering Module
Internet/Intranet
10Thai Agricultural Thesaurus
- Total number of English vocabulary is 27,531
terms - Translate in to Thai only 10,280 terms (except
scientific names) - Scientific name were not be translated
- ex. Oryza (genus) sativa (specy) of rice or
family
11Problem in hand-coded Thesaurus
- Scalability
- Reliability and Coherence
- Rigidity
- Cost
12Foods
Processed Products
Bakery Product
Canned Products
Deistic Foods
Dried Products
Frozen Foods
Frozen Products
Fermented Foods
Fermented Products
Alcoholic Beverage
milk
Fermented Foods
Fermented Fish
13Foods
Processed Products
Products
Fermented Foods
Local Product
Fermented Fish
14Commercial Vegetables The September index, at
107, was up 1.9 percent from last month but 3.6
percent below September 1998. Price increases
for lettuce, tomatoes, broccoli, and celery more
than offset price decreases for onions, carrots,
and cucumbers
Commercial Vegetable
tomatoes
Cucumbers
Carrots
Broccoli
15Commercial Vegetable
broccoli
carrot
tomato
User Category
tomato
tomatoes
Keyword Assigned
16Other Major Problems (1)
- Accessing to textual information
- Language variation
- Many ways to express the same idea
- Ex thinning flower uses deblossoming
- thinning branch uses pruning
- how the computer can know that words a person
uses are related to words found in stored text? - Ex user thinning branch
- computer pruning
17Requirement (1)
- Accessing to textual information
- Need intelligent browsing from related concept to
related concept, rather than from occurrence of
stemmed character strings
18Other Major Problems (2)
- Transforming from unstructured to structured
information
19Requirement (2)
- Need Application-based Frame about product price
- Knowledge representation in table form
- Consisting of attributes and their values
Attributes
Values
20Problems in Translation Pragmatic and Semantic
0.97 averagePrice of year from1990-1992
Using Ontology
September Of year ??
- The September All Farm Products Index was 97
percent of its 1990-92 base, down1.0 percent from
the August index and 2.0 percent below the
September 1998 - Index
August Year1997
Down 0.02price(September 1998)
21Year 1990-1992 meaning
22Requirement (3)
- Lexicon should having the semantic constraints
between lexical entities, restriction on usage
categories
23Summary of Problems related to lexicon
- In terms of coverage
- Extensional coverage, i.e., number of entries
- Intensional coverage, i.e., the number of
information fields - In terms of semantic domain covered by the
application - Meaning Interpretation with respect to objects,
subject matter, topics of discourse, and
pragmatic interpretation - The user category with reference to the intended
system users - Commercial product vs Plant products vs Family
products
24One Solution
- Encoding world knowledge in the structures
attached to each lexical item which needs both
language and tools
25The Design of Lexicon Requirement Specification
- Macrostructure Lexicon structure in terms of
relations between lexical entries - i.e. Hierarchical taxonomies which are
characteristic of thesauri of semantically
related word family - Microstructure types of information for each
entry - Pronunciation or phonemic transcription
- Syntactic properties
- Meaning
- Pragmatics of their use in real context and
language
26Microstructure (cont)
- Lexical entity could contain slots/scripts for
each specific domain and need intelligent
Analyzer and understanding language - Supplies information extraction
- Supplies the missing value
27Lexical Resource Management Language
- which is able to
- Handle heterogeneity of linguistic knowledge
structures. - Handle exceptions and inconsistencies of natural
languages. - Provide an intuitive means to store and
manipulate both linguistic and world knowledge.
28Language Features
- The language is designed in a way that will
enable - Supports for heterogeneous structures.
- Sufficient provisions to handle exceptions and
inconsistencies of natural languages (this is
achieved through the /- operators). - Deduction of knowledge from rules.
- Detection and prevention of potential integrity
violations.
29Language and Tools Specification requirement
- Flexibility almost any structures can be
defined in this model. - Extensibility extending a structure is simple.
- Maturability structure reformation and
deformation are supported. - Integrity meta-relations help prevent malformed
or ill-semantic data entries. - Dealing with inconsistencies is feasible.
30Some Syntactic Elements
- Knowledge manipulations are achieved through
these primitives - def is used to define structures not already
existing. - redef changes aspects of existing structures.
- undef removes specified structures from the
knowledge base. - ret is used to retrieve structures from the
knowledge base.
31Examples
- Hierarchies tree structures representing
generalization semantics, or classes, of atoms.
thing
animate
inanimate
animal
human
A semantic tree represented by a hierarchy
structure
32Usage Examples
- Defining a hierarchy
- def thing(animate(humananimal)inanimate).
- Adding the plant and vehicle concepts
- def animate(plantvehicle).
- Reparenting the vehicle concept
- redef animate(vehicle) inanimate(vehicle).
- Removing the human concept
- undef human. (provided that there is only a
single instance of human)
33Usage Examples (2)
- Defining case frames for verbs
- First, we need to define meta-relations for words
belonging to the sub-hierarchy verb. - def meta case(verb, subthing).
- def meta case(verb, subthing, objthing).
- Then, we define case frames for several verbs.
- def case(eat, subhumananimal, objfood).
- def case(fly, subbird-penguin). (here, we
emphasize the use of /- operators)
34Hierarchy Set
c1
c3
f1
f4
w2
w1
c2
f2
f3
p1
w4
w3
w7
w6
w5
35Defining a Hierarchy
c1
def c1(w1(w3)c2(w4)w2).
def w5w6 under w4.
w2
def p1(w7) under w2.
w1
c2
p1
w4
w3
w7
w6
w5
36Manipulating the Hierarchy
c1
redef w4 under w2.
undef w1.
w2
w1
c2
p1
w4
w3
w7
w6
w5
37Defining a Set
c3
f1
f4
f2
f3
def c3f1f2f3.
def f4 in c3.
38Defining a Relation
def meta r1(c2, c3).
Template defined.
c2
def r1(w4, f1).
Relation defined.
def r1(w1, f3).
Constraint violated. Definition not allowed.
c2
r1
w1
c3
w4
r1
f1
f4
f2
w6
f3
w5
inherited
39Synset Surrogates
- A synset is an unnamed set identified by its
unique ID. - Members of a synset are considered synonymous
with different degrees of synonymity. - Distance graph is automatically constructed
within a synset with surrogates being
representatives of synset members. - Entities with identical features are attached to
the same surrogates.
40Synset Surrogates
f4
f1
p1
f1
w1
surrogate network internally constructed
f1
p2
s2
f2
s1
w2
w6
f4
synset1
s3
s5
f3
p3
s4
w3
f3
f2
w4
f1
f4
f4
f3
41Synset Multilingual Lexicon
- Synset members are not confined within language
scope, that is, entities from different language
may belong to the same synset. - Distance matrix are computed from number of
different features over each pair of surrogates. - Traversing from a word to nearest-distant words
is handled by the system. We can determine words
with potentially nearest semantics here.
42Expected Result
43Keyword Generated
44Fruit vegetable,red
Keyword Generated
45BT
VEGETTABLES
tomatoes
Fruit vegetable,red
Keyword Generated
Expert Domain
46BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
tomatoes
Fruit vegetable,red
Keyword Generated
Expert Domain
47BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
tomatoes
Fruit vegetable,red
Sweet pepper
Keyword Generated
Expert Domain
48BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
tomatoes
Fruit vegetable,red
Sweet pepper
Tomatoes
Keyword Generated
Expert Domain
49BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
Fruit vegetable,red
Sweet pepper
Tomatoes
Cherry Tomatoes
Keyword Generated
Expert Domain
50BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Fruit vegetable,red
SOLANACEAE
Sweet pepper
colorred
NT
Tomatoes
CAPSICUM
Cherry Tomatoes
NICOTIANA
Keyword Generated
Expert Domain
51Keyword Generated
52Plant in same family
Keyword Generated
53BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
CAPSICUM
Keyword Generated
Expert Domain
54BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
Nicotiana
CAPSICUM
NICOTIANA
Keyword Generated
Expert Domain
55BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
Nicotiana
CAPSICUM
NICOTIANA
Keyword Generated
Expert Domain
56BT
Commercial Vegetable
VEGETTABLES
BROCCOLI
broccoli
typeleaf vegetable
colorgreen
carrot
SWEET PEPPER
typefruit vegetable
tomato
colorred, green, yellow
User Category
TOMATOES
typefruit vegetable
tomato
colorred, yellow
NT
tomatoes
tomatoes
CHERRY TOMATOES
typefruit vegetable
Keyword Assigned
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
tomato
SOLANACEAE
Tomato
colorred
NT
Tomatoes
CAPSICUM
Cherry Tomatoes
NICOTIANA
Keyword Generated
Expert Domain
57Conclusion and Next steps
- This is a preliminary introduction of the
language, with a few of its many possibilities. - Structures not mentioned in details here have not
yet been firmly specified. These structures are
rules, maps, and contexts, which are incorporated
to extend the potentials in handling deductions,
multilingual operations, domain-dependent
retrievals, etc.
58Next Steps
- Revise the Idea
- Continue the Implementation
- Aligner Tool
- GUI tools for Thesaurus maintenance
- Short - term solutions to language variability
problems by exploiting available knowledge
sources with available techniques - Long-range approach need high quality language
understanding , i.e., Automatic thesaurus
construction - System of Agricultural Information Summarization
and Translation
59Thank you