Language and Tools for Lexical Resource Management - PowerPoint PPT Presentation

About This Presentation
Title:

Language and Tools for Lexical Resource Management

Description:

having a rich knowledge and data in agricultural field, ... In terms of semantic domain covered by the application ... of linguistic knowledge structures. ... – PowerPoint PPT presentation

Number of Views:63
Avg rating:3.0/5.0
Slides: 60
Provided by: libu160
Category:

less

Transcript and Presenter's Notes

Title: Language and Tools for Lexical Resource Management


1
Language and Tools for Lexical Resource
Management
  • Asanee Kawtrakul (1)
  • Aree Thunkijjanukij (2)
  • Preeda Lertpongwipusana(1)
  • Poonna Yospanya(1)
  • (1)Department of Computer Engineering, Faculty of
    Engineering,
  • (2) Thai National AGRIS center
  • Kasetsart University

2
Acknowledgement
  • JIRCUS Japan International Research Center for
    Agricultural Sciences
  • Organizing committee
  • Kasetsart University

3
Outline
  • Background Motivation
  • Problems in Lexical Resource Preparation
  • Requirements for Lexical Resource Management
  • Proposed Language and tools
  • Conclusion and Next steps

4
Background and Motivation
  • Thailand is the agricultural basis country
  • having a rich knowledge and data in agricultural
    field,
  • A great quantity of agricultural information was
    scattered in unstructured and unrelated text
  • Skimming/Digesting and integrating becomes
    essential
  • Knowledge is around the world
  • Knowledge Discovery without language barrier is
    also needed

5
The Basic Idea behind..
GraphicalUser Interface
6
Textual Data as a Input
  • Let us focus on Canadas agricultural products.
    In 1998, there were 1,216 registered commercial
    egg producers in Canada. Ontario produced 39.8
    of all eggs in Canada, Quebec was second with
    16.6. The western provinces have a combined egg
    production of 35.6 and the eastern provinces
    have a combined production of 8.0.

With a courtesy of Agriculture and Agri-Food
Canada, http//www.agr.ca/cb
7
Summarization and Translation as a Result
8
The Development of Agricultural System for
Knowledge Acquisition and Dissemination
  • 5 years Project (2001-2005)
  • The Collaborative work between
  • Thai National AGRIS center
  • Providing Bilingual Thesaurus (AGROVOC)
  • Department of Computer Engineering
  • Developing NLP techniques for Searching,
    Summarizing and Translation including tools for
    lexical resource management
  • Funded by Kasetsart University Research and
    Development Institution

9
Acquisition System
Linguist/Domain Expert
Very Large Corpus
Rules
Thesaurus
Lexicon
Linguistic Knowledge Base
  • Intelligent Search Engine
  • With Translation
  • With Summarization

Document Indexing Clustering
Gathering Module
Internet/Intranet
10
Thai Agricultural Thesaurus
  • Total number of English vocabulary is 27,531
    terms
  • Translate in to Thai only 10,280 terms (except
    scientific names)
  • Scientific name were not be translated
  • ex. Oryza (genus) sativa (specy) of rice or
    family

11
Problem in hand-coded Thesaurus
  • Scalability
  • Reliability and Coherence
  • Rigidity
  • Cost

12
Foods
Processed Products
Bakery Product
Canned Products
Deistic Foods
Dried Products
Frozen Foods
Frozen Products
Fermented Foods
Fermented Products
Alcoholic Beverage
milk
Fermented Foods
Fermented Fish
13
Foods
Processed Products
Products
Fermented Foods
Local Product
Fermented Fish
14
Commercial Vegetables The September index, at
107, was up 1.9 percent from last month but 3.6
percent below September 1998. Price increases
for lettuce, tomatoes, broccoli, and celery more
than offset price decreases for onions, carrots,
and cucumbers
Commercial Vegetable
tomatoes
Cucumbers
Carrots
Broccoli
15
Commercial Vegetable
broccoli
carrot
tomato
User Category
tomato
tomatoes
Keyword Assigned
16
Other Major Problems (1)
  • Accessing to textual information
  • Language variation
  • Many ways to express the same idea
  • Ex thinning flower uses deblossoming
  • thinning branch uses pruning
  • how the computer can know that words a person
    uses are related to words found in stored text?
  • Ex user thinning branch
  • computer pruning

17
Requirement (1)
  • Accessing to textual information
  • Need intelligent browsing from related concept to
    related concept, rather than from occurrence of
    stemmed character strings

18
Other Major Problems (2)
  • Transforming from unstructured to structured
    information

19
Requirement (2)
  • Need Application-based Frame about product price
  • Knowledge representation in table form
  • Consisting of attributes and their values

Attributes
Values
20
Problems in Translation Pragmatic and Semantic
0.97 averagePrice of year from1990-1992
Using Ontology
September Of year ??
  • The September All Farm Products Index was 97
    percent of its 1990-92 base, down1.0 percent from
    the August index and 2.0 percent below the
    September 1998
  • Index

August Year1997
Down 0.02price(September 1998)
21
Year 1990-1992 meaning
22
Requirement (3)
  • Lexicon should having the semantic constraints
    between lexical entities, restriction on usage
    categories

23
Summary of Problems related to lexicon
  • In terms of coverage
  • Extensional coverage, i.e., number of entries
  • Intensional coverage, i.e., the number of
    information fields
  • In terms of semantic domain covered by the
    application
  • Meaning Interpretation with respect to objects,
    subject matter, topics of discourse, and
    pragmatic interpretation
  • The user category with reference to the intended
    system users
  • Commercial product vs Plant products vs Family
    products

24
One Solution
  • Encoding world knowledge in the structures
    attached to each lexical item which needs both
    language and tools

25
The Design of Lexicon Requirement Specification
  • Macrostructure Lexicon structure in terms of
    relations between lexical entries
  • i.e. Hierarchical taxonomies which are
    characteristic of thesauri of semantically
    related word family
  • Microstructure types of information for each
    entry
  • Pronunciation or phonemic transcription
  • Syntactic properties
  • Meaning
  • Pragmatics of their use in real context and
    language

26
Microstructure (cont)
  • Lexical entity could contain slots/scripts for
    each specific domain and need intelligent
    Analyzer and understanding language
  • Supplies information extraction
  • Supplies the missing value

27
Lexical Resource Management Language
  • which is able to
  • Handle heterogeneity of linguistic knowledge
    structures.
  • Handle exceptions and inconsistencies of natural
    languages.
  • Provide an intuitive means to store and
    manipulate both linguistic and world knowledge.

28
Language Features
  • The language is designed in a way that will
    enable
  • Supports for heterogeneous structures.
  • Sufficient provisions to handle exceptions and
    inconsistencies of natural languages (this is
    achieved through the /- operators).
  • Deduction of knowledge from rules.
  • Detection and prevention of potential integrity
    violations.

29
Language and Tools Specification requirement
  • Flexibility almost any structures can be
    defined in this model.
  • Extensibility extending a structure is simple.
  • Maturability structure reformation and
    deformation are supported.
  • Integrity meta-relations help prevent malformed
    or ill-semantic data entries.
  • Dealing with inconsistencies is feasible.

30
Some Syntactic Elements
  • Knowledge manipulations are achieved through
    these primitives
  • def is used to define structures not already
    existing.
  • redef changes aspects of existing structures.
  • undef removes specified structures from the
    knowledge base.
  • ret is used to retrieve structures from the
    knowledge base.

31
Examples
  • Hierarchies tree structures representing
    generalization semantics, or classes, of atoms.

thing
animate
inanimate
animal
human
A semantic tree represented by a hierarchy
structure
32
Usage Examples
  • Defining a hierarchy
  • def thing(animate(humananimal)inanimate).
  • Adding the plant and vehicle concepts
  • def animate(plantvehicle).
  • Reparenting the vehicle concept
  • redef animate(vehicle) inanimate(vehicle).
  • Removing the human concept
  • undef human. (provided that there is only a
    single instance of human)

33
Usage Examples (2)
  • Defining case frames for verbs
  • First, we need to define meta-relations for words
    belonging to the sub-hierarchy verb.
  • def meta case(verb, subthing).
  • def meta case(verb, subthing, objthing).
  • Then, we define case frames for several verbs.
  • def case(eat, subhumananimal, objfood).
  • def case(fly, subbird-penguin). (here, we
    emphasize the use of /- operators)

34
Hierarchy Set
c1
c3
f1
f4
w2
w1
c2
f2
f3
p1
w4
w3
w7
w6
w5
35
Defining a Hierarchy
c1
def c1(w1(w3)c2(w4)w2).
def w5w6 under w4.
w2
def p1(w7) under w2.
w1
c2
p1
w4
w3
w7
w6
w5
36
Manipulating the Hierarchy
c1
redef w4 under w2.
undef w1.
w2
w1
c2
p1
w4
w3
w7
w6
w5
37
Defining a Set
c3
f1
f4
f2
f3
def c3f1f2f3.
def f4 in c3.
38
Defining a Relation
def meta r1(c2, c3).
Template defined.
c2
def r1(w4, f1).
Relation defined.
def r1(w1, f3).
Constraint violated. Definition not allowed.
c2
r1
w1
c3
w4
r1
f1
f4
f2
w6
f3
w5
inherited
39
Synset Surrogates
  • A synset is an unnamed set identified by its
    unique ID.
  • Members of a synset are considered synonymous
    with different degrees of synonymity.
  • Distance graph is automatically constructed
    within a synset with surrogates being
    representatives of synset members.
  • Entities with identical features are attached to
    the same surrogates.

40
Synset Surrogates
f4
f1
p1
f1
w1
surrogate network internally constructed
f1
p2
s2
f2
s1
w2
w6
f4
synset1
s3
s5
f3
p3
s4
w3
f3
f2
w4
f1
f4
f4
f3
41
Synset Multilingual Lexicon
  • Synset members are not confined within language
    scope, that is, entities from different language
    may belong to the same synset.
  • Distance matrix are computed from number of
    different features over each pair of surrogates.
  • Traversing from a word to nearest-distant words
    is handled by the system. We can determine words
    with potentially nearest semantics here.

42
Expected Result
43
Keyword Generated
44
Fruit vegetable,red
Keyword Generated
45
BT
VEGETTABLES
tomatoes
Fruit vegetable,red
Keyword Generated
Expert Domain
46
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
tomatoes
Fruit vegetable,red
Keyword Generated
Expert Domain
47
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
tomatoes
Fruit vegetable,red
Sweet pepper
Keyword Generated
Expert Domain
48
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
tomatoes
Fruit vegetable,red
Sweet pepper
Tomatoes
Keyword Generated
Expert Domain
49
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
Fruit vegetable,red
Sweet pepper
Tomatoes
Cherry Tomatoes
Keyword Generated
Expert Domain
50
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Fruit vegetable,red
SOLANACEAE
Sweet pepper
colorred
NT
Tomatoes
CAPSICUM
Cherry Tomatoes
NICOTIANA
Keyword Generated
Expert Domain
51
Keyword Generated
52
Plant in same family
Keyword Generated
53
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
CAPSICUM
Keyword Generated
Expert Domain
54
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
Nicotiana
CAPSICUM
NICOTIANA
Keyword Generated
Expert Domain
55
BT
VEGETTABLES
BROCCOLI
typeleaf vegetable
colorgreen
SWEET PEPPER
typefruit vegetable
colorred, green, yellow
TOMATOES
typefruit vegetable
colorred, yellow
NT
tomatoes
CHERRY TOMATOES
typefruit vegetable
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
Plant in same family
SOLANACEAE
Capsicum
colorred
NT
Nicotiana
CAPSICUM
NICOTIANA
Keyword Generated
Expert Domain
56
BT
Commercial Vegetable
VEGETTABLES
BROCCOLI
broccoli
typeleaf vegetable
colorgreen
carrot
SWEET PEPPER
typefruit vegetable
tomato
colorred, green, yellow
User Category
TOMATOES
typefruit vegetable
tomato
colorred, yellow
NT
tomatoes
tomatoes
CHERRY TOMATOES
typefruit vegetable
Keyword Assigned
colorred
RT
LYCOPERSICON ESCULENTUM
BT
typetaxonomic
tomato
SOLANACEAE
Tomato
colorred
NT
Tomatoes
CAPSICUM
Cherry Tomatoes
NICOTIANA
Keyword Generated
Expert Domain
57
Conclusion and Next steps
  • This is a preliminary introduction of the
    language, with a few of its many possibilities.
  • Structures not mentioned in details here have not
    yet been firmly specified. These structures are
    rules, maps, and contexts, which are incorporated
    to extend the potentials in handling deductions,
    multilingual operations, domain-dependent
    retrievals, etc.

58
Next Steps
  • Revise the Idea
  • Continue the Implementation
  • Aligner Tool
  • GUI tools for Thesaurus maintenance
  • Short - term solutions to language variability
    problems by exploiting available knowledge
    sources with available techniques
  • Long-range approach need high quality language
    understanding , i.e., Automatic thesaurus
    construction
  • System of Agricultural Information Summarization
    and Translation

59
Thank you
Write a Comment
User Comments (0)
About PowerShow.com