Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying - PowerPoint PPT Presentation

About This Presentation

Title:

Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying

Description:

Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying. Department of Computing ... a natural effortlessness; 'a happy readiness of conversation'--Jane Austen. 7. ??(S) ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 27

Provided by: csyr

Learn more at: http://www.lrec-conf.org

Category:

more less

Transcript and Presenter's Notes

Title: Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying

1
Chinese Core Ontology Constructionfrom a
Bilingual Term Bank

Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying
Department of Computing
The Hong Kong Polytechnic University

2
Outline

Introduction
Related Works
Algorithm Design COCA
Performance Evaluation
Conclusion

3
Introduction

What is a Core Ontology
A mid-level ontology
Bridges the gap between an upper ontology and a
domain ontology

4
Concepts and Terminologies

Upper Ontology
A general ontology to ensure reusability across
different domains (e.g. Computer Program in
SUMO)
Domain Ontology
An ontology conceptualize a specific domain
(e.g. Free Software in IT domain)
More application dependent, more extents of
concepts
Midlevel Ontology(Core Concept)
Basic concepts of a domain
More application independent, more intents of
concepts.
core ontology (e.g. Software)
Frequently used, ability to form other concepts
Core Terms
Lexical units of core concepts

5
Related Works

Manually constructed ontologies
SUMO
Famous upper level ontology works based on
lexicon
CoreLex (Buitelaar, P., 1998)
EuroWordnet (Rodríguez, 1998 )
Ontology harmonization Core ontology
Towards a Core Ontology for Information
Integration (M. Doerr, 2003)
A most similar work
Enriching Core Ontology with Domain Thesaurus
through Concept and Relation Classification
(Huang, 2007)
Use Concept and Relation Classification to Enrich
core ontology

6
Our Previous Works

Chinese terminology extraction
Chinese core term extraction(Ji et al, 2007)
Preliminary work on automatic construction of
core ontology construction using English-Chinese
Term Bank (MRCOCA, Ontolex 2007, Chen, 2007)
Bilingual lexicon
Extended strings
Frequency information in synset
Weight from extended strings are integrated into
final weight by simple addition
Mapping to synset and SUMO can only achieve
accuracy of about 50

7
Issues

What kind of concept should be included?
How to identify core concepts
If through core terms, disambiguation
What and how to identify relations?
Making use of available resources
Chinese NLP resource scares
English NLP resources abundant

8
Requirements of Core Ontology

The concepts must be widely accepted and commonly
referenced
Corresponding core terms must be highly used and
productive
The concepts/terms can be mapped to upper
ontology. So the core ontology can inherit the
attributes provided by upper ontology

9
Core Ontology Construction Algorithm(COCA) for
Chinese

Extract Chinese core terms from a bilingual term
bank
Mapped core term Tc to English terms
Mapping English terms to WordNet
Mapping synset to a upper ontology concept in
SUMO

10
COCA - Resources Used

ITCTerm
a domain specific core term list (Chen, 2007 )
CETBank
Chinese-English bilingual term bank
1,500 most productive core terms extracted can
serve as suffixes to form more than 50 of the
terms in CETBank)
WordNet
SUMO
Mappings between WordNet and SUMO

11
The Framework of COCA
12
COCA Statistical Translation Module

Translation ambiguity
Each Chinese core term TC ? ITCTerm has a set
of translations T_SetE , TE ?T_SetE
Objective
to estimate the likelihood of every translation
using extended terms of TC
P(TE TC) for all TE ? T_SetE.

13
COCA - Sense Disambiguation Module

Mapping a given TC to the Synset S through its
translation set T_SetE (TC)
Mapping probability of a English term TE to take
a synset S using freq. info in WordNet
Mapping probability of TC to take a particular
synset S via an English translation TE

14
COCA - Concept Selection Module

Combining three features
multi-path feature
hypernyms feature
part-of-speech feature
Using Union Probability of Independent Events

15
Feature 1 Multi-Paths to Synset

Multiple paths is
the path between Chinese core terms and synset
via different English translations

The feature merges the probability of multiple
paths
16
Feature 2 Hyponyms in domain

Incorporate info on all the extended strings

Extended String uses the core term as headword
and is the hyponym of the core term
Length Ratio
Union Probability of Independent Events
17
Feature 3 Part of Speech

Probability of the POS tag pos(S)
owned by a synset S
given a core term Tc
PoS Tag estimation Heuristics on Adj, Verb, and
noun based on position

18
Integrate Features

Using Union Probability of Independent Events

19
Evaluation

Algorithm Output
A pair of lt Tc_i, Synseti gt for each Chinese core
term with the highest mapping weight
Evaluation Standard
For each Tc_i, whether their mappings to Synset
are the best match with respect to this domain
Answer Preparation
Answer is manually made by two experts in IT
domain respectively on the same set of data

Performance
The evaluation conducted on the top N frequent
core terms
The algorithm COCA achieves 71 in accuracy (N is
28 in this paper)
Compared to the result of MRCOCA (Chen, 2007)
which achieved only 50
Two examples of core term to syntset mapping
generated by the algorithm are given for ?? and
??.

21
(No Transcript)
22
Conclusion

Evaluation of COCA repeated on an English-Chinese
bilingual Term bank with more than 130K entries
show that the algorithm is
42 improved in accuracy compared to MRCOCA
(Our Previous Works)
The three features and the new algorithm based on
probability made the improvement

Term bank can help to quickly construct domain
core ontology by selecting the concept nodes and
relations used in domain
Bilingual term bank can further introduce the
second language realization of the core ontology
effectively and automatically

24
Future Works

Evaluation on three features
how effective they are
how much they contribute to the final performance
Consideration of more features such as
abbreviation, synset of head word of core term
and etc.
Use of other resources

26
Q

Write a Comment

User Comments (0)