Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying - PowerPoint PPT Presentation

About This Presentation
Title:

Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying

Description:

Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying. Department of Computing ... a natural effortlessness; 'a happy readiness of conversation'--Jane Austen. 7. ??(S) ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 27
Provided by: csyr
Learn more at: http://www.lrec-conf.org
Category:
Tags: chen | cui | gaoying | qin | wenjie | yirong

less

Transcript and Presenter's Notes

Title: Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying


1
Chinese Core Ontology Constructionfrom a
Bilingual Term Bank
  • Chen Yirong, Lu Qin, Li Wenjie, Cui Gaoying
  • Department of Computing
  • The Hong Kong Polytechnic University

2
Outline
  • Introduction
  • Related Works
  • Algorithm Design COCA
  • Performance Evaluation
  • Conclusion

3
Introduction
  • What is a Core Ontology
  • A mid-level ontology
  • Bridges the gap between an upper ontology and a
    domain ontology

4
Concepts and Terminologies
  • Upper Ontology
  • A general ontology to ensure reusability across
    different domains (e.g. Computer Program in
    SUMO)
  • Domain Ontology
  • An ontology conceptualize a specific domain
    (e.g. Free Software in IT domain)
  • More application dependent, more extents of
    concepts
  • Midlevel Ontology(Core Concept)
  • Basic concepts of a domain
  • More application independent, more intents of
    concepts.
  • core ontology (e.g. Software)
  • Frequently used, ability to form other concepts
  • Core Terms
  • Lexical units of core concepts

5
Related Works
  • Manually constructed ontologies
  • SUMO
  • Famous upper level ontology works based on
    lexicon
  • CoreLex (Buitelaar, P., 1998)
  • EuroWordnet (Rodríguez, 1998 )
  • Ontology harmonization Core ontology
  • Towards a Core Ontology for Information
    Integration (M. Doerr, 2003)
  • A most similar work
  • Enriching Core Ontology with Domain Thesaurus
    through Concept and Relation Classification
    (Huang, 2007)
  • Use Concept and Relation Classification to Enrich
    core ontology

6
Our Previous Works
  • Chinese terminology extraction
  • Chinese core term extraction(Ji et al, 2007)
  • Preliminary work on automatic construction of
    core ontology construction using English-Chinese
    Term Bank (MRCOCA, Ontolex 2007, Chen, 2007)
  • Bilingual lexicon
  • Extended strings
  • Frequency information in synset
  • Weight from extended strings are integrated into
    final weight by simple addition
  • Mapping to synset and SUMO can only achieve
    accuracy of about 50

7
Issues
  • What kind of concept should be included?
  • How to identify core concepts
  • If through core terms, disambiguation
  • What and how to identify relations?
  • Making use of available resources
  • Chinese NLP resource scares
  • English NLP resources abundant

8
Requirements of Core Ontology
  • The concepts must be widely accepted and commonly
    referenced
  • Corresponding core terms must be highly used and
    productive
  • The concepts/terms can be mapped to upper
    ontology. So the core ontology can inherit the
    attributes provided by upper ontology

9
Core Ontology Construction Algorithm(COCA) for
Chinese
  • Extract Chinese core terms from a bilingual term
    bank
  • Mapped core term Tc to English terms
  • Mapping English terms to WordNet
  • Mapping synset to a upper ontology concept in
    SUMO

10
COCA - Resources Used
  • ITCTerm
  • a domain specific core term list (Chen, 2007 )
  • CETBank
  • Chinese-English bilingual term bank
  • 1,500 most productive core terms extracted can
    serve as suffixes to form more than 50 of the
    terms in CETBank)
  • WordNet
  • SUMO
  • Mappings between WordNet and SUMO

11
The Framework of COCA
12
COCA Statistical Translation Module
  • Translation ambiguity
  • Each Chinese core term TC ? ITCTerm has a set
    of translations T_SetE , TE ?T_SetE
  • Objective
  • to estimate the likelihood of every translation
    using extended terms of TC
  • P(TE TC) for all TE ? T_SetE.

13
COCA - Sense Disambiguation Module
  • Mapping a given TC to the Synset S through its
    translation set T_SetE (TC)
  • Mapping probability of a English term TE to take
    a synset S using freq. info in WordNet
  • Mapping probability of TC to take a particular
    synset S via an English translation TE

14
COCA - Concept Selection Module
  • Combining three features
  • multi-path feature
  • hypernyms feature
  • part-of-speech feature
  • Using Union Probability of Independent Events

15
Feature 1 Multi-Paths to Synset
  • Multiple paths is
  • the path between Chinese core terms and synset
  • via different English translations

The feature merges the probability of multiple
paths
16
Feature 2 Hyponyms in domain
  • Incorporate info on all the extended strings

Extended String uses the core term as headword
and is the hyponym of the core term
Length Ratio
Union Probability of Independent Events
17
Feature 3 Part of Speech
  • Probability of the POS tag pos(S)
  • owned by a synset S
  • given a core term Tc
  • PoS Tag estimation Heuristics on Adj, Verb, and
    noun based on position

18
Integrate Features
  • Using Union Probability of Independent Events

19
Evaluation
  • Algorithm Output
  • A pair of lt Tc_i, Synseti gt for each Chinese core
    term with the highest mapping weight
  • Evaluation Standard
  • For each Tc_i, whether their mappings to Synset
    are the best match with respect to this domain
  • Answer Preparation
  • Answer is manually made by two experts in IT
    domain respectively on the same set of data

20
  • Performance
  • The evaluation conducted on the top N frequent
    core terms
  • The algorithm COCA achieves 71 in accuracy (N is
    28 in this paper)
  • Compared to the result of MRCOCA (Chen, 2007)
    which achieved only 50
  • Two examples of core term to syntset mapping
    generated by the algorithm are given for ?? and
    ??.

21
(No Transcript)
22
Conclusion
  • Evaluation of COCA repeated on an English-Chinese
    bilingual Term bank with more than 130K entries
    show that the algorithm is
  • 42 improved in accuracy compared to MRCOCA
    (Our Previous Works)
  • The three features and the new algorithm based on
    probability made the improvement

23
  • Term bank can help to quickly construct domain
    core ontology by selecting the concept nodes and
    relations used in domain
  • Bilingual term bank can further introduce the
    second language realization of the core ontology
    effectively and automatically

24
Future Works
  • Evaluation on three features
  • how effective they are
  • how much they contribute to the final performance
  • Consideration of more features such as
    abbreviation, synset of head word of core term
    and etc.
  • Use of other resources

25
  • QA

26
Q
  • A
Write a Comment
User Comments (0)
About PowerShow.com