Simple Features for Chinese Word Sense Disambiguation - PowerPoint PPT Presentation

1 / 15
About This Presentation
Title:

Simple Features for Chinese Word Sense Disambiguation

Description:

CETA (Chinese-English Translation Assistance) Dictionary ... Types of features that are important for English and Chinese are different. ... – PowerPoint PPT presentation

Number of Views:59
Avg rating:3.0/5.0
Slides: 16
Provided by: cheri1
Category:

less

Transcript and Presenter's Notes

Title: Simple Features for Chinese Word Sense Disambiguation


1
Simple Features for Chinese Word Sense
Disambiguation
  • Hoa Trang Dang, Ching-yi Chia, Martha Palmer,
    Fu-Dong Chiou
  • Computer and Information Science
  • University of Pennsylvania
  • htd, chingyc, mpalmer, chioufd_at_unagi.cis.upenn.e
    du

2
Overview
  • Maximum entropy WSD feature types
  • English Senseval2 verbs
  • Chinese
  • Penn Chinese Treebank
  • Peoples Daily News

3
English Senseval2 verbs
  • Primarily Penn Treebank WSJ corpus
  • WordNet 1.7 sense inventory
  • 29 verbs
  • 15.6 senses/verb in corpus
  • baseline (most frequent sense) 40
  • best system performance 60

4
Local Collocational Features (English)
  • Collocational features for w
  • word w
  • pos of w
  • pos of words at positions 1, -1 relative to w
  • words at positions -2, -1, 1, 2 relative to w

5
Local Syntactic Features (English)
  • Syntactic features
  • whether or not the sentence is passive
  • whether there is a subject, direct object,
    indirect object, or clausal complement
  • the words (if any) in the positions of subject,
    direct object, indirect object, particle,
    prepositional complement (and its object)

6
Local Semantic Features (English)
  • Semantic features
  • a Named Entity tag (PERSON, ORGANIZATION,
    LOCATION) for proper nouns
  • WordNet synsets and hypernyms for the nouns

7
Overall Accuracy of System (English)
Feature Type Accuracy Collocation 48.3
Collocation Syntax 53.9 Collocation
Syntax Semantics 59.0 Collocation
Topic 52.9 Collocation Syntax
Topic 54.2 Collocation Syntax Semantics
Topic 60.2
8
Data Preparation (Chinese)
  • Penn Chinese Treebank (100K words)
  • CETA (Chinese-English Translation Assistance)
    Dictionary
  • 28 words (multiple verb senses, possibly other
    pos)
  • 3.5 senses/word in corpus
  • Baseline (most frequent sense) 77

9
Local Collocational Features (Chinese)
  • Collocational Features
  • word
  • pos
  • word-2, word-1, word1, word2
  • pos-1, pos1
  • followsVerb

10
Local Syntactic Features (Chinese)
  • Syntactic Features
  • hassubj
  • subj
  • hasobj
  • obj-p
  • obj
  • hasinobj
  • Comp-VP
  • VPComp
  • Comp-IP
  • hasprd

11
Local Semantic Features (Chinese)
  • Semantic Features (for verbs only)
  • generated by assigning a HowNet noun category to
    each subject and object
  • subjsem
  • objsem

12
Overall Accuracy of Maximum Entropy System (CTB)
Feature Type Accuracy Std Dev Collocation (no
pos) 86.8 1.0 Collocation 93.4 0.5 Coll
ocation Syntax 94.4 0.4 Collocation
Syntax Semantics 94.4 0.6 Collocation
Topic 90.3 1.0 Collocation Syntax
Topic 92.7 0.9 Collocation Syntax
Semantics Topic 92.8 0.8 Baseline 76.7
13
Data Preparation (PDN)
  • Peoples Daily News (PDN)
  • Five words with low accuracy and counts in CTB
    subsequently sense-tagged in PDN (1M words).
  • About 200 sentences/word from PDN.
  • 8.2 senses/verb in corpus
  • Baseline (most frequent sense) 58
  • Automatic segmentation, pos-tagging, parsing

14
Overall Accuracy of Maximum Entropy System (PDN)
Feature Type Accuracy Std Dev Collocation (no
pos) 72.3 2.2 Collocation 70.3 2.9 Coll
ocation Syntax 71.7 3.9 Collocation
Syntax Semantics 71.7 4.2 Collocation
Topic 73.3 3.2 Collocation Syntax
Topic 72.6 2.9 Collocation Syntax
Semantics Topic 73.0 3.4 Baseline 57.6
15
Conclusion
  • Types of features that are important for English
    and Chinese are different.
  • Parse information is useful for English WSD.
  • Lexical collocational information may be
    sufficient for Chinese.
  • Chinese word sense disambiguation addressed at
    segmentation level
Write a Comment
User Comments (0)
About PowerShow.com