Scaling Up Word Sense Disambiguation via Parallel Texts - PowerPoint PPT Presentation

1 / 29
About This Presentation
Title:

Scaling Up Word Sense Disambiguation via Parallel Texts

Description:

Sense 2: A passage for water. Sense 3: A long narrow furrow ... Sense 6: A bodily passage or tube. Sense 7: A television station and its programs. 5 ... – PowerPoint PPT presentation

Number of Views:44
Avg rating:3.0/5.0
Slides: 30
Provided by: wingCom
Category:

less

Transcript and Presenter's Notes

Title: Scaling Up Word Sense Disambiguation via Parallel Texts


1
Scaling Up Word Sense Disambiguation via
Parallel Texts
  • Yee Seng Chan
  • Hwee Tou Ng
  • Department of Computer Science
  • National University of Singapore

2
Supervised WSD
  • Word Sense Disambiguation (WSD)
  • Identifying the correct meaning, or sense, of a
    word in context
  • Supervised learning
  • Successful approach
  • Collect corpus where each ambiguous word is
    annotated with the correct sense
  • Current systems usually rely on SEMCOR, a
    relatively small manually annotated corpus,
    affecting scalability

3
Data Acquisition
  • Need to tackle data acquisition bottleneck
  • Manually annotated corpora
  • DSO corpus (Ng Lee, 1996)
  • Open Mind Word Expert (OMWE) (Chklovski
    Mihalcea, 2002)
  • Parallel texts
  • Our prior work (Ng, Wang, Chan, 2003) exploited
    English-Chinese parallel texts for WSD

4
WordNet Senses of channel
  • Sense 1 A path over which electrical signals can
    pass
  • Sense 2 A passage for water
  • Sense 3 A long narrow furrow
  • Sense 4 A relatively narrow body of water
  • Sense 5 A means of communication or access
  • Sense 6 A bodily passage or tube
  • Sense 7 A television station and its programs

5
Chinese Translations of channel
  • Sense 1 ?? (pin dao)
  • Sense 2 ?? (shui dao), ?? (shui qu), ??? (pai
    shui qu)
  • Sense 3 ? (gou)
  • Sense 4 ?? (hai xia)
  • Sense 5 ?? (tu jing)
  • Sense 6 ?? (dao guan)
  • Sense 7 ?? (pin dao)

6
Parallel Texts for WSD
The institutions have already consulted the
staff concerned through various channels,
including discussion with the staff
representatives.
???????????????????????,????????????
7
Approach
  1. Use manually translated English-Chinese parallel
    texts
  2. Parallel text alignment
  3. Manually provide Chinese translations for WordNet
    senses of a word (serve as sense-tags)
  4. Gather training examples from the English portion
    of parallel texts
  5. Train WSD classifiers to disambiguate English
    words in new contexts

8
Issues
  • (Ng, Wang, Chan 2003) evaluated on 22 nouns.
    Can this approach scale up to a large set of
    nouns?
  • Previous evaluation was on lumped senses. How
    would it perform in a fine-grained disambiguation
    setting?
  • In practice, would any difficulties arise in the
    gathering of training examples from parallel
    texts?

9
Size of Parallel Corpora
Parallel Corpora English (Mwords/MB) Chinese (Mchars/MB)
Hong Kong Hansards 39.9 / 223.2 35.4 / 146.8
Hong Kong News 16.8 / 96.4 15.3 / 67.6
Hong Kong Laws 9.9 / 53.7 9.2 / 37.5
Sinorama 3.8 / 20.5 3.3 / 13.5
Xinhua News 2.1 / 11.9 2.1 / 8.9
English Translation of Chinese Treebank 0.1 / 0.7 0.1 / 0.4
Sub-total 72.6 / 406.4 65.4 / 274.7
Total 138 / 681.1 138 / 681.1
10
Parallel Text Alignment
  • Sentence alignment
  • Corpora available in sentence-aligned form
  • Pre-processing
  • English tokenization
  • Chinese word segmentation
  • Word alignment
  • GIZA (Och Ney, 2000)

11
Selection of Translations
  • WordNet 1.7 as sense inventory
  • Chinese translations from 2 sources
  • Oxford Advanced Learners English-Chinese
    dictionary
  • Kingsoft Powerword 2003 (Chinese translation of
    the American Heritage dictionary)
  • Providing Chinese translations for all the
    WordNet senses of a word takes 15 minutes on
    average.
  • If the same Chinese translation is assigned to
    several senses, only the least numbered sense
    will have a valid translation

12
Scope of Experiments
  • Aim scale up to a large set of nouns
  • Frequently occurring nouns are highly ambiguous.
  • Maximize benefits
  • Select 800 most frequent noun types in the Brown
    corpus (BC)
  • Represents 60 of noun tokens in BC

13
WSD
  • Used the WSD program of (Lee Ng, 2002)
  • Knowledge sources parts-of-speech, surrounding
    words, local collocations
  • Learning algorithm Naïve Bayes
  • Achieves state-of-the-art WSD accuracy

14
Evaluation Set
  • Suitable evaluation data set set of nouns in the
    SENSEVAL-2 English all-words task

15
Summary Figures
Noun set No. of noun types No. of noun tokens WNs1 accuracy () Avg. no. of senses
All nouns 437 1067 71.9 4.23
MFSet 212 494 61.1 5.89
All - MFSet 225 573 81.2 2.67
16
Evaluation on MFSet
  • Gather parallel text examples for nouns in MFSet
  • For comparison, what is the accuracy of training
    on manually annotated examples?
  • SEMCOR (SC)
  • SEMCOR OMWE (SCOM)

17
Evaluation Results (in )
System Evaluation set
System MFSet
S1 (best SE2 system) 72.9
S2 65.4
S3 64.4
WNs1 (WordNet sense 1) 61.1

SC (SEMCOR) 67.8
SCOM (SEMCOR OMWE) 68.4
P1 (parallel text) 69.6
18
Evaluation on All Nouns
  • Want an indication of P1 performance on all nouns
  • Expanded evaluation set to all nouns in
    SENSEVAL-2 English all-words task
  • Used WNs1 strategy for nouns where parallel text
    examples are not available

19
Evaluation Results (in )
System Evaluation set Evaluation set
System MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2
SCOM (SEMCOR OMWE) 68.4 76.5
P1 (parallel text) 69.6 75.8
20
Lack of Matches
  • Lack of matching English occurrences for some
    Chinese translations
  • Sense 7 of noun report
  • the general estimation that the public has for a
    person
  • assigned translation ?? (ming sheng)
  • In parallel corpus, no occurrences of report
    aligned to ?? (ming sheng)
  • No examples gathered for sense 7 of report
  • Affects recall

21
Examples from other Nouns
  • Can gather examples for sense 7 of report from
    other English nouns having the same corresponding
    Chinese translations

Sense 7 of report the general estimation that
the public has for a person
Sense 3 of name a persons reputation
?? (ming sheng)
22
Evaluation Results (in )
System Evaluation set Evaluation set
System MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2
SCOM (SEMCOR OMWE) 68.4 76.5
P1 (parallel text) 69.6 75.8
P2 (P1 noun substitution) 70.7 76.3
23
JCN Measure
  • Semantic distance measure of Jiang Conrath
    (1997), provides a reliable estimate of the
    distance between two WordNet synsets Dist(s1,s2)
  • JCN
  • Information content (IC) of concept c
  • Link strength LS(c,p) of edge
  • Distance between two synsets

24
Similarity Measure
  • We used the WordNet Similarity package (Pedersen,
    Patwardhan Michelizzi, 2004)
  • provide a similarity score between WordNet
    synsets based on jcn measure jcn(s1,s2)
    1/Dist(s1,s2)
  • In earlier example, obtain similarity score
    jcn(s1,s2), where
  • s1 sense 7 of report
  • s2 sense 3 of name

25
Incorporating JCN Measure
  • In performing WSD with a naïve Bayes classifier,
    sense s assigned to example with features f1, ,
    fn is chosen so as to maximize
  • A training example gathered from another English
    noun based on a common Chinese translation
    contributes a fractional count to Count(s) and
    Count(fj,s), based on jcn(s1,s2).

26
Evaluation Results (in )
System Evaluation set Evaluation set
System MFSet All nouns
S1 (best SE2 system) 72.9 78.0
S2 65.4 74.5
S3 64.4 70.0
WNs1 (WordNet sense 1) 61.1 71.9

SC (SEMCOR) 67.8 76.2
SCOM (SEMCOR OMWE) 68.4 76.5
P1 (parallel texts) 69.6 75.8
P2 (P1 noun substitution) 70.7 76.3
P2jcn (P2 jcn) 72.7 77.2
27
Paired t-test for MFSet
System S1 P1 P2 P2jcn SC SCOM WNs1
S1 gtgt gt gtgt
P1 ltlt gtgt
P2 lt gt gtgt
P2jcn gtgt gt gtgt
SC gtgt
SCOM gtgt
WNs1
gtgt, ltlt p-value 0.01 gt, lt p-value
(0.01, 0.05 p-value gt 0.05
28
Paired t-test for All Nouns
System S1 P1 P2 P2jcn SC SCOM WNs1
S1 gt gtgt
P1 lt gtgt
P2 gtgt
P2jcn gtgt
SC gtgt
SCOM gtgt
WNs1
gtgt, ltlt p-value 0.01 gt, lt p-value
(0.01, 0.05 p-value gt 0.05
29
Conclusion
  • Tackling the data acquisition bottleneck is
    crucial
  • Gathering examples for WSD from parallel texts is
    scalable to a large set of nouns
  • Training on parallel text examples can outperform
    training on manually annotated data, and achieves
    performance comparable to the best system of
    SENSEVAL-2 English all-words task
Write a Comment
User Comments (0)
About PowerShow.com