Word Sense Disambiguation in Queries - PowerPoint PPT Presentation

1 / 16

About This Presentation

Title:

Word Sense Disambiguation in Queries

Description:

Number of Views:104

Avg rating:3.0/5.0

Slides: 17

Provided by: nlgCsie

Category:

Tags: disambiguation | hyponym | queries | sense | word

Transcript and Presenter's Notes

Title: Word Sense Disambiguation in Queries

1
Word Sense Disambiguation in Queries

2
Abstract

This paper present a new approach to determine
the senses of words in queries by using WordNet.
Noun phrases in a query are determined first.
Pieces of information associated with query words
are compared to assign senses to these words.
A guess and a web search are applied if
necessary.
Experimental results show that this approach has
100 applicability and 90 accuracy for WSD.

3
Introduction

WSD
Several approaches such as machine learning and
dictionary based ones are developed.
The best reported result for WSD is 71.2.
Mihalcea, 2002
WSD is important for IR
Adding appropriate synonyms ad hyponyms to a
query can improve retrieval effectiveness.
Adding synonyms of incorrect senses of a query
term will deteriorate IR performance.

4
A Brief Introduction to WordNet

5
The WSD Algorithm

Noun phrases (w, w) in the query are detected.
In WordNet, the pieces of information of the
words which form a phrase are compared to assign
senses to these words.
Synonyms, hyponyms, definitions of synonyms and
hyponyms, and domains.
If the sense of a query word has not been
identified, information of other query words are
used by through the same process as II.
If the sense of a query word has not been
identified, a guess is applied if it has at least
50 chance of being correct.
A Web search is applied if there are still some
words whose senses have not been determined.

6
Case Analysis of Comparing WordNet Information

Case 1
w and w have a common synonym w.
? The senses of w and w are the synset
containing w
Full matching both w and w have the same part
of speech.
Partial matching not full matching
Case 2
w or one of its synonyms appears in the
definition of the jth sense of w, D(w)j.
Subcase 1
w appears in D(w)j
? The sense of w is determined but the sense of
w not.
Subcase 2
A synonym of the ith sense of w appears in D(w)j
? Both the senses of w and w are determined.

7
Case Analysis of Comparing WordNet Information

Case 6
A term t in the hyponym synset, H, of w appears
in the definitions of several hyponym synsets of
w.
? The sense of w is a hypernym synset of H.
Case 8
There are content words in common between the
definition of a synset of w, and the definition
of a hyponym synset of w.
Case 9
There is a common term in a hyponym synset of w
and a hyponym synset of w.
Case 11
One or more senses of w and w belong to the
same domain.

8
Case Conflict Resolution

A term w may be assigned different sense due to
the situation of satisfying several cases. ? case
conflict
The likelihood of choosing a sense of w is a
function of 3 parameters
Case weight the historical accuracy of the case
in determining the sense of w.
Sense weight the frequency of use of the sense
of w.
Supporting weight the historical accuracy of the
case which determines the sense of w.

9
Case Weight

CK-F and CK-P represent the full match and
partial match cases, respectively, where K varies
among the 11 cases.
case_wt(CK-X)
The weights of all cases are normalized so that
the sum is 1.
5-fold cross validation.

10
Sense Weight and Supporting Weight

The sense weight of a sense wi of term w is
sense_wt(w, wi) f(w, wi) / F(w)
where is the frequency of use of the sense wi of
w and F(w) is the sum of the frequencies over all
senses of w.
Suppose w is disambiguated to sense wi using term
w with sense wj
sp_wt(wj)

11
The Disambiguation Function

12
Web Assisted Disambiguation

The query is submitted to Google and the top 20
documents are retrieved.
For each document, find a window of n words, say
50, which contains all query terms.
All contents words in the window except w, form a
vector.
The vectors of the 20 documents are put together
to be compared with the definitions of senses of
w.
The similarity function cosine

13
Experiment on WSD

14
The Retrieval Model

Noun phrases in queries are automatically
identified and used for retrieval.
The retrieval model considers that phrases are
more important than individual words. Meng,
2004
Disambiguated query terms bring in new terms from
WordNet.
Pseudo-feedback is also used to expand query.
Additional weights are assigned to feedback terms
that are semantically related to the
disambiguated query terms.

15
Experiment Result on IR

16
Conclusion

This paper provided an effective approach to
disambiguate word senses in short queries.
This approach can be applied to 100 of ambiguous
terms and achieve an accuracy of 90,
significantly better than any existing method.
On average, the IR effectiveness of our system is
7 better than the best result reported in the
literature.

Write a Comment

User Comments (0)