Word Sense Disambiguation in Queries - PowerPoint PPT Presentation

1 / 16
About This Presentation
Title:

Word Sense Disambiguation in Queries

Description:

Adding appropriate synonyms ad hyponyms to a query can improve retrieval effectiveness. ... have a number of hyponym synsets. Each hyponym synset H(w)ij have ... – PowerPoint PPT presentation

Number of Views:104
Avg rating:3.0/5.0
Slides: 17
Provided by: nlgCsie
Category:

less

Transcript and Presenter's Notes

Title: Word Sense Disambiguation in Queries


1
Word Sense Disambiguation in Queries
  • Shuang Liu, Clement Yu
  • University of Illinois at Chicago
  • Weiyi Meng
  • Binghamton University
  • CIKM 2005

2
Abstract
  • This paper present a new approach to determine
    the senses of words in queries by using WordNet.
  • Noun phrases in a query are determined first.
  • Pieces of information associated with query words
    are compared to assign senses to these words.
  • A guess and a web search are applied if
    necessary.
  • Experimental results show that this approach has
    100 applicability and 90 accuracy for WSD.

3
Introduction
  • WSD
  • Several approaches such as machine learning and
    dictionary based ones are developed.
  • The best reported result for WSD is 71.2.
    Mihalcea, 2002
  • WSD is important for IR
  • Adding appropriate synonyms ad hyponyms to a
    query can improve retrieval effectiveness.
  • Adding synonyms of incorrect senses of a query
    term will deteriorate IR performance.

4
A Brief Introduction to WordNet
  • A term w has possibly many sets (senses) of
    synonyms
  • Each set S(w)i is called a synset in WordNet
  • Each synset is associated with a definition D(w)i
  • Each synset may have a number of hyponym synsets
  • Each hyponym synset H(w)ij have the same but
    narrower meaning
  • D(H(w)ij) represents the definition of H(w)ij
  • Each synset may belong to a domain (category)
    Dom(w)i
  • Ex hit1 belongs to baseball1

5
The WSD Algorithm
  • Noun phrases (w, w) in the query are detected.
  • In WordNet, the pieces of information of the
    words which form a phrase are compared to assign
    senses to these words.
  • Synonyms, hyponyms, definitions of synonyms and
    hyponyms, and domains.
  • If the sense of a query word has not been
    identified, information of other query words are
    used by through the same process as II.
  • If the sense of a query word has not been
    identified, a guess is applied if it has at least
    50 chance of being correct.
  • A Web search is applied if there are still some
    words whose senses have not been determined.

6
Case Analysis of Comparing WordNet Information
  • Case 1
  • w and w have a common synonym w.
  • ? The senses of w and w are the synset
    containing w
  • Full matching both w and w have the same part
    of speech.
  • Partial matching not full matching
  • Case 2
  • w or one of its synonyms appears in the
    definition of the jth sense of w, D(w)j.
  • Subcase 1
  • w appears in D(w)j
  • ? The sense of w is determined but the sense of
    w not.
  • Subcase 2
  • A synonym of the ith sense of w appears in D(w)j
  • ? Both the senses of w and w are determined.

7
Case Analysis of Comparing WordNet Information
  • Case 6
  • A term t in the hyponym synset, H, of w appears
    in the definitions of several hyponym synsets of
    w.
  • ? The sense of w is a hypernym synset of H.
  • Case 8
  • There are content words in common between the
    definition of a synset of w, and the definition
    of a hyponym synset of w.
  • Case 9
  • There is a common term in a hyponym synset of w
    and a hyponym synset of w.
  • Case 11
  • One or more senses of w and w belong to the
    same domain.

8
Case Conflict Resolution
  • A term w may be assigned different sense due to
    the situation of satisfying several cases. ? case
    conflict
  • The likelihood of choosing a sense of w is a
    function of 3 parameters
  • Case weight the historical accuracy of the case
    in determining the sense of w.
  • Sense weight the frequency of use of the sense
    of w.
  • Supporting weight the historical accuracy of the
    case which determines the sense of w.

9
Case Weight
  • CK-F and CK-P represent the full match and
    partial match cases, respectively, where K varies
    among the 11 cases.
  • case_wt(CK-X)
  • The weights of all cases are normalized so that
    the sum is 1.
  • 5-fold cross validation.

10
Sense Weight and Supporting Weight
  • The sense weight of a sense wi of term w is
  • sense_wt(w, wi) f(w, wi) / F(w)
  • where is the frequency of use of the sense wi of
    w and F(w) is the sum of the frequencies over all
    senses of w.
  • Suppose w is disambiguated to sense wi using term
    w with sense wj
  • sp_wt(wj)

11
The Disambiguation Function
  • disam_wt(w, wi) sense_wt(w, wi)
  • An example
  • the queryhealth and computer terminal
  • wterminal, wcomputer

12
Web Assisted Disambiguation
  • The query is submitted to Google and the top 20
    documents are retrieved.
  • For each document, find a window of n words, say
    50, which contains all query terms.
  • All contents words in the window except w, form a
    vector.
  • The vectors of the 20 documents are put together
    to be compared with the definitions of senses of
    w.
  • The similarity function cosine

13
Experiment on WSD
  • 250 queries in recent robust track of TREC
  • There are 258 unique sense terms and 333
    ambiguous terms.

14
The Retrieval Model
  • Noun phrases in queries are automatically
    identified and used for retrieval.
  • The retrieval model considers that phrases are
    more important than individual words. Meng,
    2004
  • Disambiguated query terms bring in new terms from
    WordNet.
  • Pseudo-feedback is also used to expand query.
  • Additional weights are assigned to feedback terms
    that are semantically related to the
    disambiguated query terms.

15
Experiment Result on IR
  • Best-known results
  • Overall 0.3333 Deng, 2004
  • Hard 50 0.1941 Yu, 2004

16
Conclusion
  • This paper provided an effective approach to
    disambiguate word senses in short queries.
  • This approach can be applied to 100 of ambiguous
    terms and achieve an accuracy of 90,
    significantly better than any existing method.
  • On average, the IR effectiveness of our system is
    7 better than the best result reported in the
    literature.
Write a Comment
User Comments (0)
About PowerShow.com