Survey on WSD and IR - PowerPoint PPT Presentation

1 / 47
About This Presentation
Title:

Survey on WSD and IR

Description:

Co-occurrence information was derived from all definition texts in the dictionary. ... Longman's Dictionary of Contemporary English (LDOCE): all its definitions were ... – PowerPoint PPT presentation

Number of Views:93
Avg rating:3.0/5.0
Slides: 48
Provided by: mlo4
Category:
Tags: wsd | dictionary | survey

less

Transcript and Presenter's Notes

Title: Survey on WSD and IR


1
Survey on WSD and IR
  • Apex_at_SJTU

2
WSD Introduction
  • Problems in online news retrieval system
  • query major
  • Articles retrieved
  • about Prime Minister John Major MP
  • major appears as an adjective
  • major appears as a military rank

3
WSD Introduction
  • Gale, Church and Yarowsky (1992) cite work dating
    back to 1950.
  • For many years, WSD was applied only to limited
    domains and a small vocabulary.
  • In recent years, disambiguators are applied to
    resolve the senses of words in a large
    heterogeneous corpus.
  • With a more accurate representation and a query
    also marked up with word sense, researchers
    believe that the accuracy of retrieval would have
    to improve.

4
Approaches to disambiguation
  • Disambiguation based on manually generated rules
  • Disambiguation using evidence from existing
    corpora.

5
Disambiguation based on manually generated rules
  • Weiss (1973)
  • general context rule
  • If the word type appears near to
    print, it most likely meant a small block of
    metal bearing a raised character on one end.
  • template rule
  • If of appears immediately after
    type, it most likely meant a subdivision of a
    particular kind of thing.

6
Weiss (1973)
  • Template rules were better, so replied them
    first.
  • To create rules
  • Examine 20 occurrences of an ambiguous word.
  • Test these manually created rules on a further 30
    occurrences.
  • Accuracy 90
  • Cause for errors idiomatic uses.

7
Disambiguation based on manually generated rules
  • Kelly and Stone (1975)
  • created a set of rules for 6,000 words
  • consisted of contextual rules similar to those
    of Weiss
  • in addition, used grammatical category of a word
    as a strong indicator of sense
  • the train and to train

8
Kelly and Stone (1975)
  • The grammar and context rules were grouped into
    sets so that only certain rules were applied in
    certain situations.
  • Conditional statements controlled the application
    of rule sets.
  • Unlike Weisss system, this disambiguator was
    designed to process a whole sentence at the same
    time.
  • Accuracy not a success

9
Disambiguation based on manually generated rules
  • Small and Rieger (1982) came to similar
    conclusions.
  • When this type of disambiguator was extended to
    work on larger vocabulary, the effort involved in
    building it became too great.
  • Since 1980s, WSD research has concentrated on
    automatically generated rules based on sense
    evidence derived from a machine readable corpus.

10
Disambiguation using evidence from existing
corpora
  • Lesk (1988)
  • Resolve the sense of ash in
  • There was ash from the coal
    fire.
  • Dictionary definition looked up
  • ash(1) The soft grey powder that remains
    after something has been burnt.
  • ash(2) A forest tree common in Britain.
  • Definition of context words looked up
  • coal(1) A black mineral which is dub from the
    earth, which can be burnt to given heat.
  • fire(1) The condition of burning flames,
    light and great heat.
  • fire(2) The act of firing weapons or
    artillery at an enemy.

11
Lesk (1988)
  • Sense definitions are ranked by scoring function
    based on the number of words that co-occur.
  • Questionable how often the word overlap
    necessary for disambiguation occurred.
  • Accuracy very brief experimentation,
  • 50--70
  • No analysis for the failure, although definition
    length is recognized as a possible factor in
    deciding which dictionary to use.

12
Disambiguation using evidence from existing
corpora
  • Wilks et al. (1990)
  • addressed this word overlap problem by using a
    technique of expanding a dictionary definition
    with words that commonly co-occurred with the
    text of that definition.
  • Co-occurrence information was derived from all
    definition texts in the dictionary.

13
Wilks et al. (1990)
  • Longmans Dictionary of Contemporary English
    (LDOCE) all its definitions were written using a
    simplified vocabulary of around 2,000 words.
  • Few synonyms, a distracting element in the
    co-occurrence calculation.
  • bank
  • for economic sense money, check, rob
  • for geographical sense river, flood,
    bridge
  • Accuracy bank in 200 sentences, judged correct
    if it coincides with one manually chosen, 53 at
    fine-grained level(13 senses) and 85 at
    coarse-grained(5 senses) level.
  • They suggested using simulated annealing to
    disambiguate a whole sentence simultaneously.

14
Disambiguating simultaneously
  • Cowie et al. (1992)
  • Accuracy tested on 67 sentences, 47 for
    fine-grained senses while 72 for coarse-grained
    ones.
  • No comparison with Wilks et al.s.
  • No baseline.
  • A possible baseline senses randomly chosen
  • A better one select the most common sense

15
Manually tagging a corpus
  • A technique in POS tagging
  • manually mark up a large text corpus with POS
    tag, and then train a statistical classifier to
    associate features with occurrences of the tags.
  • Ng and Lee (1996)
  • disambiguate 192,000 occurrences of 191 words.
  • examine the following features
  • POS and morphological form of the sense tagged
    word
  • unordered set of its surrounding words
  • local collocations relative to it
  • and if the sense tagged word was a noun, the
    presence of a verb was noted also.

16
Ng and Lee (1996)
  • Experiments
  • separated their corpus into training and test
    sets on an 89--11 split
  • accuracy 63.7 (baseline 58.1)
  • sense definition used were from WordNet, 7.8
    senses per word for nouns and 12.0 senses for
    verbs
  • no comparison possible between WordNet
    definition or LDOCE

17
Using thesauri Yarowsky (1992)
  • Rogets thesaurus 1,042 semantic categories
  • Grolier Multimedia Encyclopedia
  • To decide which semantic category an ambiguous
    word occurrence should be assigned
  • a set of clue words, one set for each
    category, was derived from a POS tagged corpus
  • the context of each occurrence was gathered
  • a term selection process similar to relevance
    feedback was used to derive clue words

18
Yarowsky (1992)
  • Eg. clue words for animal/insects
  • species, family bird, fish, cm, animal, tail,
    egg, wild, common, coat, female, inhabit, eat,
    nest
  • Comparison between words in the context and the
    clue word sets
  • Accuracy 12 ambiguous words, several hundred
    occurrences, 92 of accuracy on average
  • Comparison were suspect.

19
Testing disambiguators
  • Few pre-disambiguated test corpora publicly
    available.
  • A sense tagged version of the Brown corpus,
    called SEMCOR, is available. Trec-like effort
    underway, called SENSEVAL.

20
WSD and IR experiments
  • Voorhees (1993)
  • based on WordNet
  • Each of 90,000 words and phrases is
    assigned to one or more synsets.
  • A synset is a set of words that are
    synonyms of each other the words of a synset
    define it and its meaning.
  • All synsets are linked together to form a
    mostly hierarchical semantic network based on
    hypernymy and hyponymy.
  • Other relations meronymy, holonymy,
    antonymy.

21
Voorhees (1993)
  • the hood of a word sense contained in synset s
  • largest connected sub graph
  • contains s
  • contains only descendants of an ancestor of s
  • contains no synset that has a descendent that
    includes another instance of a member of s.
  • Consistently worse, tagging sense inaccurately

22
  • The hood of the first sense of house
    would include the words housing, lodging,
    apartment, flat, cabin, gatehouse, bungalow,
    cottage.

23
Wallis (1993)
  • replace words with definitions from LDOCE.
  • ocean and sea
  • ocean The great mass of salt water that
    covers most of the earth
  • sea the great body of salty water that covers
    much of the earths surface.
  • disappointing results.
  • no analysis of the cause.

24
Sussna (1993)
  • Assign a weight to all relations and calculate
    the semantic distance between two synsets.
  • Calculate semantic distance between context
    words and each of the the synsets to rank the
    synsets.
  • Parameters size of context (41 as optimal), the
    number of words (only 10 because of computation
    consideration) disambiguated simultaneously.
  • Accuracy 56

25
Analyses of WSD IR
  • Krovetz Croft sense mismatches were
    significantly more likely to occur in
    non-relevant documents.
  • word collocation
  • skewed frequency distribution
  • Situations under which WSD may prove useful
  • where collocation is less prevalent
  • where query words were used in a minority sense

26
Analyses of WSD IR
  • Sanderson (1994,1997)
  • pseudo-words banana/kalashnikov/anecdote
  • experiments on the factor of query length
  • effectiveness of retrievals based on short
    query was greatly affected by the introduction of
    ambiguity but much less so for longer queries.

27
Analyses of WSD IR
  • Gonzalo et al. (1998) experiments based on
    SEMCOR, write a summary for each document and use
    it as a query, which is related with only one
    relevant document.
  • Cause for error sense may be too specific
  • newspaper as a business concern as opposed
    to the physical object

28
Gonzalo et al. (1998)
  • synset based representation
  • retrieval based on synset seems to be the
    best
  • erroneous disambiguation and its impact on
    retrieval effectiveness
  • baseline precision 52.6
  • when error 30, precision 54.4
  • when error 60, precision 49.1

29
Sanderson (1997)
  • output word sense in a list ranked by a
    confidence score
  • accuracy worse than the one without sense,
    better than the one tagged with one sense.
  • possible cause errors.

30
Disambiguation without sense definition
  • Zernik (1991)
  • generate cluster for an ambiguous word by
    three criteria context words, grammatical
    category and derivational morphology.
  • associate the cluster with a dictionary
    sense.
  • eg.
  • train 95 of accuracy, grammatical category
  • office full of error

31
Disambiguation without sense definition
  • Schutze and Pederson (1995) Very few of the
    results which show 14 improvement
  • Cluster based on context words only words with
    similar context are put into the same cluster,
    but recognized as a cluster if only the context
    appears more than fifty time sin corpus
  • Similar context of ball tennis, football,
    cricket. Thus this method breaks up a words
    commonest sense into a number of uses (the
    sporting sense of ball).

32
Schutze and Pederson (1995)
  • score each use of a word
  • representing a word occurrence by
  • just the word
  • word with its commonest use
  • word with n of its uses

33
WSD in IR Revisited sigir03
  • Skewed frequency distributions coupled with the
    query term co-occurrence effect are the reasons
    why traditional IR techniques that dont take
    sense into account are not penalized severely.
  • The impact of inaccurate fine grained WSD has an
    extreme negative effect on the performance of an
    IR system.
  • To achieve increases in performance, it is
    imperative to minimize the impact of the
    inaccurate disambiguation.
  • The need for 90 accurate disambiguation in order
    to see performance increases remains
    questionable.

34
The WSD methods applied
  • A number of experiments were tried, but nothing
    better than the following was found applying
    each of knowledge source (collocations,
    co-occurrence, and sense frequency) in a stepwise
    fashion
  • a context window consisting of the sentence
    surrounding the target word to identify sense of
    the word
  • examine the surrounding sentence if it contained
    any collocates we have observed from Semcor
  • specific sense data

35
WSD in IR Revisited Conclusions
  • Reasons for success
  • high precision WSD technique
  • sense frequency statistics
  • Resilience of vector space model
  • Analysis for Schutze and Pedersons success
    added tolerance

36
A highly accurate bootstrapping algorithm for
word sense disambiguation Rada M. 2000
  • Disambiguate all nouns and verbs
  • step 1 complex nominals
  • step 2 name entity
  • step 3 word pairs, based on SEMCOR
  • (previous word, word) pair, (word, successive
    word) pair
  • step 4 context, based on SEMCOR and WordNet
  • in WordNet, hypernym are also its context

37
A highly accurate bootstrapping algorithm for
word sense disambiguation (contd)
  • step 5 words with semantic distance 0 from some
    words which has already been disambiguated
  • step 6 words with semantic distance 1 from some
    words which has already been disambiguated
  • step 7 words with semantic distance 0 among
    ambiguous words
  • step 8 words with semantic distance 1 among
    ambiguous words

38
An Effective Approach to Document Retrieval via
Utilizing WordNet and Recognizing Phrases sigir
04
  • Significant increase for short query
  • Only WSD on Query and Query Expansion
  • Phrase-based and Term-based
  • PSEUDO-RELEVANCE

39
Phrases identification
  • 4 types of phrases Proper names (Name Entity),
    Dictionary Phrases( by WordNet), a simple
    phrases, a complex phrase
  • Decide windows size of simple/complex phrases by
    calculate correlation

40
Correlation
41
WSD
  • Unlike Rada Mihas WSD, Liu didnt utilize
    Semcor, only utilize WORDNET
  • 6 step, basic ideas, by hyper, hypo,
    cross-reference,etc

42
Query Expansion
  • Add Synonyms(conditional)
  • Add Definition Words( only first shortest noun
    phrase) conditional if it is highly globally
    correlated
  • Add Hyponyms(conditional)
  • Add Compound Word(conditional)

43
PSEUDO RELEVANCE FEEDBACK
  • Using Global Correlations and Wordnet
  • Global_cor1 and one of two conditions
  • 1 monosense
  • 2its defintion contains some other query terms
  • 3.it is in top10 ranked documents
  • Combining Local and Global Correlations

44
Results
  • SO standard Okapi (term-similarity)
  • NO enhanced SO
  • NOP phrase-similarity
  • NOPD WSD
  • NOPDF Pseudo-feedback

45
Results
46
Model conclusion
  • WSD query only
  • WSD only by Wordnet, no semcor
  • Query Complicate Expansion
  • Pseudo-relevance feedback
  • Phrases and term-based

47
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com