Title: Survey on WSD and IR
1Survey on WSD and IR
2WSD Introduction
- Problems in online news retrieval system
- query major
- Articles retrieved
- about Prime Minister John Major MP
- major appears as an adjective
- major appears as a military rank
3WSD Introduction
- Gale, Church and Yarowsky (1992) cite work dating
back to 1950.
- For many years, WSD was applied only to limited
domains and a small vocabulary.
- In recent years, disambiguators are applied to
resolve the senses of words in a large
heterogeneous corpus.
- With a more accurate representation and a query
also marked up with word sense, researchers
believe that the accuracy of retrieval would have
to improve.
4Approaches to disambiguation
- Disambiguation based on manually generated rules
- Disambiguation using evidence from existing
corpora.
5Disambiguation based on manually generated rules
- Weiss (1973)
- general context rule
- If the word type appears near to
print, it most likely meant a small block of
metal bearing a raised character on one end.
- template rule
- If of appears immediately after
type, it most likely meant a subdivision of a
particular kind of thing.
6Weiss (1973)
- Template rules were better, so replied them
first.
- To create rules
- Examine 20 occurrences of an ambiguous word.
- Test these manually created rules on a further 30
occurrences.
- Accuracy 90
- Cause for errors idiomatic uses.
7Disambiguation based on manually generated rules
- Kelly and Stone (1975)
- created a set of rules for 6,000 words
- consisted of contextual rules similar to those
of Weiss
- in addition, used grammatical category of a word
as a strong indicator of sense
- the train and to train
8Kelly and Stone (1975)
- The grammar and context rules were grouped into
sets so that only certain rules were applied in
certain situations.
- Conditional statements controlled the application
of rule sets.
- Unlike Weisss system, this disambiguator was
designed to process a whole sentence at the same
time.
- Accuracy not a success
9Disambiguation based on manually generated rules
- Small and Rieger (1982) came to similar
conclusions.
- When this type of disambiguator was extended to
work on larger vocabulary, the effort involved in
building it became too great.
- Since 1980s, WSD research has concentrated on
automatically generated rules based on sense
evidence derived from a machine readable corpus.
10Disambiguation using evidence from existing
corpora
- Lesk (1988)
- Resolve the sense of ash in
- There was ash from the coal
fire.
- Dictionary definition looked up
- ash(1) The soft grey powder that remains
after something has been burnt.
- ash(2) A forest tree common in Britain.
- Definition of context words looked up
- coal(1) A black mineral which is dub from the
earth, which can be burnt to given heat.
- fire(1) The condition of burning flames,
light and great heat.
- fire(2) The act of firing weapons or
artillery at an enemy.
11Lesk (1988)
- Sense definitions are ranked by scoring function
based on the number of words that co-occur.
- Questionable how often the word overlap
necessary for disambiguation occurred.
- Accuracy very brief experimentation,
- 50--70
- No analysis for the failure, although definition
length is recognized as a possible factor in
deciding which dictionary to use.
12Disambiguation using evidence from existing
corpora
- Wilks et al. (1990)
- addressed this word overlap problem by using a
technique of expanding a dictionary definition
with words that commonly co-occurred with the
text of that definition. - Co-occurrence information was derived from all
definition texts in the dictionary.
13Wilks et al. (1990)
- Longmans Dictionary of Contemporary English
(LDOCE) all its definitions were written using a
simplified vocabulary of around 2,000 words.
- Few synonyms, a distracting element in the
co-occurrence calculation.
- bank
- for economic sense money, check, rob
- for geographical sense river, flood,
bridge
- Accuracy bank in 200 sentences, judged correct
if it coincides with one manually chosen, 53 at
fine-grained level(13 senses) and 85 at
coarse-grained(5 senses) level. - They suggested using simulated annealing to
disambiguate a whole sentence simultaneously.
14Disambiguating simultaneously
- Cowie et al. (1992)
- Accuracy tested on 67 sentences, 47 for
fine-grained senses while 72 for coarse-grained
ones.
- No comparison with Wilks et al.s.
- No baseline.
- A possible baseline senses randomly chosen
- A better one select the most common sense
15Manually tagging a corpus
- A technique in POS tagging
- manually mark up a large text corpus with POS
tag, and then train a statistical classifier to
associate features with occurrences of the tags.
- Ng and Lee (1996)
- disambiguate 192,000 occurrences of 191 words.
- examine the following features
- POS and morphological form of the sense tagged
word
- unordered set of its surrounding words
- local collocations relative to it
- and if the sense tagged word was a noun, the
presence of a verb was noted also.
16Ng and Lee (1996)
- Experiments
- separated their corpus into training and test
sets on an 89--11 split
- accuracy 63.7 (baseline 58.1)
- sense definition used were from WordNet, 7.8
senses per word for nouns and 12.0 senses for
verbs
- no comparison possible between WordNet
definition or LDOCE
17Using thesauri Yarowsky (1992)
- Rogets thesaurus 1,042 semantic categories
- Grolier Multimedia Encyclopedia
- To decide which semantic category an ambiguous
word occurrence should be assigned
- a set of clue words, one set for each
category, was derived from a POS tagged corpus
- the context of each occurrence was gathered
- a term selection process similar to relevance
feedback was used to derive clue words
18Yarowsky (1992)
- Eg. clue words for animal/insects
- species, family bird, fish, cm, animal, tail,
egg, wild, common, coat, female, inhabit, eat,
nest
- Comparison between words in the context and the
clue word sets
- Accuracy 12 ambiguous words, several hundred
occurrences, 92 of accuracy on average
- Comparison were suspect.
19Testing disambiguators
- Few pre-disambiguated test corpora publicly
available.
- A sense tagged version of the Brown corpus,
called SEMCOR, is available. Trec-like effort
underway, called SENSEVAL.
-
20WSD and IR experiments
- Voorhees (1993)
- based on WordNet
- Each of 90,000 words and phrases is
assigned to one or more synsets.
- A synset is a set of words that are
synonyms of each other the words of a synset
define it and its meaning.
- All synsets are linked together to form a
mostly hierarchical semantic network based on
hypernymy and hyponymy.
- Other relations meronymy, holonymy,
antonymy.
21Voorhees (1993)
- the hood of a word sense contained in synset s
- largest connected sub graph
- contains s
- contains only descendants of an ancestor of s
- contains no synset that has a descendent that
includes another instance of a member of s.
- Consistently worse, tagging sense inaccurately
22- The hood of the first sense of house
would include the words housing, lodging,
apartment, flat, cabin, gatehouse, bungalow,
cottage.
23Wallis (1993)
- replace words with definitions from LDOCE.
- ocean and sea
- ocean The great mass of salt water that
covers most of the earth
- sea the great body of salty water that covers
much of the earths surface.
- disappointing results.
- no analysis of the cause.
24Sussna (1993)
- Assign a weight to all relations and calculate
the semantic distance between two synsets.
- Calculate semantic distance between context
words and each of the the synsets to rank the
synsets.
- Parameters size of context (41 as optimal), the
number of words (only 10 because of computation
consideration) disambiguated simultaneously.
- Accuracy 56
25Analyses of WSD IR
- Krovetz Croft sense mismatches were
significantly more likely to occur in
non-relevant documents.
- word collocation
- skewed frequency distribution
- Situations under which WSD may prove useful
- where collocation is less prevalent
- where query words were used in a minority sense
26Analyses of WSD IR
- Sanderson (1994,1997)
- pseudo-words banana/kalashnikov/anecdote
- experiments on the factor of query length
- effectiveness of retrievals based on short
query was greatly affected by the introduction of
ambiguity but much less so for longer queries.
-
27Analyses of WSD IR
- Gonzalo et al. (1998) experiments based on
SEMCOR, write a summary for each document and use
it as a query, which is related with only one
relevant document. -
-
- Cause for error sense may be too specific
- newspaper as a business concern as opposed
to the physical object
28Gonzalo et al. (1998)
- synset based representation
- retrieval based on synset seems to be the
best
- erroneous disambiguation and its impact on
retrieval effectiveness
- baseline precision 52.6
- when error 30, precision 54.4
- when error 60, precision 49.1
29Sanderson (1997)
- output word sense in a list ranked by a
confidence score
- accuracy worse than the one without sense,
better than the one tagged with one sense.
- possible cause errors.
30Disambiguation without sense definition
- Zernik (1991)
- generate cluster for an ambiguous word by
three criteria context words, grammatical
category and derivational morphology.
- associate the cluster with a dictionary
sense.
- eg.
- train 95 of accuracy, grammatical category
- office full of error
31Disambiguation without sense definition
- Schutze and Pederson (1995) Very few of the
results which show 14 improvement
- Cluster based on context words only words with
similar context are put into the same cluster,
but recognized as a cluster if only the context
appears more than fifty time sin corpus - Similar context of ball tennis, football,
cricket. Thus this method breaks up a words
commonest sense into a number of uses (the
sporting sense of ball).
32Schutze and Pederson (1995)
- score each use of a word
- representing a word occurrence by
- just the word
- word with its commonest use
- word with n of its uses
33WSD in IR Revisited sigir03
- Skewed frequency distributions coupled with the
query term co-occurrence effect are the reasons
why traditional IR techniques that dont take
sense into account are not penalized severely. - The impact of inaccurate fine grained WSD has an
extreme negative effect on the performance of an
IR system.
- To achieve increases in performance, it is
imperative to minimize the impact of the
inaccurate disambiguation.
- The need for 90 accurate disambiguation in order
to see performance increases remains
questionable.
34The WSD methods applied
- A number of experiments were tried, but nothing
better than the following was found applying
each of knowledge source (collocations,
co-occurrence, and sense frequency) in a stepwise
fashion - a context window consisting of the sentence
surrounding the target word to identify sense of
the word
- examine the surrounding sentence if it contained
any collocates we have observed from Semcor
- specific sense data
35WSD in IR Revisited Conclusions
- Reasons for success
- high precision WSD technique
- sense frequency statistics
- Resilience of vector space model
- Analysis for Schutze and Pedersons success
added tolerance
36A highly accurate bootstrapping algorithm for
word sense disambiguation Rada M. 2000
- Disambiguate all nouns and verbs
- step 1 complex nominals
- step 2 name entity
- step 3 word pairs, based on SEMCOR
- (previous word, word) pair, (word, successive
word) pair
- step 4 context, based on SEMCOR and WordNet
- in WordNet, hypernym are also its context
37A highly accurate bootstrapping algorithm for
word sense disambiguation (contd)
- step 5 words with semantic distance 0 from some
words which has already been disambiguated
- step 6 words with semantic distance 1 from some
words which has already been disambiguated
- step 7 words with semantic distance 0 among
ambiguous words
- step 8 words with semantic distance 1 among
ambiguous words
38An Effective Approach to Document Retrieval via
Utilizing WordNet and Recognizing Phrases sigir
04
- Significant increase for short query
- Only WSD on Query and Query Expansion
- Phrase-based and Term-based
- PSEUDO-RELEVANCE
39Phrases identification
- 4 types of phrases Proper names (Name Entity),
Dictionary Phrases( by WordNet), a simple
phrases, a complex phrase
- Decide windows size of simple/complex phrases by
calculate correlation
40Correlation
41WSD
- Unlike Rada Mihas WSD, Liu didnt utilize
Semcor, only utilize WORDNET
- 6 step, basic ideas, by hyper, hypo,
cross-reference,etc
42Query Expansion
- Add Synonyms(conditional)
- Add Definition Words( only first shortest noun
phrase) conditional if it is highly globally
correlated
- Add Hyponyms(conditional)
- Add Compound Word(conditional)
43PSEUDO RELEVANCE FEEDBACK
- Using Global Correlations and Wordnet
- Global_cor1 and one of two conditions
- 1 monosense
- 2its defintion contains some other query terms
- 3.it is in top10 ranked documents
- Combining Local and Global Correlations
44Results
- SO standard Okapi (term-similarity)
- NO enhanced SO
- NOP phrase-similarity
- NOPD WSD
- NOPDF Pseudo-feedback
45Results
46Model conclusion
- WSD query only
- WSD only by Wordnet, no semcor
- Query Complicate Expansion
- Pseudo-relevance feedback
- Phrases and term-based
47Thank you!