Survey on WSD and IR - PowerPoint PPT Presentation

1 / 47

About This Presentation

Title:

Survey on WSD and IR

Description:

Co-occurrence information was derived from all definition texts in the dictionary. ... Longman's Dictionary of Contemporary English (LDOCE): all its definitions were ... – PowerPoint PPT presentation

Number of Views:93

Avg rating:3.0/5.0

Slides: 48

Provided by: mlo4

Learn more at: http://people.csail.mit.edu

Category:

more less

Transcript and Presenter's Notes

Title: Survey on WSD and IR

1
Survey on WSD and IR

Apex_at_SJTU

2
WSD Introduction

Problems in online news retrieval system
query major
Articles retrieved
about Prime Minister John Major MP
major appears as an adjective
major appears as a military rank

3
WSD Introduction

Gale, Church and Yarowsky (1992) cite work dating
back to 1950.
For many years, WSD was applied only to limited
domains and a small vocabulary.
In recent years, disambiguators are applied to
resolve the senses of words in a large
heterogeneous corpus.
With a more accurate representation and a query
also marked up with word sense, researchers
believe that the accuracy of retrieval would have
to improve.

4
Approaches to disambiguation

Disambiguation based on manually generated rules
Disambiguation using evidence from existing
corpora.

5
Disambiguation based on manually generated rules

Weiss (1973)
general context rule
If the word type appears near to
print, it most likely meant a small block of
metal bearing a raised character on one end.
template rule
If of appears immediately after
type, it most likely meant a subdivision of a
particular kind of thing.

6
Weiss (1973)

Template rules were better, so replied them
first.
To create rules
Examine 20 occurrences of an ambiguous word.
Test these manually created rules on a further 30
occurrences.
Accuracy 90
Cause for errors idiomatic uses.

7
Disambiguation based on manually generated rules

Kelly and Stone (1975)
created a set of rules for 6,000 words
consisted of contextual rules similar to those
of Weiss
in addition, used grammatical category of a word
as a strong indicator of sense
the train and to train

8
Kelly and Stone (1975)

The grammar and context rules were grouped into
sets so that only certain rules were applied in
certain situations.
Conditional statements controlled the application
of rule sets.
Unlike Weisss system, this disambiguator was
designed to process a whole sentence at the same
time.
Accuracy not a success

9
Disambiguation based on manually generated rules

Small and Rieger (1982) came to similar
conclusions.
When this type of disambiguator was extended to
work on larger vocabulary, the effort involved in
building it became too great.
Since 1980s, WSD research has concentrated on
automatically generated rules based on sense
evidence derived from a machine readable corpus.

10
Disambiguation using evidence from existing
corpora

Lesk (1988)
Resolve the sense of ash in
There was ash from the coal
fire.
Dictionary definition looked up
ash(1) The soft grey powder that remains
after something has been burnt.
ash(2) A forest tree common in Britain.
Definition of context words looked up
coal(1) A black mineral which is dub from the
earth, which can be burnt to given heat.
fire(1) The condition of burning flames,
light and great heat.
fire(2) The act of firing weapons or
artillery at an enemy.

11
Lesk (1988)

Sense definitions are ranked by scoring function
based on the number of words that co-occur.
Questionable how often the word overlap
necessary for disambiguation occurred.
Accuracy very brief experimentation,
50--70
No analysis for the failure, although definition
length is recognized as a possible factor in
deciding which dictionary to use.

12
Disambiguation using evidence from existing
corpora

Wilks et al. (1990)
addressed this word overlap problem by using a
technique of expanding a dictionary definition
with words that commonly co-occurred with the
text of that definition.
Co-occurrence information was derived from all
definition texts in the dictionary.

13
Wilks et al. (1990)

Longmans Dictionary of Contemporary English
(LDOCE) all its definitions were written using a
simplified vocabulary of around 2,000 words.
Few synonyms, a distracting element in the
co-occurrence calculation.
bank
for economic sense money, check, rob
for geographical sense river, flood,
bridge
Accuracy bank in 200 sentences, judged correct
if it coincides with one manually chosen, 53 at
fine-grained level(13 senses) and 85 at
coarse-grained(5 senses) level.
They suggested using simulated annealing to
disambiguate a whole sentence simultaneously.

14
Disambiguating simultaneously

Cowie et al. (1992)
Accuracy tested on 67 sentences, 47 for
fine-grained senses while 72 for coarse-grained
ones.
No comparison with Wilks et al.s.
No baseline.
A possible baseline senses randomly chosen
A better one select the most common sense

15
Manually tagging a corpus

A technique in POS tagging
manually mark up a large text corpus with POS
tag, and then train a statistical classifier to
associate features with occurrences of the tags.
Ng and Lee (1996)
disambiguate 192,000 occurrences of 191 words.
examine the following features
POS and morphological form of the sense tagged
word
unordered set of its surrounding words
local collocations relative to it
and if the sense tagged word was a noun, the
presence of a verb was noted also.

16
Ng and Lee (1996)

Experiments
separated their corpus into training and test
sets on an 89--11 split
accuracy 63.7 (baseline 58.1)
sense definition used were from WordNet, 7.8
senses per word for nouns and 12.0 senses for
verbs
no comparison possible between WordNet
definition or LDOCE

17
Using thesauri Yarowsky (1992)

Rogets thesaurus 1,042 semantic categories
Grolier Multimedia Encyclopedia
To decide which semantic category an ambiguous
word occurrence should be assigned
a set of clue words, one set for each
category, was derived from a POS tagged corpus
the context of each occurrence was gathered
a term selection process similar to relevance
feedback was used to derive clue words

18
Yarowsky (1992)

Eg. clue words for animal/insects
species, family bird, fish, cm, animal, tail,
egg, wild, common, coat, female, inhabit, eat,
nest
Comparison between words in the context and the
clue word sets
Accuracy 12 ambiguous words, several hundred
occurrences, 92 of accuracy on average
Comparison were suspect.

19
Testing disambiguators

Few pre-disambiguated test corpora publicly
available.
A sense tagged version of the Brown corpus,
called SEMCOR, is available. Trec-like effort
underway, called SENSEVAL.

20
WSD and IR experiments

Voorhees (1993)
based on WordNet
Each of 90,000 words and phrases is
assigned to one or more synsets.
A synset is a set of words that are
synonyms of each other the words of a synset
define it and its meaning.
All synsets are linked together to form a
mostly hierarchical semantic network based on
hypernymy and hyponymy.
Other relations meronymy, holonymy,
antonymy.

21
Voorhees (1993)

the hood of a word sense contained in synset s
largest connected sub graph
contains s
contains only descendants of an ancestor of s
contains no synset that has a descendent that
includes another instance of a member of s.
Consistently worse, tagging sense inaccurately

The hood of the first sense of house
would include the words housing, lodging,
apartment, flat, cabin, gatehouse, bungalow,
cottage.

23
Wallis (1993)

replace words with definitions from LDOCE.
ocean and sea
ocean The great mass of salt water that
covers most of the earth
sea the great body of salty water that covers
much of the earths surface.
disappointing results.
no analysis of the cause.

24
Sussna (1993)

Assign a weight to all relations and calculate
the semantic distance between two synsets.
Calculate semantic distance between context
words and each of the the synsets to rank the
synsets.
Parameters size of context (41 as optimal), the
number of words (only 10 because of computation
consideration) disambiguated simultaneously.
Accuracy 56

25
Analyses of WSD IR

Krovetz Croft sense mismatches were
significantly more likely to occur in
non-relevant documents.
word collocation
skewed frequency distribution
Situations under which WSD may prove useful
where collocation is less prevalent
where query words were used in a minority sense

26
Analyses of WSD IR

Sanderson (1994,1997)
pseudo-words banana/kalashnikov/anecdote
experiments on the factor of query length
effectiveness of retrievals based on short
query was greatly affected by the introduction of
ambiguity but much less so for longer queries.

27
Analyses of WSD IR

Gonzalo et al. (1998) experiments based on
SEMCOR, write a summary for each document and use
it as a query, which is related with only one
relevant document.
Cause for error sense may be too specific
newspaper as a business concern as opposed
to the physical object

28
Gonzalo et al. (1998)

synset based representation
retrieval based on synset seems to be the
best
erroneous disambiguation and its impact on
retrieval effectiveness
baseline precision 52.6
when error 30, precision 54.4
when error 60, precision 49.1

29
Sanderson (1997)

output word sense in a list ranked by a
confidence score
accuracy worse than the one without sense,
better than the one tagged with one sense.
possible cause errors.

30
Disambiguation without sense definition

Zernik (1991)
generate cluster for an ambiguous word by
three criteria context words, grammatical
category and derivational morphology.
associate the cluster with a dictionary
sense.
eg.
train 95 of accuracy, grammatical category
office full of error

31
Disambiguation without sense definition

Schutze and Pederson (1995) Very few of the
results which show 14 improvement
Cluster based on context words only words with
similar context are put into the same cluster,
but recognized as a cluster if only the context
appears more than fifty time sin corpus
Similar context of ball tennis, football,
cricket. Thus this method breaks up a words
commonest sense into a number of uses (the
sporting sense of ball).

32
Schutze and Pederson (1995)

score each use of a word
representing a word occurrence by
just the word
word with its commonest use
word with n of its uses

33
WSD in IR Revisited sigir03

Skewed frequency distributions coupled with the
query term co-occurrence effect are the reasons
why traditional IR techniques that dont take
sense into account are not penalized severely.
The impact of inaccurate fine grained WSD has an
extreme negative effect on the performance of an
IR system.
To achieve increases in performance, it is
imperative to minimize the impact of the
inaccurate disambiguation.
The need for 90 accurate disambiguation in order
to see performance increases remains
questionable.

34
The WSD methods applied

A number of experiments were tried, but nothing
better than the following was found applying
each of knowledge source (collocations,
co-occurrence, and sense frequency) in a stepwise
fashion
a context window consisting of the sentence
surrounding the target word to identify sense of
the word
examine the surrounding sentence if it contained
any collocates we have observed from Semcor
specific sense data

35
WSD in IR Revisited Conclusions

Reasons for success
high precision WSD technique
sense frequency statistics
Resilience of vector space model
Analysis for Schutze and Pedersons success
added tolerance

36
A highly accurate bootstrapping algorithm for
word sense disambiguation Rada M. 2000

Disambiguate all nouns and verbs
step 1 complex nominals
step 2 name entity
step 3 word pairs, based on SEMCOR
(previous word, word) pair, (word, successive
word) pair
step 4 context, based on SEMCOR and WordNet
in WordNet, hypernym are also its context

37
A highly accurate bootstrapping algorithm for
word sense disambiguation (contd)

step 5 words with semantic distance 0 from some
words which has already been disambiguated
step 6 words with semantic distance 1 from some
words which has already been disambiguated
step 7 words with semantic distance 0 among
ambiguous words
step 8 words with semantic distance 1 among
ambiguous words

38
An Effective Approach to Document Retrieval via
Utilizing WordNet and Recognizing Phrases sigir
04

Significant increase for short query
Only WSD on Query and Query Expansion
Phrase-based and Term-based
PSEUDO-RELEVANCE

39
Phrases identification

4 types of phrases Proper names (Name Entity),
Dictionary Phrases( by WordNet), a simple
phrases, a complex phrase
Decide windows size of simple/complex phrases by
calculate correlation

40
Correlation
41
WSD

Unlike Rada Mihas WSD, Liu didnt utilize
Semcor, only utilize WORDNET
6 step, basic ideas, by hyper, hypo,
cross-reference,etc

42
Query Expansion

Add Synonyms(conditional)
Add Definition Words( only first shortest noun
phrase) conditional if it is highly globally
correlated
Add Hyponyms(conditional)
Add Compound Word(conditional)

43
PSEUDO RELEVANCE FEEDBACK