Title: Word%20Sense%20Disambiguation%20and%20Information%20Retrieval
1Word Sense Disambiguation and Information
Retrieval
- By Guitao Gao
- Qing Ma
- Prof Jian-Yun Nie
2Outline
- Introduction
- WSD Approches
- Conclusion
3Introduction
- Task of Information Retrieval
- Content Repesentation
- Indexing
- Bag of words indexing
- Problems
- Synonymy query expansion
- Polysemy Word Sense Disambiguation
4WSD Approaches
- Disambiguation based on manually created rules
- Disambiguation using machine readable
dictionaries - Disambiguation using thesauri
- Disambiguation based on unsupervised machine
learning with corpora
5Disambiguation based on manually created rules
- Weiss approach Lesk 1988
- set of rules to disambiguate five words
- context rule within 5 words
- template rule specific location
- accuracy 90
- IR improvement 1
- Small Riegers approach Small 1982
- Expert system
6Disambiguation using machine readable
dictionaries
- Lesks approach Lesk 1988
- Senses are represented by different definitions
- Looked up context words definitions
- Find co-occurring words
- Select most similar sense
- Accuracy 50 - 70.
- Problem no enough overlapping words between
definitions
7Disambiguation using machine readable dictionaries
- Wilks approach Wilks 1990
- Attempt to solve Lesks problem
- Expanding dictionary definition
- Use Longman Dictionary of Contemporary English (
LDOCE ) - more word co-occurring evidence collected
- Accuracy between 53 and 85.
8Wilks approach Wilks 1990
- Commonly co-occurring words in LDOCE. Wilks
1990
9Disambiguation using machine readable dictionaries
- Luks approach Luk 1995
- Statistical sense disambiguation
- Use definitions from LDOCE
- co-occurrence data collected from Brown corpus
- defining concepts 1792 words used to write
definitions of LDOCE - LDOCE pre-processed conceptual expansion
10Luks approach Luk 1995
Noun sentence and its conceptual expansion Luk
1995
11Luks approach Luk 1995 cont.
- Collect co-occurrence data of defining concepts
by constructing a two-dimensional Concept
Co-occurrence Data Table (CCDT) - Brown corpus divided into sentences
- collect conceptual co-occurrence data for each
defining concept which occurs in the sentence - Insert collect data in the Concept Co-occurrence
Data Table.
12Luks approach Luk 1995 cont.
- Score each sense S with respect to context C
Luk 1995
13Luks approach Luk 1995 cont.
- Select sense with the highest score
- Accuracy 77
- Human accuracy 71
14Approaches using Roget's Thesaurus Yarowsky
1992
- Resources used
- Roget's Thesaurus
- Grolier Multimedia Encyclopedia
- Senses of a word categories in Roget's
Thesaurus - 1042 broad categories covering areas like,
tools/machinery or animals/insects
15Approaches using Roget's Thesaurus Yarowsky
1992 cont.
- tool, implement, appliance, contraption,
apparatus, utensil, device, gadget, craft,
machine, engine, motor, dynamo, generator, mill,
lathe, equipment, gear, tackle, tackling,
rigging, harness, trappings, fittings,
accoutrements, paraphernalia, equipage, outfit,
appointments, furniture, material, plant,
appurtenances, a wheel, jack, clockwork,
wheel-work, spring, screw,
Some words placed into the tools/machinery
category Yarowsky 1992
16Approaches using Roget's Thesaurus Yarowsky
1992 cont.
- Collect context for each category
- From Grolier Encyclopedia
- each occurrence of each member of the category
- extracts 100 surrounding words
Sample occurrence of words in the tools/machinery
category Yarowsky 1992
17Approaches using Roget's Thesaurus Yarowsky
1992 cont.
- Identify and weight salient words
Sample salient words for Roget categories 348 and
414 Yarowsky 1992
- To disambiguate a word sums up the weights of
all salient words appearing in context - Accuracy 92 disambiguating 12 words
18 Introduction to WordNet(1)
- Online thesaurus system
- Synsets Synonymous Words
- Hierachical Relationship
19 Introduction to WordNet(2)
Sanderson 2000
20Voorhees Disambg. Experiment
- Calculation of Semantic Distance Synset and
Context words - Words Sense Synset closest to Context Words
- Retrieval Result Worse than non-Disambig.
21 Gonzalos IR experiment(1)
- Two Questions
- Can WordNet really offer any potential for text
retrieval - How is text Retrieval performance affected by the
disambiguation errors?
22 Gonzalos IR experiment(2)
- Text Collection Summary and Document
- Experiments
- 1. Standard Smart Run
- 2. Indexed In Terms of Word-Sense
- 3. Indexed In Terms of Synset
- 4. Introduction of Disambiguation Error
23Gonzalos IR experiment(3)
- Experiements correct document
retrieved - Indexed by synsets 62.0
- Indexing by word senses 53.2
- Indexing by words 48.0
- Indexing by synsets(5 error) 62.0
- Id. with 10 errors 60.8
- Id. with 20 errors 56.1
- Id. with 30 errors 54.4
- Id. with all possible 52.6
- Id. with 60 errors 49.1
24Gonzalos IR experiment(4)
- Disambiguation with WordNet can improve text
retrieval - Solution lies in reliable Automatic WSD technique
25Disambiguation With Unsupervised Learning
- Yarowskys Unsupervised Method
- One Sense Per Collocation
- eg Plant(manufacturing/life)
- One Sense Per Discourse
- eg defense(War/Sports)
26Yarowskys Unsupervised Method cont.
- Algorithm Details
- Step1Store Word and its contexts as line
- eg.zonal distribution of plant life..
- Step2 Identify a few words that represent the
word Sense - eg. plant(manufacturing/life)
- Step3a Get rules from the training set
- plant X gt A, weight
- plant Y gt B, weight
- Step3bUse the rules created in 3a to classify
all occurrences of plant sample set.
27Yarowskys Unsupervised Method cont.
- Step3c Use one-sense-per-discourse rule to
filter or augment this addition - Step3d Repeat Step 3 a-b-c iteratively.
- Step4 the training converges on a stable
residual set. - Step 5 the result will be a set of rules. Those
rules will be used to disambiguate the word
plant. - eg. plant growth gt life
- plant car gt manufacturing
-
28Yarowskys Unsupervised Method cont.
- Advantages of this method
- Better accuracy compared to other unsupervised
method - No need for costly hand-tagged training
sets(supervised method)
29Schütze and Pedersens approachSchütze 1995
- Source of word sense definitions
- Not using a dictionary or thesaurus
- Only using only the corpus to be
disambiguated(Category B TREC-1 collection ) - Thesaurus construction
- Collect a (symmetric ) term-term matrix C
- Entry cij number of times that words i and j
co-occur in a symmetric window of total size k - Use SVD to reduce the dimensionality
30Schütze and Pedersens approachSchütze 1995
cont.
- Thesaurus vector columns
- Semantic similarity cosine between columns
- Thesaurus associate each word with its nearest
neighbors - Context vector summing thesaurus vectors of
context words
31Schütze and Pedersens approachSchütze 1995
cont.
- Disambiguation algorithm
- Identify context vectors corresponding to all
occurrences of a particular word - Partition them into regions of high density
- Tag a sense for each such region
- Disambiguating a word
- Compute context vector of its occurrence
- Find the closest centroid of a region
- Assign the occurrence the sense of that centroid
32Schütze and Pedersens approachSchütze 1995
cont.
- Accuracy 90
- Application to IR
- replacing the words by word senses
- sense based retrievals average precision for 11
points of recall increased 4 with respect to
word based. - Combine the ranking for each document
- average precision increased 11
- Each occurrence is assigned n(2,3,4,5) senses
- average precision increased 14 for n3
33Schütze and Pedersens approachSchütze 1995
cont.
34Conclusion
- How much can WSD help improve IR effectiveness?
Open question - Weiss 1, Voorhees method negative
- Krovetz and Croft, Sanderson only useful for
short queries - Schütze and Pedersens approaches and Gonzalos
experiment positive result - WSD must be accurate to be useful for IR
- Schütze and Pedersens, Yarowskys algorithm
promising for IR - Luks approach robust for data sparse, suitable
for small corpus.
35References
- Krovetz 92 R. Krovetz W.B. Croft (1992).
Lexical Ambiguity and Information Retrieval, in
ACM Transactions onInformation Systems, 10(1). - Gonzalo 1998 J. Gonzalo, F. Verdejo, I. Chugur
and J. Cigarran, Indexing with WordNet synsets
can improve Text Retrieval, Proceedings of the
COLING/ACL 98 Workshop on Usage of WordNet for
NLP, Montreal,1998 - Gonzalo 1992 R. Krovetz W.B. Croft .
Lexical Ambiguity and Information Retrieval, in
ACM Transactions on Information Systems, 10(1),
1992 - Lesk 1988 M. Lesk , They said true things,
but called them by wrong names vocabulary
problems in retrieval systems, in Proc. 4th
Annual Conference of the University of Waterloo
Centre for the New OED, 1988 - Luk 1995 A.K. Luk. Statistical sense
disambiguation with relatively small corpora
using dictionary definitions. In Proceedings of
the 33rd Annual Meeting of the ACL, Columbus,
Ohio, June 1995. Association for Computational
Linguistics. - Salton 83 G. Salton M.J. McGill (1983).
Introduction To Modern Information Retrieval. The
SMART and SIRE experimental retrieval systems, in
New York McGraw-Hill - Sanderson 1997 Sanderson, M. Word Sense
Disambiguation and Information Retrieval, PhD
Thesis, Technical Report (TR-1997-7) of the
Department of Computing Science at the University
of Glasgow, Glasgow G12 8QQ, UK. - Sanderson 2000 Sanderson, Mark, Retrieving
with Good Sense, http//citeseer.nj.nec.com/sande
rson00retrieving.html, 2000 -
-
36References cont.
- Schütze 1995 H. Schütze J.O. Pedersen.
Information retrieval based on word senses, in
Proceedings of the Symposium on Document Analysis
and Information Retrieval, 4 161-175. - Small 1982 S. Small C. Rieger , Parsing and
comprehending with word experts (a theoryand its
realisation) in Strategies for Natural Language
Processing, W.G. Lehnert M.H. Ringle, Eds.,
LEA 89-148, 1982 - Voorhees 1993 E. M. Voorhees, Using WordNet
to disambiguate word sense for text retrieval, in
Proceedings of ACM SIGIR Conference, (16)
171-180. 1993 - Weiss 73 S.F. Weiss (1973). Learning to
disambiguate, in Information Storage and
Retrieval, 933-41, 1973 - Wilks 1990 Y. Wilks, D. Fass, C. Guo, J.E.
Mcdonald, T. Plate, B.M. Slator (1990).
ProvidingMachine Tractable Dictionary Tools, in
Machine Translation, 5 99-154, 1990 - Yarowsky 1992 D. Yarowsky, Word sense
disambiguation using statistical models of
Rogets categories trained on large corpora, in
Proceedings of COLING Conference 454-460, 1992 - Yarowsky 1994 Yarowsky, D. Decision lists for
lexical ambiguity resolutionApplication to
Accent Restoration in Spanish and French. In
Proceedings of the 32rd Annual Meeting of the
Association for Computational Linguistics, Las
Cruces, NM, 1994 - Yarowsky 1995 Yarowsky, D. Unsupervised word
sense disambiguation rivaling supervised
methods. In Proceedings of the 33rd Annual
Meeting of the Association for Computational
Linguistics, pages 189-- 196, Cambridge, MA, 1995