Title: Evaluation of Utility of LSA for Word Sense Discrimination
1Evaluation of Utility of LSA for Word Sense
Discrimination
- Esther Levin, Mehrbod Sharifi, Jerry Ball
- http//www-cs.ccny.cuny.edu/esther/research/lsa/
2Outline
- Latent Semantic Analysis (LSA)
- Word sense discrimination through Context Group
Discrimination Paradigm - Experiments
- Sense-based clusters (supervised learning)
- K-means clustering (unsupervised learning)
- Homonyms vs. Polysemes
- Conclusions
3Latent Semantic Analysis (LSA)Deerwester 90
- Represents words and passages as vectors in the
same (low-dimensional) semantic space - Similarity in word meaning is defined by
similarity of their contexts.
4LSA Steps
- Document-Term Co-occurrence Matrix
- e.g., 1151 documents
- X 5793 terms
- Compute SVD
- Reduce dimension by taking k largest singular
values
- Compute the new vector representations for
documents
- Our Research Clustering the new context vectors
5Context Group Discrimination ParadigmShutze 98
- Inducing senses of ambiguous words from their
contextual similarity
Context Vectors of an ambiguous word
6Context Group Discrimination ParadigmShutze 98
- 1. Cluster the context vectors
2. Compute the centroids (sense vectors)
a lt b
7Experiments
8Experimental Setup
- Corpus Leacock 93
- Line (3 senses 1151 instances)
- Hard (2 senses 752 instances)
- Serve (2 senses 1292 instances)
- Interest (3 senses 2113 instances)
- Context size full document (small paragraph)
- Number of clusters Number of senses
9Research Objective
- How well the different senses of ambiguous words
are separated in the LSA-based vector space. - Parameters
- Dimensionality of LSA representation
- Distance measure
- L1 City Block
- L2 Squared Euclidean
- Cosine
10Sense-based Clusters
- An instance of supervised learning
- An upper bound on unsupervised performance of
K-means or EM - Not influenced by the choice of clustering
algorithm
11Sense-based Clusters Accuracy
- Training Finding sense vectors based on 90 of
data - Testing Assigning the 10 remaining data to the
closest sense vectors and evaluate by comparing
this assignment to sense tags - Random selection, cross validation
12Evaluating Clustering QualityTightness and
Separation
- Dispersion Inter-cluster (K-Means minimizes)
- Silhouette Intra-cluster
a(i) average distance of point i to all other
points in the same cluster b(i) average distance
of point i to the points in closest cluster
13More on Silhouette Value
a(i) average of all blue lines b(i) average of
all yellow lines
14Evaluating Clustering QualityTightness and
Separation
Average Silhouette Value
Cosine 0.9639 L1 0.7355 L2 0.9271
Cosine -0.0876 L1 -0.0504 L2 -0.0879
15Sense-based ClustersDiscrimination Accuracy
Baseline Percentage of the majority sense
16Sense-based ClustersAverage Silhouette Value
17Sense-based ClustersResults
- Good discrimination accuracy
- Low silhouette value
- How is that possible?
18Unsupervised Learning with K-means
19Unsupervised Learning with K-means
20Polysemes vs. Homonyms
- Polysemes words with multiple related meanings
- Homonyms words with the same spelling but
completely different meaning
21Pseudo Words as HomonymsShutze 98
22Polysemes vs. Homonyms In LSA Space
- The correlation between compactness of clusters
and discrimination accuracy is higher for
homonyms than polysemes
23Conclusions
- Good unsupervised sense discrimination
performance for homonyms - Major deterioration in sense discrimination of
polysemes in absence of supervision - Dimensionality reduction benefit is computational
only (no peak in performance) - Cosine measure performs better than L1 and L2