Title: Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms
1Graph-based Analysis of Espresso-styleMinimally-s
upervised Bootstrapping Algorithms
Mamoru Komachi Nara Institute of Science and
Technology
2Supervised learning has succeeded in many natural
language processing tasks
Corpus
Classifier
Corpus
Dictionary
Corpus
- Needs time-consuming annotation (data creation)
- Why not learning from minimally annotated
resource?
3Corpus-based extraction of semantic category
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Hong Kong
Singapore
Visa for __
Visa for Singapore
Australia
Travel guide to Singapore
Hong Kong
China
History of __
Egypt
Alternate step by step
4Semantic drift is the central problem of
bootstrapping
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Singapore
Australia
visa __ is
card
Generic patterns Patterns co-occurring with many
irrelevant instances
Semantic category changed!
messages
card
greeting __
words
Errors propagate to successive iteration
5Two major problems solved by this work
- Why semantic drift occurs?
- Is there any way to prevent semantic drift?
6Answers to the problems of semantic drift
- Suggest a parallel between semantic drift in
Espresso Pantel and Pennachiotti, 2006 style
bootstrapping and topic drift in HITS Kleinberg,
1999 - Solve semantic drift using relatedness measure
(regularized Laplacian) instead of importance
measure (HITS authority) used in link analysis
community
7Table of contents
2. Overview of Bootstrapping Algorithms
4.Graph-based Analysis of Espresso-style
Bootstrapping Algorithms
3.Espresso-style Bootstrapping Algorithms
6.Bilingual Dictionary Construction
7.Learning Semantic Categories
5.Word Sense Disambiguation
8Preliminaries Espresso and HITS
9Espresso Algorithm Pantel and Pennacchiotti,
2006
- Repeat
- Pattern extraction
- Pattern ranking
- Pattern selection
- Instance extraction
- Instance ranking
- Instance selection
- Until a stopping criterion is met
10Pattern/instance ranking is mutually defined in
Espresso
Reliable instances are supported by reliable
patterns, and vice versa
p pattern i instance P set of patterns I set
of instances pmi pointwise mutual
information max pmi max of pmi in all the
patterns and instances
11HITS (Hypertext Induced Topic Search)finds Hubs
and Authorities in a linked graph
Hub h
Authority a
Hub score sum of weights for all nodes pointed
by h Authority score sum of weights for all
nodes pointing to a
12HITS Algorithm Kleinberg 1999
- Input
- Initial hub score vector
- Adjacency matrix A
- Main loop
- Repeat
- Until a and h converge
- Output
- Hub and authority score vectors a and h
a normalization factor ß normalization factor
13HITS converges to fixed points regardless of
initial input
- Authority score vector
- Hub score vector
a(k) vector a on k-th iteration h(k) vector h
on k-th iteration
HITS authority vector a the principal
eigenvector of ATA HITS hub vector h the
principal eigenvector of AAT where ATA
co-citation matrix AAT bibliographic
coupling matrix
14Graph-based Analysis of Espresso-style
Bootstrapping Algorithms
- How Espresso works,
- and how Espresso fails to solve semantic drift
15Make Espresso look like HITS
- p pattern score vector
- i instance score vector
- A pattern-instance matrix
P number of patterns I number of instances
normalization factors to keep score vectors not
too large
16Espresso uses pattern-instance matrix A as
adjacency matrix in HITS
- PI-dimensional matrix holding the
(normalized) pointwise mutual information (pmi)
between patterns and instances
1 2 ... i ... I
1
2
p
P
Ap,i pmi(p,i) / maxp,i pmi(p,i)
17Three simplifications to reduce Espresso to HITS
For graph-theoretic analysis, we will introduce
3 simplifications to Espresso
- Repeat
- Pattern extraction
- Pattern ranking
- Pattern selection
- Instance extraction
- Instance ranking
- Instance selection
- Until a stopping criterion is met
18Keep pattern-instance matrix constant in the main
loop
- Compute the pattern-instance matrix
- Repeat
- Pattern extraction
- Pattern ranking
- Pattern selection
- Instance extraction
- Instance ranking
- Instance selection
- Until a stopping criterion is met
Simplification 1 Remove pattern/instance
extraction steps Instead, pre-compute all
patterns and instances once in the beginning of
the algorithm
19Remove pattern/instance selection heuristics
- Compute the pattern-instance matrix
- Repeat
- Pattern ranking
- Pattern selection
- Instance ranking
- Instance selection
- Until a stopping criterion is met
Simplification 2 Remove pattern/instance
selection steps which retain only highest scoring
k patterns / m instances for the next
iteration i.e., reset the scores of other items
to 0 Instead, retain scores of all patterns and
instances
20Remove early stopping heuristics
- Compute the pattern-instance matrix
- Repeat
- Pattern ranking
- Instance ranking
- Until a stopping criterion is met
Simplification 3 No early stopping i.e., run
until convergence
Until score vectors p and i converge
21Simplified Espresso
- Input
- Initial score vector of seed instances
- Pattern-instance co-occurrence matrix A
- Main loop
- Repeat
- Until i and p converge
- Output
- Instance and pattern score vectors i and p
22HITS Algorithm Kleinberg 1999
- Input
- Initial hub score vector
- Adjacency matrix A
- Main loop
- Repeat
- Until a and h converge
- Output
- Hub and authority score vectors a and h
a normalization factor ß normalization factor
23Simplified Espresso is essentially HITS
- Simplified Espresso HITS
- Problem
- No matter which seed you start with, the same
instance is always ranked topmost - Semantic drift (also called topic drift in HITS)
The ranking vector i tends to the principal
eigenvector of ATA as the iteration proceeds
regardless of the seed instances!
24How about Espresso?
- Espresso has two heuristics not present in
Simplified Espresso - Early stopping
- Pattern and instance selection
- Do these heuristics really help reduce semantic
drift? And how?
25Experiments on semantic drift
- Does the heuristics in original Espresso help
reduce drift?
26Word sense disambiguation task of Senseval-3
English Lexical Sample
- Predict the sense of bank
the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
Training instances are annotated with their sense
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
Predict the sense of target word in the test set
Possibly aligned to water a sort of bank(???) by
a rushing river.
27Word sense disambiguation by Espresso
- Seed instance the instance to predict its sense
- Proximity measure instance score vector given
by Espresso -
Seed instance
28Example of k-NN classification by Espresso
-
- System output k-nearest neighbor (k3)
- i(0.9, 0.1, 0.8, 0.5, 0, 0, 0.95, 0.3, 0.2,
0.4) ? sense A
Seed instance
the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
29Two heuristics in Espresso
- Early stopping
- Plot results on each iteration
- Pattern and instance selection
- of patterns to retain p20 (increase p by 1 on
each iteration) - of instance to retain m100 (increase m by 100
on each iteration) - Evaluation metric
- Recall correct instances / total true
instances
30Convergence process of Espresso
Heuristics in Espresso helps reducing semantic
drift (However, early stopping is required for
optimal performance)
Original Espresso
Simplified Espresso
Most frequent sense (baseline)
Semantic drift occurs (always outputs the most
frequent sense)
Output the most frequent sense regardless of
input
31Learning curve of Espresso per-sense breakdown
Most frequent sense
of most frequent sense predictions increases
Recall for infrequent senses worsens even with
original Espresso
Other senses
32Summary Espresso and semantic drift
- Semantic drift happens because
- Espresso is designed like HITS
- HITS gives the same ranking list regardless of
seeds - Some heuristics reduce semantic drift
- Early stopping is crucial for optimal performance
- Still, these heuristics require
- many parameters to be calibrated
- but calibration is difficult
33Main contributions of this work
- Suggest a parallel between semantic drift in
Espresso-like bootstrapping and topic drift in
HITS (Kleinberg, 1999) - Solve semantic drift by graph kernels used in
link analysis community
34Q. What caused drift in Espresso?
- A. Espresso's resemblance to HITS
- HITS is an importance computation method
- (gives a single ranking list for any seeds)
- Why not use a method for another type of link
analysis measure - which takes seeds into
account? - "relatedness" measure
- (it gives different rankings for different seeds)
35The regularized Laplacian kernel
- A relatedness measure
- Has only one parameter
Normalized Graph Laplacian
A adjacency matrix of the graph D (diagonal)
degree matrix
Regularized Laplacian matrix
ß parameter Each column of Rß gives the rankings
relative to a node
36Word Sense Disambiguation
- Evaluation of regularized Laplacian against
Espresso and other graph-based algorithms
37Label prediction of bank (Recall)
Espresso suffers from semantic drift (unless
stopped at optimal stage)
Algorithm Most frequent sense Other senses
Simplified Espresso 100.0 0.0
Espresso (after convergence) 100.0 30.2
Espresso (optimal stopping) 94.4 67.4
Regularized Laplacian (ß10-2) 92.1 62.8
The regularized Laplacian keeps high recall for
infrequent senses
38WSD on all nouns in Senseval-3
Espresso needs optimal stopping to achieve an
equivalent performance
algorithm Recall
Most frequent sense (baseline) 54.5
HyperLex Agirre et al. 2005 64.6
PageRank Agirre et al. 2005 64.6
Simplified Espresso 44.1
Espresso (after convergence) 46.9
Espresso (optimal stopping) 66.5
Regularized Laplacian (ß10-2) 67.1
Outperforms other graph-based methods
39Regularized Laplacian is stable across a parameter
40Conclusions
- Semantic drift in Espresso is a parallel form of
topic drift in HITS - The regularized Laplacian reduces semantic drift
in bootstrapping for natural language processing
tasks - inherently a relatedness measure (?? importance
measure)
41Future work
- Investigate if a similar analysis is applicable
to a wider class of bootstrapping algorithms
(including co-training) - Investigate the influence of seed selection to
bootstrapping algorithms and propose a way to
select effective seed instances