Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms - PowerPoint PPT Presentation

About This Presentation

Title:

Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms

Description:

Title: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Author: Mamoru Komachi Last modified by: Created Date – PowerPoint PPT presentation

Number of Views:153

Avg rating:3.0/5.0

Slides: 42

Provided by: Mamo74

Category:

more less

Transcript and Presenter's Notes

Title: Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms

1
Graph-based Analysis of Espresso-styleMinimally-s
upervised Bootstrapping Algorithms
Mamoru Komachi Nara Institute of Science and
Technology

Jan 15, 2010

2
Supervised learning has succeeded in many natural
language processing tasks
Corpus
Classifier
Corpus
Dictionary
Corpus

Needs time-consuming annotation (data creation)
Why not learning from minimally annotated
resource?

3
Corpus-based extraction of semantic category
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Hong Kong
Singapore
Visa for __
Visa for Singapore
Australia
Travel guide to Singapore
Hong Kong
China
History of __
Egypt
Alternate step by step
4
Semantic drift is the central problem of
bootstrapping
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Singapore
Australia
visa __ is
card
Generic patterns Patterns co-occurring with many
irrelevant instances
Semantic category changed!
messages
card
greeting __
words
Errors propagate to successive iteration
5
Two major problems solved by this work

Why semantic drift occurs?
Is there any way to prevent semantic drift?

6
Answers to the problems of semantic drift

Suggest a parallel between semantic drift in
Espresso Pantel and Pennachiotti, 2006 style
bootstrapping and topic drift in HITS Kleinberg,
1999
Solve semantic drift using relatedness measure
(regularized Laplacian) instead of importance
measure (HITS authority) used in link analysis
community

7
Table of contents
2. Overview of Bootstrapping Algorithms
4.Graph-based Analysis of Espresso-style
Bootstrapping Algorithms
3.Espresso-style Bootstrapping Algorithms
6.Bilingual Dictionary Construction
7.Learning Semantic Categories
5.Word Sense Disambiguation
8
Preliminaries Espresso and HITS
9
Espresso Algorithm Pantel and Pennacchiotti,
2006

Repeat
Pattern extraction
Pattern ranking
Pattern selection
Instance extraction
Instance ranking
Instance selection
Until a stopping criterion is met

10
Pattern/instance ranking is mutually defined in
Espresso
Reliable instances are supported by reliable
patterns, and vice versa

Score for pattern p

Score for instance i

p pattern i instance P set of patterns I set
of instances pmi pointwise mutual
information max pmi max of pmi in all the
patterns and instances
11
HITS (Hypertext Induced Topic Search)finds Hubs
and Authorities in a linked graph
Hub h
Authority a
Hub score sum of weights for all nodes pointed
by h Authority score sum of weights for all
nodes pointing to a
12
HITS Algorithm Kleinberg 1999

Input
Initial hub score vector
Adjacency matrix A
Main loop
Repeat
Until a and h converge
Output
Hub and authority score vectors a and h

a normalization factor ß normalization factor
13
HITS converges to fixed points regardless of
initial input

Authority score vector
Hub score vector

a(k) vector a on k-th iteration h(k) vector h
on k-th iteration
HITS authority vector a the principal
eigenvector of ATA HITS hub vector h the
principal eigenvector of AAT where ATA
co-citation matrix AAT bibliographic
coupling matrix
14
Graph-based Analysis of Espresso-style
Bootstrapping Algorithms

How Espresso works,
and how Espresso fails to solve semantic drift

15
Make Espresso look like HITS

p pattern score vector
i instance score vector
A pattern-instance matrix

P number of patterns I number of instances
normalization factors to keep score vectors not
too large
16
Espresso uses pattern-instance matrix A as
adjacency matrix in HITS

PI-dimensional matrix holding the
(normalized) pointwise mutual information (pmi)
between patterns and instances

1 2 ... i ... I
1
2

p

P
Ap,i pmi(p,i) / maxp,i pmi(p,i)
17
Three simplifications to reduce Espresso to HITS
For graph-theoretic analysis, we will introduce
3 simplifications to Espresso

Repeat
Pattern extraction
Pattern ranking
Pattern selection
Instance extraction
Instance ranking
Instance selection
Until a stopping criterion is met

18
Keep pattern-instance matrix constant in the main
loop

Compute the pattern-instance matrix
Repeat
Pattern extraction
Pattern ranking
Pattern selection
Instance extraction
Instance ranking
Instance selection
Until a stopping criterion is met

Simplification 1 Remove pattern/instance
extraction steps Instead, pre-compute all
patterns and instances once in the beginning of
the algorithm
19
Remove pattern/instance selection heuristics

Compute the pattern-instance matrix
Repeat
Pattern ranking
Pattern selection
Instance ranking
Instance selection
Until a stopping criterion is met

Simplification 2 Remove pattern/instance
selection steps which retain only highest scoring
k patterns / m instances for the next
iteration i.e., reset the scores of other items
to 0 Instead, retain scores of all patterns and
instances
20
Remove early stopping heuristics

Compute the pattern-instance matrix
Repeat
Pattern ranking
Instance ranking
Until a stopping criterion is met

Simplification 3 No early stopping i.e., run
until convergence
Until score vectors p and i converge
21
Simplified Espresso

Input
Initial score vector of seed instances
Pattern-instance co-occurrence matrix A
Main loop
Repeat
Until i and p converge
Output
Instance and pattern score vectors i and p

22
HITS Algorithm Kleinberg 1999

Input
Initial hub score vector
Adjacency matrix A
Main loop
Repeat
Until a and h converge
Output
Hub and authority score vectors a and h

a normalization factor ß normalization factor
23
Simplified Espresso is essentially HITS

Simplified Espresso HITS
Problem
No matter which seed you start with, the same
instance is always ranked topmost
Semantic drift (also called topic drift in HITS)

The ranking vector i tends to the principal
eigenvector of ATA as the iteration proceeds
regardless of the seed instances!
24
How about Espresso?

Espresso has two heuristics not present in
Simplified Espresso
Early stopping
Pattern and instance selection
Do these heuristics really help reduce semantic
drift? And how?

25
Experiments on semantic drift

Does the heuristics in original Espresso help
reduce drift?

26
Word sense disambiguation task of Senseval-3
English Lexical Sample

Predict the sense of bank

the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
Training instances are annotated with their sense
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
Predict the sense of target word in the test set
Possibly aligned to water a sort of bank(???) by
a rushing river.
27
Word sense disambiguation by Espresso

Seed instance the instance to predict its sense
Proximity measure instance score vector given
by Espresso

Seed instance
28
Example of k-NN classification by Espresso

System output k-nearest neighbor (k3)
i(0.9, 0.1, 0.8, 0.5, 0, 0, 0.95, 0.3, 0.2,
0.4) ? sense A

Seed instance
the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
29
Two heuristics in Espresso

Early stopping
Plot results on each iteration
Pattern and instance selection
of patterns to retain p20 (increase p by 1 on
each iteration)
of instance to retain m100 (increase m by 100
on each iteration)
Evaluation metric
Recall correct instances / total true
instances

30
Convergence process of Espresso
Heuristics in Espresso helps reducing semantic
drift (However, early stopping is required for
optimal performance)
Original Espresso
Simplified Espresso
Most frequent sense (baseline)
Semantic drift occurs (always outputs the most
frequent sense)
Output the most frequent sense regardless of
input
31
Learning curve of Espresso per-sense breakdown
Most frequent sense
of most frequent sense predictions increases
Recall for infrequent senses worsens even with
original Espresso
Other senses
32
Summary Espresso and semantic drift

Semantic drift happens because
Espresso is designed like HITS
HITS gives the same ranking list regardless of
seeds
Some heuristics reduce semantic drift
Early stopping is crucial for optimal performance
Still, these heuristics require
many parameters to be calibrated
but calibration is difficult

33
Main contributions of this work

Suggest a parallel between semantic drift in
Espresso-like bootstrapping and topic drift in
HITS (Kleinberg, 1999)
Solve semantic drift by graph kernels used in
link analysis community

34
Q. What caused drift in Espresso?

A. Espresso's resemblance to HITS
HITS is an importance computation method
(gives a single ranking list for any seeds)
Why not use a method for another type of link
analysis measure - which takes seeds into
account?
"relatedness" measure
(it gives different rankings for different seeds)

35
The regularized Laplacian kernel

A relatedness measure
Has only one parameter

Normalized Graph Laplacian
A adjacency matrix of the graph D (diagonal)
degree matrix
Regularized Laplacian matrix
ß parameter Each column of Rß gives the rankings
relative to a node
36
Word Sense Disambiguation

Evaluation of regularized Laplacian against
Espresso and other graph-based algorithms

37
Label prediction of bank (Recall)
Espresso suffers from semantic drift (unless
stopped at optimal stage)
Algorithm Most frequent sense Other senses
Simplified Espresso 100.0 0.0
Espresso (after convergence) 100.0 30.2
Espresso (optimal stopping) 94.4 67.4
Regularized Laplacian (ß10-2) 92.1 62.8
The regularized Laplacian keeps high recall for
infrequent senses
38
WSD on all nouns in Senseval-3
Espresso needs optimal stopping to achieve an
equivalent performance
algorithm Recall
Most frequent sense (baseline) 54.5
HyperLex Agirre et al. 2005 64.6
PageRank Agirre et al. 2005 64.6
Simplified Espresso 44.1
Espresso (after convergence) 46.9
Espresso (optimal stopping) 66.5
Regularized Laplacian (ß10-2) 67.1
Outperforms other graph-based methods
39
Regularized Laplacian is stable across a parameter
40
Conclusions

Semantic drift in Espresso is a parallel form of
topic drift in HITS
The regularized Laplacian reduces semantic drift
in bootstrapping for natural language processing
tasks
inherently a relatedness measure (?? importance
measure)

41
Future work

Investigate if a similar analysis is applicable
to a wider class of bootstrapping algorithms
(including co-training)
Investigate the influence of seed selection to
bootstrapping algorithms and propose a way to
select effective seed instances

Write a Comment

User Comments (0)