Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms - PowerPoint PPT Presentation

About This Presentation
Title:

Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms

Description:

Title: Graph-based Analysis of Espresso-style Minimally-supervised Bootstrapping Algorithms Author: Mamoru Komachi Last modified by: Created Date – PowerPoint PPT presentation

Number of Views:148
Avg rating:3.0/5.0
Slides: 42
Provided by: Mamo74
Category:

less

Transcript and Presenter's Notes

Title: Graph-based%20Analysis%20of%20Espresso-style%20Minimally-supervised%20Bootstrapping%20Algorithms


1
Graph-based Analysis of Espresso-styleMinimally-s
upervised Bootstrapping Algorithms
Mamoru Komachi Nara Institute of Science and
Technology
  • Jan 15, 2010

2
Supervised learning has succeeded in many natural
language processing tasks
Corpus
Classifier
Corpus
Dictionary
Corpus
  • Needs time-consuming annotation (data creation)
  • Why not learning from minimally annotated
    resource?

3
Corpus-based extraction of semantic category
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Hong Kong
Singapore
Visa for __
Visa for Singapore
Australia
Travel guide to Singapore
Hong Kong
China
History of __
Egypt
Alternate step by step
4
Semantic drift is the central problem of
bootstrapping
Input
Output
(Extracted from corpus)
Instance
Pattern
New instance
Singapore
Australia
visa __ is
card
Generic patterns Patterns co-occurring with many
irrelevant instances
Semantic category changed!
messages
card
greeting __
words
Errors propagate to successive iteration
5
Two major problems solved by this work
  • Why semantic drift occurs?
  • Is there any way to prevent semantic drift?

6
Answers to the problems of semantic drift
  1. Suggest a parallel between semantic drift in
    Espresso Pantel and Pennachiotti, 2006 style
    bootstrapping and topic drift in HITS Kleinberg,
    1999
  2. Solve semantic drift using relatedness measure
    (regularized Laplacian) instead of importance
    measure (HITS authority) used in link analysis
    community

7
Table of contents
2. Overview of Bootstrapping Algorithms
4.Graph-based Analysis of Espresso-style
Bootstrapping Algorithms
3.Espresso-style Bootstrapping Algorithms
6.Bilingual Dictionary Construction
7.Learning Semantic Categories
5.Word Sense Disambiguation
8
Preliminaries Espresso and HITS
9
Espresso Algorithm Pantel and Pennacchiotti,
2006
  • Repeat
  • Pattern extraction
  • Pattern ranking
  • Pattern selection
  • Instance extraction
  • Instance ranking
  • Instance selection
  • Until a stopping criterion is met

10
Pattern/instance ranking is mutually defined in
Espresso
Reliable instances are supported by reliable
patterns, and vice versa
  • Score for pattern p
  • Score for instance i

p pattern i instance P set of patterns I set
of instances pmi pointwise mutual
information max pmi max of pmi in all the
patterns and instances
11
HITS (Hypertext Induced Topic Search)finds Hubs
and Authorities in a linked graph
Hub h
Authority a
Hub score sum of weights for all nodes pointed
by h Authority score sum of weights for all
nodes pointing to a
12
HITS Algorithm Kleinberg 1999
  • Input
  • Initial hub score vector
  • Adjacency matrix A
  • Main loop
  • Repeat
  • Until a and h converge
  • Output
  • Hub and authority score vectors a and h

a normalization factor ß normalization factor
13
HITS converges to fixed points regardless of
initial input
  • Authority score vector
  • Hub score vector

a(k) vector a on k-th iteration h(k) vector h
on k-th iteration
HITS authority vector a the principal
eigenvector of ATA HITS hub vector h the
principal eigenvector of AAT where ATA
co-citation matrix AAT bibliographic
coupling matrix
14
Graph-based Analysis of Espresso-style
Bootstrapping Algorithms
  • How Espresso works,
  • and how Espresso fails to solve semantic drift

15
Make Espresso look like HITS
  • p pattern score vector
  • i instance score vector
  • A pattern-instance matrix

P number of patterns I number of instances
normalization factors to keep score vectors not
too large
16
Espresso uses pattern-instance matrix A as
adjacency matrix in HITS
  • PI-dimensional matrix holding the
    (normalized) pointwise mutual information (pmi)
    between patterns and instances

1 2 ... i ... I
1
2

p

P
Ap,i pmi(p,i) / maxp,i pmi(p,i)
17
Three simplifications to reduce Espresso to HITS
For graph-theoretic analysis, we will introduce
3 simplifications to Espresso
  • Repeat
  • Pattern extraction
  • Pattern ranking
  • Pattern selection
  • Instance extraction
  • Instance ranking
  • Instance selection
  • Until a stopping criterion is met

18
Keep pattern-instance matrix constant in the main
loop
  • Compute the pattern-instance matrix
  • Repeat
  • Pattern extraction
  • Pattern ranking
  • Pattern selection
  • Instance extraction
  • Instance ranking
  • Instance selection
  • Until a stopping criterion is met

Simplification 1 Remove pattern/instance
extraction steps Instead, pre-compute all
patterns and instances once in the beginning of
the algorithm
19
Remove pattern/instance selection heuristics
  • Compute the pattern-instance matrix
  • Repeat
  • Pattern ranking
  • Pattern selection
  • Instance ranking
  • Instance selection
  • Until a stopping criterion is met

Simplification 2 Remove pattern/instance
selection steps which retain only highest scoring
k patterns / m instances for the next
iteration i.e., reset the scores of other items
to 0 Instead, retain scores of all patterns and
instances
20
Remove early stopping heuristics
  • Compute the pattern-instance matrix
  • Repeat
  • Pattern ranking
  • Instance ranking
  • Until a stopping criterion is met

Simplification 3 No early stopping i.e., run
until convergence
Until score vectors p and i converge
21
Simplified Espresso
  • Input
  • Initial score vector of seed instances
  • Pattern-instance co-occurrence matrix A
  • Main loop
  • Repeat
  • Until i and p converge
  • Output
  • Instance and pattern score vectors i and p

22
HITS Algorithm Kleinberg 1999
  • Input
  • Initial hub score vector
  • Adjacency matrix A
  • Main loop
  • Repeat
  • Until a and h converge
  • Output
  • Hub and authority score vectors a and h

a normalization factor ß normalization factor
23
Simplified Espresso is essentially HITS
  • Simplified Espresso HITS
  • Problem
  • No matter which seed you start with, the same
    instance is always ranked topmost
  • Semantic drift (also called topic drift in HITS)

The ranking vector i tends to the principal
eigenvector of ATA as the iteration proceeds
regardless of the seed instances!
24
How about Espresso?
  • Espresso has two heuristics not present in
    Simplified Espresso
  • Early stopping
  • Pattern and instance selection
  • Do these heuristics really help reduce semantic
    drift? And how?

25
Experiments on semantic drift
  • Does the heuristics in original Espresso help
    reduce drift?

26
Word sense disambiguation task of Senseval-3
English Lexical Sample
  • Predict the sense of bank

the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
Training instances are annotated with their sense
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
Predict the sense of target word in the test set
Possibly aligned to water a sort of bank(???) by
a rushing river.
27
Word sense disambiguation by Espresso
  • Seed instance the instance to predict its sense
  • Proximity measure instance score vector given
    by Espresso

Seed instance
28
Example of k-NN classification by Espresso
  • System output k-nearest neighbor (k3)
  • i(0.9, 0.1, 0.8, 0.5, 0, 0, 0.95, 0.3, 0.2,
    0.4) ? sense A

Seed instance
the financial benefits of the bank (finance) 's
employee package ( cheap mortgages and pensions,
etc ) , bring this up to
In that same year I was posted to South Shields
on the south bank (bank of the river) of the
River Tyne and quickly became aware that I had an
enormous burden
29
Two heuristics in Espresso
  • Early stopping
  • Plot results on each iteration
  • Pattern and instance selection
  • of patterns to retain p20 (increase p by 1 on
    each iteration)
  • of instance to retain m100 (increase m by 100
    on each iteration)
  • Evaluation metric
  • Recall correct instances / total true
    instances

30
Convergence process of Espresso
Heuristics in Espresso helps reducing semantic
drift (However, early stopping is required for
optimal performance)
Original Espresso
Simplified Espresso
Most frequent sense (baseline)
Semantic drift occurs (always outputs the most
frequent sense)
Output the most frequent sense regardless of
input
31
Learning curve of Espresso per-sense breakdown
Most frequent sense
of most frequent sense predictions increases
Recall for infrequent senses worsens even with
original Espresso
Other senses
32
Summary Espresso and semantic drift
  • Semantic drift happens because
  • Espresso is designed like HITS
  • HITS gives the same ranking list regardless of
    seeds
  • Some heuristics reduce semantic drift
  • Early stopping is crucial for optimal performance
  • Still, these heuristics require
  • many parameters to be calibrated
  • but calibration is difficult

33
Main contributions of this work
  1. Suggest a parallel between semantic drift in
    Espresso-like bootstrapping and topic drift in
    HITS (Kleinberg, 1999)
  2. Solve semantic drift by graph kernels used in
    link analysis community

34
Q. What caused drift in Espresso?
  • A. Espresso's resemblance to HITS
  • HITS is an importance computation method
  • (gives a single ranking list for any seeds)
  • Why not use a method for another type of link
    analysis measure - which takes seeds into
    account?
  • "relatedness" measure
  • (it gives different rankings for different seeds)

35
The regularized Laplacian kernel
  • A relatedness measure
  • Has only one parameter

Normalized Graph Laplacian
A adjacency matrix of the graph D (diagonal)
degree matrix
Regularized Laplacian matrix
ß parameter Each column of Rß gives the rankings
relative to a node
36
Word Sense Disambiguation
  • Evaluation of regularized Laplacian against
    Espresso and other graph-based algorithms

37
Label prediction of bank (Recall)
Espresso suffers from semantic drift (unless
stopped at optimal stage)
Algorithm Most frequent sense Other senses
Simplified Espresso 100.0 0.0
Espresso (after convergence) 100.0 30.2
Espresso (optimal stopping) 94.4 67.4
Regularized Laplacian (ß10-2) 92.1 62.8
The regularized Laplacian keeps high recall for
infrequent senses
38
WSD on all nouns in Senseval-3
Espresso needs optimal stopping to achieve an
equivalent performance
algorithm Recall
Most frequent sense (baseline) 54.5
HyperLex Agirre et al. 2005 64.6
PageRank Agirre et al. 2005 64.6
Simplified Espresso 44.1
Espresso (after convergence) 46.9
Espresso (optimal stopping) 66.5
Regularized Laplacian (ß10-2) 67.1
Outperforms other graph-based methods
39
Regularized Laplacian is stable across a parameter
40
Conclusions
  • Semantic drift in Espresso is a parallel form of
    topic drift in HITS
  • The regularized Laplacian reduces semantic drift
    in bootstrapping for natural language processing
    tasks
  • inherently a relatedness measure (?? importance
    measure)

41
Future work
  • Investigate if a similar analysis is applicable
    to a wider class of bootstrapping algorithms
    (including co-training)
  • Investigate the influence of seed selection to
    bootstrapping algorithms and propose a way to
    select effective seed instances
Write a Comment
User Comments (0)
About PowerShow.com