Max-margin sequential learning methods

About This Presentation

Title:

Max-margin sequential learning methods

Description:

Title: PowerPoint Presentation Last modified by: William Cohen Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:46

Avg rating:3.0/5.0

Slides: 28

Provided by: cmu133

Learn more at: http://www.cs.cmu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Max-margin sequential learning methods

1
Max-margin sequential learning methods

William W. Cohen
CALD

2
Announcements

Upcoming assignments
Wed 3/3 project proposal due
personnel 1-2 page
Spring break next week, no class
Will get feedback on project proposals by end of
break
No write-ups for Distance Metrics for Text week
are due Wed 3/17
not the Monday after spring break

3
Collins paper

Notation
label (y) is a tag t
observation (x) is word w
history h is a 4-tuple ltti,ti-1,w1n,igt
phis(h,t) is a feature of h, t

4
Collins papers

Notation cont
Phi is summation of phi for all positions i
alphas is weight to give phis

5
Collins paper
6
The theory
Claim 1 the algorithm is an instance of this
perceptron variant
Claim 2 the arguments in the mistake-bounded
classification results of FS99 extend
immediately to this ranking task as well.
7
(No Transcript)
8
FS99 algorithm
9
FS99 result
10
Collins result
11
Results

Two experiments
POS tagging, using the Adwaits features
NP chunking (Start,Continue,Outside tags)
NER on special ATT dataset (another paper)

12
Features for NP chunking
13
Results
14
More ideas

The dual version of a perceptron
w is built up by repeatedly adding examples gt w
is a weighted sum of the examples x1,...,xn
inner product ltw,xgt is can be rewritten

15
Dual version of perceptron ranking
alpha i,j i,j range over example and
correct/incorrect tag sequence
16
NER features for re-ranking MAXENT tagger output
17
NER features
18
NER results
19
Altun et al paper

Starting point dual version of Collins
perceptron algorithm
final hypothesis is weighted sum of inner
products with a subset of the examples
this a lot like an SVM except that the
perceptron algorithm is used to set the weights
rather than quadratic optimization

20
SVM optimization

Notation
yi is the correct tag for xi
y is an incorrect tag
F(xi,yi) are features
Optimization problem
find weights w on the examples that maximize
minimal margin, limiting w1, or
minimize w2 such that every margin gt 1

21
SVMs for ranking
22
SVMs for ranking
Proposition (14) and (15) are equivalent
23
SVMs for ranking
A binary classification problem with xi yi the
positive example and xi y negative examples,
except that thetai varies for each example. Why?
because were ranking.
24
SVMs for ranking

Altun et al work give the remaining details
Like for perceptron learning, negative data is
found by running Viterbi given the learned
weights and looking for errors
Each mistake is a possible new support vector
Need to iterate over the data repeatedly
Could be exponential time before convergence if
the support vectors are dense...

25
Altun et al results