Max-margin sequential learning methods - PowerPoint PPT Presentation

About This Presentation
Title:

Max-margin sequential learning methods

Description:

Title: PowerPoint Presentation Last modified by: William Cohen Created Date: 1/1/1601 12:00:00 AM Document presentation format: On-screen Show Other titles – PowerPoint PPT presentation

Number of Views:43
Avg rating:3.0/5.0
Slides: 28
Provided by: cmu133
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Max-margin sequential learning methods


1
Max-margin sequential learning methods
  • William W. Cohen
  • CALD

2
Announcements
  • Upcoming assignments
  • Wed 3/3 project proposal due
  • personnel 1-2 page
  • Spring break next week, no class
  • Will get feedback on project proposals by end of
    break
  • No write-ups for Distance Metrics for Text week
    are due Wed 3/17
  • not the Monday after spring break

3
Collins paper
  • Notation
  • label (y) is a tag t
  • observation (x) is word w
  • history h is a 4-tuple ltti,ti-1,w1n,igt
  • phis(h,t) is a feature of h, t

4
Collins papers
  • Notation cont
  • Phi is summation of phi for all positions i
  • alphas is weight to give phis

5
Collins paper
6
The theory
Claim 1 the algorithm is an instance of this
perceptron variant
Claim 2 the arguments in the mistake-bounded
classification results of FS99 extend
immediately to this ranking task as well.
7
(No Transcript)
8
FS99 algorithm
9
FS99 result
10
Collins result
11
Results
  • Two experiments
  • POS tagging, using the Adwaits features
  • NP chunking (Start,Continue,Outside tags)
  • NER on special ATT dataset (another paper)

12
Features for NP chunking
13
Results
14
More ideas
  • The dual version of a perceptron
  • w is built up by repeatedly adding examples gt w
    is a weighted sum of the examples x1,...,xn
  • inner product ltw,xgt is can be rewritten

15
Dual version of perceptron ranking
alpha i,j i,j range over example and
correct/incorrect tag sequence
16
NER features for re-ranking MAXENT tagger output
17
NER features
18
NER results
19
Altun et al paper
  • Starting point dual version of Collins
    perceptron algorithm
  • final hypothesis is weighted sum of inner
    products with a subset of the examples
  • this a lot like an SVM except that the
    perceptron algorithm is used to set the weights
    rather than quadratic optimization

20
SVM optimization
  • Notation
  • yi is the correct tag for xi
  • y is an incorrect tag
  • F(xi,yi) are features
  • Optimization problem
  • find weights w on the examples that maximize
    minimal margin, limiting w1, or
  • minimize w2 such that every margin gt 1

21
SVMs for ranking
22
SVMs for ranking
Proposition (14) and (15) are equivalent
23
SVMs for ranking
A binary classification problem with xi yi the
positive example and xi y negative examples,
except that thetai varies for each example. Why?
because were ranking.
24
SVMs for ranking
  • Altun et al work give the remaining details
  • Like for perceptron learning, negative data is
    found by running Viterbi given the learned
    weights and looking for errors
  • Each mistake is a possible new support vector
  • Need to iterate over the data repeatedly
  • Could be exponential time before convergence if
    the support vectors are dense...

25
Altun et al results
  • NER on 300 sentences from CoNLL2002 shared task
  • Spanish
  • Four entity types, nine labels (beginning-T,
    intermediate-T, other)
  • POS tagging on 300 sentences from Penn TreeBank
  • 5-CV, window of size 3, simple features

26
Altun et al results
27
Altun et al results
Write a Comment
User Comments (0)
About PowerShow.com