Structured Prediction, Dual Extragradient - PowerPoint PPT Presentation

1 / 32
About This Presentation
Title:

Structured Prediction, Dual Extragradient

Description:

En vertu des nouvelles propositions, quel est le co t pr vu ... orthography. What. is. the. anticipated. cost. of. collecting. fees. under. the. new. proposal ... – PowerPoint PPT presentation

Number of Views:72
Avg rating:3.0/5.0
Slides: 33
Provided by: csr85
Category:

less

Transcript and Presenter's Notes

Title: Structured Prediction, Dual Extragradient


1
Structured Prediction, Dual Extragradient
Bregman Projections
  • Ben Taskar
  • Simon Lacoste-Julien
  • Michael Jordan
  • UC Berkeley

2
Handwriting Recognition
x
y
brace
Sequential structure
3
Natural Language Parsing
x
y
The screen was a sea of red
Recursive structure
4
Object Segmentation
x
y
Spatial structure
5
Bilingual Word Alignment
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de les
droits ?
x
y
What is the anticipated cost of collecting fees
under the new proposal ?
What is the anticipated cost of collecting fees
under the new proposal?
En vertu des nouvelles propositions, quel est le
coût prévu de perception des droits?
Combinatorial structure
6
Linear Structured Models
scoring function
space of feasible outputs
  • Assumption
  • linear combination of features

7
Chain Markov Net (aka CRF)
y
x
Lafferty et al. 01
8
Chain Markov Net (aka CRF)
y
x
Lafferty et al. 01
9
CFG Parsing
(NP ? DT NN) (PP ? IN NP) (NN ? sea)
10
Bilingual Word Alignment
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de le
droits ?
What is the anticipated cost of collecting fees
under the new proposal ?
  • association
  • position
  • orthography

k
j
11
Ising Models Min Cuts
Point features spin image
1
0
Edge features length, angle
  • Find max y via min-cut if edge scores are
    non-negative
  • Restrict edge features and
    weights

12
Linear Structured Models
scoring function
space of feasible outputs
  • Assumptions
  • linear combination of features
  • sum of part scores

13
Learning w
  • Training examples
  • Probabilistic approach
  • Computing Zw(x) can be P-complete
  • Tractable models but intractable estimation
  • Large margin approach
  • Exact and efficient when prediction is tractable

14
Alignment Example Loss
  • We want
  • Structured Loss
  • Precision, Recall, F1, Hamming

What is the Quel est le
0 1 2 2
What is the Quel est le
15
Alignment Example Constraints
  • We want
  • Equivalently

What is the Quel est le
1 2 3
1 2 3
What is the Quel est le
What is the Quel est le
Exponential number of constraints
What is the Quel est le
What is the Quel est le

What is the Quel est le
What is the Quel est le
16
Large Margin Estimation
  • Approximation constraint generation/sampling
    Collins02Altun03Tsochantaridis04Joachims0
    5
  • Alternative approach
  • Hinge loss
  • Min-max formulation

17
Alternatives Constraint Generation
  • Add most violated constraint
  • Handles more general loss functions
  • Only polynomial of constraints needed
  • Need to re-solve QP many times
  • Worst case of constraints larger than factored

Collins 02 Altun et al, 03 Tsochantaridis et
al, 04
18
Min-max Formulation
Structured loss (Hamming)
Inference
LP Inference
Key step
discrete optim.
continuous optim.
19
y ? z Map for Markov Nets
20
Markov Net Inference LP
normalization
agreement
Has integral solutions z for chains, trees Can be
fractional for untriangulated networks
21
Matching Inference LP
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de le
droits ?
k
What is the anticipated cost of collecting fees
under the new proposal ?
degree
j
Has integral solutions z
22
Saddle-point Problem
23
First Try Projected Gradient
Euclidean projection
Can oscillate!
no convergence guarantee
24
Dual Extragradient for Structured Prediction
State -- cumulative gradient
Start
Prediction
Correction
Cumulative gradient update
Output
O(1/?) convergence rate
Nesterov03
25
for Bipartite Matchings Min Cost Flow
t
s
  • All capacities 1
  • Min-cost quadratic flow computes projection
  • O(N3) complexity for fixed precision (Nnum
    nodes)
  • Well-studied problem, free code (Guerreiro
    Tseng02)
  • See paper for flow-reduction for min-cuts

26
Non-Euclidean Dual Extragradient
d( , ) Bregman divergence
Prediction
Correction
Cumulative dual gradient update
Output
27
Bregman Divergence Updates
  • Squared distance ? Euclidean projections
  • KL-distance ? Multiplicative update
    normalization
  • In case of sequences trees, can be computed via
    forward-backward and inside-outside

28
Memory-efficient Version
  • Required memory
  • -- proportional to number of parameters
  • -- proportional to number/size of
    examples
  • Luckily, we dont need to maintain explicit
  • Sufficient to maintain with memory proportional
    to number of parameters
  • Similar trick works in Exponentiated Gradient
    Bartlett04 which requires decomposable models

29
Experiments
  • Word Alignment
  • Training data
  • 5000 sentences
  • 555K edges
  • Object Segmentation
  • Training data
  • 5 scenes
  • 37K nodes
  • 88K edges
  • Compare to averaged perceptron

30
(Averaged) Perceptron
  • Perceptron for structured output Collins 2002
  • For each example ,
  • Predict
  • Update
  • Output averaged parameters

31
Matchings
32
Min-cuts
33
Conclusion
  • General technique for structured large-margin
    estimation
  • Exact, compact, convex formulations
  • Allow efficient use of kernels
  • Tractable when other estimation methods are not
  • Memory efficient learning algorithms
  • See http//www.cs.berkeley.edu/taskar for paper
Write a Comment
User Comments (0)
About PowerShow.com