Structured Prediction, Dual Extragradient

About This Presentation

Title:

Structured Prediction, Dual Extragradient

Description:

En vertu des nouvelles propositions, quel est le co t pr vu ... orthography. What. is. the. anticipated. cost. of. collecting. fees. under. the. new. proposal ... – PowerPoint PPT presentation

Number of Views:72

Avg rating:3.0/5.0

Slides: 33

Provided by: csr85

Category:

more less

Transcript and Presenter's Notes

Title: Structured Prediction, Dual Extragradient

1
Structured Prediction, Dual Extragradient
Bregman Projections

Ben Taskar
Simon Lacoste-Julien
Michael Jordan
UC Berkeley

2
Handwriting Recognition
x
y
brace
Sequential structure
3
Natural Language Parsing
x
y
The screen was a sea of red
Recursive structure
4
Object Segmentation
x
y
Spatial structure
5
Bilingual Word Alignment
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de les
droits ?
x
y
What is the anticipated cost of collecting fees
under the new proposal ?
What is the anticipated cost of collecting fees
under the new proposal?
En vertu des nouvelles propositions, quel est le
coût prévu de perception des droits?
Combinatorial structure
6
Linear Structured Models
scoring function
space of feasible outputs

Assumption
linear combination of features

7
Chain Markov Net (aka CRF)
y
x
Lafferty et al. 01
8
Chain Markov Net (aka CRF)
y
x
Lafferty et al. 01
9
CFG Parsing
(NP ? DT NN) (PP ? IN NP) (NN ? sea)
10
Bilingual Word Alignment
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de le
droits ?
What is the anticipated cost of collecting fees
under the new proposal ?

association
position
orthography

k
j
11
Ising Models Min Cuts
Point features spin image
1
0
Edge features length, angle

Find max y via min-cut if edge scores are
non-negative
Restrict edge features and
weights

12
Linear Structured Models
scoring function
space of feasible outputs

Assumptions
linear combination of features
sum of part scores

13
Learning w

Training examples
Probabilistic approach
Computing Zw(x) can be P-complete
Tractable models but intractable estimation
Large margin approach
Exact and efficient when prediction is tractable

14
Alignment Example Loss

We want
Structured Loss
Precision, Recall, F1, Hamming

What is the Quel est le
0 1 2 2
What is the Quel est le
15
Alignment Example Constraints

We want
Equivalently

What is the Quel est le
1 2 3
1 2 3
What is the Quel est le
What is the Quel est le
Exponential number of constraints
What is the Quel est le
What is the Quel est le

What is the Quel est le
What is the Quel est le
16
Large Margin Estimation

Approximation constraint generation/sampling
Collins02Altun03Tsochantaridis04Joachims0
5
Alternative approach
Hinge loss
Min-max formulation

17
Alternatives Constraint Generation

Add most violated constraint
Handles more general loss functions
Only polynomial of constraints needed
Need to re-solve QP many times
Worst case of constraints larger than factored

Collins 02 Altun et al, 03 Tsochantaridis et
al, 04
18
Min-max Formulation
Structured loss (Hamming)
Inference
LP Inference
Key step
discrete optim.
continuous optim.
19
y ? z Map for Markov Nets
20
Markov Net Inference LP
normalization
agreement
Has integral solutions z for chains, trees Can be
fractional for untriangulated networks
21
Matching Inference LP
En vertu de les nouvelles propositions , quel
est le coût prévu de perception de le
droits ?
k
What is the anticipated cost of collecting fees
under the new proposal ?
degree
j
Has integral solutions z
22
Saddle-point Problem
23
First Try Projected Gradient
Euclidean projection
Can oscillate!
no convergence guarantee
24
Dual Extragradient for Structured Prediction
State -- cumulative gradient
Start
Prediction
Correction
Cumulative gradient update
Output
O(1/?) convergence rate
Nesterov03
25
for Bipartite Matchings Min Cost Flow
t
s

All capacities 1
Min-cost quadratic flow computes projection
O(N3) complexity for fixed precision (Nnum
nodes)
Well-studied problem, free code (Guerreiro
Tseng02)
See paper for flow-reduction for min-cuts

26
Non-Euclidean Dual Extragradient
d( , ) Bregman divergence
Prediction
Correction
Cumulative dual gradient update
Output
27
Bregman Divergence Updates

Squared distance ? Euclidean projections
KL-distance ? Multiplicative update
normalization
In case of sequences trees, can be computed via
forward-backward and inside-outside

28
Memory-efficient Version

Required memory
-- proportional to number of parameters
-- proportional to number/size of
examples
Luckily, we dont need to maintain explicit
Sufficient to maintain with memory proportional
to number of parameters
Similar trick works in Exponentiated Gradient
Bartlett04 which requires decomposable models

29
Experiments