Inversion Transduction Grammar with Linguistic Constraints - PowerPoint PPT Presentation

1 / 83

About This Presentation

Title:

Inversion Transduction Grammar with Linguistic Constraints

Description:

Statistical Machine Translation. Input: Source language sentence E. Goal: ... Valuable resource for training and testing statistical machine translation systems ... – PowerPoint PPT presentation

Number of Views:212

Avg rating:3.0/5.0

Slides: 84

Provided by: ColinC98

Category:

more less

Transcript and Presenter's Notes

Title: Inversion Transduction Grammar with Linguistic Constraints

1
Inversion Transduction Grammar with Linguistic
Constraints

Colin Cherry
University of Alberta

2
Edmonton Weather (Tuesday)
3
Outline

Bitext and Bitext Parsing
Inversion Transduction Grammar (ITG)
ITG with Linguistic Constraints
Discriminative ITG with Linguistic Features
Other Projects

4
Statistical Machine Translation

Input
Source language sentence E
Goal
Produce a well-formed target language sentence F
with same meaning as E
Process
Decoding search for an operation sequence O that
transforms E into F
Weights on individual operations are determined
empirically from examples of translation

5
Bitext
Text in English
Same text, in French

Valuable resource for training and testing
statistical machine translation systems
Large-scale examples of translation
Needs analysis to determine small-scale
operations that generalize to unseen sentences

6
Word Alignment

Given a sentence and its translation, find the
word-to-word connections

the
minister
in
charge
of
the
Canadian
Wheat
Board
le
ministre
chargé
de
la
Commission
Canadienne
du
blé
7
Word Alignment

Given a sentence and its translation, find the
word-to-word connections
Link a single word-to-word connection

the
minister
in
charge
of
the
Canadian
Wheat
Board
le
ministre
chargé
de
la
Commission
Canadienne
du
blé
8
Given a Word Alignment

Extract bilingual phrase pairs for phrasal SMT
(Koehn et al. 2003)
Add in a parse tree and
Extract treelet pairs for dependency translation
(Quirk et al. 2005)
Extract rules for a tree transducer (Galley et
al. 2004)
Other fun things
Train monolingual paraphrasers (Quirk et al.
2004, Callison-Burch et al. 2005)

9
Bitext Parsing

Assume a context-free grammar generates two
languages at once
Like joint models, but position of words in both
languages is controlled by grammar

10
Monolingual Parsing
Non-terminals
S
NP
VP
Production NP?Adj N
V
NP
Adv
V
Det
NP
Terminals
Adj
N
always
verbs
the
adjective
noun
he
11
Another view
S
S ?NP VP
VP
NP
VP ?V NP
V
NP
NP

always verbs
he
the adjective noun
12
Bitext Parsing is in 2D
S
English
French
13
Bitext Parsing is in 2D
VP
English
NP
French
14
Bitext Parsing is in 2D
NP
V
English
NP
French
15
Bitext Parsing is in 2D
NP
V
English
Adv
NP
French
16
Bitext Parsing is in 2D
NP
Det
V
English
Adv
NP
French
17
Bitext Parsing is in 2D
N
Adj
Det
V
English
Adv
NP
French
18
Bitext Parsing is in 2D
N
noun
Adj
adjective
Det
the
V
verbs
Adv
always
NP
he
il
verbe
toujours
le
nom
adjectif
19
Why Bitext Parsing?

Established polynomial algorithms
Flexible framework, easy to add info
Parse given an alignment
Align given a parse (this work)
Discoveries can be ported to parser-based
decoders (Zens et al. 2004, Melamed 2004)
Advances in parsing can be ported to word
alignment

20
Outline
?

Bitext and Bitext Parsing
Inversion Transduction Grammar (ITG)
ITG with Linguistic Constraints
Discriminative ITG with Linguistic Features
Other Projects

21
Inversion Transduction Grammar

Introduced in by Wu (1997)
Transduction
N ? noun / nom
Inversion
NP ? Det NP
NP ? ltAdj Ngt

N
noun
nom
Straight
Inverted
22
Binary Bracketing

A?AA
A?ltAAgt
A?e/f
No linguistic meaning to A

23
Tree visualization
24
Pros and Cons of Bracketing

Pros
Language independent
Straight-forward and fast
Symbols are minimally restrictive
Cons
Grammar is meaningless
ITG Constraint

25
ITG Constraint
12 are acceptable
to the commission
Mr Burton
fully or in part
12 are acceptable
to the commission
Mr Burton
fully or in part
26
Outline
?

Bitext and Bitext Parsing
Inversion Transduction Grammar (ITG)
ITG with Linguistic Constraints
Discriminative ITG with Linguistic Features
Other Projects

?
27
Some questions

Those ITG constraints are kind of scary. How bad
are they? Do they ever help?
Can we inject some linguistics into this
otherwise purely syntactic process?
Linguistic grammar would limit trees that can be
built - and therefore limit alignments

28
Alignment Spaces

Set of feasible alignments for a sentence pair
Described by how links interact
If links dont interact, problem loses its
structure
Should encourage competition between links
(Guidance)
Should not eliminate correct alignments
(Expressiveness)

29
ITG Space

Rules out inside-out alignments
Limits how concepts can be re-ordered during
translation

30
Permutation Space

One-to-one each word in at most one link
Allows any permutation of concepts
Reduces to weighted maximum matching if each link
can be scored independently

the
tax
causes
unrest
l
impôt
cause
le
malaise
31
Linguistic source Dependencies

Tree structure defines dependencies between words
Subtrees define contiguous phrases

the
minister
in
charge
of
32
Linguistic source Dependencies

Tree structure defines dependencies between words
Subtrees define contiguous phrases

the
minister
in
charge
of
33
Phrasal Cohesion

Syntactic phrases in tree tend to stay together
after translation (Fox 2002)
We can use this idea to constrain an alignment
given an English dependency tree
Shown to improve alignment quality
(Lin and Cherry 2003)

34
Example
the
tax
causes
unrest
l
impôt
cause
le
malaise
35
Example
the
tax
causes
unrest
l
impôt
cause
le
malaise
We can rule out the link, even with no one-to-one
violation
36
ITG Dependency

Both limit movement with phrasal cohesion
ITG Cohesive in some binary tree
Dep Cohesive in provided dependency tree
Not subspaces of each other

the
big
red
dog
the
dog
ate
it
Dep ? ITG x
Dep x ITG ?
37
D-ITG Space

Force ITG to maintain phrasal cohesion with a
provided dependency tree
Intersects ITG and Dependency spaces
Adds linguistic dependency tree to ITG parsing

38
Chart Modification Solution

Eliminate structures that allow tax to invert
away from the

the
tax
causes
unrest
39
Effect on Parser
A ?
unrest
A x
causes
tax
A ?
the
l
impôt
cause
le
malaise
40
Effect on Parser
A ?
unrest
causes
A ?
tax
the
l
impôt
cause
le
malaise
41
Continuum of constraints
Permutation
ITG
D-ITG
Unconstrained
42
Experimental Setup

English-French Parliamentary debates
500 sentence labeled test set
(Och and Ney, 2003)
Dependency parses from Minipar

43
Guidance Test

Does the space stop incorrect alignments?
Use a weighted link score built from
Bilingual correlations between words
Relative position of tokens
Maximize summed link scores in all spaces, check
alignment error rate
AER Combined precision and recall, lower is
better

44
Guidance Results
45
Expressiveness Test

Given a strong model, does the space hold us
back?
Use a cooked link score from the gold standard
Only correct links are given positive scores
Best space is unconstrained space
Maximize summed link scores in all spaces, check
recall

46
Expressiveness Results
47
Contributions

Algorithmic
Method to inject ITG with linguistic constraints
Experimental
ITG constraints provide guidance, with virtually
no loss in expressiveness (French-English)
Dependency cohesion constraints provide greater
guidance, at the cost of some expressiveness

48
Outline
?

Bitext and Bitext Parsing
Inversion Transduction Grammar (ITG)
ITG with Linguistic Constraints
Discriminative ITG with Linguistic Features
Other Projects

?
?
49
Remaining Problems

Dependency cohesion stops correct links
Parse errors, Paraphrase, Exceptions
Would like a soft constraint
Im not doing much learning
?2 competitive linking with an ITG search

50
Soft Constraint

Invalid spans need not be disallowed
Instead parser could incur a penalty
Easy to incorporate penalty into DP

51
ITG Learning

Zhang and Gildea 2004, 2005, 2006
Expectation Maximization to parameterize a
stochastic grammar unsupervised
Driven by expensive 2D inside-outside
Not doing much better than I am with ?2
Meanwhile, EMNLP05 is happening
Moore 2005, Taskar et al. 2005
Suddenly its okay to use some training data

52
Discriminative matching (Taskar et al. 05)
causes
?2 0.767 DIST 0.050 LCSR 0.833 HMM 0.0
60 -09 02 20
?
47.2
cause
Link Score
Features
Learned Weights
Max matching finds alignment that maximizes the
sum of link scores Entire alignment y can be
given feature vector ?(y) according to features
of links in y
53
Learning objective

Find weights w, such that for each example i
Can formulate as constrained optimization
problem, do max margin training
Problem Exponential number of wrong answers

Structured Distance
Features
Learned Weights
54
SVM Struct (Tsochantaridis et al. 2004)
w
Constrained optimization
Search for most violated

Empty constraints
Accumulated constraints
Theory of constraint generation in constrained
optimization guarantees convergence
55
Similarities to Averaged Perceptron

Online method driven by comparisons of current
output to correct answer
But
Allows a notion of structural distance
Returns a max margin solution (with slacks) at
each step
Remembers all of its past mistakes

56
SVM-ITG

Can learn ITG parameters discriminatively
Link productions A?e/f are scored as in
discriminative matching
Non-terminal productions A?AA ltAAgt are scored
with two features
Is it inverted?
Does it cover a span that would usually be
illegal?

causes
?2 0.767 DIST 0.050 LCSR 0.833 HMM 0.0
60 -09 02 20
47.2
A?causes / cause
cause
57
Experimental Setup

Identical to Taskar et al.
100 training
37 development
347 test
Same unsupervised text as before to derive
features
50k Hansards data

58
Results
Bipartite matching SVM (Permutation) SVM
weights with hard constraint (D-ITG)
59
Results
Bipartite matching SVM SVM weights with
hard constraint ITG SVM with soft cohesion
feature
60
Contributions

Algorithmic
Discriminative learning method for ITGs
Experimental
Value of hard constraints is reduced in the
presence of a strong link score
Integrating constraint as a feature during
training can recover value of constraints,
improve AER recall

61
Other Projects

Applying techniques from SMT to new domains
Unsupervised pronoun resolution
Discriminative Structured Learning
Discriminative parsing

62
Unsupervised Pronoun ResolutionCherry and
Bergsma, CoNLL05

The president entered the arena with his family.
Input
A pronoun in context, and a list of candidates
his family, arena, president
Output The correct candidate - president
Big Idea
Formulate a generative model, where a candidate
generates the pronoun and context, run EM
Similar to IBM-1 Align pronouns to candidates

63
Pronoun Resolution Innovations

Used linguistics to limit candidate list
Binding theory, known noun genders
Used unambiguous cases to initialize EM
Re-weighted component models discriminatively
with maximum entropy
End result
Within 5 of a supervised system, with
re-weighted model matching supervised performance

64
Discriminative ParsingWang, Cherry, Lizotte and
Schuurmans, CoNLL06

Input Segmented Chinese string
Output Dependency parse tree
Big Idea
Score each link independently, with SVM weighting
features on links (MacDonald 2005), but
generalize without Part of Speech tags
Learn a weight for every word-pair seen in
training

65
Parsing Innovations

To promote generalization
Altered large margin portion of SVM objective
so semantically similar word pairs have similar
weights
Tried two constraint types
Local Link scores constrained so links present
in gold standard score higher than those absent
Global SVM Struct-style constraint generation

66
Others in brief

Dependency treelet decoder (here)
Sequence tagging
Biomedical Term recognition
Highlight gene names, proteins in medical texts
Character-based Syllabification
Find syllable breaks in written words

67
Outline
?

Bitext and Bitext Parsing
Inversion Transduction Grammar (ITG)
ITG with Linguistic Constraints
Discriminative ITG with Linguistic Features
Other Projects

?
?
?
?
68
(No Transcript)
69
Connecting E and F

One language generates the other
IBM models (Brown et al. 1993), HMM (Vogel et al.
1996), Tree-to-string model (Yamada and Knight
2001)
Both languages generated simultaneously
Joint model (Melamed 2000), Phrasal joint model
(Marcu and Wong 2002)
S and T generate an alignment
Conditional model (Cherry and Lin 2003),
Discriminative models (Taskar et al. 2005, Moore
2005)

70
Phrases agree, not trees
he
ran
here
quickly
Dependencies state that ran is modified here and
quickly separately We allow ITG to state that ran
is modified by here quickly Also tested these
additional head constraints
71
Effect on Parser
A x
unrest
causes
tax
A ?
the
l
impôt
cause
le
malaise
72
Custom Grammar Solution

What trees force the and tax to stay together?
Custom recursive grammar
Same alignment space, canonical tree

tax causes unrest
the
tax
causes
unrest
ITG
the tax
ITG
73
Guidance Results
74
Expressiveness Results
75
Expressiveness Analysis

HD-ITG has systematic violations
Discontinuous Constituents (Melamed, 2003)
Maintains distance to head - not always
maintained in translation

Canadian
Wheat
Board
Canadian
Wheat
Board
Commission
Canadienne
du
blé
76
Discriminative Alignment

Alignment can be viewed as multi-class
classification

Wrong Answers
Input
the
tax
causes
unrest
the
tax
causes
unrest
l
impôt
cause
le
malaise
l
impôt
cause
le
malaise
the
tax
causes
unrest
Correct Answer
l
impôt
cause
le
malaise
the
tax
causes
unrest

the
tax
causes
unrest
l
impôt
cause
le
malaise
l
impôt
cause
le
malaise
77
Problem

Exponential number of incorrect alignments
One solution
Take advantage of properties of matching
algorithm
Factor constraints
Doing the same factorization on ITG could be a
lot of work - need something more modular
Averaged perceptron?
Structured SVM

78
Final Challenge