Inversion Transduction Grammar with Linguistic Constraints - PowerPoint PPT Presentation

1 / 83
About This Presentation
Title:

Inversion Transduction Grammar with Linguistic Constraints

Description:

Statistical Machine Translation. Input: Source language sentence E. Goal: ... Valuable resource for training and testing statistical machine translation systems ... – PowerPoint PPT presentation

Number of Views:212
Avg rating:3.0/5.0
Slides: 84
Provided by: ColinC98
Category:

less

Transcript and Presenter's Notes

Title: Inversion Transduction Grammar with Linguistic Constraints


1
Inversion Transduction Grammar with Linguistic
Constraints
  • Colin Cherry
  • University of Alberta

2
Edmonton Weather (Tuesday)
3
Outline
  • Bitext and Bitext Parsing
  • Inversion Transduction Grammar (ITG)
  • ITG with Linguistic Constraints
  • Discriminative ITG with Linguistic Features
  • Other Projects

4
Statistical Machine Translation
  • Input
  • Source language sentence E
  • Goal
  • Produce a well-formed target language sentence F
    with same meaning as E
  • Process
  • Decoding search for an operation sequence O that
    transforms E into F
  • Weights on individual operations are determined
    empirically from examples of translation

5
Bitext
Text in English
Same text, in French
  • Valuable resource for training and testing
    statistical machine translation systems
  • Large-scale examples of translation
  • Needs analysis to determine small-scale
    operations that generalize to unseen sentences

6
Word Alignment
  • Given a sentence and its translation, find the
    word-to-word connections

the
minister
in
charge
of
the
Canadian
Wheat
Board
le
ministre
chargé
de
la
Commission
Canadienne
du
blé
7
Word Alignment
  • Given a sentence and its translation, find the
    word-to-word connections
  • Link a single word-to-word connection

the
minister
in
charge
of
the
Canadian
Wheat
Board
le
ministre
chargé
de
la
Commission
Canadienne
du
blé
8
Given a Word Alignment
  • Extract bilingual phrase pairs for phrasal SMT
    (Koehn et al. 2003)
  • Add in a parse tree and
  • Extract treelet pairs for dependency translation
    (Quirk et al. 2005)
  • Extract rules for a tree transducer (Galley et
    al. 2004)
  • Other fun things
  • Train monolingual paraphrasers (Quirk et al.
    2004, Callison-Burch et al. 2005)

9
Bitext Parsing
  • Assume a context-free grammar generates two
    languages at once
  • Like joint models, but position of words in both
    languages is controlled by grammar

10
Monolingual Parsing
Non-terminals
S
NP
VP
Production NP?Adj N
V
NP
Adv
V
Det
NP
Terminals
Adj
N
always
verbs
the
adjective
noun
he
11
Another view
S
S ?NP VP
VP
NP
VP ?V NP
V
NP
NP

always verbs
he
the adjective noun
12
Bitext Parsing is in 2D
S
English
French
13
Bitext Parsing is in 2D
VP
English
NP
French
14
Bitext Parsing is in 2D
NP
V
English
NP
French
15
Bitext Parsing is in 2D
NP
V
English
Adv
NP
French
16
Bitext Parsing is in 2D
NP
Det
V
English
Adv
NP
French
17
Bitext Parsing is in 2D
N
Adj
Det
V
English
Adv
NP
French
18
Bitext Parsing is in 2D
N
noun
Adj
adjective
Det
the
V
verbs
Adv
always
NP
he
il
verbe
toujours
le
nom
adjectif
19
Why Bitext Parsing?
  • Established polynomial algorithms
  • Flexible framework, easy to add info
  • Parse given an alignment
  • Align given a parse (this work)
  • Discoveries can be ported to parser-based
    decoders (Zens et al. 2004, Melamed 2004)
  • Advances in parsing can be ported to word
    alignment

20
Outline
?
  • Bitext and Bitext Parsing
  • Inversion Transduction Grammar (ITG)
  • ITG with Linguistic Constraints
  • Discriminative ITG with Linguistic Features
  • Other Projects

21
Inversion Transduction Grammar
  • Introduced in by Wu (1997)
  • Transduction
  • N ? noun / nom
  • Inversion
  • NP ? Det NP
  • NP ? ltAdj Ngt

N
noun
nom
Straight
Inverted
22
Binary Bracketing
  • A?AA
  • A?ltAAgt
  • A?e/f
  • No linguistic meaning to A

23
Tree visualization
24
Pros and Cons of Bracketing
  • Pros
  • Language independent
  • Straight-forward and fast
  • Symbols are minimally restrictive
  • Cons
  • Grammar is meaningless
  • ITG Constraint

25
ITG Constraint
12 are acceptable
to the commission
Mr Burton
fully or in part
12 are acceptable
to the commission
Mr Burton
fully or in part
26
Outline
?
  • Bitext and Bitext Parsing
  • Inversion Transduction Grammar (ITG)
  • ITG with Linguistic Constraints
  • Discriminative ITG with Linguistic Features
  • Other Projects

?
27
Some questions
  • Those ITG constraints are kind of scary. How bad
    are they? Do they ever help?
  • Can we inject some linguistics into this
    otherwise purely syntactic process?
  • Linguistic grammar would limit trees that can be
    built - and therefore limit alignments

28
Alignment Spaces
  • Set of feasible alignments for a sentence pair
  • Described by how links interact
  • If links dont interact, problem loses its
    structure
  • Should encourage competition between links
    (Guidance)
  • Should not eliminate correct alignments
    (Expressiveness)

29
ITG Space
  • Rules out inside-out alignments
  • Limits how concepts can be re-ordered during
    translation

30
Permutation Space
  • One-to-one each word in at most one link
  • Allows any permutation of concepts
  • Reduces to weighted maximum matching if each link
    can be scored independently

the
tax
causes
unrest
l
impôt
cause
le
malaise
31
Linguistic source Dependencies
  • Tree structure defines dependencies between words
  • Subtrees define contiguous phrases

the
minister
in
charge
of
32
Linguistic source Dependencies
  • Tree structure defines dependencies between words
  • Subtrees define contiguous phrases

the
minister
in
charge
of
33
Phrasal Cohesion
  • Syntactic phrases in tree tend to stay together
    after translation (Fox 2002)
  • We can use this idea to constrain an alignment
    given an English dependency tree
  • Shown to improve alignment quality
  • (Lin and Cherry 2003)

34
Example
the
tax
causes
unrest
l
impôt
cause
le
malaise
35
Example
the
tax
causes
unrest
l
impôt
cause
le
malaise
We can rule out the link, even with no one-to-one
violation
36
ITG Dependency
  • Both limit movement with phrasal cohesion
  • ITG Cohesive in some binary tree
  • Dep Cohesive in provided dependency tree
  • Not subspaces of each other

the
big
red
dog
the
dog
ate
it
Dep ? ITG x
Dep x ITG ?
37
D-ITG Space
  • Force ITG to maintain phrasal cohesion with a
    provided dependency tree
  • Intersects ITG and Dependency spaces
  • Adds linguistic dependency tree to ITG parsing

38
Chart Modification Solution
  • Eliminate structures that allow tax to invert
    away from the

the
tax
causes
unrest
39
Effect on Parser
A ?
unrest
A x
causes
tax
A ?
the
l
impôt
cause
le
malaise
40
Effect on Parser
A ?
unrest
causes
A ?
tax
the
l
impôt
cause
le
malaise
41
Continuum of constraints
Permutation
ITG
D-ITG
Unconstrained
42
Experimental Setup
  • English-French Parliamentary debates
  • 500 sentence labeled test set
  • (Och and Ney, 2003)
  • Dependency parses from Minipar

43
Guidance Test
  • Does the space stop incorrect alignments?
  • Use a weighted link score built from
  • Bilingual correlations between words
  • Relative position of tokens
  • Maximize summed link scores in all spaces, check
    alignment error rate
  • AER Combined precision and recall, lower is
    better

44
Guidance Results
45
Expressiveness Test
  • Given a strong model, does the space hold us
    back?
  • Use a cooked link score from the gold standard
  • Only correct links are given positive scores
  • Best space is unconstrained space
  • Maximize summed link scores in all spaces, check
    recall

46
Expressiveness Results
47
Contributions
  • Algorithmic
  • Method to inject ITG with linguistic constraints
  • Experimental
  • ITG constraints provide guidance, with virtually
    no loss in expressiveness (French-English)
  • Dependency cohesion constraints provide greater
    guidance, at the cost of some expressiveness

48
Outline
?
  • Bitext and Bitext Parsing
  • Inversion Transduction Grammar (ITG)
  • ITG with Linguistic Constraints
  • Discriminative ITG with Linguistic Features
  • Other Projects

?
?
49
Remaining Problems
  • Dependency cohesion stops correct links
  • Parse errors, Paraphrase, Exceptions
  • Would like a soft constraint
  • Im not doing much learning
  • ?2 competitive linking with an ITG search

50
Soft Constraint
  • Invalid spans need not be disallowed
  • Instead parser could incur a penalty
  • Easy to incorporate penalty into DP

51
ITG Learning
  • Zhang and Gildea 2004, 2005, 2006
  • Expectation Maximization to parameterize a
    stochastic grammar unsupervised
  • Driven by expensive 2D inside-outside
  • Not doing much better than I am with ?2
  • Meanwhile, EMNLP05 is happening
  • Moore 2005, Taskar et al. 2005
  • Suddenly its okay to use some training data

52
Discriminative matching (Taskar et al. 05)
causes
?2 0.767 DIST 0.050 LCSR 0.833 HMM 0.0
60 -09 02 20
?
47.2
cause
Link Score
Features
Learned Weights
Max matching finds alignment that maximizes the
sum of link scores Entire alignment y can be
given feature vector ?(y) according to features
of links in y
53
Learning objective
  • Find weights w, such that for each example i
  • Can formulate as constrained optimization
    problem, do max margin training
  • Problem Exponential number of wrong answers

Structured Distance
Features
Learned Weights
54
SVM Struct (Tsochantaridis et al. 2004)
w
Constrained optimization
Search for most violated

Empty constraints
Accumulated constraints
Theory of constraint generation in constrained
optimization guarantees convergence
55
Similarities to Averaged Perceptron
  • Online method driven by comparisons of current
    output to correct answer
  • But
  • Allows a notion of structural distance
  • Returns a max margin solution (with slacks) at
    each step
  • Remembers all of its past mistakes

56
SVM-ITG
  • Can learn ITG parameters discriminatively
  • Link productions A?e/f are scored as in
    discriminative matching
  • Non-terminal productions A?AA ltAAgt are scored
    with two features
  • Is it inverted?
  • Does it cover a span that would usually be
    illegal?

causes
?2 0.767 DIST 0.050 LCSR 0.833 HMM 0.0
60 -09 02 20
47.2
A?causes / cause
cause
57
Experimental Setup
  • Identical to Taskar et al.
  • 100 training
  • 37 development
  • 347 test
  • Same unsupervised text as before to derive
    features
  • 50k Hansards data

58
Results
Bipartite matching SVM (Permutation) SVM
weights with hard constraint (D-ITG)
59
Results
Bipartite matching SVM SVM weights with
hard constraint ITG SVM with soft cohesion
feature
60
Contributions
  • Algorithmic
  • Discriminative learning method for ITGs
  • Experimental
  • Value of hard constraints is reduced in the
    presence of a strong link score
  • Integrating constraint as a feature during
    training can recover value of constraints,
    improve AER recall

61
Other Projects
  • Applying techniques from SMT to new domains
  • Unsupervised pronoun resolution
  • Discriminative Structured Learning
  • Discriminative parsing

62
Unsupervised Pronoun ResolutionCherry and
Bergsma, CoNLL05
  • The president entered the arena with his family.
  • Input
  • A pronoun in context, and a list of candidates
  • his family, arena, president
  • Output The correct candidate - president
  • Big Idea
  • Formulate a generative model, where a candidate
    generates the pronoun and context, run EM
  • Similar to IBM-1 Align pronouns to candidates

63
Pronoun Resolution Innovations
  • Used linguistics to limit candidate list
  • Binding theory, known noun genders
  • Used unambiguous cases to initialize EM
  • Re-weighted component models discriminatively
    with maximum entropy
  • End result
  • Within 5 of a supervised system, with
    re-weighted model matching supervised performance

64
Discriminative ParsingWang, Cherry, Lizotte and
Schuurmans, CoNLL06
  • Input Segmented Chinese string
  • Output Dependency parse tree
  • Big Idea
  • Score each link independently, with SVM weighting
    features on links (MacDonald 2005), but
    generalize without Part of Speech tags
  • Learn a weight for every word-pair seen in
    training

65
Parsing Innovations
  • To promote generalization
  • Altered large margin portion of SVM objective
    so semantically similar word pairs have similar
    weights
  • Tried two constraint types
  • Local Link scores constrained so links present
    in gold standard score higher than those absent
  • Global SVM Struct-style constraint generation

66
Others in brief
  • Dependency treelet decoder (here)
  • Sequence tagging
  • Biomedical Term recognition
  • Highlight gene names, proteins in medical texts
  • Character-based Syllabification
  • Find syllable breaks in written words

67
Outline
?
  • Bitext and Bitext Parsing
  • Inversion Transduction Grammar (ITG)
  • ITG with Linguistic Constraints
  • Discriminative ITG with Linguistic Features
  • Other Projects

?
?
?
?
68
(No Transcript)
69
Connecting E and F
  • One language generates the other
  • IBM models (Brown et al. 1993), HMM (Vogel et al.
    1996), Tree-to-string model (Yamada and Knight
    2001)
  • Both languages generated simultaneously
  • Joint model (Melamed 2000), Phrasal joint model
    (Marcu and Wong 2002)
  • S and T generate an alignment
  • Conditional model (Cherry and Lin 2003),
    Discriminative models (Taskar et al. 2005, Moore
    2005)

70
Phrases agree, not trees
he
ran
here
quickly
Dependencies state that ran is modified here and
quickly separately We allow ITG to state that ran
is modified by here quickly Also tested these
additional head constraints
71
Effect on Parser
A x
unrest
causes
tax
A ?
the
l
impôt
cause
le
malaise
72
Custom Grammar Solution
  • What trees force the and tax to stay together?
  • Custom recursive grammar
  • Same alignment space, canonical tree

tax causes unrest
the
tax
causes
unrest
ITG
the tax
ITG
73
Guidance Results
74
Expressiveness Results
75
Expressiveness Analysis
  • HD-ITG has systematic violations
  • Discontinuous Constituents (Melamed, 2003)
  • Maintains distance to head - not always
    maintained in translation

Canadian
Wheat
Board
Canadian
Wheat
Board
Commission
Canadienne
du
blé
76
Discriminative Alignment
  • Alignment can be viewed as multi-class
    classification

Wrong Answers
Input
the
tax
causes
unrest
the
tax
causes
unrest
l
impôt
cause
le
malaise
l
impôt
cause
le
malaise
the
tax
causes
unrest
Correct Answer
l
impôt
cause
le
malaise
the
tax
causes
unrest

the
tax
causes
unrest
l
impôt
cause
le
malaise
l
impôt
cause
le
malaise
77
Problem
  • Exponential number of incorrect alignments
  • One solution
  • Take advantage of properties of matching
    algorithm
  • Factor constraints
  • Doing the same factorization on ITG could be a
    lot of work - need something more modular
  • Averaged perceptron?
  • Structured SVM

78
Final Challenge
  • Need gold standard trees to train on, only have
    gold standard alignments
  • Versatility of ITG makes this easy
  • Search for best parse given an alignment
  • Select the parse with fewest cohesion violations
    and fewest inversions

79
Redundancy
  • Using A?AA ltAAgt e/f
  • Several parses produce the same alignment
  • Wu provides a canonical-form grammar
  • Creates only one parse per alignment
  • Useful for
  • Counting methods like EM
  • Detecting arbitrary bracketing decisions

80
Results Table
81
Guidance Results
82
Expressiveness Results
83
SVM Objective
Slack
Structured loss
Feature rep
Write a Comment
User Comments (0)
About PowerShow.com