Generation in the Context of MT

About This Presentation

Title:

Generation in the Context of MT

Description:

Generation in the Context of MT Final Report The Team Senior members & affiliate members Jan Haji , Charles Univ., Prague Drago Radev, Univ. of Michigan Gerald Penn ... – PowerPoint PPT presentation

Number of Views:181

Avg rating:3.0/5.0

Slides: 133

Provided by: janh5

Learn more at: https://www.cs.jhu.edu

Category:

more less

Transcript and Presenter's Notes

Title: Generation in the Context of MT

1
Generation in the Context of MT

Final Report

2
The Team

Senior members affiliate members
Jan Hajic, Charles Univ., Prague Drago Radev,
Univ. of Michigan
Gerald Penn, Univ. of Toronto Jason Eisner, Johns
Hopkins Univ.
Owen Rambow, Univ. of Pennsylvania
Dan Gildea, Univ. of Pennsylvania Bonnie Dorr,
Univ. of Maryland
Students
Yuan Ding, Univ. of Pennsylvania Martin Cmejrek,
Charles Univ., Prague
Terry Koo, MIT Kristen Parton, Stanford Univ.
Jan Curín, Charles University Ivona Kucerová,
Charles University
Pre-workshop work (Charles University)
Zdenek Žabokrtský Petr Pajas
Václav Honetschläger Alena Böhmová
Vladislav Kubon Jirí Havelka

3
The Goal

Generate English (linear surface form)
from syntactic-semantic sentence representation
(so-called tectogrammatical, or TR)
Possible application setting
machine translation
other uses
Front-end for QA systems, summarization
Evaluate under various circumstances

4
Tectogrammatical Representation
According to his opinion UALs executives were
misinformed about the financing of the original
transaction
5
Tectogrammatical Representation
According to he opinion UALs executive were
misinform about the financing of the original
transaction
6
TR in Machine Translation
Vedení UAL bylo podle jeho názoru o financování
puvodní transakce nesprávne informováno.
NULL
7
The MT Framework
Source language textCZECH
8
The MT Framework
AR trees
CZECH
ENGLISH
9
Translating trees
a
A
c
b
B
CD
d
E
e
f
F
10
Tools and Data Resources

Tools
WS98 Czech parser other Czech tools (tagger)
GIZA (WS99) ISI decoder
Data
PTB (40k sentences)
PTB translation to Czech (11k sentences)
Prague Dependency Treebank 1.0 (90k sentences)
Prague Dependency Treebank 2.0 preliminary
15k sentences manually annotated
Monolingual data

11
The Evaluation Metric BLEU

Plain English output (MT, Generation)
difficult and/or expensive to evaluate
subjectively
BLEU (IBM)
automatic method, score 0..1
relative scores ? subjective human evaluation
needs several reference gold standards
n-gram-based metric w/small-length penalty
Different local evaluations throughout, too

12
Presentation Outline

The Systems and Their Inputs
Getting the data tools ready
The Statistical Generation System
The channel model
Word order, Punctuation, Morphology
The Hybrid Approach
Evaluation Results
Student Project Proposals
Conclusions and Future Directions

13
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
14
The Systems and Their Inputs

Martin Cmejrek

15
WS02GMT

System 1 statistical
System 2 hybrid
Output English linear surface form
Input 1 automatically created English TR
Input 2 manually created English TR
Input 3 improved automatic English TR (PropBank)
Input 4 Czenglish TR (simple translation)

16
Input 1 Automatic English TR

Penn Treebank v. 3
heads (Jason Eisners code modifications)
lemmatization
word IDs
rule-based transformation to English AR, TR
(by Kucerová Žabokrtský)
? English TR (I1), size 40k sentences

17
Input 2 Manual English TR

Penn Treebank v. 3
Input 1
manual annotation (correction) (IK)
including
deep word order, conversion of grammatical
codes
? English TR (I2), size 1.5k sentences

18
Input 3 Enhanced Automatic English TR

Penn Treebank v. 3
Input 1
PropBank
additional sources
? English TR (I3) size 40k sentences

19
Input 4 Automatic Czenglish TR

Linear Surface Czech
Czech tagging lemmatization
Parsed to Czech AR, Czech TR
Simple Transfer (Lemma translation)
- lexical replacement
dictionary collected from web, MRDs
trained on TR lemmas by GIZA
? Czenglish TR (I4) 11k sentences

20
Dictionary Filtering
Frequencies on English Monolingual Corpus (North
American News Text) 365 M words
4 Czech/English Dictionary Sources (WinGED,
GNU/FDL, PCTrans, EuroWordNet)
Merging, Pruning
Czech POS
English POS
Czech/English parallel Penn TreeBank Corpus
GIZA Training
Czech/English Dictionary for Transfer
Input Data Source Output Data Tools
21
Word-by-word translation of TR lemmas

Word by word dictionary 42 835 entries, 65408
translations
format
ltegtteckalttgtN
lttrgtspotlttrtgtNltprobgt0.353598
lttrgtdotlttrtgtNltprobgt0.28792
lttrgtfull _at_stoplttrtgtNltprobgt0.28729
1-1, 1-2 (2-1 translations not yet implemented)
packed forest representation for multiple
translation choice
simplified version choose the first best

22
Where are we?
w/additional info
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
23
Automatically Annotating a Tectogrammatical Corpus

Owen Rambow

24
Goal

Use PropBank annotations to
Improve automatic construction of English TRs
Allow generation from generic pred-arg
structures

25
Types of Corpus Annotation

Surface Syntax
Deep Syntax
Local Lexical Semantics
Global Lexical Semantics
Hybrid Deep Syntactic/Global Semantic
Tectogrammatical level used here

26
Surface SyntaxE.g., Penn Treebank
loaded
prepobj
prepobj
subj
hay
into
by
is
comp
comp
John
trucks
Hay is loaded into trucks by John
27
Deep SyntaxE.g., TAG
John loads hay into trucks
Hay is loaded into trucks by John
28
Local SemanticsPenn PropBank (brand new)
John loads hay into trucks
John loads trucks with hay
29
Global SemanticsLCS (U. Md.)
John loads hay into trucks
John throws hay into trucks
30
Tectogrammatical Representation

First two syntactic arguments of verb
deep-syntactic
All other arguments global semantic

load
load
throw
dir3
dir3
pat
act
act
act
pat
acmp
pat
John
hay
truck
John
hay
truck
John
hay
truck
John loads hay into trucks
John throws hay into trucks
John loads trucks with hay
31
Why Use TR? Research Hypothesis

Replacing function words by TR arc labels makes
transfer easier
Choice of realization target language-dependent
Deep-syntactic labels for first two arguments
realization more verb-specific
Global semantic labels on remaining arguments
realization just label-specific

32
Available Resources for Input 3

Surface syntax PTB corpus (hand, checked)
Deep syntax derived automatically from PTB
(Chen01)
Local semantics PropBank corpus and frame
lexicon (hand, checked)
Global semantics LCS lexicon (partially hand,
partially checked)
TR PTB subset corpus (hand), PropBank ? TR
dictionary (hand, not checked) (I. Kucerová)

33
Experiment Machine Learning of TR Labels Using
Ripper

Ripper (Cohen 1996) greedy symbolic rule
learner, set- and bag-valued features
Features
Surface, deep syntactic info
Local, global semantic info
Kucerovás PropBank ? TR dictionary
(hand-crafted)
Input 1 (Automatic English TR)

34
Results (TR Label Error Rates)
Semantics
PB? TR dict
all
local-global
local
none
37.7
22.6
23.7
25.9
58.8
none
17.1
15.9
16.3
17.7
19.5
Input 1
16.2
16.7
17.1
16.4
16.5
surface-deep
Syntax
14.4
16.1
16.2
15.9
15.5
surface-deep-Inp1
Average accuracy on 5-fold cross-validation (1326
data points)
35
Conclusions

Machine learning can improve on hand-written
conversion rules ( Input 1)
PropBank is useful
Best results
All syntactic features PropBank ? TR dictionary
Future work use PropBank ? LCS dictionary
(developed during workshop)

36
Where are we?
Transfer
Deep syntax (Czech)
CZECH
ENGLISH
37
The MAGENTA System

Statistically based
The pipeline
TR to AR by a channel model
Word order by reordering on dep. trees
Punctuation insertion
Morphology

38
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
CZECH
ENGLISH
39
The Tree-to-Tree Transductions
a
A

Jason Eisner .

CD
c
b
B
d
E
prep
prep
e
f
F
det
det
40
Translating trees
a
A
c
b
B
CD
learn this 21 mapping(or in dictionary)
d
E
Also 12, 20, etc., rearrangements ...
e
f
F
01 mapping
41
Translating trees
a
A
c
b
B
CD
d
E
e
f
F
42
Statistical Need a model of tree pairs
Mainly interested in (TR,AR) pairs But our
techniques are quite general E.g., example below
is not a (TR,AR) pair
the girl kissed her kitty cat
the girl gave a kiss to her cat
43
Training Our team has many tree pairs
Should be nicer to model than string pairs - why
we built them! What Czech trees went with what
English trees in training? ... Learn parameters
? of a joint model P?(T1,T2).
the girl kissed her kitty cat
Pred,kissed
Pred
Obj,cat
Obj
Subj,girl
Obj
Subj
Det
Det,the
kitty
Det
Det,her
44
Decoding Complete a tree pair
Training given T1 and T2 find ? to maximize
P?(T1,T2) Decoding given T1 and ? find T2 to
maximize P?(T1,T2) Horrible sparse data problem -
cant just do tree lookup.
the girl kissed her kitty cat
??
45
How should a model of tree pairs look?
Joint model P?(T1,T2).
Wise to use noisy-channel form P?(T1 T2)
P?(T2)
But any joint model will do.
46
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).
the girl kissed her kitty cat
the girl gave a kiss to her cat
47
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).
the girl kissed her kitty cat
the girl gave a kiss to her cat
different, bad alignment!
48
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).

So model must consider alignment P? (T1,T2,A)
Why A is complicated
The correspondence isnt 1 to 1

Also need to model word order (indeed topology)

49
Solution Use the right grammar formalism
Grammars can assemble words or phrases into
trees. Lets work up to the right formalism.

Model must consider alignment P? (T1,T2,A)
Why A is complicated
The correspondence isnt 1 to 1

Also need to model word order (indeed topology)

kiss ? gave a kiss
cat ? kitty cat
? ? to

the girl kissed her kitty cat
the girl gave a kiss to her cat
50
Context-Free Grammar
the girl kissed her cat
S
etc.
51
Augment CFG nonterminalswith headwords
the girl kissed her cat
S
etc.
52
Augment CFG nonterminalswith headwords
the girl kissed her cat
S
look at all the rules headed by kissed ...
etc.
53
Lexicalized Tree Substitution Grammar
the girl kissed her cat
S
S,kissed
look at all the rules headed by kissed ...
a natural chunk
VP,kissed
VP,kissed
NP
NP
V, kissed
open role waiting to be filled
kissed
can fill open roles higher up
etc.
54
Lexicalized Tree Substitution Grammar
the girl kissed her cat
55
Lexicalized Tree Substitution Grammar
S
S
NP
VP
NP
Det
N
Det
NP
V
NP
the
girl
Det
N
Det
kissed
cat
her
56
Dependency-Style Lexicalized Tree Substitution
Grammar
Simplify structure Eliminate extra internal
nodes Just one node per word (dependency
style) Yields the kind of AR and TR trees we
actually have
57
Dependency-Style Lexicalized Tree Substitution
Grammar
the girl kissed her kitty cat
58
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
59
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
60
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
Det,a
61
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
Det,a
62
P(T1, T2, A) ? p(t1,t2,a n)
So any aligned BIG TREE PAIR is built from a set
of aligned LITTLE TREE PAIRS
Det,a
63
P(T1, T2, A) ? p(t1,t2,a n) How This
Simplifies Things

Alignment find A to max P?(T1,T2,A)
Decoding find T2, A to max P?(T1,T2,A)
Training find ? to max ?A P?(T1,T2,A)
Do everything on little trees instead!
Only need to train decode a model of
p?(t1,t2,a)
But not sure how to break up big tree correctly
So try all possible little trees all ways
of combining them, by dynamic prog.

64
System Architecture
Probability Model p?(t1,t2,a) of Little Trees
score little trees find p(...)
propose little translations t2 make p(...) big
update parameters ?raise p(...)
Decoder
Trainer
alignmentsbetween a big tree
T1 a forest of big trees T2
scores all
scores all alignmentsof two big trees T1,T2
dynamic programming engine
65
System Architecture
Probability Model p?(t1,t2,a) of Little Trees
score little trees
propose little translations t2
update parameters ?
Decoder
Trainer
dynamic programming engine
output
66
Related Work

Synchronous grammars (Shieber Schabes 1990)
Statistical work has allowed only 11 (isomorphic
trees)
Stochastic inversion transduction grammars (Wu
1995)
Head transducer grammars (Alshawi et al. 2000)
Statistical tree translation
Noisy channel model (Yamada Knight 2000)
Infers tree trains on (string, tree) pair, not
(tree, tree) pair
But again, allows only 11, plus 10 at leaves
Statistical tree generation - find most prob.
expressing meaning
Dynamic prog. search in packed forest (Langkilde
2000)
Stack decoder (Ratnaparkhi 2000)

67
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
68
The Little Trees

Jan Hajic

)
p?(
69
)
p?(

Data still sparse, but better than for big trees
No alignment needed - already hypothesized for us

70
Form of the model for 11 (ARTR)

Base form
p(cat,PL,PAT,cat,NNS,Obj,alignment)
High-level Backoff
p(cat,cat) p(PL,NNS) p(PAT,Obj)
p(alignment)
Low-level Backoff
p(align) (1/LTF) , where
(L size of ltTlemma,Alemmagt, etc.)

71
Non-11 Correspondences

Joint model
01
p(to,TO,AuxY,alignment)k01
10
p(GenNULL,ACT,align)k10
12
p(home,SG,LOC,in,IN,AuxP,home,NNS,Adv,alignment)k1
2
etc. corresponding backoff scheme

72
Smoothing issues

Other backoff schemes?
Too many to do all
Graphical models?
Derive from (manual) alignments
esp. for types of alignment the model cannot
handle (14, for example)

73
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
74
The Proposer

Yuan Ding

75
Map TR to AR
76
Proposer for Decoder

Collecting Feature Patterns on TR
Construct AR using observed possible TR-AR
transform
For unobserved TR, using naïve mapping onto AR

77
Proposer Observes during Training
78
Proposer During Decoding
79
Example
State
80
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
ENGLISH
81
The Classifier(s)

Terry Koo

82
Tree Transduction Model
Tree Transduction Models Decoder
Proposer

Global information in labels suppress proposals

83
Preposition Insertion Labeler

C5.0 decision tree classifier
Labels nothing, insert_of,

84
Preposition Insertion Labeler

Trained on Input 1 (Automatic English TR)

85
Preposition Insertion Labeler

Some TR nodes should be ignored
fly to Baltimore and from Boston

86
Boosting Insertion Recall

Overgenerating better than undergenerating
Using C5.0s misclassification costs to
discourage nothing
Training on preposition-only data

87
Boosting Insertion Recall

N Best Labels
Confidence Threshold
N Average of Labels
Aggressive Confidence Threshold
N Average of Labels

88
Insertion Recall vs N
N 5, R 84.35
Aggressive Confidence Threshold
N 3, R 80.26
N 4, R 80.59
N 3, R 76.39
Confidence Threshold
N Best
89
What should be done next?

Clustering TR Lemmas into a tractable number of
classes
Ripper instead of C5.0

90
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
91
Word Order

Dan Gildea

92
Word order

Tree-based models
Analytical level surface dependency, tree-based
Collins model
Uses function information (Sb, Obj, Atr, ...),
POS, lemmas
94 of nodes have correct ordering of children
(chance 68)
No punctuation (inserted later)
Input order completely irrelevant

93
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
94
Punctuation-- Morphology

Kristen Parton

95
Punctuation Insertion Motivation

Important for sentence meaning, understanding
BLEU - n-gram statistics
commas are most frequent lemma in WSJ
Focusing on commas (95 of intra-sentence
punctuation)
Difficulties
English comma usage very flexible
varies with style, meaning of sentence
quotes not marked in TR trees

96
Why insert commas separately?

Commas depend not only on underlying
syntax/semantics but also on the surface
realization of the sentence.
Soon, she will realize her mistake.
? Soon she will realize her mistake.
She will soon realize her mistake.
?? She will, soon, realize her mistake.
She will soon, realize her mistake.
She will, soon realize her mistake.
Channel model deals with unordered trees
Easier to do comma insertion after surface
ordering

97
Commas in AR Trees

TR tree - autosemantic words - commas deleted
AR tree - commas are AuxX or Apos (apposition
governors)
Input Data ordered, unpunctuated AR tree, with
AR and TR functors, POS
Task insert AuxX nodes into AR tree, and link
them in correct surface order.

98
Another Example
99
Comma Insertion Model

C5.0 decisions tree classifier
Trained on English AR trees with TR functors
(sect. 0-19 WSJ) with punctuation stripped
Node Labels NO-ACTION, INSERT-RIGHT
Feature vectors
Local features (AFun, TFun, POS)
For node, left/right brother, parent, grandparent
Global features (Zhang 02) (position in sentence,
)

100
Decision Tree Model Results
Preliminary results - still based on hand parsed
WSJ

Evaluation metric is sentence accuracy
What is (human) upper bound?
Systems are hard to compare models and data sets
very different

101
Results for Generation

Comma insertion improves BLEU score
Possible improvements
Adding n-gram information to insertion model
Trying with other punctuation marks

102
Surface Morphology

Morphology dictionary - 365 M words (Curín)
morpha (morph analyzer) - lemmatize words, keep
counts
Word -gt POS surface_form lemma
frequency
NN want-you-babe want-you-babe
1
VBD wanted want
45595
VBD wanting want
1
VBG wanting want
3708
Task Lemma POS -gt surface form reverse
lookup
Clashes resolved by frequency

103
Morphology Dictionary

Initial tests half of the errors were from the
ambiguity in the verb "to be" between singular
and plural.
Be VBP -gt (I) am or
(we/they) are ??
Be VBD -gt (I/he/she/it) was or
(we/you/they) were ??
Introduced an entry for "be2" to correspond to
plural subject.
Test use the full dictionary (plus "be2"
entries) - 902,220 entries total - generate
surface forms for lemmatized version of the WSJ
sections 0-21.

Be2 VBP -gt are Be2 VBD -gt were
Be VBP -gt am Be VBD -gt was
104
Surface Morphology Results

OOV rate 1.69
For many, surface form lemma, so correct by
default
English morphology not complex most OOV are
proper nouns
Non-OOV words 99.74 accuracy
86 of mistakes were contractions 'm am, 've
have, etc. (Actually correct.) Ignoring these
99.96 correct rate
Overall, ignoring contractions error rate of
0.03
High accuracy rate, fast runtime, good coverage -
unnecessary to improve more

105
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
106
Improving Czech Parsing (AR-TR)

Gerald Penn

107
Improving Czech TR Parsing

Pre-workshop state
Czech Deep Syntax mapping AR to TR (Boehmova,
Honetschlaeger, Zabokrtsky)
Two parts of the system
rule-based
19 transformations by order-dependent perl code
statistical
C4.5-based labeling of TR functions 84 accuracy

108
Czech Deep Syntax mapping AR to TR

New statistical system
tree transduction model has same form as for
generation
little-tree model reversed for parsing (AR to
TR mapping)
initial EM pass uses simple model based on PDT
(manual) node ID alignment
(reversed) proposer not finished yet

109
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
110
The Hybrid Approach to Generation The ARGENT
System

Dragomir Radev

111
Example
112
Alan Spoon, recently named Newsweek president,
said Newsweek s ad rates would increase 5 pct in
January.
113
NLG architectures

Statistical approaches
MAGENTA Hajic et al. 02
Rule-based approaches
FUF/Surge Elhadad 93, Elhadad and Robin 98
KPML Bateman 97
Hybrid approaches
NitroGen Knight and Hatzivassiloglou 95
HaloGen Langkilde and Knight 00
Fergus Bangalore and Rambow 00
ARGENT

114
FUF/Surge

What FUF can do (given sufficient control
information)
Maps FUF-style thematic structure onto syntactic
roles
Performs syntactic paraphrasing and alternations
(e.g., dative move, passive)
Provides defaults for syntactic features (e.g.,
present tense, third person)
Propagates agreement features
Selects closed class words
Inflects words
Provides linear precedence constraints among
syntactic constituents
What FUF cannot do
convert dependency to phrase-structure
provide control for syntactic paraphrasing
provide control for lexical features
(conditionals, past tense, )
choose determiners
provide a robust grammar

115
(setq r '((process ((lex "say")
(tense past) (object-clause
that))) (circum ((time ((cat pp)
(prep ((lex "in")))
(np ((lex "January")
(determiner none)))))))
(partic ((affected ((cat clause)
(process ((lex "increase")
(tense past)))
(partic ((created ((cat
measure)
(quantity ((value 5)))
(unit ((lex
"pct")))))
(agent ((cat np)
(head ((lex "rate")
(number
plural)
(determiner none)))
(classifier ((lex
"ad")
(determiner none)))
(possessor ((lex
"Newsweek")
(determiner none)))))))))
(agent ((complex apposition)
(punctuation ((after
","))) (distinct
(((lex "Spoon")
(classifier ((lex "Alan")))
(determiner none))
((lex "name")
(classifier
((lex "president")))
(determiner none))))))))))
116
(No Transcript)
117
Grammar development

translating TG ? FUF (deterministic channel)
write high coverage rules first
problem no aligned training data
four types of rules Langkilde-Geary 02 -
recasting, ordering, filling, morphing
Three modules
Top-level
Recursion
Bottom-level

118
Evaluation

Robustness
ARGENT 245/248 sentences 98.7
HaloGen 80
Speed
ARGENT 1.4-2.9 sec/sentence
HaloGen 28.9-55.5 sec/sentence
BLEU score -- later

119
Future work

Complete grammar
improve coverage
use other grammatemes
degree of comparison (comparative) , sentmod
(interrogative), verbmod (imperative)
Better error recovery
inconsistent PTB markup, TR transformation,
translation
Grammar induction
N-gram based insertion of missing words
Integrate with MAGENTA

120
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
ENGLISH
121
The Implemented Systems Creating Data for
Generation