Generation in the Context of MT - PowerPoint PPT Presentation

About This Presentation
Title:

Generation in the Context of MT

Description:

Generation in the Context of MT Final Report The Team Senior members & affiliate members Jan Haji , Charles Univ., Prague Drago Radev, Univ. of Michigan Gerald Penn ... – PowerPoint PPT presentation

Number of Views:179
Avg rating:3.0/5.0
Slides: 133
Provided by: janh5
Learn more at: https://www.cs.jhu.edu
Category:

less

Transcript and Presenter's Notes

Title: Generation in the Context of MT


1
Generation in the Context of MT
  • Final Report

2
The Team
  • Senior members affiliate members
  • Jan Hajic, Charles Univ., Prague Drago Radev,
    Univ. of Michigan
  • Gerald Penn, Univ. of Toronto Jason Eisner, Johns
    Hopkins Univ.
  • Owen Rambow, Univ. of Pennsylvania
  • Dan Gildea, Univ. of Pennsylvania Bonnie Dorr,
    Univ. of Maryland
  • Students
  • Yuan Ding, Univ. of Pennsylvania Martin Cmejrek,
    Charles Univ., Prague
  • Terry Koo, MIT Kristen Parton, Stanford Univ.
  • Jan Curín, Charles University Ivona Kucerová,
    Charles University
  • Pre-workshop work (Charles University)
  • Zdenek Žabokrtský Petr Pajas
  • Václav Honetschläger Alena Böhmová
  • Vladislav Kubon Jirí Havelka

3
The Goal
  • Generate English (linear surface form)
  • from syntactic-semantic sentence representation
    (so-called tectogrammatical, or TR)
  • Possible application setting
  • machine translation
  • other uses
  • Front-end for QA systems, summarization
  • Evaluate under various circumstances

4
Tectogrammatical Representation
According to his opinion UALs executives were
misinformed about the financing of the original
transaction
5
Tectogrammatical Representation
According to he opinion UALs executive were
misinform about the financing of the original
transaction
6
TR in Machine Translation
Vedení UAL bylo podle jeho názoru o financování
puvodní transakce nesprávne informováno.
NULL
7
The MT Framework
Source language textCZECH
8
The MT Framework
AR trees
CZECH
ENGLISH
9
Translating trees
a
A
c
b
B
CD
d
E
e
f
F
10
Tools and Data Resources
  • Tools
  • WS98 Czech parser other Czech tools (tagger)
  • GIZA (WS99) ISI decoder
  • Data
  • PTB (40k sentences)
  • PTB translation to Czech (11k sentences)
  • Prague Dependency Treebank 1.0 (90k sentences)
  • Prague Dependency Treebank 2.0 preliminary
  • 15k sentences manually annotated
  • Monolingual data

11
The Evaluation Metric BLEU
  • Plain English output (MT, Generation)
  • difficult and/or expensive to evaluate
    subjectively
  • BLEU (IBM)
  • automatic method, score 0..1
  • relative scores ? subjective human evaluation
  • needs several reference gold standards
  • n-gram-based metric w/small-length penalty
  • Different local evaluations throughout, too

12
Presentation Outline
  • The Systems and Their Inputs
  • Getting the data tools ready
  • The Statistical Generation System
  • The channel model
  • Word order, Punctuation, Morphology
  • The Hybrid Approach
  • Evaluation Results
  • Student Project Proposals
  • Conclusions and Future Directions

13
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
14
The Systems and Their Inputs
  • Martin Cmejrek

15
WS02GMT
  • System 1 statistical
  • System 2 hybrid
  • Output English linear surface form
  • Input 1 automatically created English TR
  • Input 2 manually created English TR
  • Input 3 improved automatic English TR (PropBank)
  • Input 4 Czenglish TR (simple translation)

16
Input 1 Automatic English TR
  • Penn Treebank v. 3
  • heads (Jason Eisners code modifications)
  • lemmatization
  • word IDs
  • rule-based transformation to English AR, TR
  • (by Kucerová Žabokrtský)
  • ? English TR (I1), size 40k sentences

17
Input 2 Manual English TR
  • Penn Treebank v. 3
  • Input 1
  • manual annotation (correction) (IK)
  • including
  • deep word order, conversion of grammatical
    codes
  • ? English TR (I2), size 1.5k sentences

18
Input 3 Enhanced Automatic English TR
  • Penn Treebank v. 3
  • Input 1
  • PropBank
  • additional sources
  • ? English TR (I3) size 40k sentences

19
Input 4 Automatic Czenglish TR
  • Linear Surface Czech
  • Czech tagging lemmatization
  • Parsed to Czech AR, Czech TR
  • Simple Transfer (Lemma translation)
  • - lexical replacement
  • dictionary collected from web, MRDs
  • trained on TR lemmas by GIZA
  • ? Czenglish TR (I4) 11k sentences

20
Dictionary Filtering
Frequencies on English Monolingual Corpus (North
American News Text) 365 M words
4 Czech/English Dictionary Sources (WinGED,
GNU/FDL, PCTrans, EuroWordNet)
Merging, Pruning
Czech POS
English POS
Czech/English parallel Penn TreeBank Corpus
GIZA Training
Czech/English Dictionary for Transfer
Input Data Source Output Data Tools
21
Word-by-word translation of TR lemmas
  • Word by word dictionary 42 835 entries, 65408
    translations
  • format
  • ltegtteckalttgtN
  • lttrgtspotlttrtgtNltprobgt0.353598
  • lttrgtdotlttrtgtNltprobgt0.28792
  • lttrgtfull _at_stoplttrtgtNltprobgt0.28729
  • 1-1, 1-2 (2-1 translations not yet implemented)
  • packed forest representation for multiple
    translation choice
  • simplified version choose the first best

22
Where are we?
w/additional info
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
23
Automatically Annotating a Tectogrammatical Corpus
  • Owen Rambow

24
Goal
  • Use PropBank annotations to
  • Improve automatic construction of English TRs
  • Allow generation from generic pred-arg
    structures

25
Types of Corpus Annotation
  • Surface Syntax
  • Deep Syntax
  • Local Lexical Semantics
  • Global Lexical Semantics
  • Hybrid Deep Syntactic/Global Semantic
  • Tectogrammatical level used here

26
Surface SyntaxE.g., Penn Treebank
loaded
prepobj
prepobj
subj
hay
into
by
is
comp
comp
John
trucks
Hay is loaded into trucks by John
27
Deep SyntaxE.g., TAG
John loads hay into trucks
Hay is loaded into trucks by John
28
Local SemanticsPenn PropBank (brand new)
John loads hay into trucks
John loads trucks with hay
29
Global SemanticsLCS (U. Md.)
John loads hay into trucks
John throws hay into trucks
30
Tectogrammatical Representation
  • First two syntactic arguments of verb
    deep-syntactic
  • All other arguments global semantic

load
load
throw
dir3
dir3
pat
act
act
act
pat
acmp
pat
John
hay
truck
John
hay
truck
John
hay
truck
John loads hay into trucks
John throws hay into trucks
John loads trucks with hay
31
Why Use TR? Research Hypothesis
  • Replacing function words by TR arc labels makes
    transfer easier
  • Choice of realization target language-dependent
  • Deep-syntactic labels for first two arguments
    realization more verb-specific
  • Global semantic labels on remaining arguments
    realization just label-specific

32
Available Resources for Input 3
  • Surface syntax PTB corpus (hand, checked)
  • Deep syntax derived automatically from PTB
    (Chen01)
  • Local semantics PropBank corpus and frame
    lexicon (hand, checked)
  • Global semantics LCS lexicon (partially hand,
    partially checked)
  • TR PTB subset corpus (hand), PropBank ? TR
    dictionary (hand, not checked) (I. Kucerová)

33
Experiment Machine Learning of TR Labels Using
Ripper
  • Ripper (Cohen 1996) greedy symbolic rule
    learner, set- and bag-valued features
  • Features
  • Surface, deep syntactic info
  • Local, global semantic info
  • Kucerovás PropBank ? TR dictionary
    (hand-crafted)
  • Input 1 (Automatic English TR)

34
Results (TR Label Error Rates)
Semantics
PB? TR dict
all
local-global
local
none
37.7
22.6
23.7
25.9
58.8
none
17.1
15.9
16.3
17.7
19.5
Input 1
16.2
16.7
17.1
16.4
16.5
surface-deep
Syntax
14.4
16.1
16.2
15.9
15.5
surface-deep-Inp1
Average accuracy on 5-fold cross-validation (1326
data points)
35
Conclusions
  • Machine learning can improve on hand-written
    conversion rules ( Input 1)
  • PropBank is useful
  • Best results
  • All syntactic features PropBank ? TR dictionary
  • Future work use PropBank ? LCS dictionary
    (developed during workshop)

36
Where are we?
Transfer
Deep syntax (Czech)
CZECH
ENGLISH
37
The MAGENTA System
  • Statistically based
  • The pipeline
  • TR to AR by a channel model
  • Word order by reordering on dep. trees
  • Punctuation insertion
  • Morphology

38
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
CZECH
ENGLISH
39
The Tree-to-Tree Transductions
a
A
  • Jason Eisner .

CD
c
b
B
d
E
prep
prep
e
f
F
det
det
40
Translating trees
a
A
c
b
B
CD
learn this 21 mapping(or in dictionary)
d
E
Also 12, 20, etc., rearrangements ...
e
f
F
01 mapping
41
Translating trees
a
A
c
b
B
CD
d
E
e
f
F
42
Statistical Need a model of tree pairs
Mainly interested in (TR,AR) pairs But our
techniques are quite general E.g., example below
is not a (TR,AR) pair
the girl kissed her kitty cat
the girl gave a kiss to her cat
43
Training Our team has many tree pairs
Should be nicer to model than string pairs - why
we built them! What Czech trees went with what
English trees in training? ... Learn parameters
? of a joint model P?(T1,T2).
the girl kissed her kitty cat
Pred,kissed
Pred
Obj,cat
Obj
Subj,girl
Obj
Subj
Det
Det,the
kitty
Det
Det,her
44
Decoding Complete a tree pair
Training given T1 and T2 find ? to maximize
P?(T1,T2) Decoding given T1 and ? find T2 to
maximize P?(T1,T2) Horrible sparse data problem -
cant just do tree lookup.
the girl kissed her kitty cat
??
45
How should a model of tree pairs look?
Joint model P?(T1,T2).
Wise to use noisy-channel form P?(T1 T2)
P?(T2)
But any joint model will do.
46
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).
the girl kissed her kitty cat
the girl gave a kiss to her cat
47
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).
the girl kissed her kitty cat
the girl gave a kiss to her cat
different, bad alignment!
48
How should a model P? (T1,T2) of tree pairs look?
Intuition some kind of correspondence between
words.
Try to learn correspondence using EM
alignment (could seed with a dictionary).
  • So model must consider alignment P? (T1,T2,A)
  • Why A is complicated
  • The correspondence isnt 1 to 1
  • Also need to model word order (indeed topology)

49
Solution Use the right grammar formalism
Grammars can assemble words or phrases into
trees. Lets work up to the right formalism.
  • Model must consider alignment P? (T1,T2,A)
  • Why A is complicated
  • The correspondence isnt 1 to 1
  • Also need to model word order (indeed topology)
  • kiss ? gave a kiss
  • cat ? kitty cat
  • ? ? to

the girl kissed her kitty cat
the girl gave a kiss to her cat
50
Context-Free Grammar
the girl kissed her cat
S
etc.
51
Augment CFG nonterminalswith headwords
the girl kissed her cat
S
etc.
52
Augment CFG nonterminalswith headwords
the girl kissed her cat
S
look at all the rules headed by kissed ...
etc.
53
Lexicalized Tree Substitution Grammar
the girl kissed her cat
S
S,kissed
look at all the rules headed by kissed ...
a natural chunk
VP,kissed
VP,kissed
NP
NP
V, kissed
open role waiting to be filled
kissed
can fill open roles higher up
etc.
54
Lexicalized Tree Substitution Grammar
the girl kissed her cat
55
Lexicalized Tree Substitution Grammar
S
S
NP
VP
NP
Det
N
Det
NP
V
NP
the
girl
Det
N
Det
kissed
cat
her
56
Dependency-Style Lexicalized Tree Substitution
Grammar
Simplify structure Eliminate extra internal
nodes Just one node per word (dependency
style) Yields the kind of AR and TR trees we
actually have
57
Dependency-Style Lexicalized Tree Substitution
Grammar
the girl kissed her kitty cat
58
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
59
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
60
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
Det,a
61
Synchronous Dependency-Style Lexicalized Tree
Substitution Grammar
the girl kissed her kitty cat
the girl gave a kiss to her cat
Det,a
62
P(T1, T2, A) ? p(t1,t2,a n)
So any aligned BIG TREE PAIR is built from a set
of aligned LITTLE TREE PAIRS
Det,a
63
P(T1, T2, A) ? p(t1,t2,a n) How This
Simplifies Things
  • Alignment find A to max P?(T1,T2,A)
  • Decoding find T2, A to max P?(T1,T2,A)
  • Training find ? to max ?A P?(T1,T2,A)
  • Do everything on little trees instead!
  • Only need to train decode a model of
    p?(t1,t2,a)
  • But not sure how to break up big tree correctly
  • So try all possible little trees all ways
    of combining them, by dynamic prog.

64
System Architecture
Probability Model p?(t1,t2,a) of Little Trees
score little trees find p(...)
propose little translations t2 make p(...) big
update parameters ?raise p(...)
Decoder
Trainer
alignmentsbetween a big tree
T1 a forest of big trees T2
scores all
scores all alignmentsof two big trees T1,T2
dynamic programming engine
65
System Architecture
Probability Model p?(t1,t2,a) of Little Trees
score little trees
propose little translations t2
update parameters ?
Decoder
Trainer
dynamic programming engine
output
66
Related Work
  • Synchronous grammars (Shieber Schabes 1990)
  • Statistical work has allowed only 11 (isomorphic
    trees)
  • Stochastic inversion transduction grammars (Wu
    1995)
  • Head transducer grammars (Alshawi et al. 2000)
  • Statistical tree translation
  • Noisy channel model (Yamada Knight 2000)
  • Infers tree trains on (string, tree) pair, not
    (tree, tree) pair
  • But again, allows only 11, plus 10 at leaves
  • Statistical tree generation - find most prob.
    expressing meaning
  • Dynamic prog. search in packed forest (Langkilde
    2000)
  • Stack decoder (Ratnaparkhi 2000)

67
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
68
The Little Trees
  • Jan Hajic

)
p?(
69
)
p?(
  • Data still sparse, but better than for big trees
  • No alignment needed - already hypothesized for us

70
Form of the model for 11 (ARTR)
  • Base form
  • p(cat,PL,PAT,cat,NNS,Obj,alignment)
  • High-level Backoff
  • p(cat,cat) p(PL,NNS) p(PAT,Obj)
    p(alignment)
  • Low-level Backoff
  • p(align) (1/LTF) , where
  • (L size of ltTlemma,Alemmagt, etc.)

71
Non-11 Correspondences
  • Joint model
  • 01
  • p(to,TO,AuxY,alignment)k01
  • 10
  • p(GenNULL,ACT,align)k10
  • 12
  • p(home,SG,LOC,in,IN,AuxP,home,NNS,Adv,alignment)k1
    2
  • etc. corresponding backoff scheme

72
Smoothing issues
  • Other backoff schemes?
  • Too many to do all
  • Graphical models?
  • Derive from (manual) alignments
  • esp. for types of alignment the model cannot
    handle (14, for example)

73
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
74
The Proposer
  • Yuan Ding

75
Map TR to AR
76
Proposer for Decoder
  • Collecting Feature Patterns on TR
  • Construct AR using observed possible TR-AR
    transform
  • For unobserved TR, using naïve mapping onto AR

77
Proposer Observes during Training
78
Proposer During Decoding
79
Example
State
80
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
ENGLISH
81
The Classifier(s)
  • Terry Koo

82
Tree Transduction Model
Tree Transduction Models Decoder
Proposer
  • Global information in labels suppress proposals

83
Preposition Insertion Labeler
  • C5.0 decision tree classifier
  • Labels nothing, insert_of,

84
Preposition Insertion Labeler
  • Trained on Input 1 (Automatic English TR)

85
Preposition Insertion Labeler
  • Some TR nodes should be ignored
  • fly to Baltimore and from Boston

86
Boosting Insertion Recall
  • Overgenerating better than undergenerating
  • Using C5.0s misclassification costs to
    discourage nothing
  • Training on preposition-only data

87
Boosting Insertion Recall
  • N Best Labels
  • Confidence Threshold
  • N Average of Labels
  • Aggressive Confidence Threshold
  • N Average of Labels

88
Insertion Recall vs N
N 5, R 84.35
Aggressive Confidence Threshold
N 3, R 80.26
N 4, R 80.59
N 3, R 76.39
Confidence Threshold
N Best
89
What should be done next?
  • Clustering TR Lemmas into a tractable number of
    classes
  • Ripper instead of C5.0

90
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
91
Word Order
  • Dan Gildea

92
Word order
  • Tree-based models
  • Analytical level surface dependency, tree-based
  • Collins model
  • Uses function information (Sb, Obj, Atr, ...),
    POS, lemmas
  • 94 of nodes have correct ordering of children
    (chance 68)
  • No punctuation (inserted later)
  • Input order completely irrelevant

93
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
94
Punctuation-- Morphology
  • Kristen Parton

95
Punctuation Insertion Motivation
  • Important for sentence meaning, understanding
  • BLEU - n-gram statistics
  • commas are most frequent lemma in WSJ
  • Focusing on commas (95 of intra-sentence
    punctuation)
  • Difficulties
  • English comma usage very flexible
  • varies with style, meaning of sentence
  • quotes not marked in TR trees

96
Why insert commas separately?
  • Commas depend not only on underlying
    syntax/semantics but also on the surface
    realization of the sentence.
  • Soon, she will realize her mistake.
  • ? Soon she will realize her mistake.
  • She will soon realize her mistake.
  • ?? She will, soon, realize her mistake.
  • She will soon, realize her mistake.
  • She will, soon realize her mistake.
  • Channel model deals with unordered trees
  • Easier to do comma insertion after surface
    ordering

97
Commas in AR Trees
  • TR tree - autosemantic words - commas deleted
  • AR tree - commas are AuxX or Apos (apposition
    governors)
  • Input Data ordered, unpunctuated AR tree, with
    AR and TR functors, POS
  • Task insert AuxX nodes into AR tree, and link
    them in correct surface order.

98
Another Example
99
Comma Insertion Model
  • C5.0 decisions tree classifier
  • Trained on English AR trees with TR functors
    (sect. 0-19 WSJ) with punctuation stripped
  • Node Labels NO-ACTION, INSERT-RIGHT
  • Feature vectors
  • Local features (AFun, TFun, POS)
  • For node, left/right brother, parent, grandparent
  • Global features (Zhang 02) (position in sentence,
    )

100
Decision Tree Model Results
Preliminary results - still based on hand parsed
WSJ
  • Evaluation metric is sentence accuracy
  • What is (human) upper bound?
  • Systems are hard to compare models and data sets
    very different

101
Results for Generation
  • Comma insertion improves BLEU score
  • Possible improvements
  • Adding n-gram information to insertion model
  • Trying with other punctuation marks

102
Surface Morphology
  • Morphology dictionary - 365 M words (Curín)
  • morpha (morph analyzer) - lemmatize words, keep
    counts
  • Word -gt POS surface_form lemma
    frequency
  • NN want-you-babe want-you-babe
    1
  • VBD wanted want
    45595
  • VBD wanting want
    1
  • VBG wanting want
    3708
  • Task Lemma POS -gt surface form reverse
    lookup
  • Clashes resolved by frequency

103
Morphology Dictionary
  • Initial tests half of the errors were from the
    ambiguity in the verb "to be" between singular
    and plural.
  • Be VBP -gt (I) am or
    (we/they) are ??
  • Be VBD -gt (I/he/she/it) was or
    (we/you/they) were ??
  • Introduced an entry for "be2" to correspond to
    plural subject.
  • Test use the full dictionary (plus "be2"
    entries) - 902,220 entries total - generate
    surface forms for lemmatized version of the WSJ
    sections 0-21.

Be2 VBP -gt are Be2 VBD -gt were
Be VBP -gt am Be VBD -gt was
104
Surface Morphology Results
  • OOV rate 1.69
  • For many, surface form lemma, so correct by
    default
  • English morphology not complex most OOV are
    proper nouns
  • Non-OOV words 99.74 accuracy
  • 86 of mistakes were contractions 'm am, 've
    have, etc. (Actually correct.) Ignoring these
    99.96 correct rate
  • Overall, ignoring contractions error rate of
    0.03
  • High accuracy rate, fast runtime, good coverage -
    unnecessary to improve more

105
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
106
Improving Czech Parsing (AR-TR)
  • Gerald Penn

107
Improving Czech TR Parsing
  • Pre-workshop state
  • Czech Deep Syntax mapping AR to TR (Boehmova,
    Honetschlaeger, Zabokrtsky)
  • Two parts of the system
  • rule-based
  • 19 transformations by order-dependent perl code
  • statistical
  • C4.5-based labeling of TR functions 84 accuracy

108
Czech Deep Syntax mapping AR to TR
  • New statistical system
  • tree transduction model has same form as for
    generation
  • little-tree model reversed for parsing (AR to
    TR mapping)
  • initial EM pass uses simple model based on PDT
    (manual) node ID alignment
  • (reversed) proposer not finished yet

109
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
CZECH
ENGLISH
110
The Hybrid Approach to Generation The ARGENT
System
  • Dragomir Radev

111
Example
112
Alan Spoon, recently named Newsweek president,
said Newsweek s ad rates would increase 5 pct in
January.
113
NLG architectures
  • Statistical approaches
  • MAGENTA Hajic et al. 02
  • Rule-based approaches
  • FUF/Surge Elhadad 93, Elhadad and Robin 98
  • KPML Bateman 97
  • Hybrid approaches
  • NitroGen Knight and Hatzivassiloglou 95
  • HaloGen Langkilde and Knight 00
  • Fergus Bangalore and Rambow 00
  • ARGENT

114
FUF/Surge
  • What FUF can do (given sufficient control
    information)
  • Maps FUF-style thematic structure onto syntactic
    roles
  • Performs syntactic paraphrasing and alternations
    (e.g., dative move, passive)
  • Provides defaults for syntactic features (e.g.,
    present tense, third person)
  • Propagates agreement features
  • Selects closed class words
  • Inflects words
  • Provides linear precedence constraints among
    syntactic constituents
  • What FUF cannot do
  • convert dependency to phrase-structure
  • provide control for syntactic paraphrasing
  • provide control for lexical features
    (conditionals, past tense, )
  • choose determiners
  • provide a robust grammar

115
(setq r '((process ((lex "say")
(tense past) (object-clause
that))) (circum ((time ((cat pp)
(prep ((lex "in")))
(np ((lex "January")
(determiner none)))))))
(partic ((affected ((cat clause)
(process ((lex "increase")
(tense past)))
(partic ((created ((cat
measure)
(quantity ((value 5)))
(unit ((lex
"pct")))))
(agent ((cat np)
(head ((lex "rate")
(number
plural)
(determiner none)))
(classifier ((lex
"ad")
(determiner none)))
(possessor ((lex
"Newsweek")
(determiner none)))))))))
(agent ((complex apposition)
(punctuation ((after
","))) (distinct
(((lex "Spoon")
(classifier ((lex "Alan")))
(determiner none))
((lex "name")
(classifier
((lex "president")))
(determiner none))))))))))
116
(No Transcript)
117
Grammar development
  • translating TG ? FUF (deterministic channel)
  • write high coverage rules first
  • problem no aligned training data
  • four types of rules Langkilde-Geary 02 -
    recasting, ordering, filling, morphing
  • Three modules
  • Top-level
  • Recursion
  • Bottom-level

118
Evaluation
  • Robustness
  • ARGENT 245/248 sentences 98.7
  • HaloGen 80
  • Speed
  • ARGENT 1.4-2.9 sec/sentence
  • HaloGen 28.9-55.5 sec/sentence
  • BLEU score -- later

119
Future work
  • Complete grammar
  • improve coverage
  • use other grammatemes
  • degree of comparison (comparative) , sentmod
    (interrogative), verbmod (imperative)
  • Better error recovery
  • inconsistent PTB markup, TR transformation,
    translation
  • Grammar induction
  • N-gram based insertion of missing words
  • Integrate with MAGENTA

120
Where are we?
Transfer
English TR to AR
Deep syntax (Czech)
Word Order
Punctuation
Morphology
Evaluation
CZECH
ENGLISH
121
The Implemented Systems Creating Data for
Generation
  • ? ? Czech Tagger Parser (WS98, pre-WS02)
  • ? ? Czech-English Transfer (WS99, pre-WS02)
  • ? ? New Statistical Czech Parser to TR
  • ? ? Input3 Improved English TR for training

122
The Generation Systems
  • ? ? Aligner and Decoder
  • ? ? Little Tree Joint Model
  • ? ? Proposer
  • ? ? Preposition Classifier
  • ? ? Word Order by Tree LM
  • ? ? Comma insertion
  • ? ? Morphology
  • ? ? The Hybrid System (TR to FUF translation)

123
Evaluation
  • Evaluation data for BLEU (1-4grams)
  • devtest/evaltest 248/249 sentences, 5 ref.
    translations
  • Inputs
  • 1 Automatic English TR
  • 2 Manual English TR
  • 3 Enhanced Automatic English TR
  • 4 Automatic Czenglish TR
  • Systems Statistical, Hybrid (FUF-based)

124
Upper estimate
  • 5 reference translations
  • 1 original WSJ text from PTB
  • 4 retranslations from Czech to English
  • 2 US, 2 Czech
  • Evaluate the translations
  • take one out
  • evaluate against remaining 4
  • Average BLEU score 0.556

125
Results
  • Input 1 (Automatic English TR)

126
Results
  • Input 2 (Manual English TR)

127
Results
  • Input 3 (Improved Auto English TR)

128
Results
  • Input 4 (CzenglishAutomatic TR)

Unigram BLEU score for the reference set 0.844
129
Conclusions and Future Work
130
The Good News and the Bad News
  • Good news
  • End-to-end, tree-transformation system running
  • Written in 4 weeks, fully trainable from data
  • Generates from semantic (TR) English
    significantly better than the baseline
  • Datasets developed for generation/MT, evaluation
  • Bad news
  • not fully integrated (proposer, little tree
    model)
  • on full MT, cannot beat baseline (and yes, GIZA)

131
Things To Do (1)
  • Integrate the proposer
  • Integrate the preposition classifier
  • Write more classifiers, integrate
  • classifiers running in parallel/sequentially?
  • True EM smoothing (by adaptation of aligner)
  • Make the system more modular
  • e.g., declarative specification of smoothing

132
Things To Do (2)
  • The aligner/decoder
  • Pruning during aligning/decoding
  • Better smoothing of the little tree model
  • More dependence among little trees
  • through shared nonterminals or lexicalized
    nonterminals
  • Little-tree joint model ? noisy channel model
  • i.e., integrate Gildeas tree LM directly
  • Better initial model for EM
  • ML training off manual alignments
  • Nondeterministic transfer

133
Things To Do (3)
  • Make use of TRs deep (discourse) word order
  • More experiments
  • with new smoothing, integrated proposer
  • different order of modules
  • other punctuation classifier or inside the
    model?
  • Different settings/applications
  • AR to TR (parsing)
  • AR to AR (surface translation)
  • TR to TR (translation) other languages

134
The End
The beginning!
135
Generation in the context of MT
  • Project summary
  • Explore ways of using semantic sentence
    representation for NL generation
  • Use it in the machine translation context
  • Evaluate / compare the results
Write a Comment
User Comments (0)
About PowerShow.com