MT System Combination - PowerPoint PPT Presentation

1 / 51

About This Presentation

Title:

MT System Combination

Description:

Make use of the individual strengths of the different systems to ... 2.7 giga words English data, 5-gram LM. SRI LM toolkit. LDC: Linguistic Data Consortium ... – PowerPoint PPT presentation

Number of Views:27

Avg rating:3.0/5.0

Slides: 52

Provided by: muntsi

Category:

more less

Transcript and Presenter's Notes

Title: MT System Combination

1
MT System Combination
2
System Combination in MT

Methods of machine translation
Rule based
Example based
Statistical
Hierarchical
Syntax based
Output is different
Make use of the individual strengths of the
different systems to improve translation quality
Just selecting the best output on a
sentence-by-sentence basis or synthetic
combination of the output from the original
systems?

3
Parallel Combination
MT System
MT SystemCombination
MT System
Translation
Source Language Text
MT System
MT System
4
Serial Combination
MT System
MT System
Source Language Text
Translation
5
Model Combination
Source Language Text
Phrase Table S1
Phrase TableCombination
Decoder
Phrase Table S2
Lexicon S1
LexiconCombination
Lexicon S2
Reordering S1
ReorderingCombination
Reordering S2
Target Language Text

6
MT System Combination approaches

Parallel Combination
Hypothesis Selection
Lattice based Combination
CN based Combination
Serial combination
RBMT SMT
Cross combo Hypothesis Selection CN based
Combination
Model level combination
Combine lexica
Combine phrase tables
Combine input systems reorderings

7
System Combination in MT
Example based
Phrase based statistical
Hierarchical statistical
8
System Combination in MT
hoffman was mesmerized by drug but woke up in a
timely manner to create career
9
Hypothesis Selection

How to decide which hypothesis to pick?
System bias
Boost hypothesis from each system according to
its overall (BLEU) score on development data
MT System confidence score
Sytem tells you what it thinks how well it
translated the sentence
Problematic, because systems estimates are not
comparable, between systems as well as between
sentences within a system
Need to be normalized
Language model

10
Hypothesis Selection

Pick the best hypothesis from the different
systems for each source sentence
Use n-best list re-ranking approach
Add n-best hypotheses from each system
Re-rank joint n-best list
Find good featuresadd n-best list based
features

11
N-best list re-ranking Features

Consistently calculated for the joint n-best list
Language model
Statistical word lexicon
Position dependent word agreement
Position independent n-gram agreement
N-best list n-gram probability
Sentence length features
Rank in systems n-best list
System bias
Minimum error rate training to determine feature
weights on a development test set

12
LM and Lexicon

Language model
Kneser-Ney smoothing
Sentence score is normalized by the sentence
length
as large as possible e.g. complete LDC
GigaWordCorpus, 2.7 giga words English data,
5-gram LM
SRI LM toolkit
LDC Linguistic Data Consortium

13
LM and Lexicon

Statistical word lexica
lexicon probability sum
lexicon probability maximum
Both language directions
From model 4 GIZA training
all bilingual data you can get e.g. 260 million
words Chinese - English
Sentence score is normalized by the sentence
length

14
Position dependent word agreement

N-best list based feature
Relative frequency of n-best list entries
containing word e in the same position
Very restrictive, to loosen the restriction
Use window around original position
Window sizes t 0 2 as three separate
features
Average word score Sentence score

15
Position Dependent Word Agreement
t 2
t 1
t 0agreement 30
n-best list for one source sentence

majority vote on words (ROVER style)

16
Position Independent N-gram Agreement

N-best list based feature
Relative frequency of n-best list entries
containing the n-gram
Average n-gram score sentence score
Uni-gram to 6-gram as 6 separate features

17
Position Independent N-gram Agreement
n-best list for one source sentence
n 1word agreement90
n 3tri-gram agreement60
n 55-gram agreement40

Agreement score for n 1 to 6 as separate
features

18
N-best List N-gram Probability

N-best list based feature
Standard language model probability, estimated on
the n-best hypothesis for one source sentence
No smoothing, because counts are never zero
Average word score Sentence score

19
N-gram Agreement vs. N-gram Probability
San Francisco
n-best list for one source sentence
n 2bi-gram agreement30
P( Francisco San ) 3 / 3 bi-gram
probability100

LM n-gram probability gives information on word
order.

20
Sentence Length Features

Deviation of the sentence length from the average
hypothesis length of all hypothesis in the n-best
list for this source sentence
Ratio between the source and the target sentence
length
Deviation from the overall source-target token
count ratio in the bilingual training corpus for
this language pair
Deviation of the source/target ratio from the
average source to target length ratio within its
class
divide training corpus into several buckets by
sentence length
calculate average source/target length ratio
or average target length per bucket
source/target ratio might be quite different for
short, medium and long sentences

21
System Bias

Boost each hypothesis according to its rank in
the original systems n-best list
Boost each translation coming from the best
system
Boost each hypothesis according to its systems
performance on development data
Add system indicator feature to the feature set
and optimize weights using MERT

22
System Combination in MT
Chinese-English MT06
hoffman was mesmerized by drug
fortunately awakening
in a timely manner to create career
in performing arts
23
Lattice-based Example
24
Lattice-based combination
hoffman was addicted to drugs, fortunately
awaking in a timely manner to begin an acting
career
hoffman was obsessed
previously enamored drug
was mesmerized by
drug
hoffman
were
1
2
3
5
6
4
hoffman
was
addicted
to
drugs
was obsessed
25
Lattice-based Combination

Build lattice from multiple system output
Needs phrase translation boundary and source
alignment information
Can also combine complete system internal
translation lattices (if available)
Systems internal scores are most likely not
comparable, consistent scoring is difficult
Same translation proposed by several system is
not preferred/boosted
LM score
Re-decode

26
Confusion Network based Combo
was (3)
obsessed (2)
hoffman (5)
were (1)
previously (1)
enamored (1)
drug (6)
1
2
3
5
6
4
e (1)
e (3)
e (2)
hoffmann (1)
has (1)
fortunately (1)
mesmerized (1)
27
Lattice vs. CN
hoffman was obsessed
previously enamored drug
was mesmerized by
hoffman
were
drug
1
2
3
5
6
4
was
hoffman
addicted
to
drugs
was obsessed
was (3)
obsessed (2)
hoffman (5)
were (1)
previously (1)
enamored (1)
drug (6)
1
2
3
5
6
4
e (1)
e (3)
e (2)
hoffmann (1)
has (1)
fortunately (1)
mesmerized (1)
28
Word level confusion network decoding

Choose one translation hypothesis as skeleton
(determines word order)
Align each hypothesis to the skeleton using TER,
ITGs, statistical word alignment
Build confusion network with consensus votes
Add LM score into network
Train system weights, add into network
Choose best path through the network (decode)
Output consensus translation

29
confusion network decoding
choose as skeleton
skeleton determines word order
30
confusion network decoding

Biggest challenge word alignment
Pairwise vs. incremental alignment
TER alignment
Use morphology, synonyms, POS tag
Go to phrases (without source-target phrase
alignment available

31
confusion network decoding

comparison pairwise lt-gt incremental alignment
next hypothesis is aligned to the existing
network, not to the skeleton
order of adding hypothesis does make a
difference, e.g. use increasing TER/decreasing
BLEU of the system
But choice of skeleton is not that crucial any
more

32
pairwise vs. incremental alignment
pairwise
incremental
33
Serial Combination
RBMT System
SMT System
Source Language Text
Translation
34
Serial Combination

RBMT and SMT are good on very different things
RBMT produces very good translations, if its
rules cover the sentence well and fails utterly
for e.g. long complicated sentences
SMT produces more or less erroneous output on
everything
Serial Combination
Translate entire training corpus with RBMT
Train SMT on parallel corpus RMBT-translation
-gt English
SMT system as automatic post editor for RBMT
Smooth out RBMT problems without loosing its
strengths
Give RBMT strengths a better chance as in
parallel combination, because statistical models
bias towards SMT there

35
Cross Combination

Hypothesis selection output serves as skeleton
for CN generation
Smart choice of the skeleton for CN generation
has impact on translation quality e.g. because
it determines word order
Reverse order works as well
each input system serves as skeleton for n
confusion networks
hypothesis selection selects from combined
n-best lists from the CN decoding

MT System
Combination via Hypothesis Selection
MT System
Source Language Text
Combination viaConfusion NetworkDecoding
MT System
MT System
36
Model Combination
Source Language Text
Phrase Table S1
Phrase TableCombination
Decoder
Phrase Table S2
Lexicon S1
LexiconCombination
Lexicon S2
Reordering S1
ReorderingConstraint
Reordering S2
Target Language Text

37
Lexicon Combination

Combine Systems Lexica
Re-estimate joint probabilities
only useful, if the systems have different
training data available
Train test set specific Lexicon
Treat the source and its various translations as
special training data
Build a lexicon only with entries from the
systems translations
If available, use systems phrase alignment to
constrain word alignment
Interpolate this test set specific sharp lexicon
with large lexicon trained from all training data

38
Phrase Table combination

Source-target phrase pairs available for the test
set from the input systems
Combine with full baseline phrase table,
adjusted weight for the phrase counts from the
systems output
Rescore phrases using scaled systems total or
confidence score
Agreement boost for phrases coming from several
systems
Exact match
Same phrase, different distortion
Overlapping source interval with same target
words
Overlapping target words
Prune phrase table
From full phrase table, only keep phrases,
covered by one of the systems output

39
Phrase Table combination

Several rule based systems as input, so no
phrase pairs available
Train word alignment on parallel corpus
Align systems output to source with
statistically learned alignments
Extract test set specific phrase table from
systems translations (moses phrase extraction)
Interpolate with baseline statistical phrase
table
Build translation lattice
Re-decode

40
Reordering

While re-decoding
Restrict to reorderings used by one of the
input systems
Boost word order chosen by one of the systems

41
MT System Combination approaches

Parallel Combination
Hypothesis Selection
Lattice based Combination
CN based Combination
Serial combination
RBMT SMT
Cross combo Hypothesis Selection CN based
Combination
Model level combination
Combine lexica
Combine phrase tables
Combine input systems reorderings

Thank you!

43
Hypothesis selection

6 Large scale Chinese English MT systems
3 translation research groups
4 MT decoders
phrase based, hierarchical, example based systems

scores in BLEU
44
Questions to answer

N-best list size per system
Do n-best translations help at all?
If yes, how many?
How many systems to include
More systems always better?
Does a low quality system hurt combination?
Feature impact
Which features are the most useful?
How does this compare to MBR on n-best list?

45
N-best List Size
baseline 31.45
50
BLEU on Chinese MT06
46
Combining all Systems

Adding systems one by one to the combination
Ordered by their BLEU score on the unseen test
set

2.27 BLEU
baseline 31.45
BLEU on MT06
47
Feature Impact

Compare to two additional baselines
LM re-ranking only
LM statistical word lexicon
23 features, 5 feature groups, cant run all
combinations
Remove one feature group at a time
LM
Lexicon
Position dependent word agreement
Position independent n-gram agreement
N-best list LM probability

48
Feature Impact
8 features
23 features
scores in BLEU
49
Contribution of the Systems to the Combination
Chinese-English contributions per system
50
Comparison to MBR

Chinese English
623 sentences tuning set / 588 sentences blind
test
Normalized system specific cost
200 best per system
Hypo-sel w/o normalized system cost as additional
feature

51
Cross Combination

Hypothesis selection output serves as skeleton
for CN generation (JHU CN-based combo)
Smart choice of the skeleton for CN generation
has impact on translation quality

Write a Comment

User Comments (0)