Relevant target fragments retrieved and recombined to deriv

About This Presentation

Title:

Relevant target fragments retrieved and recombined to deriv

Description:

Relevant target fragments retrieved and recombined to derive final translation. ... Retrieved example translation candidates are recombined, along with their ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 51

Provided by: declan7

Category:

more less

Transcript and Presenter's Notes

Title: Relevant target fragments retrieved and recombined to deriv

1
Hybrid Data-Driven Models of Machine Translation

Andy Way ( Declan Groves)
National Centre for Language Technology,
School of Computing,
Dublin City University, Dublin 9, Ireland
away_at_computing.dcu.ie

2
Outline

Motivations
Example-Based Machine Translation
Marker-Based EBMT
Statistical Machine Translation
Experiments
Language Pairs Corpora Used
EBMT and PBSMT baseline systems
Hybrid System Experiments
Making use of merged data sets
Phrases, Chunks and Training-Test Corpora
Conclusions
Future Work

3
Motivations

Most MT research carried out today is
corpus-based
Example-Based Machine Translation (EBMT)
Statistical Machine Translation (SMT)
Lack of comparative research
Relative unavailability of EBMT systems
Lack of participation of EBMT researchers in
competitive evaluations
Dominance of the SMT approach

4
Example-Based Machine Translation

As with SMT, EBMT makes use of information
extracted from sententially-aligned bilingual
corpora. In general
SMT only uses parameters, throws away data
EBMT makes use of linguistic units directly
During Translation
Source side of bitext is searched for close
matches
Source-target subsentential links are determined
Relevant target fragments retrieved and
recombined to derive final translation.

5
EBMT An Example

Assumes an aligned bilingual corpus of examples
against which input text is matched
Best match is found using a similarity metric
based on word co-occurrence, POS, generalized
templates and bilingual dictionaries (exact and
fuzzy matching)

6
EBMT An Example

Assumes an aligned bilingual corpus of examples
against which input text is matched
Best match is found using a similarity metric
based on word co-occurrence, POS, generalized
templates and bilingual dictionaries (exact and
fuzzy matching)

7
EBMT An Example

Identify useful fragments

8
EBMT An Example

Identify useful fragments

Recombination depends on nature of examples used

on Monday
lundi
John went to
Jean est allé à
the bakers
la boulangerie
9
Marker-Based EBMT at DCU
Marker-Based EBMT at DCU

Gaijin Veale Way, RANLP 97
Gough et al., AMTA 02
wEBMT Way Gough, Comp. Linguistics 03
Gough Way, EAMT 04
Way Gough, TMI 04
Gough, PhD Thesis 05
Way Gough, Natural Language Engineering 05
Way Gough, Machine Translation 05
Groves Way, ACL w/shop on Data-Driven MT 05
Groves Way, Machine Translation EAMT 06
MaTrEx Armstrong et al., TC-STAR OpenLab 06
Stroppa et al., NIST MT-Eval 06, AMTA 06,
IWSLT-06

10
System Development
11
System Development
12
System Development
13
Marker-Based EBMT

The Marker Hypothesis states that all natural
languages have
a closed set of specific words or morphemes
which appear in a limited set of grammatical
contexts
and which signal that context.
Green, 1979
Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes

The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.
14
Marker-Based EBMT

The Marker Hypothesis states that all natural
languages have
a closed set of specific words or morphemes
which appear in a limited set of grammatical
contexts
and which signal that context.
Green, 1979
Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes

The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.

Three NPs start with determiners, one with a
possessive pronoun
Nominal element will appear soon to the right
Sets of determiners and possessive pronouns small
and finite

15
Marker-Based EBMT

The Marker Hypothesis states that all natural
languages have
a closed set of specific words or morphemes
which appear in a limited set of grammatical
contexts
and which signal that context.
Green, 1979
Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes

The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.

Four prepositional phrases, with prepositional
heads
NP object will appear soon to the right
Set of prepositions small and finite

16
Marker-Based EBMT Chunking

Use a set of closed-class marker words to segment
aligned source and target sentences during a
pre-processing stage
ltPUNCgt now used as end-of-chunk marker

English Marker words extracted from CELEX

17
Marker-Based EBMT Chunking (2)

Enables the use of basic syntactic markup for
extraction of translation resources
Source-target sentence pairs are tagged with
marker categories in pre-processing stage

EN ltPRONgt you click apply ltPREPgt to view ltDETgt
the effect ltPREPgt of ltDETgt the
selection FR ltPRONgt vous cliquez ltPRONgt sur
appliquer ltPREPgt pour visualiser
ltDETgtl effet ltPREPgt de ltDETgt la sélection

Aligned source-target chunks created by
segmenting sentences based
on these marker tags along with cognate
and word co-occurrence
information
ltPRONgt you click apply
ltPRONgt vous cliquez sur appliquer
ltPREPgt to view
ltPREPgt pour visualiser
ltDETgt the effect
ltDETgt leffet
ltPREPgt of the selection
ltPREPgt de la sélection

18
Marker-Based EBMT Chunking (2)

Enables the use of basic syntactic markup for
extraction of translation resources
Source-target sentence pairs are tagged with
marker categories in pre-processing stage

Aligned source-target chunks created by
segmenting sentences based
on these marker tags along with cognate
and word co-occurrence
information
ltPRONgt you click apply
ltPRONgt vous cliquez sur appliquer
ltPREPgt to view
ltPREPgt pour visualiser
ltDETgt the effect
ltDETgt leffet
ltPREPgt of the selection
ltPREPgt de la sélection

Chunks must contain at least one non-marker
wordensures chunks contain
useful contextual information

19
Marker-Based EBMT Lexicon Template Extraction

Chunks containing only one non-marker word in
both source and target languages can then be used
to extract a word-level lexicon
ltPREPgt to ltPREPgt pour
ltLEXgt view ltLEXgt visualiser
ltLEXgt effect ltLEXgt effet
ltDETgt the ltDETgt l
ltPREPgt of ltPREPgt de
In a final pre-processing stage, we produce a set
of generalized marker templates by replacing
marker words with their tags
ltPRONgt click apply ltPRONgt cliquez sur appliquer
ltPREPgt view ltPREPgt visualiser
ltDETgt effect ltDETgt effet
ltPREPgt the selection ltPREPgt la sélection
Any marker word pair can now be inserted at the
appropriate tag location.
More general examples add flexibility to the
matching process and improve coverage (and
quality)

20
Marker-Based EBMT

During translation
Resources are searched from maximal (specific
source-target sentence-pairs) to minimal context
(word-for-word translation).
Retrieved example translation candidates are
recombined, along with their weights, based on
source sentence order
System outputs n-best list of translations

21
Phrase-Based SMT

SMT translation and language models now make use
of phrase-translations in TM, along with word
correspondences, to improve translation output.
Better modelling of syntax and local
word-reordering
Phrase extraction heuristics based on word
alignments shown to be better than more
syntactically motivated approaches Koehn et al.,
2003
Perform word alignment in both source-target and
target-source directions
Take intersection of unidirectional alignments
Extend the intersection iteratively into the
union by adding adjacent alignments within the
alignment space Och Ney 2003, Koehn et al.,
2003.
Extract all possible phrases from sentence pairs
which correspond to these alignments
Phrase probabilities can be calculated from
relative frequencies

22
Outline Recap

Motivations
Example-Based Machine Translation
Marker-Based EBMT
Statistical Machine Translation
Experiments
Language Pairs Corpora Used
EBMT and PBSMT baseline systems
Hybrid System Experiments
Making use of merged data sets
Phrases, Chunks and Training-Test Corpora
Conclusions
Future Work

23
Experiments
24
EBMT vs. WB-SMT

Way Gough, 05 (cf. talk here in May 05) on
203K- Sun TM (4.8M words), and a 4K- test set
(ave. -length 13.1 words EN, 15.2 words FR),
EBMTgtvanilla WB-SMT (Giza, CMU-Cambridge
statistical toolkit, ISI ReWrite Decoder) for
FR?EN
Best BLEU scores
EN?FR .453 EBMT, .338 WB-SMT
FR?EN .461 EBMT, .446 WB-SMT

25
EBMT PB-SMT (on Sun TM)
English-French

The Phrase-Based system using GIZA-Data
outperforms the same system seeded with EBMT-Data
on all metrics, bar Precision (0.6598 vs. 0.6661)
Marker-Based EBMT system beats both Phrase-Based
SMT systems, particularly for BLEU (0.4409 vs.
0.3758) and Recall (0.6877 vs. 0.5759).

26
EBMT PB-SMT (on Sun TM)
French-English

Scores for all systems are better for FR?EN than
for EN?FR
Again, the Phrase-Based system using GIZA data
outperforms the same system seeded with EBMT
data.
As for EN?FR, the Marker-Based EBMT system
significantly outperforms both Phrase-Based SMT
systems for FR?EN.

27
Towards Hybridity

Decided to merge data sources
Combine parts of EBMT sub-sentential alignments
with parts of the data induced using GIZA
Performed a number of experiments using
EBMT Phrases GIZA Words (SEMI-HYBRID)
Investigate if quality of EBMT phrases is better
than GIZA phrases
All Data (HYBRID) GIZA Words Phrases EBMT
Words Phrases
EBMT phrases will be used instead of SMT n-grams
EBMT phrases should add extra probability to
more useful SMT phrases i.e. the probabilities
of the phrases in the intersection of these two
sets are boosted

EBMT Phrases
Giza Phrases
28
Merging Data Sources EN?FR Results

Using EBMT phrases GIZA words improves
significantly on using EBMT data alone
Merging all the EBMT and GIZA data improves on
all metrics, most significantly for BLEU score
(0.4259 vs. 0.3643 SEMI-HYBRID).
EBMT system still wins out for BLEU score, Recall
and WER

29
Merging Data Sources FR?EN Results

Using EBMT phrases GIZA words shows
improvements on PBSMT system seeded with EBMT
data, but improves only on the GIZA seeded
systems BLEU score (0.4888 vs. 0.4198).
However, merging all data improves on both PBSMT
systems on all metrics
EBMT system beats Hybrid system only on Recall
and WER

30
Results Discussion

PBSMT
Best PBSMT BLEU scores (with Giza data only)
0.375 (E-F), 0.420 (F-E)
Seeding PBSMT with EBMT data gets good scores
for BLEU, 0.364 (E-F), 0.395 (F-E) note
differences in data size (1.73M vs. 403K)
PBSMT loses out to EBMT system

Semi-Hybrid System
Seeding Pharaoh with SMT words and EBMT phrases
improves over baseline Giza seeded system
Data size diminishes considerably (430K vs.
1.73M)
Worse results than for EBMT system.

Fully-Hybrid System
Better results than for semi-hybrid system E-F
0.426 (0.396), F-E 0.489 (0.427)
Data size increases to 2.04M phrase table entries
For F-E, Hybrid system beats EBMT on BLEU (0.4888
vs. 0.4611) Precision (0.6927 vs. 0.6782) EBMT
ahead for Recall WER.

31
EBMT PB-SMT (on Europarl)

Groves Way, 06a/b
Added SMT-chunks to EBMT system ? hybrid
statistical EBMT system
New domain Europarl (FR?EN, 322K- ) Koehn, 05
Extracted training data from designated training
sets, filtering based on sentence length and
relative sentence length (ratio of 1.5 used).
Allowed us to extract high-quality training sets

For testing, randomly extracted 5000
sentences from the Europarl common
test set. Avg. sentence lengths 20.5
words (French), 19.0 words (English)

32
EBMT vs. PBSMT

Compared the performance of our Marker-Based EBMT
system against that of a PB-SMT system built
using
Pharaoh Phrase-Based Decoder Koehn, 04
SRI LM toolkit Stolcke, 02.
Refined alignment strategy Och Ney, 03
Trained on incremental data sets, tested on 5000
sentence test set
Effect of increasing training data on translation
quality
Performed translation for FR?EN
Evaluated translation quality automatically using
BLEU Papineni et al., 02, Precision Recall
(GTM toolkit Turian et al., 03) and Word-error
rate (WER)

33
EBMT vs. PBSMT French-English

Doubling the amount of data improves performance
across the board for both EBMT and PBSMT
PBSMT system clearly outperforms EBMT system, on
average achieving 0.07 BLEU score higher
PBSMT achieves a significantly lower WER (e.g.
68.55 vs. 82.43 for the 322K data set)
Increasing amount of training data results in
3-5 increase in relative BLEU for PBSMT
6.2 to 10.3 relative BLEU score improvement
for EBMT

78K
156K
322K
34
EBMT vs. PBSMT English-French

PBSMT continues to outperform EBMT system by some
distance
e.g. 0.1933 vs. 0.1488 BLEU score, 0.518 vs.
0.4578 Recall for 322K data set
Difference between systems is somewhat less for
EN?FR than for FR?EN
EBMT system performance much more consistent for
both directions
PBSMT system performs 2 BLEU score worse (10
relative) for EN?FR than for
FR?EN
French-English is easier
Fewer agreement errors, problems with boundary
friction e.g. le? the (FR?EN),
the? le, la, les, l (EN?FR)
EBMT scores higher for EN?FR than for
FR?EN in terms of BLEU score
Cf. Callison-Burch et al., 06, BLEU for
evaluating non-n-gram-based systems

78K
156K
322K
35
Hybrid System Experiments

Decided to merge elements of EBMT marker-based
alignments with PBSMT phrases and words induced
via GIZA
Number of Hybrid Systems
LEX-EBMT Replaced EBMT lexicon with higher
quality PBSMT word-alignments, to lower WER
H-EBMT vs. H-PBSMT Merged PBSMT words and
phrases with EBMT data (words and phrases) and
passed resulting data to baseline EBMT and
baseline PBSMT systems
H-EBMT-LM Reranked the output of H-EBMT systems
using the PBSMT systems equivalent language model

36
Hybrid Experiments French-English
37
Hybrid Experiments French-English
38
Hybrid Experiments French-English
39
Hybrid Experiments French-English

Use of the improved lexicon (LEX-EBMT), leads to
only slight improvements (average relative
increase of 2.9 BLEU)
Adding Hybrid data improves above baselines, for
both EBMT (H-EBMT) and PBSMT (H-PBSMT)
H-PBSMT system achieves higher BLEU score trained
on 78K 156K compared with PBSMT system when
trained on twice as much data.
The addition of the language model to the H-EBMT
system helps guide word order after lexical
selection and thus improves results further

40
Hybrid Experiments English-French

We see similar results for EN?FR as for FR?EN
The more SMT-like the EBMT system becomes, the
more the BLEU scores fall in line with other
metrics, i.e. higher for FR?EN than for EN?FR
Using the hybrid data set we get a 15 average
relative increase in BLEU score for the EBMT
system, and 6.2 for the H-PBSMT system over its
baseline
The H-PBSMT system performs almost as well as the
baseline system trained on over 4 times the
amount of data

41
SMT phrases vs. EBMT chunks

Many more SMT phrases are derived than EBMT
chunks
Not reflected in scores
Doubling amount of data, doubles amount of
sub-sentential alignments for both systems
Indicates the heterogeneous nature of the
Europarl corpus
Taking the 322K training set
93.0 SMT chunks found only once, 99.4 occur lt
10 times
96.6 EBMT chunks found only once, 99.8 occur lt
10 times

Of the top 10 most frequent chunks in SMT-only
set, 7 are made up solely of marker words
du ? of the
de la ? of the
union européenne ? union
états membres ? member states
de l ? of the
dans le ? in the
n est ? is
parlement européen ? parliament
que nous ? that we
que la ? that the

42
Translation Examples

PBSMT we have all accepted the lesson of the
food crisis the 1990s
H-PBSMT we have all accepted the lesson of
the food crisis in the 1990s
REF we have all learned our lesson from the
food crisis of the 90s
--------------------------------------------------
--------------------------------------------------
-----------------
PBSMT indeed if the second-pillar example
were less frequent there would be fewer poor
H-PBSMT indeed if pensions for example were
less frequent there would be fewer poor
REF if indeed for example pensions were less
inadequate there would be fewer poor people
--------------------------------------------------
--------------------------------------------------
-----------------
PBSMT in this regard the port controls there
should be making the regulations still more
stringent
H-PBSMT when it comes to port controls we must
make the regulations still more stringent
REF it is important to tighten up regulations
regarding the control of harbours and ports even
further
--------------------------------------------------
--------------------------------------------------
-----------------
PBSMT it also requires that we continue to
discussed the entry into force of fiscal
harmonization
H-PBSMT we also need to continue to ask
ourselves questions about the implementation of
fiscal harmonization
REF we also still need to continue to question
the implementing of fiscal harmonisation

43
Remarks

Groves Way, 05 showed how an EBMT system
outperforms a PBSMT system when trained on the
Sun Microsystems data set
This time around, the baseline PBSMT system
achieves higher quality than all variants of the
EBMT system
Heterogeneous Europarl vs. Homogeneous Sun data
Chunk coverage is lower on Europarl data set 6
translations produced using chunks alone (Sun)
vs. 1 on Europarl
EBMT system considered 13 words on average for
direct translation (vs. 7 for Sun data)
Significant improvements seen when using
higher-quality lexicon
Improvements also seen when LM introduced
H-PBSMT system able to outperform baseline PBSMT
system
Further gains to be made from hybrid corpus-based
approaches
Small overlap on chunks extracted via EBMT and
SMT methods

44
Hybrid Example-Based SMT The MaTrEx system
45
Hybrid Example-Based SMT

Armstrong et al., 06 OpenLab MT-EVAL (March
06)adding EBMT chunks to vanilla Pharaoh
PB-SMT system adds about 4 BLEU points for ES?EN
Stroppa et al., 06 adding EBMT chunks to
vanilla Pharaoh PB-SMT system adds about 5 BLEU
points for Basque?EN
Good performance in IWSLT-06

46
Outline Recap

Motivations
Example-Based Machine Translation
Marker-Based EBMT
Statistical Machine Translation
Experiments
Language Pairs Corpora Used
EBMT and PBSMT baseline systems
Hybrid System Experiments
Making use of merged data sets
Phrases, Chunks and Training-Test Corpora
Conclusions
Future Work

47
Phrases, Chunks and Training-Test Corpora

SMT phrases are contiguous sequences of n-grams
Typically, EBMT performance is comparable with
PB-SMT with fewer sub-sentential alignments
As EBMT chunks are different from SMT phrases,
use them if available in your PB-SMT systems (cf.
OpenLab ES?EN and AMTA Basque?EN results). They
Provide longer sequences of context ? better
translations
Reinforce probability of good but infrequent SMT
phrases
As SMT phrases are different from EBMT chunks,
use them if available in your EBMT systems
SMT phrases typically shorter than EBMT chunks,
so more useful where training/test material is
more heterogeneouswhere EBMT chunks are too
long to cover the input data, SMT n-grams can
fill in before we need to resort to W2W
translation (always last resort)
cf. CMU findings in recent NIST MT-Eval

48
Phrases, Chunks and Training-Test Corpora

Looks like EBMT better on homogeneous training
data
EBMT gt PB-SMT on Sun TM (EN?FR)
EBMT gt PB-SMT on EF TM (Basque?EN)
SMT better on (more) heterogeneous data
PB-SMT gt EBMT on Europarl (EN?FR)
Predictors of Usefulness of Approach given Text
Type
Chunk coverage
Amount of W2W Translation

49
Conclusions

Combining SMT phrases and EBMT chunks in a
hybrid statistical EBMT or example-based SMT
system will improve your system output
Blind adherence to one approach will guarantee
that your performance is less than it could
otherwise be
John Hutchins EBMT is Hybrid MT
Joe Olive Need combination of rules and
statistics

50
Ongoing Future Work

Automatic detection of Marker Words
Most common SMT phrases consist mainly of marker
words
Plan to increase levels of hybridity
Code a simple EBMT decoder, factoring in
Marker-Based recombination approach along with
probabilities
Use exact sentence matching in PBSMT, as in EBMT
Integration of generalized templates into PBSMT
system (and reintegrate them into EBMT system)
Integrate marker tag information into SMT
language and translation models
Hybrid EBMT-EBMT System (with CMU)?!
Whats the contribution of EBMT chunks if an SMT
system is allowed as much training data as it
likes?