Title: Relevant target fragments retrieved and recombined to deriv
1Hybrid Data-Driven Models of Machine Translation
- Andy Way ( Declan Groves)
- National Centre for Language Technology,
- School of Computing,
- Dublin City University, Dublin 9, Ireland
- away_at_computing.dcu.ie
2Outline
- Motivations
- Example-Based Machine Translation
- Marker-Based EBMT
- Statistical Machine Translation
- Experiments
- Language Pairs Corpora Used
- EBMT and PBSMT baseline systems
- Hybrid System Experiments
- Making use of merged data sets
- Phrases, Chunks and Training-Test Corpora
- Conclusions
- Future Work
3Motivations
- Most MT research carried out today is
corpus-based - Example-Based Machine Translation (EBMT)
- Statistical Machine Translation (SMT)
- Lack of comparative research
- Relative unavailability of EBMT systems
- Lack of participation of EBMT researchers in
competitive evaluations - Dominance of the SMT approach
4Example-Based Machine Translation
- As with SMT, EBMT makes use of information
extracted from sententially-aligned bilingual
corpora. In general - SMT only uses parameters, throws away data
- EBMT makes use of linguistic units directly
- During Translation
- Source side of bitext is searched for close
matches - Source-target subsentential links are determined
- Relevant target fragments retrieved and
recombined to derive final translation.
5EBMT An Example
- Assumes an aligned bilingual corpus of examples
against which input text is matched - Best match is found using a similarity metric
based on word co-occurrence, POS, generalized
templates and bilingual dictionaries (exact and
fuzzy matching)
6EBMT An Example
- Assumes an aligned bilingual corpus of examples
against which input text is matched - Best match is found using a similarity metric
based on word co-occurrence, POS, generalized
templates and bilingual dictionaries (exact and
fuzzy matching)
7EBMT An Example
- Identify useful fragments
8EBMT An Example
- Identify useful fragments
- Recombination depends on nature of examples used
on Monday
lundi
John went to
Jean est allé à
the bakers
la boulangerie
9Marker-Based EBMT at DCU
Marker-Based EBMT at DCU
- Gaijin Veale Way, RANLP 97
- Gough et al., AMTA 02
- wEBMT Way Gough, Comp. Linguistics 03
- Gough Way, EAMT 04
- Way Gough, TMI 04
- Gough, PhD Thesis 05
- Way Gough, Natural Language Engineering 05
- Way Gough, Machine Translation 05
- Groves Way, ACL w/shop on Data-Driven MT 05
- Groves Way, Machine Translation EAMT 06
- MaTrEx Armstrong et al., TC-STAR OpenLab 06
- Stroppa et al., NIST MT-Eval 06, AMTA 06,
IWSLT-06
10System Development
11System Development
12System Development
13Marker-Based EBMT
- The Marker Hypothesis states that all natural
languages have - a closed set of specific words or morphemes
- which appear in a limited set of grammatical
contexts - and which signal that context.
- Green, 1979
- Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes
The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.
14Marker-Based EBMT
- The Marker Hypothesis states that all natural
languages have - a closed set of specific words or morphemes
- which appear in a limited set of grammatical
contexts - and which signal that context.
- Green, 1979
- Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes
The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.
- Three NPs start with determiners, one with a
possessive pronoun - Nominal element will appear soon to the right
- Sets of determiners and possessive pronouns small
and finite
15Marker-Based EBMT
- The Marker Hypothesis states that all natural
languages have - a closed set of specific words or morphemes
- which appear in a limited set of grammatical
contexts - and which signal that context.
- Green, 1979
- Universal psycholinguistic constraint languages
are marked for syntactic structure at surface
level by closed set of lexemes or morphemes
The Dearborn Mich., energy company stopped paying
a dividend in the third quarter of 1984 because
of troubles at its Midland nuclear plant.
- Four prepositional phrases, with prepositional
heads - NP object will appear soon to the right
- Set of prepositions small and finite
16Marker-Based EBMT Chunking
- Use a set of closed-class marker words to segment
aligned source and target sentences during a
pre-processing stage - ltPUNCgt now used as end-of-chunk marker
- English Marker words extracted from CELEX
17Marker-Based EBMT Chunking (2)
- Enables the use of basic syntactic markup for
extraction of translation resources - Source-target sentence pairs are tagged with
marker categories in pre-processing stage
EN ltPRONgt you click apply ltPREPgt to view ltDETgt
the effect ltPREPgt of ltDETgt the
selection FR ltPRONgt vous cliquez ltPRONgt sur
appliquer ltPREPgt pour visualiser
ltDETgtl effet ltPREPgt de ltDETgt la sélection
- Aligned source-target chunks created by
segmenting sentences based - on these marker tags along with cognate
and word co-occurrence - information
- ltPRONgt you click apply
ltPRONgt vous cliquez sur appliquer - ltPREPgt to view
ltPREPgt pour visualiser - ltDETgt the effect
ltDETgt leffet - ltPREPgt of the selection
ltPREPgt de la sélection
18Marker-Based EBMT Chunking (2)
- Enables the use of basic syntactic markup for
extraction of translation resources - Source-target sentence pairs are tagged with
marker categories in pre-processing stage
EN ltPRONgt you click apply ltPREPgt to view ltDETgt
the effect ltPREPgt of ltDETgt the
selection FR ltPRONgt vous cliquez ltPRONgt sur
appliquer ltPREPgt pour visualiser
ltDETgtl effet ltPREPgt de ltDETgt la sélection
- Aligned source-target chunks created by
segmenting sentences based - on these marker tags along with cognate
and word co-occurrence - information
- ltPRONgt you click apply
ltPRONgt vous cliquez sur appliquer - ltPREPgt to view
ltPREPgt pour visualiser - ltDETgt the effect
ltDETgt leffet - ltPREPgt of the selection
ltPREPgt de la sélection
- Chunks must contain at least one non-marker
wordensures chunks contain - useful contextual information
19Marker-Based EBMT Lexicon Template Extraction
- Chunks containing only one non-marker word in
both source and target languages can then be used
to extract a word-level lexicon - ltPREPgt to ltPREPgt pour
- ltLEXgt view ltLEXgt visualiser
- ltLEXgt effect ltLEXgt effet
- ltDETgt the ltDETgt l
- ltPREPgt of ltPREPgt de
- In a final pre-processing stage, we produce a set
of generalized marker templates by replacing
marker words with their tags - ltPRONgt click apply ltPRONgt cliquez sur appliquer
- ltPREPgt view ltPREPgt visualiser
- ltDETgt effect ltDETgt effet
- ltPREPgt the selection ltPREPgt la sélection
- Any marker word pair can now be inserted at the
appropriate tag location. - More general examples add flexibility to the
matching process and improve coverage (and
quality)
20Marker-Based EBMT
- During translation
- Resources are searched from maximal (specific
source-target sentence-pairs) to minimal context
(word-for-word translation). - Retrieved example translation candidates are
recombined, along with their weights, based on
source sentence order - System outputs n-best list of translations
21Phrase-Based SMT
- SMT translation and language models now make use
of phrase-translations in TM, along with word
correspondences, to improve translation output. - Better modelling of syntax and local
word-reordering - Phrase extraction heuristics based on word
alignments shown to be better than more
syntactically motivated approaches Koehn et al.,
2003 - Perform word alignment in both source-target and
target-source directions - Take intersection of unidirectional alignments
- Extend the intersection iteratively into the
union by adding adjacent alignments within the
alignment space Och Ney 2003, Koehn et al.,
2003. - Extract all possible phrases from sentence pairs
which correspond to these alignments - Phrase probabilities can be calculated from
relative frequencies
22Outline Recap
- Motivations
- Example-Based Machine Translation
- Marker-Based EBMT
- Statistical Machine Translation
- Experiments
- Language Pairs Corpora Used
- EBMT and PBSMT baseline systems
- Hybrid System Experiments
- Making use of merged data sets
- Phrases, Chunks and Training-Test Corpora
- Conclusions
- Future Work
23Experiments
24EBMT vs. WB-SMT
- Way Gough, 05 (cf. talk here in May 05) on
203K- Sun TM (4.8M words), and a 4K- test set
(ave. -length 13.1 words EN, 15.2 words FR),
EBMTgtvanilla WB-SMT (Giza, CMU-Cambridge
statistical toolkit, ISI ReWrite Decoder) for
FR?EN - Best BLEU scores
- EN?FR .453 EBMT, .338 WB-SMT
- FR?EN .461 EBMT, .446 WB-SMT
25EBMT PB-SMT (on Sun TM)
English-French
- The Phrase-Based system using GIZA-Data
outperforms the same system seeded with EBMT-Data
on all metrics, bar Precision (0.6598 vs. 0.6661) - Marker-Based EBMT system beats both Phrase-Based
SMT systems, particularly for BLEU (0.4409 vs.
0.3758) and Recall (0.6877 vs. 0.5759).
26EBMT PB-SMT (on Sun TM)
French-English
- Scores for all systems are better for FR?EN than
for EN?FR - Again, the Phrase-Based system using GIZA data
outperforms the same system seeded with EBMT
data. - As for EN?FR, the Marker-Based EBMT system
significantly outperforms both Phrase-Based SMT
systems for FR?EN.
27Towards Hybridity
- Decided to merge data sources
- Combine parts of EBMT sub-sentential alignments
with parts of the data induced using GIZA - Performed a number of experiments using
- EBMT Phrases GIZA Words (SEMI-HYBRID)
- Investigate if quality of EBMT phrases is better
than GIZA phrases - All Data (HYBRID) GIZA Words Phrases EBMT
Words Phrases - EBMT phrases will be used instead of SMT n-grams
- EBMT phrases should add extra probability to
more useful SMT phrases i.e. the probabilities
of the phrases in the intersection of these two
sets are boosted
EBMT Phrases
Giza Phrases
28Merging Data Sources EN?FR Results
- Using EBMT phrases GIZA words improves
significantly on using EBMT data alone - Merging all the EBMT and GIZA data improves on
all metrics, most significantly for BLEU score
(0.4259 vs. 0.3643 SEMI-HYBRID). - EBMT system still wins out for BLEU score, Recall
and WER
29Merging Data Sources FR?EN Results
- Using EBMT phrases GIZA words shows
improvements on PBSMT system seeded with EBMT
data, but improves only on the GIZA seeded
systems BLEU score (0.4888 vs. 0.4198). - However, merging all data improves on both PBSMT
systems on all metrics - EBMT system beats Hybrid system only on Recall
and WER
30Results Discussion
- PBSMT
- Best PBSMT BLEU scores (with Giza data only)
0.375 (E-F), 0.420 (F-E) - Seeding PBSMT with EBMT data gets good scores
for BLEU, 0.364 (E-F), 0.395 (F-E) note
differences in data size (1.73M vs. 403K) - PBSMT loses out to EBMT system
- Semi-Hybrid System
- Seeding Pharaoh with SMT words and EBMT phrases
improves over baseline Giza seeded system - Data size diminishes considerably (430K vs.
1.73M) - Worse results than for EBMT system.
- Fully-Hybrid System
- Better results than for semi-hybrid system E-F
0.426 (0.396), F-E 0.489 (0.427) - Data size increases to 2.04M phrase table entries
- For F-E, Hybrid system beats EBMT on BLEU (0.4888
vs. 0.4611) Precision (0.6927 vs. 0.6782) EBMT
ahead for Recall WER.
31EBMT PB-SMT (on Europarl)
- Groves Way, 06a/b
- Added SMT-chunks to EBMT system ? hybrid
statistical EBMT system - New domain Europarl (FR?EN, 322K- ) Koehn, 05
- Extracted training data from designated training
sets, filtering based on sentence length and
relative sentence length (ratio of 1.5 used). - Allowed us to extract high-quality training sets
- For testing, randomly extracted 5000
sentences from the Europarl common - test set. Avg. sentence lengths 20.5
words (French), 19.0 words (English)
32EBMT vs. PBSMT
- Compared the performance of our Marker-Based EBMT
system against that of a PB-SMT system built
using - Pharaoh Phrase-Based Decoder Koehn, 04
- SRI LM toolkit Stolcke, 02.
- Refined alignment strategy Och Ney, 03
- Trained on incremental data sets, tested on 5000
sentence test set - Effect of increasing training data on translation
quality - Performed translation for FR?EN
- Evaluated translation quality automatically using
BLEU Papineni et al., 02, Precision Recall
(GTM toolkit Turian et al., 03) and Word-error
rate (WER)
33EBMT vs. PBSMT French-English
- Doubling the amount of data improves performance
across the board for both EBMT and PBSMT - PBSMT system clearly outperforms EBMT system, on
average achieving 0.07 BLEU score higher - PBSMT achieves a significantly lower WER (e.g.
68.55 vs. 82.43 for the 322K data set) - Increasing amount of training data results in
- 3-5 increase in relative BLEU for PBSMT
- 6.2 to 10.3 relative BLEU score improvement
for EBMT
78K
156K
322K
34EBMT vs. PBSMT English-French
- PBSMT continues to outperform EBMT system by some
distance - e.g. 0.1933 vs. 0.1488 BLEU score, 0.518 vs.
0.4578 Recall for 322K data set - Difference between systems is somewhat less for
EN?FR than for FR?EN - EBMT system performance much more consistent for
both directions - PBSMT system performs 2 BLEU score worse (10
relative) for EN?FR than for - FR?EN
- French-English is easier
- Fewer agreement errors, problems with boundary
friction e.g. le? the (FR?EN), - the? le, la, les, l (EN?FR)
- EBMT scores higher for EN?FR than for
- FR?EN in terms of BLEU score
- Cf. Callison-Burch et al., 06, BLEU for
evaluating non-n-gram-based systems
78K
156K
322K
35Hybrid System Experiments
- Decided to merge elements of EBMT marker-based
alignments with PBSMT phrases and words induced
via GIZA - Number of Hybrid Systems
- LEX-EBMT Replaced EBMT lexicon with higher
quality PBSMT word-alignments, to lower WER - H-EBMT vs. H-PBSMT Merged PBSMT words and
phrases with EBMT data (words and phrases) and
passed resulting data to baseline EBMT and
baseline PBSMT systems - H-EBMT-LM Reranked the output of H-EBMT systems
using the PBSMT systems equivalent language model
36Hybrid Experiments French-English
37Hybrid Experiments French-English
38Hybrid Experiments French-English
39Hybrid Experiments French-English
- Use of the improved lexicon (LEX-EBMT), leads to
only slight improvements (average relative
increase of 2.9 BLEU) - Adding Hybrid data improves above baselines, for
both EBMT (H-EBMT) and PBSMT (H-PBSMT) - H-PBSMT system achieves higher BLEU score trained
on 78K 156K compared with PBSMT system when
trained on twice as much data. - The addition of the language model to the H-EBMT
system helps guide word order after lexical
selection and thus improves results further
40Hybrid Experiments English-French
- We see similar results for EN?FR as for FR?EN
- The more SMT-like the EBMT system becomes, the
more the BLEU scores fall in line with other
metrics, i.e. higher for FR?EN than for EN?FR - Using the hybrid data set we get a 15 average
relative increase in BLEU score for the EBMT
system, and 6.2 for the H-PBSMT system over its
baseline - The H-PBSMT system performs almost as well as the
baseline system trained on over 4 times the
amount of data
41SMT phrases vs. EBMT chunks
- Many more SMT phrases are derived than EBMT
chunks - Not reflected in scores
- Doubling amount of data, doubles amount of
sub-sentential alignments for both systems - Indicates the heterogeneous nature of the
Europarl corpus - Taking the 322K training set
- 93.0 SMT chunks found only once, 99.4 occur lt
10 times - 96.6 EBMT chunks found only once, 99.8 occur lt
10 times
- Of the top 10 most frequent chunks in SMT-only
set, 7 are made up solely of marker words - du ? of the
- de la ? of the
- union européenne ? union
- états membres ? member states
- de l ? of the
- dans le ? in the
- n est ? is
- parlement européen ? parliament
- que nous ? that we
- que la ? that the
42Translation Examples
- PBSMT we have all accepted the lesson of the
food crisis the 1990s - H-PBSMT we have all accepted the lesson of
the food crisis in the 1990s - REF we have all learned our lesson from the
food crisis of the 90s - --------------------------------------------------
--------------------------------------------------
----------------- - PBSMT indeed if the second-pillar example
were less frequent there would be fewer poor - H-PBSMT indeed if pensions for example were
less frequent there would be fewer poor - REF if indeed for example pensions were less
inadequate there would be fewer poor people - --------------------------------------------------
--------------------------------------------------
----------------- - PBSMT in this regard the port controls there
should be making the regulations still more
stringent - H-PBSMT when it comes to port controls we must
make the regulations still more stringent - REF it is important to tighten up regulations
regarding the control of harbours and ports even
further - --------------------------------------------------
--------------------------------------------------
----------------- - PBSMT it also requires that we continue to
discussed the entry into force of fiscal
harmonization - H-PBSMT we also need to continue to ask
ourselves questions about the implementation of
fiscal harmonization - REF we also still need to continue to question
the implementing of fiscal harmonisation
43Remarks
- Groves Way, 05 showed how an EBMT system
outperforms a PBSMT system when trained on the
Sun Microsystems data set - This time around, the baseline PBSMT system
achieves higher quality than all variants of the
EBMT system - Heterogeneous Europarl vs. Homogeneous Sun data
- Chunk coverage is lower on Europarl data set 6
translations produced using chunks alone (Sun)
vs. 1 on Europarl - EBMT system considered 13 words on average for
direct translation (vs. 7 for Sun data) - Significant improvements seen when using
higher-quality lexicon - Improvements also seen when LM introduced
- H-PBSMT system able to outperform baseline PBSMT
system - Further gains to be made from hybrid corpus-based
approaches - Small overlap on chunks extracted via EBMT and
SMT methods
44Hybrid Example-Based SMT The MaTrEx system
45Hybrid Example-Based SMT
- Armstrong et al., 06 OpenLab MT-EVAL (March
06)adding EBMT chunks to vanilla Pharaoh
PB-SMT system adds about 4 BLEU points for ES?EN - Stroppa et al., 06 adding EBMT chunks to
vanilla Pharaoh PB-SMT system adds about 5 BLEU
points for Basque?EN - Good performance in IWSLT-06
46Outline Recap
- Motivations
- Example-Based Machine Translation
- Marker-Based EBMT
- Statistical Machine Translation
- Experiments
- Language Pairs Corpora Used
- EBMT and PBSMT baseline systems
- Hybrid System Experiments
- Making use of merged data sets
- Phrases, Chunks and Training-Test Corpora
- Conclusions
- Future Work
47Phrases, Chunks and Training-Test Corpora
- SMT phrases are contiguous sequences of n-grams
- Typically, EBMT performance is comparable with
PB-SMT with fewer sub-sentential alignments - As EBMT chunks are different from SMT phrases,
use them if available in your PB-SMT systems (cf.
OpenLab ES?EN and AMTA Basque?EN results). They - Provide longer sequences of context ? better
translations - Reinforce probability of good but infrequent SMT
phrases - As SMT phrases are different from EBMT chunks,
use them if available in your EBMT systems - SMT phrases typically shorter than EBMT chunks,
so more useful where training/test material is
more heterogeneouswhere EBMT chunks are too
long to cover the input data, SMT n-grams can
fill in before we need to resort to W2W
translation (always last resort) - cf. CMU findings in recent NIST MT-Eval
48Phrases, Chunks and Training-Test Corpora
- Looks like EBMT better on homogeneous training
data - EBMT gt PB-SMT on Sun TM (EN?FR)
- EBMT gt PB-SMT on EF TM (Basque?EN)
- SMT better on (more) heterogeneous data
- PB-SMT gt EBMT on Europarl (EN?FR)
- Predictors of Usefulness of Approach given Text
Type - Chunk coverage
- Amount of W2W Translation
49Conclusions
- Combining SMT phrases and EBMT chunks in a
hybrid statistical EBMT or example-based SMT
system will improve your system output - Blind adherence to one approach will guarantee
that your performance is less than it could
otherwise be - John Hutchins EBMT is Hybrid MT
- Joe Olive Need combination of rules and
statistics
50Ongoing Future Work
- Automatic detection of Marker Words
- Most common SMT phrases consist mainly of marker
words - Plan to increase levels of hybridity
- Code a simple EBMT decoder, factoring in
Marker-Based recombination approach along with
probabilities - Use exact sentence matching in PBSMT, as in EBMT
- Integration of generalized templates into PBSMT
system (and reintegrate them into EBMT system) - Integrate marker tag information into SMT
language and translation models - Hybrid EBMT-EBMT System (with CMU)?!
- Whats the contribution of EBMT chunks if an SMT
system is allowed as much training data as it
likes?
51- Thank you for your attention.