Comparing ExampleBased

About This Presentation

Title:

Comparing ExampleBased

Description:

PRON you click apply PREP to view DET the effect PREP of DET the selection ... PRON you : vous DET the : l' PREP of : de. National Centre for ... – PowerPoint PPT presentation

Number of Views:35

Avg rating:3.0/5.0

Slides: 40

Provided by: applic

Category:

more less

Transcript and Presenter's Notes

Title: Comparing ExampleBased

1
Comparing Example-Based Statistical Machine
Translation

Andy Way
Nano Gough, Declan Groves
National Centre for Language Technology
School of Computing, Dublin City University
away,ngough,dgroves_at_computing.dcu.ie
To appear in the Journal of Natural Language
Engineering, June 2005
To appear in the Workshop on Building and
Using Parallel Texts
Data-Driven MT and Beyond, ACL-05, June 2005

2
Plan of the Talk

Basic Situation in MT today
Statistical MT (SMT)
Example-Based MT (EBMT)
Differences between Phrase-based SMT EBMT.
Our Marker-based EBMT system.
Testing EBMT vs. word- phrase-based SMT.
Results Observations.
Concluding Remarks.
Future Research Avenues.

3
What is the Situation today in MT?

Most MT research undertaken today iscorpus-based
(compared with rule-based methods).
Two main data-driven approaches
Example-Based MT (EBMT)
Statistical MT (SMT)
SMT by far the more dominant paradigm.

4
How does EBMT work?
EX (input)
search
FX (output)
F2 F4
5
A (much simplified) Example

Given in corpusJohn went to school ? Jean est
allé à lécole.The butchers is next to the
bakers ? La boucherie est à côté de
la boulangerie.
Isolate useful fragmentsJohn went to ? Jean
est allé àthe bakers ? la boulangerie
We can now translateJohn went to the bakers
Jean est allé à la boulangerie.

6
How does SMT work?

SMT deduces language translation models from
huge quantities of monolingual and bilingual data
using a range of theoretical approaches to
probability distribution and estimation.
Translation model establishes the set of target
language words (and more recently, phrases) which
are most likely to be useful in translating the
source string.
takes into account source and target word (and
phrase) co-occurrence frequencies, sentence
lengths and the relative sentence positions of
source and target words.
Language model tries to assemble these words (and
phrases) in the best order possible.
trained by determining all bigram and/or trigram
frequency distributions occurring in the training
data

7
The Paradigms are Converging

Harder than it has ever been to describe the
differences between the two methods.
This used to be easy
from the beginning, EBMT has sought to translate
new texts by means of a range of sub-sentential
databoth lexical and phrasalstored in the
system's memory.
until quite recently, SMT models of
translationwere based on the simple IBM word
alignment models of Brown et al., 1990.

8
From word- to phrase-based SMT

SMT systems now learn phrasal as well as lexical
alignments e.g. Koehn, Och, Marcu 2003 Och,
2003.
Unsurprisingly, the quality of today's
phrase-based SMT systems is considerably better
than that of the poorer word-based models.
Despite the fact that EBMT models have been
modelling lexical and phrasal correspondences for
20 years, no papers on SMT acknowledge this debt
to EBMT, nor describe their approach as
example-based

9
Differences between EBMT and Phrase-Based SMT?

EBMT alignments remain available for reuse in the
system, whereas (similar) SMT alignments
disappear in the probability models.
SMT systems never learn from previously
encountered data, i.e. when SMT sees a string
its seen before, it processes it in the same way
as unseen dataEBMT will just look up such
strings in its databases and output the
translation quite straightforwardly.
Depending on the model, EBMT builds in (some)
syntax at its coremost SMT systems only use
models of syntax in a post hoc reranking process,
and even here, Koehn et al., JHU Workshop 2003
demonstrated that bolting on syntax in this
manner did not help improve translation quality
Given (3), phrase-based SMT systems are likely to
learn (some) chunks that EBMT systems would
not.

10
SMT chunks are different from EBMT chunks

En Mary did not slap the green witch ?
Sp Maria no dió una botefada a la bruja verde.
(Lit Mary not gave a slap to the witch green)
From this aligned example, an SMT system would
potentially learn the following phrases (along
with many others)
slap the ? dió una botefada a
slap the ? dió una botefada a la
the green witch ?a la bruja verde
NB, SMT essentially learns n-gram sequences,
rather than phrases per se.
Koehn Knight, AMTA-04 SMT Tutorial
Notes

11
Our Marker-Based EBMT System
The Marker Hypothesis states that all natural
languages have a closed set of specific words or
morphemes which appear in a limited set of
grammatical contexts and which signal that
context. Green, 1979 Markers for English
(and French)
12
An Example

En you click apply to view the effect of the
selection ?
Fr vous cliquez sur appliquer pour visualiser
l'effet de la sélection
Sourcetarget aligned sentences are traversed
word by word and automatically tagged with their
marker categories
ltPRONgtyou click apply ltPREPgtto view ltDETgtthe
effect ltPREPgtof ltDETgtthe selection ?
ltPRONgtvous cliquez ltPREPgtsur appliquer ltPREPgtpour
visualiser ltDETgtl'effet ltPREPgtde ltDETgtla
sélection

13
Deriving Sub-Sentential SourceTarget Chunks

From these tagged strings, we generate the
following aligned marker chunks
ltPRONgt you click apply vous cliquez sur
appliquer
ltPREPgt to view pour visualiser
ltDETgt the effect l'effet
ltPREPgt of the selection de la sélection
New source and target (not necessarily
sourcetarget! fragments begin where marker
words are met and end at the next marker word
cognates, MI etc ? sourcetarget sub-sentential
alignments.
One further constraint each chunk must contain
at least one non-marker word (cf. 4th marker
chunk).

14
Deriving Lexical Mappings

Where chunks contain just one non-marker word in
both source and target, we assume they are
translations.
Thus we can extract the following word-level
translations
ltPREPgt to pour
ltLEXgt view visualiser
ltLEXgt effect effet
ltPRONgt you vous
ltDETgt the l
ltPREPgt of de

15
Deriving Generalised Templates

In a final pre-processing stage, we produce a set
of generalised marker templates by replacing
marker words with their tags
ltPRONgt click apply ltPRONgt cliquez sur appliquer
ltPREPgt view ltPREPgt visualiser
ltDETgt effect ltDETgt effet
ltPREPgt the selection ltPREPgt la sélection
Any marker tag pair can now be inserted at the
appropriate tag location.
More general examples add flexibility to the
matching process and improve coverage (and
quality).

16
Summary of Knowledge Sources

the original sententially-aligned sourcetarget
pairs
the marker-aligned chunks
the generalised marker chunks
the word-level lexicon.
New strings are segmented into all possible
n-grams that might be retrieved from the system's
memories.
Resources searched in the order provided here,
from maximal (specific sourcetarget
sentence-pairs) to minimal context (word-for-word
translation).

17
Application Areas for our EBMT System

Seeding System Memories with Penn-II Treebank
phrases and translations AMTA-02.
Controlled Language EBMT MT Summit-03,
EAMT-04, MT Journal-05.
Integration with web-based MT Systems CL
Journal-03.
Using the Web for Translation Validation (and
Correction, if required).
Scalable EBMT TMI-04, NLE Journal-05, ACL-05.
Largest English?French EBMT System.
Robust, Wide-Coverage, Good Quality.
Outperforms good on-line MT Systems.

18
What are we interested in finding out?

Whether our marker-based EBMT system could
outperform (1) word-based and (2) phrase-based
SMT systems compiled from generally available
tools
Whether such SMT systems outperform our EBMT
system when given enough training text.
Whether seeding SMT (and EBMT) systems with SMT
and/or EBMT data improves translation quality.
NB, (astonishingly), no previous published
research on comparing EBMT and SMT

19
What have we done vs. what are we doing?

WBSMT vs. EBMT
PBSMT seeded with
SMT chunks
EBMT chunks
Both knowledge sources (Hybrid Example-Based
SMT).
PBSMT vs. EBMT
Ongoing work
EBMT seeded with
SMT chunks
EBMT chunks
Merged knowledge sources (Hybrid Statistical
EBMT).

20
Word-Based SMT vs. EBMT

Marker-Based EBMT system Gough Way, TMI-04
To develop language and translation models for
the WBSMT system, we used
Giza (for word-alignment)
the CMU-Cambridge statistical toolkit (for
computing the language and translation models)
the ISI ReWrite Decoder (for deriving
translations)

21
Experiment 1 Set-Up

207K EnglishFrench Sun TM.
Randomly extracted 4K sentence test set.
Split remaining sentences into three training
sets roughly 50K (1.1M words), 100K and 203K
(4.8M words) sentence-pairs to test impact of
training set size.
Translation performed at each stage from
EnglishFrench and FrenchEnglish.
Resulting translations evaluated using a range of
automatic metrics.

22
WBSMT vs. EBMT EnglishFrench

All metrics bar one suggest that EBMT can
outperform WBSMT from FrenchEnglish
Only exception is for TS1, where WBSMT
outperforms EBMT in terms of precision (.674
compared to .653)

23
WBSMT vs. EBMT EnglishFrench

In general, scores incrementally improve as
training data increases.
But apart from SER, metrics suggest that training
on just over 100K sentences pairs yields better
results than training on just over 200K.
Why? Maybe due to overfitting or odd data
Surprising generally assumed that increasing
training data in Machine Learning approaches will
improve the quality of the output translations
(variance analysisbootstrap-resampling on test
set Koehn, EMNLP-04 different test sets).
Note especially the similarity of the WER scores,
and the difference in SER values. Much more
significant improvement for EBMT (20.6) than for
WBSMT (0.1).

24
WBSMT vs. EBMT FrenchEnglish

All WBSMT scores higher than for FrenchEnglish
For EBMT, better translations from FrenchEnglish
for BLEU, Recall and SER worse for WER (FR-EN
.508, EN-FR .448) and precision (FR-EN .678,
EN-FR .736)

25
WBSMT vs. EBMT FrenchEnglish

For TS1, EBMT does not outperform WBSMT from
FrenchEnglish for any of the five metrics.
For TS2, EBMT beats WBSMT in terms of BLEU,
Recall and SER (66.5 compared to 81.3 for
WBSMT), while WBSMT gets higher scores for
Precision and WER (46.2 compared to 55.2).
For TS3, WBSMT again beats EBMT in terms of
Precision (2.5) and WER (4 - both less
significant differences than for TS1 and TS2),
but EBMT wins out according to the other three
metricsnotably, by a huge 29.6 for SER.
BLEU WBSMT obtains significantly higher scores
for FrenchEnglish compared to EnglishFrench 8
higher for TS1, 6higher for TS2, and 12 higher
for TS3. Apart from TS1, the EBMT scores for the
two different language directions are much more
in line, indicating perhaps that EBMT may be more
consistent even for the same language pair in
different directions.

26
Summary of Results

Both EBMT WBSMT achieve better translation
qualityfrom FrenchEnglish compared to
EnglishFrench. Of the five automatic evaluation
metrics for each of the three training sets, in
9/15 cases WBSMT wins out over our EBMT system.
For EnglishFrench, in 14/15 cases EBMT beats
WBSMT.
Summing these results together, EBMT outperforms
WBSMT in 20 tests, while WBSMT does better in 10
experiments.
Assuming all of these tests to be of equal
importance,EBMT appears to outperform WBSMT by a
factor of two to one.
While the results are a little mixed, it is clear
that EBMT tends to outperform WBSMT on this
sublanguage and on these training sets.

27
Experiment 2 Phrase-Based SMT vs. EBMT

Same EBMT system as for WBSMT experiment
To develop language and translation models for
the SMT system, we used
Giza to extract word-alignments
Refine these to extract Giza phrase-alignments
Construct Probability Tables
Pass these to CMU-SRI statistical toolkit
Pharaoh Decoder to derive translations.
Same Translation Pairs, Training Sets, Test Sets
Resulting translations evaluated using a range of
automatic metrics

28
PBSMT vs. EBMT EnglishFrench

PBSMT with Giza sub-sentential alignments wins
out over PBSMT with EBMT data, but cf. size of
data sets
EBMT 403,317
PBSMT 1.73M
PBSMT beats WBSMT, notably for BLEU but 5 worse
for WER. SER still (disappointingly) high
EBMT beats PBSMT, esp. for BLEU, Recall, WER
SER

29
PBSMT vs. EBMT FrenchEnglish

PBSMT with Giza sub-sentential alignments wins
out over PBSMT with EBMT data (with same caveat)
PBSMT with both knowledge sources better for FE
than for EF
PBSMT doesnt beat WBSMT - ??
EBMT beats PBSMT

30
Experiment 3a Seeding Pharaoh with Giza Words
and EBMT Phrases EnglishFrench

Hybrid PBSMT system beats baseline PBSMT for
BLEU, PR, and SER slightly worse WER
Data Size 430K (cf. PBSMT 1.73M, EBMT 403K)
Still worse than EBMT scores

31
Experiment 3b Seeding Pharaoh with Giza Words
and EBMT Phrases FrenchEnglish

Hybrid PBSMT system beats baseline PBSMT for
BLEU slightly worse for PR, and SER quite a
bit worse for WER
Still shy of the results for EBMT.

32
Experiment 4a Seeding Pharaoh with All Data,
EnglishFrench

Hybrid System beats semi-hybrid system on all
metrics
Loses out to EBMT system, except for Precision.
Data Set now gt2M items.

33
Experiment 4b Seeding Pharaoh with All Data,
FrenchEnglish

Hybrid System beats semi-hybrid system on all
metrics
Hybrid System beats EBMT on BLEU Precision
EBMT ahead for Recall WER still well ahead for
SER.

34
Summary of Results WBSMT vs. EBMT

None of these are bad systems for TS3, worst
BLEU score is for WBSMT, E?F, .322
WBSMT loses out to EBMT 21 (but better overall
for F?E)
For TS3, WBSMT BLEU score of .446 and EBMT score
of .461 are high scores
For WBSMT vs. EBMT experiments, odd finding
higher scores for 100K training set investigate
in future work.

35
Summary of Results PBSMT vs. EBMT

PBSMT scores better than for WBSMT, but odd
result for F?E ?!
Best PBSMT BLEU scores (with Giza data only)
.375 (E?F), .420 (F?E)
Seeding PBSMT with EBMT data gets good scores
for BLEU, .364 (E?F), .395 (F?E) note
differences in data size (1.73M vs. 403K)
PBSMT loses out to EBMT
PBSMT SER still very high (8387).

36
Summary of Results Semi-Hybrid Systems

Seeding Pharaoh with SMT words and EBMT phrases
improves over baseline Giza seeded system
Data size diminishes considerably (430K vs.
1.73M)
Still worse result for semi-hybrid system for
F?E than for WBSMT ?!
Still worse results than for EBMT.

37
Summary of Results Fully Hybrid Systems

Better results than for semi-hybrid systems
E?F .426 (.396), F?E .489 (.427)
Data size increases
For F?E, Hybrid system beats EBMT on BLEU (.461)
Precision EBMT ahead for Recall WER still
well ahead (27) for SER.

38
Concluding Remarks

Despite the convergence between EBMT and SMT,
further gains to be made
Merging Giza and EBMT-induced data leads to an
improved Hybrid Example-Based SMT system
?Lesson for SMT community dont disregard the
large body of work on EBMT!
We expect in further work that adding SMT
sub-sentential data to our EBMT system will also
lead to improvements
?Lesson for EBMT-ers SMT data can help you too!

39
Future Work

Carry out significance tests on these results.
Investigate whats going on in 2nd 100K training
set.
Develop Statistical EBMT System as described
Other issues in hybridity
Use target LM in EBMT
Replace EBMT recombination process with SMT
decoder
Try different decoders, LMs and TMs
Factor in Marker Tags into SMT Probability
Tables.
Experiment with other training data in other
sublanguage domains, especially those where
larger corpora are available (e.g. Canadian
Hansards, European Parliament )
Try other language pairs.