Comparing ExampleBased - PowerPoint PPT Presentation

About This Presentation

Comparing ExampleBased


PRON you click apply PREP to view DET the effect PREP of DET the selection ... PRON you : vous DET the : l' PREP of : de. National Centre for ... – PowerPoint PPT presentation

Number of Views:35
Avg rating:3.0/5.0
Slides: 40
Provided by: applic


Transcript and Presenter's Notes

Title: Comparing ExampleBased

Comparing Example-Based Statistical Machine
  • Andy Way
  • Nano Gough, Declan Groves
  • National Centre for Language Technology
  • School of Computing, Dublin City University
  • away,ngough,
  • To appear in the Journal of Natural Language
    Engineering, June 2005
  • To appear in the Workshop on Building and
    Using Parallel Texts
  • Data-Driven MT and Beyond, ACL-05, June 2005

Plan of the Talk
  • Basic Situation in MT today
  • Statistical MT (SMT)
  • Example-Based MT (EBMT)
  • Differences between Phrase-based SMT EBMT.
  • Our Marker-based EBMT system.
  • Testing EBMT vs. word- phrase-based SMT.
  • Results Observations.
  • Concluding Remarks.
  • Future Research Avenues.

What is the Situation today in MT?
  • Most MT research undertaken today iscorpus-based
    (compared with rule-based methods).
  • Two main data-driven approaches
  • Example-Based MT (EBMT)
  • Statistical MT (SMT)
  • SMT by far the more dominant paradigm.

How does EBMT work?
EX (input)
FX (output)
F2 F4
A (much simplified) Example
  • Given in corpusJohn went to school ? Jean est
    allé à lécole.The butchers is next to the
    bakers ? La boucherie est à côté de
    la boulangerie.
  • Isolate useful fragmentsJohn went to ? Jean
    est allé àthe bakers ? la boulangerie
  • We can now translateJohn went to the bakers
    Jean est allé à la boulangerie.

How does SMT work?
  • SMT deduces language translation models from
    huge quantities of monolingual and bilingual data
    using a range of theoretical approaches to
    probability distribution and estimation.
  • Translation model establishes the set of target
    language words (and more recently, phrases) which
    are most likely to be useful in translating the
    source string.
  • takes into account source and target word (and
    phrase) co-occurrence frequencies, sentence
    lengths and the relative sentence positions of
    source and target words.
  • Language model tries to assemble these words (and
    phrases) in the best order possible.
  • trained by determining all bigram and/or trigram
    frequency distributions occurring in the training

The Paradigms are Converging
  • Harder than it has ever been to describe the
    differences between the two methods.
  • This used to be easy
  • from the beginning, EBMT has sought to translate
    new texts by means of a range of sub-sentential
    databoth lexical and phrasalstored in the
    system's memory.
  • until quite recently, SMT models of
    translationwere based on the simple IBM word
    alignment models of Brown et al., 1990.

From word- to phrase-based SMT
  • SMT systems now learn phrasal as well as lexical
    alignments e.g. Koehn, Och, Marcu 2003 Och,
  • Unsurprisingly, the quality of today's
    phrase-based SMT systems is considerably better
    than that of the poorer word-based models.
  • Despite the fact that EBMT models have been
    modelling lexical and phrasal correspondences for
    20 years, no papers on SMT acknowledge this debt
    to EBMT, nor describe their approach as

Differences between EBMT and Phrase-Based SMT?
  • EBMT alignments remain available for reuse in the
    system, whereas (similar) SMT alignments
    disappear in the probability models.
  • SMT systems never learn from previously
    encountered data, i.e. when SMT sees a string
    its seen before, it processes it in the same way
    as unseen dataEBMT will just look up such
    strings in its databases and output the
    translation quite straightforwardly.
  • Depending on the model, EBMT builds in (some)
    syntax at its coremost SMT systems only use
    models of syntax in a post hoc reranking process,
    and even here, Koehn et al., JHU Workshop 2003
    demonstrated that bolting on syntax in this
    manner did not help improve translation quality
  • Given (3), phrase-based SMT systems are likely to
    learn (some) chunks that EBMT systems would

SMT chunks are different from EBMT chunks
  • En Mary did not slap the green witch ?
  • Sp Maria no dió una botefada a la bruja verde.
  • (Lit Mary not gave a slap to the witch green)
  • From this aligned example, an SMT system would
    potentially learn the following phrases (along
    with many others)
  • slap the ? dió una botefada a
  • slap the ? dió una botefada a la
  • the green witch ?a la bruja verde
  • NB, SMT essentially learns n-gram sequences,
    rather than phrases per se.
  • Koehn Knight, AMTA-04 SMT Tutorial

Our Marker-Based EBMT System
The Marker Hypothesis states that all natural
languages have a closed set of specific words or
morphemes which appear in a limited set of
grammatical contexts and which signal that
context. Green, 1979 Markers for English
(and French)
An Example
  • En you click apply to view the effect of the
    selection ?
  • Fr vous cliquez sur appliquer pour visualiser
    l'effet de la sélection
  • Sourcetarget aligned sentences are traversed
    word by word and automatically tagged with their
    marker categories
  • ltPRONgtyou click apply ltPREPgtto view ltDETgtthe
    effect ltPREPgtof ltDETgtthe selection ?
  • ltPRONgtvous cliquez ltPREPgtsur appliquer ltPREPgtpour
    visualiser ltDETgtl'effet ltPREPgtde ltDETgtla

Deriving Sub-Sentential SourceTarget Chunks
  • From these tagged strings, we generate the
    following aligned marker chunks
  • ltPRONgt you click apply vous cliquez sur
  • ltPREPgt to view pour visualiser
  • ltDETgt the effect l'effet
  • ltPREPgt of the selection de la sélection
  • New source and target (not necessarily
    sourcetarget! fragments begin where marker
    words are met and end at the next marker word
    cognates, MI etc ? sourcetarget sub-sentential
  • One further constraint each chunk must contain
    at least one non-marker word (cf. 4th marker

Deriving Lexical Mappings
  • Where chunks contain just one non-marker word in
    both source and target, we assume they are
  • Thus we can extract the following word-level
  • ltPREPgt to pour
  • ltLEXgt view visualiser
  • ltLEXgt effect effet
  • ltPRONgt you vous
  • ltDETgt the l
  • ltPREPgt of de

Deriving Generalised Templates
  • In a final pre-processing stage, we produce a set
    of generalised marker templates by replacing
    marker words with their tags
  • ltPRONgt click apply ltPRONgt cliquez sur appliquer
  • ltPREPgt view ltPREPgt visualiser
  • ltDETgt effect ltDETgt effet
  • ltPREPgt the selection ltPREPgt la sélection
  • Any marker tag pair can now be inserted at the
    appropriate tag location.
  • More general examples add flexibility to the
    matching process and improve coverage (and

Summary of Knowledge Sources
  • the original sententially-aligned sourcetarget
  • the marker-aligned chunks
  • the generalised marker chunks
  • the word-level lexicon.
  • New strings are segmented into all possible
    n-grams that might be retrieved from the system's
  • Resources searched in the order provided here,
    from maximal (specific sourcetarget
    sentence-pairs) to minimal context (word-for-word

Application Areas for our EBMT System
  • Seeding System Memories with Penn-II Treebank
    phrases and translations AMTA-02.
  • Controlled Language EBMT MT Summit-03,
    EAMT-04, MT Journal-05.
  • Integration with web-based MT Systems CL
  • Using the Web for Translation Validation (and
    Correction, if required).
  • Scalable EBMT TMI-04, NLE Journal-05, ACL-05.
  • Largest English?French EBMT System.
  • Robust, Wide-Coverage, Good Quality.
  • Outperforms good on-line MT Systems.

What are we interested in finding out?
  • Whether our marker-based EBMT system could
    outperform (1) word-based and (2) phrase-based
    SMT systems compiled from generally available
  • Whether such SMT systems outperform our EBMT
    system when given enough training text.
  • Whether seeding SMT (and EBMT) systems with SMT
    and/or EBMT data improves translation quality.
  • NB, (astonishingly), no previous published
    research on comparing EBMT and SMT

What have we done vs. what are we doing?
  • WBSMT vs. EBMT
  • PBSMT seeded with
  • SMT chunks
  • EBMT chunks
  • Both knowledge sources (Hybrid Example-Based
  • PBSMT vs. EBMT
  • Ongoing work
  • EBMT seeded with
  • SMT chunks
  • EBMT chunks
  • Merged knowledge sources (Hybrid Statistical

Word-Based SMT vs. EBMT
  • Marker-Based EBMT system Gough Way, TMI-04
  • To develop language and translation models for
    the WBSMT system, we used
  • Giza (for word-alignment)
  • the CMU-Cambridge statistical toolkit (for
    computing the language and translation models)
  • the ISI ReWrite Decoder (for deriving

Experiment 1 Set-Up
  • 207K EnglishFrench Sun TM.
  • Randomly extracted 4K sentence test set.
  • Split remaining sentences into three training
    sets roughly 50K (1.1M words), 100K and 203K
    (4.8M words) sentence-pairs to test impact of
    training set size.
  • Translation performed at each stage from
    EnglishFrench and FrenchEnglish.
  • Resulting translations evaluated using a range of
    automatic metrics.

WBSMT vs. EBMT EnglishFrench
  • All metrics bar one suggest that EBMT can
    outperform WBSMT from FrenchEnglish
  • Only exception is for TS1, where WBSMT
    outperforms EBMT in terms of precision (.674
    compared to .653)

WBSMT vs. EBMT EnglishFrench
  • In general, scores incrementally improve as
    training data increases.
  • But apart from SER, metrics suggest that training
    on just over 100K sentences pairs yields better
    results than training on just over 200K.
  • Why? Maybe due to overfitting or odd data
  • Surprising generally assumed that increasing
    training data in Machine Learning approaches will
    improve the quality of the output translations
    (variance analysisbootstrap-resampling on test
    set Koehn, EMNLP-04 different test sets).
  • Note especially the similarity of the WER scores,
    and the difference in SER values. Much more
    significant improvement for EBMT (20.6) than for
    WBSMT (0.1).

WBSMT vs. EBMT FrenchEnglish
  • All WBSMT scores higher than for FrenchEnglish
  • For EBMT, better translations from FrenchEnglish
    for BLEU, Recall and SER worse for WER (FR-EN
    .508, EN-FR .448) and precision (FR-EN .678,
    EN-FR .736)

WBSMT vs. EBMT FrenchEnglish
  • For TS1, EBMT does not outperform WBSMT from
    FrenchEnglish for any of the five metrics.
  • For TS2, EBMT beats WBSMT in terms of BLEU,
    Recall and SER (66.5 compared to 81.3 for
    WBSMT), while WBSMT gets higher scores for
    Precision and WER (46.2 compared to 55.2).
  • For TS3, WBSMT again beats EBMT in terms of
    Precision (2.5) and WER (4 - both less
    significant differences than for TS1 and TS2),
    but EBMT wins out according to the other three
    metricsnotably, by a huge 29.6 for SER.
  • BLEU WBSMT obtains significantly higher scores
    for FrenchEnglish compared to EnglishFrench 8
    higher for TS1, 6higher for TS2, and 12 higher
    for TS3. Apart from TS1, the EBMT scores for the
    two different language directions are much more
    in line, indicating perhaps that EBMT may be more
    consistent even for the same language pair in
    different directions.

Summary of Results
  • Both EBMT WBSMT achieve better translation
    qualityfrom FrenchEnglish compared to
    EnglishFrench. Of the five automatic evaluation
    metrics for each of the three training sets, in
    9/15 cases WBSMT wins out over our EBMT system.
  • For EnglishFrench, in 14/15 cases EBMT beats
  • Summing these results together, EBMT outperforms
    WBSMT in 20 tests, while WBSMT does better in 10
  • Assuming all of these tests to be of equal
    importance,EBMT appears to outperform WBSMT by a
    factor of two to one.
  • While the results are a little mixed, it is clear
    that EBMT tends to outperform WBSMT on this
    sublanguage and on these training sets.

Experiment 2 Phrase-Based SMT vs. EBMT
  • Same EBMT system as for WBSMT experiment
  • To develop language and translation models for
    the SMT system, we used
  • Giza to extract word-alignments
  • Refine these to extract Giza phrase-alignments
  • Construct Probability Tables
  • Pass these to CMU-SRI statistical toolkit
    Pharaoh Decoder to derive translations.
  • Same Translation Pairs, Training Sets, Test Sets
  • Resulting translations evaluated using a range of
    automatic metrics

PBSMT vs. EBMT EnglishFrench
  • PBSMT with Giza sub-sentential alignments wins
    out over PBSMT with EBMT data, but cf. size of
    data sets
  • EBMT 403,317
  • PBSMT 1.73M
  • PBSMT beats WBSMT, notably for BLEU but 5 worse
    for WER. SER still (disappointingly) high
  • EBMT beats PBSMT, esp. for BLEU, Recall, WER

PBSMT vs. EBMT FrenchEnglish
  • PBSMT with Giza sub-sentential alignments wins
    out over PBSMT with EBMT data (with same caveat)
  • PBSMT with both knowledge sources better for FE
    than for EF
  • PBSMT doesnt beat WBSMT - ??
  • EBMT beats PBSMT

Experiment 3a Seeding Pharaoh with Giza Words
and EBMT Phrases EnglishFrench
  • Hybrid PBSMT system beats baseline PBSMT for
  • BLEU, PR, and SER slightly worse WER
  • Data Size 430K (cf. PBSMT 1.73M, EBMT 403K)
  • Still worse than EBMT scores

Experiment 3b Seeding Pharaoh with Giza Words
and EBMT Phrases FrenchEnglish
  • Hybrid PBSMT system beats baseline PBSMT for
  • BLEU slightly worse for PR, and SER quite a
    bit worse for WER
  • Still shy of the results for EBMT.

Experiment 4a Seeding Pharaoh with All Data,
  • Hybrid System beats semi-hybrid system on all
  • Loses out to EBMT system, except for Precision.
  • Data Set now gt2M items.

Experiment 4b Seeding Pharaoh with All Data,
  • Hybrid System beats semi-hybrid system on all
  • Hybrid System beats EBMT on BLEU Precision
    EBMT ahead for Recall WER still well ahead for

Summary of Results WBSMT vs. EBMT
  • None of these are bad systems for TS3, worst
    BLEU score is for WBSMT, E?F, .322
  • WBSMT loses out to EBMT 21 (but better overall
    for F?E)
  • For TS3, WBSMT BLEU score of .446 and EBMT score
    of .461 are high scores
  • For WBSMT vs. EBMT experiments, odd finding
    higher scores for 100K training set investigate
    in future work.

Summary of Results PBSMT vs. EBMT
  • PBSMT scores better than for WBSMT, but odd
    result for F?E ?!
  • Best PBSMT BLEU scores (with Giza data only)
    .375 (E?F), .420 (F?E)
  • Seeding PBSMT with EBMT data gets good scores
    for BLEU, .364 (E?F), .395 (F?E) note
    differences in data size (1.73M vs. 403K)
  • PBSMT loses out to EBMT
  • PBSMT SER still very high (8387).

Summary of Results Semi-Hybrid Systems
  • Seeding Pharaoh with SMT words and EBMT phrases
    improves over baseline Giza seeded system
  • Data size diminishes considerably (430K vs.
  • Still worse result for semi-hybrid system for
    F?E than for WBSMT ?!
  • Still worse results than for EBMT.

Summary of Results Fully Hybrid Systems
  • Better results than for semi-hybrid systems
    E?F .426 (.396), F?E .489 (.427)
  • Data size increases
  • For F?E, Hybrid system beats EBMT on BLEU (.461)
    Precision EBMT ahead for Recall WER still
    well ahead (27) for SER.

Concluding Remarks
  • Despite the convergence between EBMT and SMT,
    further gains to be made
  • Merging Giza and EBMT-induced data leads to an
    improved Hybrid Example-Based SMT system
  • ?Lesson for SMT community dont disregard the
    large body of work on EBMT!
  • We expect in further work that adding SMT
    sub-sentential data to our EBMT system will also
    lead to improvements
  • ?Lesson for EBMT-ers SMT data can help you too!

Future Work
  • Carry out significance tests on these results.
  • Investigate whats going on in 2nd 100K training
  • Develop Statistical EBMT System as described
  • Other issues in hybridity
  • Use target LM in EBMT
  • Replace EBMT recombination process with SMT
  • Try different decoders, LMs and TMs
  • Factor in Marker Tags into SMT Probability
  • Experiment with other training data in other
    sublanguage domains, especially those where
    larger corpora are available (e.g. Canadian
    Hansards, European Parliament )
  • Try other language pairs.
Write a Comment
User Comments (0)