Reranking for largescale statistical machine translation - PowerPoint PPT Presentation

1 / 18
About This Presentation
Title:

Reranking for largescale statistical machine translation

Description:

Extract translation phrase-pairs (based on word-alignment heuristics) ... For Chinese-English 80 million words corpus: 12 million unique phrase-pairs are extracted. ... – PowerPoint PPT presentation

Number of Views:67
Avg rating:3.0/5.0
Slides: 19
Provided by: kyam1
Category:

less

Transcript and Presenter's Notes

Title: Reranking for largescale statistical machine translation


1
Re-ranking for large-scale statistical machine
translation
  • Kenji Yamada and Ion Muslea
  • Language Weaver
  • kyamada,imuslea_at_languageweaver.com

2
The Task
  • Re-ranking the n-best output from the
    phrase-based statistical machine translation
    (SMT) system.
  • Possible gain from the re-ranker
  • 1-best BLEU 31.6
  • 256-best BLEU 42.8
  • (see JHU03 workshop report)
  • BLEU Geometric average of of 1- to 4-grams
    match with reference.

3
How phrase-based SMT works
  • Use sentence-aligned bi-lingual corpus.
  • Run word-alignment algorithm (such as IBM model)
  • Extract translation phrase-pairs (based on
    word-alignment heuristics)
  • Obtain phrase-level and word-level translation
    probability, and other feature values.
  • Build a log-linear model of
  • above probabilities, language models, and other
    feature values
  • weights are tuned with dev-corpus.
  • Translate (decode) by beam-search in the model.

4
Junk phrase-pairs
  • Translation phrase-pairs are automatically
    extracted from the bi-lingual corpus.
  • For Chinese-English 80 million words corpus
  • 12 million unique phrase-pairs are extracted.
  • It contains a lot of junk.
  • Idea Use each phrase-pair as a feature for the
    n-best re-ranker.

5
Re-ranking by Perceptron
  • Fast algorithm for huge data and of parameters.
  • Previous work
  • PRank Crammer Singer 2003
  • OAP-BPM Harrington 2003
  • Applied to SMT
  • Shen and Joshi 2005
  • Liang, et.al. 2006

6
Our extension
  • Partial pair-wise comparison
  • Oracle (best in the n-best) vs. non-oracle
  • Strength of weight-update is proportional with
    the difference.
  • Ensemble training
  • Split the training data, train separately, and
    average learned weights.

7
Prepare the training set for the re-ranker
  • Generate the n-best translation for each training
    sentence.
  • 4 million sentences 200-best 800 million data
    points
  • Calculate BP1 score for each hypothesis in the
    n-best.
  • BP1 1 floored BLEU score (per sentence)
  • Use the best-BP1 hypothesis as a reference
  • Extract features from each hypothesis
  • Decoder cost
  • Phrase-pair IDs

8
Algorithm
  • Init w0 (1 for decoder cost, 0 for phrase-pairs)
  • For each epoch
  • For each sentence
  • For each non-oracle hypothesis xi in the
    n-best
  • if wt xi lt wt xoracle then
  • // miss-classification
  • wt1 wt (xi xoracle) x a
  • where a BP1(xoracle) BP1(xi)
  • else wt1 wt
  • tt1
  • Output wt or S wi / t

9
Parallel Training
  • Split the training data into X sets
  • Train a perceptron on each split
  • Average the learned weight vectors

10
Interleaving Dev-data
  • Distribution difference between Train and
    Dev/Test corpus.
  • Mix duplicated Dev data into Train

11
Experiment Setup
  • Baseline phrase-based SMT system
  • Chinese-to-English
  • Train 80 million words
  • Dev 993 sentences
  • Test 919 sentences
  • Test-BLEU 31.19
  • Baseline system uses 12 million phrase-pairs
  • Extracted automatically from the Train corpus
  • Use only 4M phrase pairs for the re-ranker
  • (prune if it appears gt100k or 1 times)

12
Result (953-split)
BLEU
Epochs
13
Result (no split)
BLEU
Epochs
14
Conclusion and Future Work
  • Large-scale n-best re-ranking for phrase-based
    SMT
  • Parallel perceptron training
  • Interleave dev-corpus helps
  • Future work
  • Other feature types (e.g. n-grams)
  • Non-uniform weight mixture

15
Thank You!
16
Result (953-split)
BLEU
Epochs
17
Result (no-split)
BLEU
Epochs
18
Using Dev-data only?
  • How much Train/Dev contribute?
  • Baseline 31.19
  • Train only 31.31
  • Dev only 31.41
  • Train Dev 31.72
Write a Comment
User Comments (0)
About PowerShow.com