Search Applications: Machine Translation - PowerPoint PPT Presentation

About This Presentation

Search Applications: Machine Translation


Translation model can be derived from alignment ... Machine Translation ... It's not a simple left-to-right translation ... – PowerPoint PPT presentation

Number of Views:245
Avg rating:3.0/5.0
Slides: 34
Provided by: Kathleen268


Transcript and Presenter's Notes

Title: Search Applications: Machine Translation

Search ApplicationsMachine Translation
  • Next time Constraint Satisfaction
  • Reading for today See Machine Translation
    Paper under links
  • Reading for next time Chapter 5

Homework Questions?

  • Introduction to machine translation
  • Statistical approaches
  • Use of parallel data
  • Alignment
  • What functions must be optimized?
  • Comparison of A and greedy local search (hill
    climbing) algorithms for translation
  • How they work
  • Their performance

Approach to Statistical MT
  • Translate from past experience
  • Observe how words, and phrases, and sentences are
  • Given new sentences in the source language,
    choose the most probable translation in the
    target language
  • Data large corpus of parallel text
  • E.g., Canadian Parliamentary proceedings

  • Example
  • Ce nest pas clair.
  • It is not clear.
  • Quantity
  • 200 billion words (2004 MT evaluation)
  • Sources
  • Hansards Canadian parliamentary proceedings
  • Hong Kong official documents published in
    multiple languages
  • Newspapers published in multiple languages
  • Religious and literary works

Alignment the first step
  • Which sentences or paragraphs in one language
    correspond to which paragraphs or sentences in
    another language? (Or what words?)
  • Problems
  • Translators dont use word for word translations
  • Crossing alignments
  • Types of alignment
  • 11 (90 of the cases)
  • 12, 21
  • 31, 13

With regard to Quant aux According to
the mineral waters and (les) eaux minerales et our survey, 1988
the lemonades-soft drinks aux limonades,
they encounter elles rencontrent sales of
still more toujours plus mineral water
users. Indeed dadeptes. En effet and soft drinks were
our survey notre sondage much higher
makes standout fait ressortir than in 1987,
the sales des ventes reflecting
clearly nettement The growing popularity
superior Superieures Of these products.
to those in 1987 a celles de 1987 Cola drink manufacturers
for cola-based drinks Pour les boissons a base de cola in particular
especially notamment Achieved above
Average growth rates
An example of 22 alignment
  • Fertility a word may be translated by more than
    1 word
  • Notamment -gt in particular (fertility 2)
  • Limonades -gt soft drinks
  • Fertility 0 A word translated by 0 words
  • Des ventes -gt sales
  • Les boissons a base de cola -gt cola drinks
  • Many to many
  • Elles rencontrent toujours plus dadeptes -gt The
    growing popularity

Bead for sentence alignment
  • A group of sentences in one language that
    corresponds in content to some group of sentences
    in the other language
  • Either group can be empty
  • How much content has to overlap between sentences
    to count it as alignment?
  • An overlapping clause can be sufficient

Methods for alignment
  • Length based
  • Offset alignment
  • Word based
  • Anchors (e.g., cognates)

Word Based Alignment
  • Assume first and last sentences of the texts
    align (anchors).
  • Then until most sentences aligned
  • Form an envelope of alignments from the cartesian
    product of the list of sentences
  • Exclude alignments if they cross anchors or too
  • Choose pairs of words that tend to occur in
  • Find pairs of source and target sentences which
    contain many possible lexical correspondences.
  • The most reliable augment the set of anchors

The Noisy Channel Model for MT
Language Model P(e)
Decoder eargmaxeP(ef)
Translation Model P(fe)
Noisy Channel
The problem
  • Language model constructed from a large corpus of
  • Bigram model probability of word pairs
  • Trigram model probability of 3 words in a row
  • From these, compute sentence probability
  • Translation model can be derived from alignment
  • For any pair of English/French words, what is the
    probability that pair is a translation?
  • Decoding is the problem Given an unseen French
    sentence, how do we determine the translation?

Language Model
  • Predict the next word given the previous words
  • P(Wn W1Wn-1)
  • Markov assumption
  • Only the last few words affects the next word
  • Usual cases bigram, trigram, 4gram
  • Sue swallowed the large green .
  • Parameter estimation
  • Bigram 20,000X19,000 400 million
  • Trigram 20,0002X19,000 8 trillion
  • 4gram 20,0003X19,0001.6X1017

Translation Model
  • For a particular word alignment, multiply the m
    translation probabilities
  • P(Jean aime Marie John loves Mary)
  • P(JeanJohn)XP(aimeloves)XP(MarieMary)
  • Then sum the probabilities of all alignments

Decoding is NP complete
  • When considering any word re-ordering
  • Swapped words
  • Words with fertility gt n (insertions)
  • Words with fertility 0 (deletions)
  • Usual strategy examine a subset of likely
    possibilities and choose from that
  • Search error decoder returns e but there exists
    some e s.t. P(ef) gt P (ef)

Example Decoding Errors
  • Search ErrorPermettez que je donne un example a
    la chambre.Let me give the House one
    example.Let me give an example in the House
  • Model Error Vous avez besoin de toute laide
    disponible.You need all the help you can
    get.You need of the whole benefits available.

  • Traditional decoding method stack decoder
  • A algorithm
  • Deeply explore each hypothesis
  • Fast greedy algorithm
  • Much faster than A
  • How often does it fail?
  • Integer Programming Method
  • Transform to Traveling Salesman (see paper)
  • Very slow
  • Guaranteed to find the best choice

Large branching factors
  • Machine Translation
  • Input sequence of n words, each with up to 200
    possible target word translations.
  • Output sequence of m words in the target
    language that has high score under some goodness
  • Search space
  • 6 words French sentence has 10300 distinct
    translation scores under the IBM M4 translation
    model. Soricut, Knight, Marcu, AMTA2002

Stack decoder A
  • Initialize the stack with an empty hypothesis
  • Loop
  • Pop h, the best hypothesis off the stack
  • If h is a complete sentence, output h and
  • For each possible next word w, extend h by adding
    w and push the resulting hypothesis onto the

  • Its not a simple left-to-right translation
  • Because we multiply probabilities as we add
    words, shorter hypotheses will always win
  • Use multiple stacks, one for each length
  • Given fertility possibilities, when we add a new
    target word for an input source word, how many do
    we add?

Hill climbing
  • function HillClimbing(problem, initial-state,
  • node ? MakeNode(initial-state(problem))
  • while T do
  • next ? Best(SearchOperator-fn(node,cost-fn))
  • if(IsBetter-fn(next, node)) then continue
  • else if(GoalTest(node)) then return node
  • else exit
  • end while
  • return Failure

MT (Germann et al., ACL-2001) node ?
targetGloss(sourceSentence) while T do next
? Best( LocallyModifiedTranslationOf(node))
if(IsBetter(next, node)) then continue else
print node exit end while
Types of changes
  • Translate one or two words (j1e1j2e2)
  • Translate and insert (j e1 e2)
  • Remove word of fertility 0 (i)
  • Swap segments (i1 i2 j1 j2)
  • Join words (i1 i2)

  • Total of 77,421 possible translations attempted

(No Transcript)
(No Transcript)
How to search better?
  • MakeNode(initial-state(problem))
  • RemoveFront(Q)
  • SearchOperator-fn(node, cost-fn)
  • queuing-fn(problem, Q, (Next,Cost))

Example 1 Greedy Search MakeNode(initial-state(p
Machine Translation (Marcu and Wong,
EMNLP-2002) node ? targetGloss(sourceSentence) w
hile T do next ? Best( LocallyModifiedTranslat
ionOf(node)) if(IsBetter(next, node)) then
continue else print node exit end while
Climbing the wrong peak
What sentence is more grammatical? 1. better bart
than madonna , i say 2. i say better than bart
madonna ,
Can you make a sentence with these words? a
and apparently as be could dissimilar firing
identical neural really so things thought
Language-model stress-testing
  • Input bag of words
  • Output best sequence according to a linear
    combination of an
  • ngram LM
  • syntax-based LM (Collins, 1997)

Size 3-7 words long
  • Best searched
  • 32.3 i say better than bart madonna ,
  • Original word order
  • 41.6 better bart than madonna, i say

SBLM trained on an additional 160k WSJ
End of Class Questions
Write a Comment
User Comments (0)