Decoding and Reordering - PowerPoint PPT Presentation

1 / 65
About This Presentation
Title:

Decoding and Reordering

Description:

compose. merge. Binarizing Syntax Trees for Syntax-Based MT. Substructures of the ... Using the EM algorithm to choose restructuring ... – PowerPoint PPT presentation

Number of Views:22
Avg rating:3.0/5.0
Slides: 66
Provided by: mtgrou
Category:

less

Transcript and Presenter's Notes

Title: Decoding and Reordering


1
Decoding and Reordering
  • Jiang Wenbin
  • 2007-10-30

2
Outline
  • A Probabilistic Approach to Syntax-based
    Reordering for Statistical Machine Translation
  • Binarizing Syntax Trees to Improve Syntax-Based
    Machine Translation Accuracy
  • Forest Rescoring Faster Decoding with
    Integrated Language Models
  • An Efficient Two-Pass Approach to Synchronous-CFG
    Driven Statistical MT

3
Syntax-based Reordering for Phrase-based Decoding
  • There are global reordering and local reordering
    in phrase-based decoding according to the
    distortion limit (resort to the distance-based
    reordering model (Koehn et al., 2003) for
    details)

4
Syntax-based Reordering for Phrase-based Decoding
  • Syntax , a potential solution to global
    reordering, gives decoder a reordered input
  • If it works, how about n-best list of reordered
    inputs?

5
Translation Models
  • one reordered input
  • nbest reordered input

S
6
Translation Models
  • one reordered input
  • nbest reordered input

S1
S2
Generate nbest reordered inputs
S
Sn
7
Translation Models
  • one reordered input
  • nbest reordered input

S1
T1
Phrase-based decoding
S2
T2
S
Sn
Tn
8
Translation Models
  • one reordered input
  • nbest reordered input

S1
T1
S2
T2
T
Select the best T
S
Sn
Tn
9
Select the best translation
  • Definition of the best T

P(CE) P(EC) P(LM) Just like those in the
phrase-based SMT
The probability of reordering S as S
10
Select the best translation
  • Transformation

11
Acquisition of Reordering Knowledge
  • Given a node N on the parse tree of an source
    language sentence, reordering knowledge can be
    extract from the relative order of its children
    phrase pi and corresponding target language
    phrase T(pi)
  • Just consider the case of binary node for
    simplicity

12
Acquisition of Reordering Knowledge
13
Two kinds of representations
  • Reordering rules
  • Z is the phrase label of a binary node
  • X and Y are the phrase labels of Zs children
  • Pr(IN-ORDER) and Pr(INVERTED) are the
    probabilities that X and Y are inverted or not in
    the target language.
  • Estimate the probability by Maximum Likelihood
    Estimation

14
Two kinds of representations
  • Maximum Entropy Model
  • (binary classification problem)
  • Features mfay be used
  • Leftmost word
  • Rightmost word
  • Head word
  • Context word
  • POSs
  • All features above can be extracted from source
    phrases as well as target phrases

15
The Application of Reordering Knowledge
  • Let Pr(p?p) be the probability for a particular
    reordering, it denote the probability of
    reordering a phrase p into p

unitary node
binary node
16
The Application of Reordering Knowledge
  • The number of S increases exponentially. Let
    R(N) be the number of reorderings of the phrase
    yielded by N
  • Traverse the source language tree bottom to up,
    at each node we just keep n ps that have the
    highest reordering probability

17
Remedy data sparseness
  • If T(p1) and T(p2) overlap, the node N with
    children N1 and N2 is not taken as a training
    instance
  • The amount of training input is greatly reduced
  • Remove some less probable alignment points to
    minimize overlapping phrases

18
Decoding
  • As the greedy reordering algorithm above has a
    tendency to focus on a particular clause of a
    long sentence, for a lot of long sentences
    containing several clauses, only one of the
    clauses is reordered.

19
Decoding
S
20
Decoding
C1
S
Ci
Cn
Clauses spliting
21
Decoding
S1
Cj1
Cj2
C1
Sj
S
Ci
Cjn
Cn
Sm
Clauses spliting
Clauses reordering
22
Decoding
S1
Cj21
Cj1
Cj22
Cj2
C1
Sj
S
Ci
Cj2n
Cjn
Cn
Sm
Clauses spliting
Clauses reordering
In-Clause reordering
23
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
Cj2
C1
Sj
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
compose
24
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Select the best
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
25
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
T(Sj)
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
merge
Select the best
Clauses spliting
Clauses reordering
In-Clause reordering
26
Decoding
S1
Cj21
T(Cj21)
Cj1
Cj22
T(Cj22)
T(Cj2)
Cj2
C1
Sj
T(Sj)
T(S)
S
Ci
Cj2n
T(Cj2n)
Cjn
Cn
Sm
translation
Clauses spliting
Clauses reordering
In-Clause reordering
merge
compose
Select the best
27
Binarizing Syntax Trees for Syntax-Based MT
  • Substructures of the tree cannot be reused
  • A solution is to binarizingsyntax trees
  • Simple methods such asleft-, right-,
    head-binarization and theircombinations

28
Left/Right binarizaton
29
Left/Right binarizaton
30
Definition Left Binarization
  • The left binarization of node n factorizes the
    leftmost r-1 children by forming a new node n to
    dominate them, leaving the last child nr
    untouched, and then makes the new node n the left
    child of n

31
Definition Left Binarization
  • The left binarization of node n factorizes the
    leftmost r-1 children by forming a new node n to
    dominate them, leaving the last child nr
    untouched, and then makes the new node n the left
    child of n

32
Definition Right Binarization
  • The right binarization of node n factorizes the
    rightmost r-1 children by forming a new node n to
    dominate them, leaving the first child n1
    untouched, and then makes the new node n the
    right child of n

33
Definition Right Binarization
  • The right binarization of node n factorizes the
    rightmost r-1 children by forming a new node n to
    dominate them, leaving the first child n1
    untouched, and then makes the new node n the
    right child of n

34
Definition Head Binarization
  • Left binarizes n if the head is the first child,
    otherwise right binarizes it. We prefer
    right-binarization if both applicable.
  • Keep the head be in the push-down part.

35
Parallel binatization
  • Transform a parse tree into a packed binarization
    forest
  • Packed forest is composed of additive forest
    nodesand multiplicative forest nodes

36
Procedure
  • Given a tree node that has children n1,nr
  • Recursively parallel-binarize childrennode
    n1,nr, producing binarization
  • Right-binarize n if any contiguous subset of
    children n2,nr is factorizable, insert a label
    n, recursively parrel-binarize n to generate a
    binarization forest node , then form a
    multiplicative forest nodeas the parent of
    and
  • Left-binarize is similar to Right-binarize above
    except that the subset is n1,nr-1. Finaly it
    forms a multiplicative forest node
  • Form an additive node as the parent
    of and

37
Example
Similar to OR
Similar to AND
38
Extract translation rule Condition1
39
Extract translation rule
Call Procedure-1
40
Extract translation rule
Call Procedure-1
Call Procedure-2 recursively
41
Extract translation rule
Call Procedure-1
Call Procedure-2 recursively
Call Procedure-2 recursively
42
Extract translation rule Condition2
43
Extract translation rule
Call Procedure-2
44
Extract translation rule
Call Procedure-1 recursively
Call Procedure-2
45
Extract translation rule
Call Procedure-1 recursively
Call Procedure-1 recursively
Call Procedure-2
46
Extract translation rule
  • So we can build a derivation forest, through
    traversing the forest top-down recursively we can
    extract rules at admissible forest nodes

47
Learning how to binarize via EM algorithm
  • Perform a set of binarization operations ß on a
    parse tree t
  • Each binarition ßis the sequence of binarizations
    on the necessary nodes in t in pre-order
  • Each binarization ßresults in a restructured tree
    t ß
  • Extract rules from (t ß , f , a ), generating a
    translation model consisting parameters ?
    (i.e.,rule probability)
  • Obtain the ßthat satisfy

48
Using the EM algorithm to choose restructuring
49
Forest Rescoring Faster Decoding with Integrated
Language Models
  • Efficient decoding for phrase-based MT models and
    syntax-based MT model is a difficult problem
  • If the language model is fully integrated into
    the decoder, there will be an expensive overhead
    for maintaining target-language boundary words in
    decoding

50
Some alternative methods
  • Rescoring Produce a k-best list of candidate
    translations without LM, then rerank the the
    k-best list using the LM
  • Forest rescoring
  • Cube pruning
  • Cube growing

51
Cube pruning some details
  • Avoid duplicate deductions

52
Cube pruning some details
  • Avoid duplicate deductions

The first to be extracted
53
Cube pruning some details
  • Avoid duplicate deductions

The second to be extracted
54
Cube pruning some details
  • Avoid duplicate deductions

The third to be extracted
55
Cube pruning some details
  • When we compute the LM

Here?
Cube1
Here?
Stack
Cube2
Heap
Cuben
56
Cube pruning some details
  • Suppose that we are decoding with hierachical
    phrase based model, the dimension of the cube is
    at most 3, because ench rule has at most 2 vars,
    The rule itself forms a dimension, while the two
    vars forms the other two

Dimension 1X1 ? X2
Dimension 3????
Dimension 2??
57
Cube pruning some details
  • So when we extract the best derivation from the
    top of the heap, we put at most 3 neighbors of it
    into the heap as candidates

Nb1 i1, j ,k Nb1 i, j1 ,k Nb1 i, j
,k1
Dimension 1X1 ? X2
Dimension 3????
Dimension 2??
k
i
j
58
Cube growing
LazyJthBest(n)
59
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
60
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
61
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
62
Cube growing
LazyJthBest(n)
Fire(1,1, cand)
Fire(1,1, cand)
LazyJthBest(1)
63
Cube growing
LazyJthBest(n)
While D(v)ltn AND cand not empty
Pop_Min Fire(Nb1,cand) Fire(Nb2,cand) Fi
re(Nbm,cand) End
64
Two-Pass Approach to SCFG SMT
  • The first pass, corresponding to a severe
    parameterization of Cube Pruning, consider only
    the first best (LM integrated) chart item in each
    cell, while maintaining unexplored alternatives
    for second-pass consideration
  • The second stage, we drive the search process
    with the integration of long distance and
    flexible history n-gram LMs, rather than simply
    using such models for hypothesis rescoring

65
Thanks!
  • Questions?
Write a Comment
User Comments (0)
About PowerShow.com