Syntax for MT - PowerPoint PPT Presentation

1 / 44
About This Presentation
Title:

Syntax for MT

Description:

Title: Syntax for MT Author: Angust Faradin Last modified by: Angust Faradin Created Date: 1/31/2006 9:58:18 AM Document presentation format: On-screen Show – PowerPoint PPT presentation

Number of Views:120
Avg rating:3.0/5.0
Slides: 45
Provided by: AngustF6
Category:
Tags: phrasal | syntax | verb

less

Transcript and Presenter's Notes

Title: Syntax for MT


1
Syntax for MT
  • EECS 767
  • Feb. 1, 2006

2
Outline
  • Motivation
  • Syntax-based translation model
  • Formalization
  • Training
  • Using syntax in MT
  • Using multiple features
  • Syntax-based features

3
The IBM Models
  • Word reordering
  • Single words, not groups
  • Conditioned on position of words
  • Null-word insertion
  • Uniform across position

4
The Alignment Template Model
  • Word Reordering
  • Phrases can be reordered in any way, but tend to
    stay in same order as source.
  • Reordering within phrases defined by templates
  • Word Translations
  • Must match up No null

5
Implied Assumptions
  • Word Order
  • Similar to source sentence
  • Translation
  • Near 1-1 correspondence

6
What goes wrong?
  • We see many errors in machine translation when we
    only look at the word level
  • Missing content words
  • MT Condemns US interference in its internal
    affairs.
  • Human Ukraine condemns US interference in its
    internal affairs.
  • Verb phrase
  • MT Indonesia that oppose the presence of foreign
    troops.
  • Human Indonesia reiterated its opposition to
    foreign military presence.

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
7
What goes wrong cont.
  • Wrong dependencies
  • MT , particularly those who cheat the audience
    the players.
  • Human , particularly those players who cheat
    the audience.
  • Missing articles
  • MT , he is fully able to activate team.
  • Human , he is fully able to activate the team.

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
8
What goes wrong cont.
  • Word salad
  • the world arena on top of the u . s . sampla
    competitors , and since mid july has not
    appeared in sports field , the wounds heal go
    back to the situation is very good , less than a
    half hours in the same score to eliminate 62 in
    light of the south african athletes to the second
    round .

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
9
How can we improve?
  • Relying on language model to produce more
    accurate sentences is not enough
  • Many of the problems can be considered
    syntactic
  • Perhaps MT-systems dont know enough about what
    is important to people
  • So, include syntax into MT
  • Build a model around syntax
  • Include syntax-based features in a model

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
10
A New Translation Story
  • You have a sentence and its parse tree
  • The children at each node in the tree are
    rearranged
  • New nodes may be inserted before or after a child
    node
  • These new nodes are assigned a translation
  • Each of the leaf lexical nodes is then translated

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
11
A Syntax-based model
  • Assume word order is based on a reordering of
    source syntax tree.
  • Assume null-generated words happen at syntactical
    boundaries.
  • (For now) Assume a word translates into a single
    word.

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
12
Reorder
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
13
Insert
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
14
Translate
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
15
Parameters
  • Reorder (R) child node reordering
  • Can take any possible child node reordering
  • Defines word order in translation sentence
  • Conditioned on original child node order
  • Only applies to non-leaf nodes

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
16
Parameters cont.
  • Insertion (N) placement and translation
  • Left, right, or none
  • Defines word to be inserted
  • Place conditioned on current and parent labels
  • Word choice is unconditioned

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
17
Parameters cont.
  • Translation (T) 1 to 1
  • Conditioned only on source word
  • Can take on null
  • Translation (T) N to N
  • Consider word fertility (for 1-to-N mapping)
  • Consider phrase translation at each node
  • Limit size of possible phrases
  • Mix phrasal w/ word-to-word translation

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
18
Formalization
Set of nodes in parse tree
Total probability
Assume node independence
Assume random variables are Independent of one
another and only dependent on certain features
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
19
Training (EM)
  • Initialize all probability tables (uniform)
  • Reset all counters
  • For each pair in the training corpus
  • Try all possible mappings of N,R, and T
  • Update the counts as seen in the mappings
  • Normalize the probability tables with the new
    counts
  • Repeat 2-4 several times

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
20
Decoding
  • Modify original CFG with new reordering and their
    probabilities
  • Add in VP-gtVP X and X -gt word rules from N
  • Add lexical rules englishWord-gtforeignWord
  • Use the noisy-channel approach starting with a
    translated sentence
  • Proceed through the parse tree using a bottom-up
    beam search keeping an N-best list of good
    partial translations for each subtree

YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
21
Decoding cont.
YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
22
Performance (Alignment)
Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
23
Performance (Alignment) cont.
  • Counting number of individual alignments
  • Perfect means all alignments in a pair are correct

Yamada A Syntax-Based Statistical Translation
Model Thesis 2002
24
Performance cont.
  • Chinese-English BLEU scores

YamadaKnight A Decoder for Syntax-based
Statistical MT 2002
25
Do we need the entire model to be based on syntax?
  • Good performance increase
  • Large computational cost
  • Many permutations to CFG rules (120K non-lexical)
  • How about trying something else?
  • Add syntax-based features that look for more
    specific things

26
Using Syntax in MT
  • Multiple Features
  • Formalization
  • Baseline
  • Training
  • Syntax-based Features
  • Shallow
  • Deep

27
Multiple Features (log-linear)
Calculate probability using a variety of features
parameterized by an associated weight
Find the translated sentence which maximizes the
feature function with your foreign sentence
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
28
Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
29
Baseline System
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
30
Baseline Features
  • Alignment template feature
  • Uses simple counts

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
31
Baseline Features
  • Word selection feature
  • Uses lexicon probability estimated by relative
    frequency

Additional feature capturing word reordering
within phrasal alignments
JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
32
Baseline Features
  • Phrase alignment feature
  • Measure of deviation from monotone alignment

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
33
Baseline Features
  • Language model feature
  • Standard backing-off trigram probability
  • Word/Phrase penalty feature
  • Feature counting number of words in translated
    sentence
  • Feature counting number of phrases in translated
    sentence
  • Alignment lexicon feature
  • Feature counting the number of time something
    from a given alignment lexicon is used

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
34
A possible training method
  • Line optimization method

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
35
Use reranking of N-best lists
  • Feature functions do not need to be integrated in
    dynamic programming search
  • A feature function can arbitrarily condition
    itself on any part of English/Chinese
    sentece/parse tree/chunks
  • Provides a simple software architecture
  • Using a fixed set of translations allows feature
    functions to be a vector of numbers
  • You are limited to improvements you see within
    the N-best lists

WS 2003 Syntax for Statistical Machine
Translation Final Presentation
36
Syntax-based Features
  • Shallow
  • POS and Chunk Tag counts
  • Projected POS language model
  • Deep
  • Tree-to-string
  • Tree-to-tree
  • Verb arguments

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
37
Shallow Syntax-Based Features
  • POS and chunk tag count
  • Low-level syntactic problems with baseline
    system. Too many articles, commas and singular
    nouns. Too few pronouns, past tense verbs, and
    plural nouns.
  • Reranker can learn balanced distributions of tags
    from various features
  • Examples
  • Number of NPs in English
  • Difference in number of NPs between English and
    Chinese
  • Number of Chinese N tags translated to only non-N
    tags in English.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
38
Shallow Syntax-Based Features
  • Projected POS language model
  • Use word-level alignments to project Chinese POS
    tags onto the English words
  • Possibly keeping relative position within Chinese
    phrase
  • Possibly keeping NULLs in POS sequence
  • Possibly using lexicalized NULLs from English
    word
  • Use the POS tags to train a language model based
    on POS N-grams

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
39
Deep Syntax-Based Features
  • Tree to string
  • Uses the Syntax-based model we saw previously
  • Reduces computational cost by limiting size of
    reorderings
  • Add in a feature for probability as defined by
    the model and the probability of the viterbi
    alignment defined by the model

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
40
Deep Syntax-Based Features
  • Tree to Tree
  • Uses tree transformation functions similar to
    those in the tree-to-string model
  • The probability of transforming a source tree
    into a target tree is modeled as a sequence of
    steps starting from the root of the target tree
    down.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
41
Tree to Tree cont.
  • At each level of the tree
  • At most one of the current nodes children is
    grouped with the current node into a single
    elementary tree with its probability conditioned
    on the current node and its children.
  • An alignment of the children of the current
    elementary tree is chosen with its probability
    conditioned on the current node an the children
    of child in the elementary tree. This is similar
    to the reorder operation in the tree-to-string
    model, but allows for node addition and removal.
  • Leaf-level parameters are ignored when
    calculating probability of tree-to-tree.

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
42
Verb Arguments
  • Idea A feature that counts the difference in the
    number of arguments to the main verb between the
    Chinese and English sentences
  • Perform a breadth-first search traversal of the
    dependency trees
  • Mark the first verb encountered as the main verb
  • The number of arguments is equal to the number of
    its children

JHU WS 2003 Syntax for Statistical Machine
Translation Final Report
43
Performance
  • Some things helped, some things didnt
  • Is syntax useful? Necessary?

44
References
  • K. Yamada and K. Knight. 2001. A syntax-based
    statistical translation model. In ACL-01.
  • K. Yamada. 2002. A Syntax-Based Statistical
    Translation Model. Ph.D. thesis, University of
    Southern California.
  • Yamada, Kenji and Kevin Knight. 2002. A decoder
    for syntaxbased MT. In Proc. of the 40th Annual
    Meeting of the Association for Computational
    Linguistics (ACL), Philadelphia, PA.
  • Franz Josef Och, Daniel Gildea, Sanjeev
    Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
    Fraser, Shankar Kumar, Libin Shen, David Smith,
    Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
    Radev. A smorgasbord of features for statistical
    machine translation. In Proceedings of the Human
    Language Technology Conference.North American
    chapter of the Association for Computational
    Linguistics Annual Meeting, pages 161-168, 2004.
    MIT Press.
  • Franz Josef Och, Daniel Gildea, Sanjeev
    Khudanpur, Anoop Sarkar, Kenji Yamada, Alex
    Fraser, Shankar Kumar, Libin Shen, David Smith,
    Katherine Eng, Viren Jain, Zhen Jin, and Dragomir
    Radev. Final Report of the Johns Hopkins 2003
    summer workshop on Syntax for Statistical Machine
    Translation.
  • Philipp Koehn, Franz Josef Och, and Daniel Marcu.
    Statistical phrase-based translation. In
    Proceedings of the Human Language Technology
    Conference/North American Chapter of the
    Association for Computational Linguistics Annual
    Meeting, 2003. MIT Press.
Write a Comment
User Comments (0)
About PowerShow.com