Segmentation for English-to-Arabic Statistical Machine Translation - PowerPoint PPT Presentation

About This Presentation

Title:

Segmentation for English-to-Arabic Statistical Machine Translation

Description:

Title: Building and Optimizing A Broad-Coverage English Arabic Phrase Based Statistical Machine Translation System Author: New1 Last modified by – PowerPoint PPT presentation

Number of Views:68

Avg rating:3.0/5.0

Slides: 10

Provided by: new1

Learn more at: http://www.cs.cmu.edu

Category:

Tags: arabic | english | machine | prefix | segmentation | statistical | suffix | translation

Transcript and Presenter's Notes

Title: Segmentation for English-to-Arabic Statistical Machine Translation

1
Segmentation for English-to-Arabic Statistical
Machine Translation

Ibrahim Badr, Rabih Zbib, James Glass

2
Introduction

Experiment on English-to-Arabic SMT.
Two domains text news ,spoken travel conv.
Explore the effect of Arabic segmentation, on
the translation quality .
Propose various schemes recombining (Not
Trivial!) the segmented Arabic.
Apply (basic) factored translation models

3
Arabic Morphology

Arabic is a morphologically rich language.
Nouns and Adjectives inflect for gender (m,f) ,
number (pl,sg,du) and case (Nom,Acc,Gen) all
comb are possible
???? (a player, M), ????? (a player, F),
?????? (two players, M),
??????? (two players, F), ?????? (players,
M,P,Nom), ?????? (players, M,P,Acc or gen)
In addition to gender and number, verbs inflect
for tense, voice, and
person
????? (play, past, plM3P), ?????? (play ,
present, plM3P), ??????? (played, plM3P)
Addittional Pefixes conjunction ?, determiner
??, preposition ? (with,in) ? (to) (for) ??..
?????????
Additional Sufixes
- possessive pronouns (attach to nouns)??
(their), ??? (your, pl,M), ??? (your, pl,F),
- object and subject pronouns attach to
verbs ?? (me), ??? (them), ? (they)
??????????
Many surface forms sharing the same lemma!

4
Arabic segmentation

Use MADA for morphological decomposition of
Arabic text.
(typical) normalizaion ? ? ?, ??? ? ?
2 proposed segmentation
S1 Split all clitics mentioned in prev
slide except plural
and subject pronoun morphemes.
S2 Same a S1, the split clitics are
glued into one prefix and one suffix
word prefix stem suffix
Example
???????? (and for his kids)
s1 ? ????? ? ?
s2 ?? ????? ?

5
Arabic Recombination

Segmented output needs recombination!
Why is it not a trivial
a) Letter ambiguity we normalized ? ?
?
? ??? ?????
? ?? ? ???
b) Word Ambiguity Some words can be
grammatically recombined in more than
one way
???? ? 1 ???? 2 ?????
Propose two recombination schemes
1. R recombination rules define manualy.
Resolve a pick most frequent stem form
in non-norm data.
Resovle b pick most frequent
grammatical form.
2. T Build a table derived from the
training set (surface, decomposed word)
more than one surface ? choose
randomly.
can help in combining words segmented
incorrectly .

6
Factored Model Data

Factors
-Factors on the English Side surface
formPOS
-Factors on the Arabic Side Surface form
POSclitics
-Build 3-gram LM on surface form, 7-gram
for the POSclitics.
-Generation model Surface POSclitics
? Surface.
Data Newswire spoken dialogue (travel)
- Training Data
Newswire LDC 3M ,1.6M,
600K words. (Avg sent 33 En,25 Ar, 36 SegAr
Spoken dialogue IWSLT (2007),
200k words (Avg sent 9 En, 8 Ar, 10 SegAr)
- LM
Newswire 3M Ar side 30M from
Arabic Giga word
Spoken dialogue 200k words Ar
side.
- Tuning and test sets (1 En ref)
Newwire 2000 tune, 2000 test
(chosen randomly,same source of trainnig)
Spoken dialogue 500 tune, 500
test

7
Setup Recombination

Setup
Use GIZA for alignment (both unseg Ar, seg Ar),
use MAXPHR 15 for segAr!
Decode using MOSES.
SRI LM
- News wire 4 -gram (unseg Ar), 6-gram
(SegAr).
- Spoken 3-gram (unseg Ar), 4-gram
(SegAr).
MERT for tuning, optimize for BLEU.
Define 2 tuning schemes for SegAr
- T1 Use segAr for ref
-T2 Use UnsegAr for ref. Combine
before scoring the n-best list
Recombination Results
-Test on Newswire training and test sets .(Sent
error!)
- T was trained on the Training set.
- Baseline Glue pref and suff.
- TR if word was seen use T, else use R

8
Translation Results News

Results for Newswire (BLEU)
Segmentation helps, but the gain diminishes as
the training data size increases (less sparse
model).
Segmentation S2 is slightly better than S1.
Tuning scheme T2 performs better than T1
Factored models performs the best for the
Largest system (at higher cost!)

9
Translation Results Spoken Dialogue

Results for Spoken dialogue (BLEU)
S2 performs slightly better than S1
T1 is better than T2
Conclusions
- Recombination based on both the training data
and rules performs best.
- Segmentation helps, but the gain diminishes
as the training data size increases .
- Recombining the segmented output during
tuning helps.
- Factored models perform best for the Large
system.
- What next Explore the effect of Syntactic
reordering on En?Ar MT
Syntactic Phrase Reordering for
English-to-Arabic Statistical Machine
Translation, Badr et al., EACL 2009.

Write a Comment

User Comments (0)

About PowerShow.com

Recommended Relevance Latest Highest Rated Most Viewed

Sort by:

Related More from user

CrystalGraphics Presentations

Introducing-PowerShowcom PowerPoint PPT Presentation

Introducing-PowerShowcom - Introducing-PowerShowcom (Without Music)

CrystalGraphics 3D Character Slides for PowerPoint PowerPoint PPT Presentation

CrystalGraphics 3D Character Slides for PowerPoint - CrystalGraphics 3D Character Slides for PowerPoint

Chart and Diagram Slides for PowerPoint PowerPoint PPT Presentation

Chart and Diagram Slides for PowerPoint - Beautifully designed chart and diagram s for PowerPoint with visually stunning graphics and animation effects. Our new CrystalGraphics Chart and Diagram Slides for PowerPoint is a collection of over 1000 impressively designed data-driven chart and editable diagram s guaranteed to impress any audience. They are all artistically enhanced with visually stunning color, shadow and lighting effects. Many of them are also animated. And they’re ready for you to use in your PowerPoint presentations the moment you need them. – PowerPoint PPT presentation

Related Presentations

Automatic Translation of Human Languages PowerPoint PPT Presentation

Automatic Translation of Human Languages - The US International Airport of Guam and its office has received an email from a ... Guam International Airport and its offices are maintaining a high state of alert ... | PowerPoint PPT presentation | free to view

Varying Input Segmentation for Story Boundary Detection in English Arabic and Mandarin Broadcast New PowerPoint PPT Presentation

Varying Input Segmentation for Story Boundary Detection in English Arabic and Mandarin Broadcast New - E.g. CNN 'Headline News' and ABC 'World News Tonight have distinct models ... 30 minutes manually annotated ASR BN from reserved TDT-4 CNN show. ... | PowerPoint PPT presentation | free to view

CrossLanguage Retrieval and Laboratory PowerPoint PPT Presentation

CrossLanguage Retrieval and Laboratory - Free Text CLIR. What to translate? Queries or documents. Where to get translation knowledge? ... Document translation. Rapid support for interactive selection ... | PowerPoint PPT presentation | free to view

An Overview of Statistical Machine Translation PowerPoint PPT Presentation

An Overview of Statistical Machine Translation - Translation Dictionaries From Minimal Resources ' ... Free Translation. Tschernobyl. k nnte. dann. etwas. sp ter. an. die. Reihe. kommen. Then. we ... | PowerPoint PPT presentation | free to view

CrossLanguage Retrieval PowerPoint PPT Presentation

CrossLanguage Retrieval - Free Text CLIR. What to translate? Queries or documents. Where to get translation knowledge? ... Document translation. Rapid support for interactive selection ... | PowerPoint PPT presentation | free to view

Introduction to Machine Translation PowerPoint PPT Presentation

Introduction to Machine Translation - translation tools (HAMT, MAHT) ... Statistical Machine Translation Technology. Spanish/English. Bilingual Text. English Text ... in UN Spanish/English Corpus ... | PowerPoint PPT presentation | free to view

Machine Translation Overview PowerPoint PPT Presentation

Machine Translation Overview - LTI originated as the Center for Machine Translation (CMT) in 1985 ... Translation between English and Spanish ... Translation model probability estimation ... | PowerPoint PPT presentation | free to view

Introduction to Statistical Machine Translation PowerPoint PPT Presentation

Introduction to Statistical Machine Translation - EU spends more than $1 billion on translation costs each year. ... NIST/DARPA: Yearly campaigns for Arabic-English, Chinese-English, newstexts, since 2001 ... | PowerPoint PPT presentation | free to view

Machine Transliteration PowerPoint PPT Presentation

Machine Transliteration - Words written in a language with alphabet A written in a language with alphabet B ... But not Yiddish, because wouldn't have 't' ending ... | PowerPoint PPT presentation | free to view

The CMU Statistical Machine Translation System for IWSLT 2005 PowerPoint PPT Presentation

The CMU Statistical Machine Translation System for IWSLT 2005 - ... IBM1 word alignment: don't sum over words in forbidden' (grey) areas ... Coherence constraints (Fox, 2002) 30,803 phrase pairs in total. Word-level F-measure ... | PowerPoint PPT presentation | free to view

Beyond just English Cross-Language IR J. Savoy University of Neuchatel iiun.unine.ch PowerPoint PPT Presentation

Beyond just English Cross-Language IR J. Savoy University of Neuchatel iiun.unine.ch - Okapi. 0.3433. 0.3246. 0.3042. 0.2774. PB2. uni bigram. word ... Okapi. 0.3659. 0.3729. 0.2378. PB2. decompound (HAM) bigram. unigram. Korean (T) NTCIR-5 ... | PowerPoint PPT presentation | free to view

Syntax-based Statistical Machine Translation Models PowerPoint PPT Presentation

Syntax-based Statistical Machine Translation Models - 'One naturally wonders if the problem of translation could ... TAG STAG. etc. Monolingual parsers are extended for bitext parsing. Synchronous Grammar: SCFG ... | PowerPoint PPT presentation | free to view

Stream Decoding for Simultaneous Translation PowerPoint PPT Presentation

Stream Decoding for Simultaneous Translation - unlimited amount new named entities each day ... United Nations Vereinte Nationen. transcribe (between letter-based scripts)? russ. ... | PowerPoint PPT presentation | free to view

New Paradigms for Machine Translation PowerPoint PPT Presentation

New Paradigms for Machine Translation - Auto industry analysts have taken notice of changes in industry conditions based ... Donations saw a dramatic drop in the first quarter but stabilized as the economy ... | PowerPoint PPT presentation | free to view

The Web as a Parallel Corpus PowerPoint PPT Presentation

The Web as a Parallel Corpus - The Rosetta Stone dates back from around 190 BC. The three texts on the RS are ... Motivation:Bitexts provide indispensable training data for statistical ... | PowerPoint PPT presentation | free to view

Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System PowerPoint PPT Presentation

Rapid Prototyping of a Transfer-based Hebrew-to-English Machine Translation System - 'Decode' new input by searching for the most likely sequence of phrase matches, ... was a two-month, three person effort we were quite happy with the outcome ... | PowerPoint PPT presentation | free to view

Decoding Algorithms for Statistical Machine Translation PowerPoint PPT Presentation

Decoding Algorithms for Statistical Machine Translation - The order to choose the next source word to translate is like to choose the next ... For each source word and phrase, there are |t| translation alternatives. ... | PowerPoint PPT presentation | free to view

Machine Translation Overview PowerPoint PPT Presentation

Machine Translation Overview - MT started in 1940's, one of the first conceived application of computers ... viol en su perjuicio los derechos a las garant as judiciales ... en su contra. ... | PowerPoint PPT presentation | free to view

Charisma in English and Arabic Political Speech PowerPoint PPT Presentation

Charisma in English and Arabic Political Speech - Charisma in English and Arabic Political Speech | PowerPoint PPT presentation | free to view

Statistical Machine Translation Quo Vadis PowerPoint PPT Presentation

Statistical Machine Translation Quo Vadis - But: Hypotheses are recombined - many good translations don't reach the sentence end ... Discriminative training, to retrain given models: e.g. perceptron learning; ... | PowerPoint PPT presentation | free to view

Statistical XFER: Hybrid Statistical Rule-based Machine Translation PowerPoint PPT Presentation

Statistical XFER: Hybrid Statistical Rule-based Machine Translation - SL: the old man, TL: ha-ish ha-zaqen. NP::NP [DET ADJ N] - [DET N DET ADJ] (X1::Y1) ... Automatic extraction of 'clean' base NPs from parallel data ... | PowerPoint PPT presentation | free to view

Topics Detection and Tracking PowerPoint PPT Presentation

Topics Detection and Tracking - Margaret Connel, Ao Feng, Giridhar Kumaran, Hema Raghavan, ... LAT (LA Times/Washington Post) English. 1117. CNE (CNN) English. 104,941. APE (Associated Press) ... | PowerPoint PPT presentation | free to view

Rapid Prototyping of a Transferbased HebrewtoEnglish Machine Translation System PowerPoint PPT Presentation

Rapid Prototyping of a Transferbased HebrewtoEnglish Machine Translation System - (LEX B$WRH) (LEX B) (LEX $WR) (POS N) (POS PREP)) (POS N) (GEN F) (GEN M) (NUM S) (NUM S) ... for missing closed-class entries (pronouns, prepositions, etc. ... | PowerPoint PPT presentation | free to view

CSCI 5582 Artificial Intelligence PowerPoint PPT Presentation

CSCI 5582 Artificial Intelligence - CSCI 5582 Artificial Intelligence Lecture 23 Jim Martin Today 11/30 Natural Language Processing Overview 2 sub-problems Machine Translation Question Answering ... | PowerPoint PPT presentation | free to view

What PowerPoint PPT Presentation

What - What s New in Statistical Machine Translation Kevin Knight USC/Information Sciences Institute USC/Computer Science Department Recent Progress in Statistical MT ... | PowerPoint PPT presentation | free to view

Cross-Language Retrieval PowerPoint PPT Presentation

Cross-Language Retrieval - Title: Translingual Topic Tracking with PRISE Author: Gina Levow Last modified by: jj Created Date: 2/24/2000 9:16:42 PM Document presentation format | PowerPoint PPT presentation | free to view

Natural Language Processing (NLP) PowerPoint PPT Presentation

Natural Language Processing (NLP) - Natural Language Processing (NLP) ... Semantics: How can we infer ... Segmenting Chinese, tokenizing English, de-compoundizing German, | PowerPoint PPT presentation | free to view