Title: A Hybrid Machine Translation System from Turkish to English
1A Hybrid Machine Translation System from Turkish
to English
- Ferhan Türe
- MSc Thesis, Sabanci University
- Advisor Kemal Oflazer
2Introduction
- Goal Create a machine translation system that
translates Turkish text into English text - Turkish has an agglutinative morphology
- evimdekine
- to the one at my home
- Turkish has free word order
- Ben eve gittim, Eve gittim ben, Gittim ben eve,
... - I went to the house
- Idea
- Write rules to translate analyzed Turkish
sentence into English
3Outline
- Machine Translation (MT)
- Motivation
- Challenges in MT
- History of MT
- Classical Approaches to MT
- The Hybrid Approach
- Challenges
- Translation Steps
- Analysis and Preprocessing
- Transfer and Generation
- Decoding
- Evaluation
- Methods
- Experimental Results
- Examples
- Conclusions
4Machine Translation
- Translation
- Given Input text s in source language S
- Find A well-formed text in target language T
that is equivalent to s - Machine Translation (MT)
- Any system using an electronic computer to
perform translation
5Motivation
- Satisfy increasing demand for translation
- 100 languages with 5 million or more native
speakers - Reduce the cost and effort of human translation
- 13 of EU budget
- weeks vs. minutes
- Make information available to more people in less
time - translation of web sites automatically
- Exploring limits to computers ability and
linguistic challenges
6Challenges in MT
- Morphological issues
- Each language has a different morphology
- Syntactical issues
- Word order in sentences and noun phrases
- Language-specific features (narrative past tense
in Turkish, distinguishing feminine and masculine
nouns) - Semantical issues
- Word sense ambiguities
- bank ? geographical term OR financial
institution? - Idiomatic phrases
- kafa çekmek ? pull head OR drink alcohol?
7History of MT
- Idea by Warren Weaver in 1945
- 1950s Russian-English MT research during cold
war between US and USSR - 1960s Funding for research stopped due to
failure - Mid-1970s
- METÉO English-French MT in Canada
- Systran and Eurotra Multi-lingual MT in Europe
- TITRAN and MU Project in Kyoto University, Japan
- After 90s
- Statistical MT Use statistics and large amount
of data
8MT between English and Turkish
- Morphological analyzer
- Oflazer, 1993.
- Morphological disambiguator
- Oflazer Kuruöz, 1994.
- Hakkani-Tür et al., 2000.
- Yuret Türe, 2006.
- English-to-Turkish MT
- Sagay, 1981.
- Hakkani et al., 1998.
- Keyder Turhan, 1997.
- No Turkish-to-English system
9Classical Approaches to MT
10Vauquois Triangle
Interlingua
Semantic level
Transfer
Analysis
Generation
Syntactic level
Lexical level
11Word-by-word Translation
Source sentence
Bilingual Dictionary
Target sentence
Source sentence Ali evdeki kediyi çok
sevmez Translation Ali home cat
very like Reference Ali does not like
the cat at home very much
12Direct Translation
Source sentence
Morphological Analyzer
Lexical Transfer
Local Reordering
Target sentence
Source Ali evde -ki
kediyi çok
sevmez Analysis Ali evLoc RelAdj
kediAcc çokAdv
sevNegPresent Lexical Ali homeLoc
atAdj catAcc very muchAdv
likeNegPresent Reorder Ali atAdj
homeLoc catAcc likeNegPresent very
muchAdv Generate Ali at home
cat not like
very much
13Transfer-based Translation
SL Grammar
TL Grammar
Transfer rules / Dictionary
Source sentence
Target sentence
SL Representation
TL Representation
14Transfer-based Translation
SL Grammar
TL Grammar
Transfer rules / Dictionary
Source sentence
Target sentence
SL Representation
TL Representation
NP
NP
mavi evin duvari
the wall of the blue house
NP
PP
NP
NP
N duvari
Det the
NP
Prep of
NP
AP
NP
N wall
N evin
A mavi
Det the
AP
NP
N house
A blue
15Interlingual Translation
Source sentence
Target sentence
Interlingua
Analysis
Generation
Source Ali evdeki kediyi çok
sevmez Interlingua holds(in_general, like(subj
Ali, obj cat(at home),
degree very much)) Translation Ali does not
like the cat at home very much
16Statistical MT
Given a Turkish sentence t, find the English
sentence e that is the most likely translation
of t
17Statistical MT
Turkish-English aligned text
English text
whether an English text e is well-formed English
or not
whether an English text e is a good translation
of a Turkish text t
Translation Model P(te)
Language Model P(e)
Decoding argmax P(e) P(te) e
18Statistical MT
Ali çok açti
Ali was so hungry
19Outline
- Machine Translation (MT)
- Motivation
- Challenges in MT
- History of MT
- Classical Approaches to MT
- The Hybrid Approach
- Challenges
- Translation Steps
- Analysis and Preprocessing
- Transfer and Generation
- Decoding
- Evaluation
- Methods
- Experimental Results
- Examples
- Conclusions
20The Hybrid Approach
21Why Hybrid?
- Classical transfer-based approaches are good at
- representing the structural differences between
the source and target languages. - and statistical methods are good at
- extracting knowledge from large amounts of data,
about how well-formed a sentence or how
meaningful a translation is.
22Challenges
Morphological differences
Avrupalilastiramadiklarimizdanmissiniz
You were among the ones who we were not able to
cause to become European
- Extreme case of a word in an agglutinative
language - Each Turkish morpheme corresponds to one or more
words in English
23Challenges
Morphological differences
arkadasimdakiler
the ones at my friend
24Challenges
Structural differences
dinlemissin ? (someone told me that)
you listened dinledin ? you
listened dinlettin ? you made
(someone) listen dinlettirdin ? you
had (someone) make (someone) listen dinleri
m ? I listen dinlerdim ? I
used to listen dinletebilirmisim ? ???
25Challenges
Structural differences
Adam evde kitap okuyordu ? The man was
reading a book at home SUBJ ADJCT OBJ
V SUBJ V OBJ
ADJCT mavi kitap ? blue book AP NP
AP NP evdeki kitap ? the book at home
AP NP NP AP kitabimin
kapagi ? my books cover NP1 NP2
NP1 NP2 arkadasimin yüzünden ?
because of my friend NP1 NP2
NP2 NP1
26Challenges
Ambiguities
- koyun
- sheep (or bosom)
- your bay
- your dark (one)
- of the bay
- put!
27Challenges
Ambiguities
- silahini evine koy
- put your gun to your home
- put your gun to his home
- put his gun to your home
- put his gun to his home
- put your gun to her home
- put her gun to your home
- put her gun to her home
- .
- .
28Challenges
Ambiguities
- kitabin kapagi
- the books cover
- books cover
- the cover of the book
29Challenges
Ambiguities
evDative (gitti) ? (went) to the
house masaDative (çikti) ? (jumped) on the
table adamDative (bakti) ? (looked) at the man
30Challenges
Morphological differences -----------------------
--------------------------------------------------
-- Structural differences ----------------------
--------------------------------------------------
--- Ambiguities
Use morphological analysis on Turkish side and
generation on English side Transfer rules can
represent such transformations An English
language model can determine the most probable
translation statistically
31The Avenue Transfer System
- Avenue Project initiated by CMU LTI Group
- Grammar formalism, which allows one to manually
create a parallel grammar between two languages - and
- Transfer engine, which transfers the source
sentence into possible target sentence(s) using
this parallel grammar
32Overview of Our Approach
Turkish sentence
Morphological Analyzer
Analysis
Preprocessor
Lattice
Transfer rules
Avenue Transfer Engine
English translations
...
English Language Model
Most probable English translation
33I. Analysis and Preprocessing
Morphological analyses of each word A set of
features, describing the structural properties of
the word adam evde oglunu yendi
34I. Analysis and Preprocessing
Lattice representation of the sentence
yeV
PassVPast
evNLoc
adaNP1Sg
ogulNP2Sg
4
0
1
2
3
6
yenN
ZeroVPast
adamNPNon
ogulNP3Sg
5
yenVPast
35I. Analysis and Preprocessing
Representation of IGs
36II. Transfer and Generation
37II. Transfer and Generation
38II. Transfer and Generation
N
N
N
V
39II. Transfer and Generation
N
N
N
N
N
V
V
N
adam evde oglunu yendi
man won son house
40II. Transfer and Generation
NP
NP
N
N
N
N
N
V
the
V
N
adam evde oglunu yendi
man won son house
41II. Transfer and Generation
SUBJ
SUBJ
NP
NP
N
N
N
N
N
V
the
V
N
adam evde oglunu yendi
man won son house
42II. Transfer and Generation
SUBJ
SUBJ
NP
NP
NP
NP
N
N
N
N
V
N
the
V
N
the
adam evde oglunu yendi
man won son house
43II. Transfer and Generation
SUBJ
Adjct
SUBJ
Adjct
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
the
adam evde oglunu yendi
man won son house
44II. Transfer and Generation
SUBJ
Adjct
SUBJ
Adjct
NP
NP
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
45II. Transfer and Generation
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
46II. Transfer and Generation
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
47II. Transfer and Generation
Vfin
Vfin
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
the
V
N
his
the
adam evde oglunu yendi
man won son house
48II. Transfer and Generation
S
S
Vfin
Vfin
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
49II. Transfer and Generation
S
S
Vfin
Vfin
Adjct
OBJ
SUBJ
SUBJ
Adjct
OBJ
50II. Transfer and Generation
Adjunct
Adjunct
NP
NP
at
Adjunct,3 AdjunctAdjunct NP -gt "at"
NP ( (x1y2) (x0 x1) ((x1 CASE) c
Loc) ((x1 poss) c yes) (y0 x0) )
51II. Transfer and Generation
Vfin
Vfin
Vc
Vc
yendi -gt won Vc,2 VcVc V -gt
V ( (x1y1) Analysis (x0
x1) Constraints ((x1 lex) c (or yen"
...) ((x0 casev) lt Acc) ((x0 trans) lt yes)
Transfer ((y1 TENSE) (x1 TENSE)) ((y1
AGR-PERSON) (x1 AGR-PERSON)) ((y1 AGR-NUMBER)
(x1 AGR-NUMBER)) ((y1 POLARITY) (x1
POLARITY)) Generation (y0 y1) )
52III. Decoding
Transfer engine outputs n translations T1, ...,
Tn We use an English language model to calculate
probability of each translation, and pick the one
with highest language model score
53III. Decoding
54III. Decoding
55Outline
- Machine Translation (MT)
- Motivation
- Challenges in MT
- History of MT
- Classical Approaches to MT
- The Hybrid Approach
- Challenges
- Translation Steps
- Analysis and Preprocessing
- Transfer and Generation
- Decoding
- Evaluation
- Methods
- Experimental Results
- Examples
- Conclusions
56Evaluation
57MT Evaluation
- Manual evaluation
- SSER (subjective sentence error rate)
- Correct/Incorrect
- Manual evaluations require human effort and time
- Automatic evaluation
- WER (word error rate)
- BLEU (Bilingual Evaluation Understudy)
- METEOR
58Automatic Evaluation
- Word Error Rate (WER)
- Number of insertions, deletions, and
substitutions required to transform the reference
translation into the system translation - BLEU
- Number of common n-grams of words between the
system translation S and a set of reference
translations - METEOR
- Similar to BLEU, considers roots and synonyms
59Experimental Results
- System contains over 200 transfer rules, and
20000 lexical rules - It can parse and translate challenging sentences
- Translations are sound, but not complete
- We tested the system on 192 noun phrases, and 70
sentences. - BLEU Score for noun phrases 60.38
- BLEU Score for sentences 33.17
60Examples
- Noun phrase siyahlarla birlikte bir protesto
yürüyüsünde - Translation in a protest walk with the blacks
- Reference in a protest walk with the blacks
- Noun phrase Elif 'in arkasindaki kapida
- Translation at the door at the back of Elif
- Reference on the door behind Elif
- Noun phrase alisveris dünyasinda
- Translation in the shopping world
- Reference at the shopping world
61Examples
- Sentence Bu tutku zamanla bana aci vermeye
basladi - Translation This passion began to give pain to
me with time - Reference In time this passion began to give me
pain - Sentence Persembe uzun yürüyüsler ve ziyaretler
yapiyorum - Translation I am doing long walks and visits on
Thursday - Reference On Thursdays I take long walks and
make visits - Sentence Kaçtikça daha büyüdü, bir tutku oldu
- Translation It grew more as escaping, it became
a passion - Reference He grew as he ran away, became an
obsession
62Conclusions Future Work
- A hybrid machine translation system from Turkish
to English - wide linguistic coverage by manually-crafted
transfer rules in Avenue - ambiguities handled by English language model
- computationally inefficient translation
- time-consuming development
- Future work
- further improvement of transfer rules
- learning rules automatically from parallel corpus
63Thank you!