A Hybrid Machine Translation System from Turkish to English - PowerPoint PPT Presentation

1 / 63
About This Presentation
Title:

A Hybrid Machine Translation System from Turkish to English

Description:

Goal: Create a machine translation system that translates ... Grammar formalism, which allows one to manually create a parallel grammar between two languages ... – PowerPoint PPT presentation

Number of Views:659
Avg rating:3.0/5.0
Slides: 64
Provided by: ferha
Category:

less

Transcript and Presenter's Notes

Title: A Hybrid Machine Translation System from Turkish to English


1
A Hybrid Machine Translation System from Turkish
to English
  • Ferhan Türe
  • MSc Thesis, Sabanci University
  • Advisor Kemal Oflazer

2
Introduction
  • Goal Create a machine translation system that
    translates Turkish text into English text
  • Turkish has an agglutinative morphology
  • evimdekine
  • to the one at my home
  • Turkish has free word order
  • Ben eve gittim, Eve gittim ben, Gittim ben eve,
    ...
  • I went to the house
  • Idea
  • Write rules to translate analyzed Turkish
    sentence into English

3
Outline
  • Machine Translation (MT)
  • Motivation
  • Challenges in MT
  • History of MT
  • Classical Approaches to MT
  • The Hybrid Approach
  • Challenges
  • Translation Steps
  • Analysis and Preprocessing
  • Transfer and Generation
  • Decoding
  • Evaluation
  • Methods
  • Experimental Results
  • Examples
  • Conclusions

4
Machine Translation
  • Translation
  • Given Input text s in source language S
  • Find A well-formed text in target language T
    that is equivalent to s
  • Machine Translation (MT)
  • Any system using an electronic computer to
    perform translation

5
Motivation
  • Satisfy increasing demand for translation
  • 100 languages with 5 million or more native
    speakers
  • Reduce the cost and effort of human translation
  • 13 of EU budget
  • weeks vs. minutes
  • Make information available to more people in less
    time
  • translation of web sites automatically
  • Exploring limits to computers ability and
    linguistic challenges

6
Challenges in MT
  • Morphological issues
  • Each language has a different morphology
  • Syntactical issues
  • Word order in sentences and noun phrases
  • Language-specific features (narrative past tense
    in Turkish, distinguishing feminine and masculine
    nouns)
  • Semantical issues
  • Word sense ambiguities
  • bank ? geographical term OR financial
    institution?
  • Idiomatic phrases
  • kafa çekmek ? pull head OR drink alcohol?

7
History of MT
  • Idea by Warren Weaver in 1945
  • 1950s Russian-English MT research during cold
    war between US and USSR
  • 1960s Funding for research stopped due to
    failure
  • Mid-1970s
  • METÉO English-French MT in Canada
  • Systran and Eurotra Multi-lingual MT in Europe
  • TITRAN and MU Project in Kyoto University, Japan
  • After 90s
  • Statistical MT Use statistics and large amount
    of data

8
MT between English and Turkish
  • Morphological analyzer
  • Oflazer, 1993.
  • Morphological disambiguator
  • Oflazer Kuruöz, 1994.
  • Hakkani-Tür et al., 2000.
  • Yuret Türe, 2006.
  • English-to-Turkish MT
  • Sagay, 1981.
  • Hakkani et al., 1998.
  • Keyder Turhan, 1997.
  • No Turkish-to-English system

9
Classical Approaches to MT

10
Vauquois Triangle
Interlingua
Semantic level
Transfer
Analysis
Generation
Syntactic level
Lexical level
11
Word-by-word Translation
Source sentence
Bilingual Dictionary
Target sentence
Source sentence Ali evdeki kediyi çok
sevmez Translation Ali home cat
very like Reference Ali does not like
the cat at home very much
12
Direct Translation
Source sentence
Morphological Analyzer
Lexical Transfer
Local Reordering
Target sentence
Source Ali evde -ki
kediyi çok
sevmez Analysis Ali evLoc RelAdj
kediAcc çokAdv
sevNegPresent Lexical Ali homeLoc
atAdj catAcc very muchAdv
likeNegPresent Reorder Ali atAdj
homeLoc catAcc likeNegPresent very
muchAdv Generate Ali at home
cat not like
very much
13
Transfer-based Translation
SL Grammar
TL Grammar
Transfer rules / Dictionary
Source sentence
Target sentence
SL Representation
TL Representation
14
Transfer-based Translation
SL Grammar
TL Grammar
Transfer rules / Dictionary
Source sentence
Target sentence
SL Representation
TL Representation
NP
NP
mavi evin duvari
the wall of the blue house
NP
PP
NP
NP
N duvari
Det the
NP
Prep of
NP
AP
NP
N wall
N evin
A mavi
Det the
AP
NP
N house
A blue
15
Interlingual Translation
Source sentence
Target sentence
Interlingua
Analysis
Generation
Source Ali evdeki kediyi çok
sevmez Interlingua holds(in_general, like(subj
Ali, obj cat(at home),
degree very much)) Translation Ali does not
like the cat at home very much
16
Statistical MT
Given a Turkish sentence t, find the English
sentence e that is the most likely translation
of t
17
Statistical MT
Turkish-English aligned text
English text
whether an English text e is well-formed English
or not
whether an English text e is a good translation
of a Turkish text t
Translation Model P(te)
Language Model P(e)
Decoding argmax P(e) P(te) e
18
Statistical MT
Ali çok açti
Ali was so hungry
19
Outline
  • Machine Translation (MT)
  • Motivation
  • Challenges in MT
  • History of MT
  • Classical Approaches to MT
  • The Hybrid Approach
  • Challenges
  • Translation Steps
  • Analysis and Preprocessing
  • Transfer and Generation
  • Decoding
  • Evaluation
  • Methods
  • Experimental Results
  • Examples
  • Conclusions

20
The Hybrid Approach

21
Why Hybrid?
  • Classical transfer-based approaches are good at
  • representing the structural differences between
    the source and target languages.
  • and statistical methods are good at
  • extracting knowledge from large amounts of data,
    about how well-formed a sentence or how
    meaningful a translation is.

22
Challenges
Morphological differences
Avrupalilastiramadiklarimizdanmissiniz
You were among the ones who we were not able to
cause to become European
  • Extreme case of a word in an agglutinative
    language
  • Each Turkish morpheme corresponds to one or more
    words in English

23
Challenges
Morphological differences
arkadasimdakiler
the ones at my friend
24
Challenges
Structural differences
dinlemissin ? (someone told me that)
you listened dinledin ? you
listened dinlettin ? you made
(someone) listen dinlettirdin ? you
had (someone) make (someone) listen dinleri
m ? I listen dinlerdim ? I
used to listen dinletebilirmisim ? ???
25
Challenges
Structural differences
Adam evde kitap okuyordu ? The man was
reading a book at home SUBJ ADJCT OBJ
V SUBJ V OBJ
ADJCT mavi kitap ? blue book AP NP
AP NP evdeki kitap ? the book at home
AP NP NP AP kitabimin
kapagi ? my books cover NP1 NP2
NP1 NP2 arkadasimin yüzünden ?
because of my friend NP1 NP2
NP2 NP1
26
Challenges
Ambiguities
  • koyun
  • sheep (or bosom)
  • your bay
  • your dark (one)
  • of the bay
  • put!

27
Challenges
Ambiguities
  • silahini evine koy
  • put your gun to your home
  • put your gun to his home
  • put his gun to your home
  • put his gun to his home
  • put your gun to her home
  • put her gun to your home
  • put her gun to her home
  • .
  • .

28
Challenges
Ambiguities
  • kitabin kapagi
  • the books cover
  • books cover
  • the cover of the book

29
Challenges
Ambiguities
evDative (gitti) ? (went) to the
house masaDative (çikti) ? (jumped) on the
table adamDative (bakti) ? (looked) at the man
30
Challenges
Morphological differences -----------------------
--------------------------------------------------
-- Structural differences ----------------------
--------------------------------------------------
--- Ambiguities
Use morphological analysis on Turkish side and
generation on English side Transfer rules can
represent such transformations An English
language model can determine the most probable
translation statistically
31
The Avenue Transfer System
  • Avenue Project initiated by CMU LTI Group
  • Grammar formalism, which allows one to manually
    create a parallel grammar between two languages
  • and
  • Transfer engine, which transfers the source
    sentence into possible target sentence(s) using
    this parallel grammar

32
Overview of Our Approach
Turkish sentence
Morphological Analyzer
Analysis
Preprocessor
Lattice
Transfer rules
Avenue Transfer Engine
English translations
...
English Language Model
Most probable English translation
33
I. Analysis and Preprocessing
Morphological analyses of each word A set of
features, describing the structural properties of
the word adam evde oglunu yendi
34
I. Analysis and Preprocessing
Lattice representation of the sentence
yeV
PassVPast
evNLoc
adaNP1Sg
ogulNP2Sg
4
0
1
2
3
6
yenN
ZeroVPast
adamNPNon
ogulNP3Sg
5
yenVPast
35
I. Analysis and Preprocessing
Representation of IGs
36
II. Transfer and Generation
37
II. Transfer and Generation
38
II. Transfer and Generation
N
N
N
V
39
II. Transfer and Generation
N
N
N
N
N
V
V
N
adam evde oglunu yendi
man won son house
40
II. Transfer and Generation
NP
NP
N
N
N
N
N
V
the
V
N
adam evde oglunu yendi
man won son house
41
II. Transfer and Generation
SUBJ
SUBJ
NP
NP
N
N
N
N
N
V
the
V
N
adam evde oglunu yendi
man won son house
42
II. Transfer and Generation
SUBJ
SUBJ
NP
NP
NP
NP
N
N
N
N
V
N
the
V
N
the
adam evde oglunu yendi
man won son house
43
II. Transfer and Generation
SUBJ
Adjct
SUBJ
Adjct
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
the
adam evde oglunu yendi
man won son house
44
II. Transfer and Generation
SUBJ
Adjct
SUBJ
Adjct
NP
NP
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
45
II. Transfer and Generation
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
NP
NP
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
46
II. Transfer and Generation
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
47
II. Transfer and Generation
Vfin
Vfin
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
the
V
N
his
the
adam evde oglunu yendi
man won son house
48
II. Transfer and Generation
S
S
Vfin
Vfin
OBJ
SUBJ
Adjct
OBJ
SUBJ
Adjct
NP
NP
Vc
NP
NP
Vc
NP
NP
at
N
N
N
N
V
N
the
V
N
his
the
adam evde oglunu yendi
man won son house
49
II. Transfer and Generation
S
S
Vfin
Vfin
Adjct
OBJ
SUBJ
SUBJ
Adjct
OBJ
50
II. Transfer and Generation
Adjunct
Adjunct
NP
NP
at
Adjunct,3 AdjunctAdjunct NP -gt "at"
NP ( (x1y2) (x0 x1) ((x1 CASE) c
Loc) ((x1 poss) c yes) (y0 x0) )
51
II. Transfer and Generation
Vfin
Vfin
Vc
Vc
yendi -gt won Vc,2 VcVc V -gt
V ( (x1y1) Analysis (x0
x1) Constraints ((x1 lex) c (or yen"
...) ((x0 casev) lt Acc) ((x0 trans) lt yes)
Transfer ((y1 TENSE) (x1 TENSE)) ((y1
AGR-PERSON) (x1 AGR-PERSON)) ((y1 AGR-NUMBER)
(x1 AGR-NUMBER)) ((y1 POLARITY) (x1
POLARITY)) Generation (y0 y1) )
52
III. Decoding
Transfer engine outputs n translations T1, ...,
Tn We use an English language model to calculate
probability of each translation, and pick the one
with highest language model score
53
III. Decoding
54
III. Decoding
55
Outline
  • Machine Translation (MT)
  • Motivation
  • Challenges in MT
  • History of MT
  • Classical Approaches to MT
  • The Hybrid Approach
  • Challenges
  • Translation Steps
  • Analysis and Preprocessing
  • Transfer and Generation
  • Decoding
  • Evaluation
  • Methods
  • Experimental Results
  • Examples
  • Conclusions

56
Evaluation
57
MT Evaluation
  • Manual evaluation
  • SSER (subjective sentence error rate)
  • Correct/Incorrect
  • Manual evaluations require human effort and time
  • Automatic evaluation
  • WER (word error rate)
  • BLEU (Bilingual Evaluation Understudy)
  • METEOR

58
Automatic Evaluation
  • Word Error Rate (WER)
  • Number of insertions, deletions, and
    substitutions required to transform the reference
    translation into the system translation
  • BLEU
  • Number of common n-grams of words between the
    system translation S and a set of reference
    translations
  • METEOR
  • Similar to BLEU, considers roots and synonyms

59
Experimental Results
  • System contains over 200 transfer rules, and
    20000 lexical rules
  • It can parse and translate challenging sentences
  • Translations are sound, but not complete
  • We tested the system on 192 noun phrases, and 70
    sentences.
  • BLEU Score for noun phrases 60.38
  • BLEU Score for sentences 33.17

60
Examples
  • Noun phrase siyahlarla birlikte bir protesto
    yürüyüsünde
  • Translation in a protest walk with the blacks
  • Reference in a protest walk with the blacks
  • Noun phrase Elif 'in arkasindaki kapida
  • Translation at the door at the back of Elif
  • Reference on the door behind Elif
  • Noun phrase alisveris dünyasinda
  • Translation in the shopping world
  • Reference at the shopping world

61
Examples
  • Sentence Bu tutku zamanla bana aci vermeye
    basladi
  • Translation This passion began to give pain to
    me with time
  • Reference In time this passion began to give me
    pain
  • Sentence Persembe uzun yürüyüsler ve ziyaretler
    yapiyorum
  • Translation I am doing long walks and visits on
    Thursday
  • Reference On Thursdays I take long walks and
    make visits
  • Sentence Kaçtikça daha büyüdü, bir tutku oldu
  • Translation It grew more as escaping, it became
    a passion
  • Reference He grew as he ran away, became an
    obsession

62
Conclusions Future Work
  • A hybrid machine translation system from Turkish
    to English
  • wide linguistic coverage by manually-crafted
    transfer rules in Avenue
  • ambiguities handled by English language model
  • computationally inefficient translation
  • time-consuming development
  • Future work
  • further improvement of transfer rules
  • learning rules automatically from parallel corpus

63
Thank you!
Write a Comment
User Comments (0)
About PowerShow.com