Machine Translation Challenges and Language Divergences - PowerPoint PPT Presentation

About This Presentation
Title:

Machine Translation Challenges and Language Divergences

Description:

Machine Translation Challenges and Language Divergences Alon Lavie Language Technologies Institute Carnegie Mellon University 11-731: Machine Translation – PowerPoint PPT presentation

Number of Views:214
Avg rating:3.0/5.0
Slides: 36
Provided by: AlonL6
Learn more at: http://www.cs.cmu.edu
Category:

less

Transcript and Presenter's Notes

Title: Machine Translation Challenges and Language Divergences


1
Machine TranslationChallenges and Language
Divergences
  • Alon Lavie
  • Language Technologies Institute
  • Carnegie Mellon University
  • 11-731 Machine Translation
  • January 12, 2011

2
Major Sources of Translation Problems
  • Lexical Differences
  • Multiple possible translations for SL word, or
    difficulties expressing SL word meaning in a
    single TL word
  • Structural Differences
  • Syntax of SL is different than syntax of the TL
    word order, sentence and constituent structure
  • Differences in Mappings of Syntax to Semantics
  • Meaning in TL is conveyed using a different
    syntactic structure than in the SL
  • Idioms and Constructions

3
Lexical Differences
  • SL word has several different meanings, that
    translate differently into TL
  • Ex financial bank vs. river bank
  • Lexical Gaps SL word reflects a unique meaning
    that cannot be expressed by a single word in TL
  • Ex English snub doesnt have a corresponding
    verb in French or German
  • TL has finer distinctions than SL ? SL word
    should be translated differently in different
    contexts
  • Ex English wall can be German wand (internal),
    mauer (external)

4
Google at Work
5
(No Transcript)
6
(No Transcript)
7
Lexical Differences
  • Lexical gaps
  • Examples these have no direct equivalent in
    Englishgratiner(v., French, to cook with a
    cheese coating)otosanrin(n., Japanese,
    three-wheeled truck or van)

8
Lexical Differences
From Hutchins Somers
9
MT Handling of Lexical Differences
  • Direct MT and Syntactic Transfer
  • Lexical Transfer stage uses bilingual lexicon
  • SL word can have multiple translation entries,
    possibly augmented with disambiguation features
    or probabilities
  • Lexical Transfer can involve use of limited
    context (on SL side, TL side, or both)
  • Lexical Gaps can partly be addressed via phrasal
    lexicons
  • Semantic Transfer
  • Ambiguity of SL word must be resolved during
    analysis ? correct symbolic representation at
    semantic level
  • TL Generation must select appropriate word or
    structure for correctly conveying the concept in
    TL

10
Structural Differences
  • Syntax of SL is different than syntax of the TL
  • Word order within constituents
  • English NPs art adj n the big boy
  • Hebrew NPs art n art adj ha yeled ha gadol
  • Constituent structure
  • English is SVO Subj Verb Obj I saw the man
  • Modern Arabic is VSO Verb Subj Obj
  • Different verb syntax
  • Verb complexes in English vs. in German
  • I can eat the apple Ich kann den apfel essen
  • Case marking and free constituent order
  • German and other languages that mark case
  • den apfel esse Ich the(acc) apple eat I(nom)

11
(No Transcript)
12
(No Transcript)
13
(No Transcript)
14
MT Handling of Structural Differences
  • Direct MT Approaches
  • No explicit treatment Phrasal Lexicons and
    sentence level matches or templates
  • Syntactic Transfer
  • Structural Transfer Grammars
  • Trigger rule by matching against syntactic
    structure on SL side
  • Rule specifies how to reorder and re-structure
    the syntactic constituents to reflect syntax of
    TL side
  • Semantic Transfer
  • SL Semantic Representation abstracts away from SL
    syntax to functional roles ? done during analysis
  • TL Generation maps semantic structures to correct
    TL syntax

15
Syntax-to-Semantics Differences
  • Meaning in TL is conveyed using a different
    syntactic structure than in the SL
  • Changes in verb and its arguments
  • Passive constructions
  • Motion verbs and state verbs
  • Case creation and case absorption
  • Main Distinction from Structural Differences
  • Structural differences are mostly independent of
    lexical choices and their semantic meaning ?
    addressed by transfer rules that are syntactic in
    nature
  • Syntax-to-semantic mapping differences are
    meaning-specific require the presence of
    specific words (and meanings) in the SL

16
Syntax-to-Semantics Differences
  • Structure-change example
  • I like swimming
  • Ich scwhimme gern
  • I swim gladly

17
(No Transcript)
18
Syntax-to-Semantics Differences
  • Verb-argument example
  • Jones likes the film.
  • Le film plait à Jones.
  • (lit the film pleases to Jones)
  • Use of case roles can eliminate the need for this
    type of transfer
  • Jones Experiencer
  • film Theme

19
(No Transcript)
20
(No Transcript)
21
Syntax-to-Semantics Differences
  • Passive Constructions
  • Example French reflexive passivesCes livres se
    lisent facilementThese books read themselves
    easilyThese books are easily read

22
(No Transcript)
23
(No Transcript)
24
Same intention, different syntax
  • rigly bitiwgacny
  • my leg hurts
  • candy wagac fE rigly
  • I have pain in my leg
  • rigly bitiClimny
  • my leg hurts
  • fE wagac fE rigly
  • there is pain in my leg
  • rigly bitinqaH calya
  • my leg bothers on me
  • Romanization of Arabic from CallHome Egypt.

25
MT Handling of Syntax-to-Semantics Differences
  • Direct MT Approaches
  • No Explicit treatment Phrasal Lexicons and
    sentence level matches or templates
  • Syntactic Transfer
  • Lexicalized Structural Transfer Grammars
  • Trigger rule by matching against lexicalized
    syntactic structure on SL side lexical and
    functional features
  • Rule specifies how to reorder and re-structure
    the syntactic constituents to reflect syntax of
    TL side
  • Semantic Transfer
  • SL Semantic Representation abstracts away from SL
    syntax to functional roles ? done during analysis
  • TL Generation maps semantic structures to correct
    TL syntax

26
Idioms and Constructions
  • Main Distinction meaning of whole is not
    directly compositional from meaning of its
    sub-parts ? no compositional translation
  • Examples
  • George is a bull in a china shop
  • He kicked the bucket
  • Can you please open the window?

27
(No Transcript)
28
Formulaic Utterances
  • Good night.
  • tisbaH cala xEr
  • waking up on good
  • Romanization of Arabic from CallHome Egypt

29
Constructions
  • Identifying speaker intention rather than literal
    meaning for formulaic and task-oriented
    sentences.
  • How about suggestion
  • Why dont you suggestion
  • Could you tell me request info.
  • I was wondering request info.

30
(No Transcript)
31
MT Handling of Constructions and Idioms
  • Direct MT Approaches
  • No Explicit treatment Phrasal Lexicons and
    sentence level matches or templates
  • Syntactic Transfer
  • No effective treatment
  • Highly Lexicalized Structural Transfer rules
    can handle some constructions
  • Trigger rule by matching against entire
    construction, including structure on SL side
  • Rule specifies how to generate the correct
    construction on the TL side
  • Semantic Transfer
  • Analysis must capture non-compositional
    representation of the idiom or construction ?
    specialized rules
  • TL Generation maps construction semantic
    structures to correct TL syntax and lexical words

32
Take Home Messages
  • Remember these types of language divergences as
    you learn about and apply the various steps in
    the MT system pipelines of different approaches!
  • Ask yourself how capable these various steps and
    approaches are in addressing these types of
    divergences!
  • Can the step/approach handle these divergences?
  • If so, does it model the divergence at the
    appropriate level of abstraction?
  • Keep these language divergences in mind when you
    analyze the errors of the MT system that you have
    put together and trained!
  • Are the errors attributable to a particular
    divergence?
  • What would be required for the system to address
    this type of error?

33
Summary
  • Main challenges for current state-of-the-art MT
    approaches - Coverage and Accuracy
  • Acquiring broad-coverage high-accuracy
    translation lexicons (for words and phrases)
  • learning syntactic mappings between languages
    from parallel word-aligned data
  • overcoming syntax-to-semantics differences and
    dealing with constructions
  • Effective Target Language Modeling

34
Homework Assignment 1
35
Questions
Write a Comment
User Comments (0)
About PowerShow.com