Language Divergences and Solutions - PowerPoint PPT Presentation

1 / 43
About This Presentation
Title:

Language Divergences and Solutions

Description:

Source: Quechua vs. English (they say) s/he was singing -- takisharansi ... SL syntactic parser can still be hard to come by. Divergences and DUSTer ... – PowerPoint PPT presentation

Number of Views:50
Avg rating:3.0/5.0
Slides: 44
Provided by: clarkm
Category:

less

Transcript and Presenter's Notes

Title: Language Divergences and Solutions


1
Language Divergences and Solutions
  • Advanced Machine Translation Seminar
  • Alison Alvarez

2
Overview
  • Introduction
  • Morphology Primer
  • Translation Mismatches
  • Types
  • Solutions
  • Translation Divergences
  • Types
  • Solutions
  • Different MT Systems
  • Generation Heavy Machine Translation
  • DUSTer

3
Source ? Target
  • Languages dont encode the same information in
    the same way
  • Makes MT complicated
  • Keeps all of us employed

4
Morphology in a Nutshell
  • Morphemes are word parts
  • Work er
  • Iki ta ku na ku na ri ma shi ta
  • Types of Morphemes
  • Derivational makes new word
  • Inflectional adds information to an existing word

5
Morphology in a Nutshell
  • Analytic/Isolating
  • little or no inflectional morphology, separate
    words
  • Vietnamese, Chinese
  • I was made to go
  • Synthetic
  • Lots of inflectional morphology
  • Fusional vs. Agglutinating
  • Romance Languages, Finnish, Japanese, Mapudungun
  • Ika (to go) se (to make/let) rare (passive) ta
    (past tense)
  • He need s (3rd person singular) it.

6
Translation Differences
  • Types
  • Translation Mismatches
  • Different information from source to target
  • Translation Divergences
  • Same information from source to target, but the
    meaning is distributed differently in each
    language

7
Translation Mismatches
  • the information that is conveyed is different
    in the source and target languages
  • Types
  • Lexical level
  • Typological level

8
Lexical Mismatches
  • A lexical item in one language may have more
    distinctions than in another

Brother
? otouto Younger Brother
??? Ani-san Older Brother
9
Typological Mismatches
  • Mismatch between languages with different levels
    of grammaticalization
  • One language may be more structurally complex
  • Source marking, Obligatory Subject

10
Typological Mismatches
  • Source Quechua vs. English
  • (they say) s/he was singing --gt takisharansi
  • taki (sing) sha (progressive) ra (past) n
    (3rd sg) si (reportative)
  • Obligatory Arguments English vs. Japanese
  • Kusuri wo Nonda --gt (I, you, etc.) took medicine.
  • Makasemasu! --gt(Ill) leave (it) to (you)

11
Translation Mismatch Solutions
  • More information --gt Less information (easy)
  • Less information --gt More information (hard)
  • Context clues
  • Language Models
  • Generalization
  • Formal representations

12
Translation Divergences
  • the same information is conveyed in source and
    target texts
  • Divergences are quite common
  • Occurs in about 1 out of every three sentences in
    the TREC El Norte Newspaper corpus
    (Spanish-English)
  • Sentences can have multiple kinds of divergences

13
Translation Divergence Types
  • Categorial Divergence
  • Conflational Divergence
  • Structural Divergence
  • Head Swapping Divergence
  • Thematic Divergence

14
Categorial Divergence
  • Translation that uses different parts of speech
  • Tener hambre (have hunger) --gt be hungry
  • Noun --gt adjective

15
Conflational Divergence
  • The translation of two words using a single word
    that combines their meaning
  • Can also be called a lexical gap
  • X stab Z --gt X dar puñaladas a Z (X give stabs to
    Z)
  • glastuinbouw --gt cultivation under glass

16
Structural Divergence
  • A difference in the realization of incorporated
    arguments
  • PP to Object
  • X entrar en Y (X enter in Y) --gt X enter Y
  • X ask for a referendum --gt X pedir un referendum
    (ask-for a referendum)

17
Head Swapping Divergence
  • Involves the demotion of a head verb and the
    promotion of a modifier verb to head position

S NP VP N V PP I ran into the room.
S NP VP N V PP VP Yo entro en el cuarto
corriendo
18
Thematic Divergence
  • This divergence occurs when sentence arguments
    switch argument roles from one language to
    another
  • X gustar a Y (X please to Y) --gt Y like X

19
Divergence Solutions and Statistical/EBMT Systems
  • Not really addressed explicitly in SMT
  • Covered in EBMT only if it is covered extensively
    in the data

20
Divergence Solutions and Transfer Systems
  • Hand-written transfer rules
  • Automatic extraction of transfer rules from
    bi-texts
  • Problematic with multiple divergences

21
Divergence Solutions and Interlingua Systems
  • Melcuks Deep Syntactic Structure
  • Jackendoffs Lexical Semantic Structure
  • Both require explicit symmetric knowledge from
    both source and target language
  • Expensive

22
Divergence Solutions and Interlingua Systems
John swam across a river
event CAUSE JOHN event GO JOHN path ACROSS
JOHN position AT JOHN RIVER manner
SWIMINGLY
Juan cruza el río nadando
23
Generation-Heavy MT
  • Built to address language divergences
  • Designed for source-poor/target-rich translation
  • Non-Interlingual
  • Non-Transfer
  • Uses symbolic overgeneration to account for
    different translation divergences

24
Generation-Heavy MT
  • Source language
  • syntactic parser
  • translation lexicon
  • Target language
  • lexical semantics, categorial variations
    subcategorization frames for overgeneration
  • Statistical language model

25
GHMT System
26
Analysis Stage
  • Independent of Target Language
  • Creates a deep syntactic dependency
  • Only argument structure, top-level conceptual
    nodes thematic-role information
  • Should normalize over syntactic morphological
    phenomena

27
Translation Stage
  • Converts SL lexemes to TL lexemes
  • Maintains dependency structure

28
Analysis/Translation Stage
GIVE (v) cause go
I agent
STAB (n) theme
JOHN goal
29
Generation Stage
  • Lexical Structural Selection
  • Conversion to a thematic dependency
  • Uses syntactic-thematic linking map
  • loose linking
  • Structural expansion
  • Addresses conflation head-swapped divergences
  • Turn thematic dependency to TL syntactic
    dependency
  • Addresses categorial divergence

30
Generation Stage Structural Expansion
31
Generation Stage
  • Linearization Step
  • Creates a word lattice to encode different
    possible realizations
  • Implemented using oxyGen engine
  • Sentences ranked extracted
  • Nitrogens statistical extractor

32
Generation Stage
33
GHMT Results
  • 4 of 5 Spanish-English divergences can be
    generated using structural expansion categorial
    variations
  • The remaining 1 out of 5 needed more world
    knowledge or idiom handling
  • SL syntactic parser can still be hard to come by

34
Divergences and DUSTer
  • Helps to overcome divergences for word alignment
    improve coder agreement
  • Changes an English sentence structure to resemble
    another language
  • More accurate alignment and projection of
    dependency trees without training on dependency
    tree data

35
DUSTer
  • Motivation for the development of automatic
    correction of divergences
  • Every Language Pair has translation divergences
    that are easy to recognize
  • Knowing what they are and how to accommodate
    them provides the basis for refined word level
    alignment
  • Refined word-level alignment results in
    improved projection of structural information
    from English to another language

36
DUSTer
37
DUSTer
  • Bi-text parsed on English side only
  • Linguistically Motivated common search terms
  • Conducted on Spanish Arabic (and later Chinese
    Hindi)
  • Uses all of the divergences mentioned before,
    plus a light verb divergence
  • Try ? put to trying ? poner a prueba

38
DUSTer Rule Development Methods
  • Identify canonical transformations for each
    divergence type
  • Categorize English sentences into divergence type
    or none
  • Apply appropriate transformations
  • Humans align E ? E ? foreign language

39
DUSTer Rules
  • "kill" gt "LightVB kill(N)" (LightVB light
    verb)
  • Presumably, this will work for "kill" gt "give
    death to
  • "borrow" gt "take lent (thing) to
  • "hurt" gt "make harm to
  • "fear" gt "have fear of
  • "desire" gt "have interest in
  • "rest" gt "have repose on
  • "envy" gt "have envy of
  • type1.B.X English2 1 3 Spanish2 1 3 4 5
  • Verblt1,i,CatVarV_Ngt Nounlt2,j,Subjgt
    Nounlt3,k,Objgt lt--gt
  • LightVBlt1,Verbgt Nounlt2,j,Subjgt
    Nounlt3,i,Objgt Obliquelt4,Pred,Prepgt
    Nounlt5,k,PObjgt

40
DUSTer Results
41
Conclusion
  • Divergences are common
  • They are not handled well by most MT systems
  • GHMT can account for divergences, but still needs
    development
  • DUSTer can handle divergences through structure
    transformations, but requires a great deal of
    linguistic knowledge

42
The End
  • Questions?

43
References
  • Dorr, Bonnie J., "Machine Translation
    Divergences A Formal Description and Proposed
    Solution," Computational Linguistics, 204, pp.
    597--633, 1994.
  • Dorr, Bonnie J. and Nizar Habash, "Interlingua
    Approximation A Generation-Heavy Approach", In
    Proceedings of Workshop on Interlingua
    Reliability, Fifth Conference of the Association
    for Machine Translation in the Americas,
    AMTA-2002,Tiburon, CA, pp. 1--6, 2002
  • Dorr, Bonnie J., Clare R. Voss, Eric Peterson,
    and Michael Kiker, "Concept Based Lexical
    Selection," Proceedings of the AAAI-94 fall
    symposium on Knowledge Representation for Natural
    Language Processing in Implemented Systems, New
    Orleans, LA, pp. 21--30, 1994.
  • Dorr, Bonnie J., Lisa Pearl, Rebecca Hwa, and
    Nizar Habash, "DUSTer A Method for Unraveling
    Cross-Language Divergences for Statistical
    Word-Level Alignment," Proceedings of the Fifth
    Conference of the Association for Machine
    Translation in the Americas, AMTA-2002,Tiburon,
    CA, pp. 31--43, 2002.
  • Habash, Nizar and Bonnie J. Dorr, "Handling
    Translation Divergences Combining Statistical
    and Symbolic Techniques in Generation-Heavy
    Machine Translation", In Proceedings of the Fifth
    Conference of the Association for Machine
    Translation in the Americas, AMTA-2002,Tiburon,
    CA, pp. 84--93, 2002.
  • Haspelmath, Martin. Understanding Morphology.
    Oxford Univeristy Press, 2002.
  • Kameyama, Megumi and Ryo Ochitani, Stanley
    Peters Resolving Translation Mismatches With
    Information Flow Annual Meeting of the
    Assocation of Computational Linguistics, 1991

44
Other Divergences
  • Idioms
  • Aspectual Divergences
  • Knowledge outside of Lexical Semantics

Write a Comment
User Comments (0)
About PowerShow.com